Sunday, May 17, 2009

Disable CFC Type Check Proof

I usually question a lot of the things I hear, and this one I just had to set the record straight. I can stand being corrected on things when I'm proven wrong so I'm hoping this will come across as simply a clarification and not any kind of 'I told you so'.

But in the CFADmin in CF8 'Settings' there is a 'Disable CFC type Check' option that allows you to turn ON the ability to ignore the argument 'types' of CFC's. For example, if I have a component of type 'com.adobe.reactor' and I pass that to a method expecting a 'com.adobe.transfer', if this option is ON, then there would be NO type checking and the argument would be successfully passed in (I'd assuming something else would fail after that however).

At cfObjective this year it was stated at a session that if this option was turned ON, a datatype like 'numeric' would be able to accept a 'string' value as there would be no type checking. This is simply not true

Proof:

Visit your cfadmin and turn ON the 'Disable CFC Type Check' option and restart your server.
Run the following script:


<cffunction name="test" output="false" access="public" returntype="any" hint="">
<cfargument name="num" type="com.adobe.transfer" required="true">
<cfreturn>
</cfreturn>

<cfset x="test('hi')">
<cfdump var="#x#">


You'll notice that this works just fine. Notice that I'm passing a string of 'hi' into the test method that is expecting a 'com.adobe.transfer' object. This works just fine with Disable CFC Type Checking ON as it simply treats the argument type as an 'ANY' type.

The CFAdmin states:
"When checked, UDF arguments of CFC type is not validated. The arguments are treated as type "ANY". Use this setting in a production environment only."


In this case 'com.adobe.transfer' is a UDF or user defined argument - I defined it in a directory of com/adobe/transfer.cfc.

However, when I change my test method to accept a 'numeric' value instead of this UDF value, you will see it will throw an exception:


<cffunction name="test" output="false" access="public" returntype="any" hint="">
<cfargument name="num" type="numeric" required="true">
<cfreturn>
</cfreturn>


"The NUM argument passed to the test function is not of type numeric."

This proves that ONLY UDF arguments are changed to 'ANY' types when 'Disable CFC Type Check' is turned ON in cfadmin (oh no I've gone cross eyed)

Imagine the security holes created by such an option if indeed the native datatypes were all treated as 'ANY'? SQL Injection could occur if you had typed the cfargument as a 'numeric' datatype. A malicious user could simply pass in a string value and bypass your initial line of attack - that being your cfargument data type. This would not be a good situation (more information on security and cfml see Jason Dean's blog at 12Robots.com).

The fact is, regardless of whether or not 'Disable CFC Type Check' is ON or OFF in your CFAdmin Setttings, native Datatypes still need to be adhered to, or an exception will be thrown.

Monday, May 4, 2009

A ColdFusion Developers Keyboard

How can you tell what kind of developer you are by your Keyboard??

Check out my keyboard - Notice the 'C' key is all but worn right off - This has been from about 2 years of programming on this external keyboard - My previous laptop was the same way!

Just for fun, shoot me a picture of your keyboard - whether ColdFusion Programmer, .Net Programmer or dba (would your 'S' key be worn down from all the SELECTS?).

Enjoy

Friday, May 1, 2009

Clustered Application Needs Re-Loading? Here's How

A question came up at last years cfObjective in Mike Brunt's presentation about reloading your application/model/ColdSpring config in mid-flight on a clustered application (your application has 2 or more CF instances running it).
We've spent some time investigating on our own and mostly just ignoring the issue and going through the tedious process of RDP-ing into the server and MMC-ing into each Servers Services Panel and restarting each instance and verifying they come back online - staggering them throughout a period of time until all cluster members were synced. This sucks.

Some of the proposed solutions included:
1.) Using ActiveMQ and a CF Gateway to Broadcast a 'reload' message and have the cluster members stagger their execution of it so they all don't reload at once (taking the app offline) via a thread sleep randomly based on the JVM instance etc.
2.) Appreloading via the url X times until you think you've HIT them all
3.) Use JRun's internal Web Server

I figured I'd give number 3 a try as I knew it would be the quickest and easiest to implement if it worked. Turns out it does.

What would happen is this: each CF instance has it's own dedicated IP and port that is running the application that can be accessed via a browser or over http (which can get automated but we'll get into that later).

The IP:Port is normally used if you wanted to get to a particular cluster members cfide to do some configurations etc i.e. 127.0.0.1:8303/cfide/administrator/index.cfm. However, if you browse to just 127.0.0.1:8303, you'll notice a nice directory structure that comes form your installation instance directory i.e. C:\JRun4\servers\{instancename}\cfusion.ear\cfusion.war if on Jrun.

So what we'll do is modify your jrun-web.xml file by adding the virtual root path to your application AND make sure if you use a separte jvm.config for your instance, that the default jvm.config is the same (more on that later).

Navigate to C:\JRun4\servers\{instancename}\cfusion.ear\cfusion.war\WEB-INF and open jrun-web.xml (if you don't have one you can try borrowing one from the default cfusion instance).

Add the following to the file and save:
<virtual-mapping>
<resource-path>/</resource-path>
<system-path>C:/webroot/approot</system-path>
</virtual-mapping>


This will make JRUN's internal web server root be your application's root directory.

Restart your CF instance.

If you are like most shops and use some kind of query parameter to execute your application reload procedure i.e. ?reloadsite=true or the like, then you can execute it via the ip:port?param and reload each cluster member by changing your ip:port.

Advanced Note: if you're like us and have a separate jvm configured for your CF instances (different than the stock jvm.config found in c:\JRun4\bin) then you'll need to make sure that
a.) either the jvm.config is the same as your CF Instance jvmconfig file
b.) make the jvm.config (default) the same as the CFInstance jvmconfig file, and then reconfigure that instance to use the 'jvm.config' file. This is key, as the internal JRun webserver will use the default jvm.config on startup, so if you have separated these jvm's out, reloading your app from ip:port will do nothing, as you have only just setup 2 separate jvm's running your application, and the one has NO knowledge of the other.

This is one method anyway - just make sure you have tested this thoroughly, especially if you have environment specific configurations that get loaded up depending on the url - as the IP address block may load up an unexpected set of configuration data from your application.

We'll be trying this out over the next few weeks and likely integrate it into the application in some kind of threaded reload routine if some configuration needs to be done mid-flight.

Enjoy!

Search Engine Death Match - Solr Wins (Pt. 2)

Abstraction:
They key to the integration of Solr and removal of Verity in our model was to abstract the search functionality of the application. The process started by creating a 'CollectionManager' which held the methods that were needed to interact with the Verity Server through cfsearch and consolodated all calls to 'cfsearch' through this service.

Coldspring:
Once that was done and working fine - we were ready to dive into Solr and figure out the nuances of the product and ultimately create a 'SolrManager' which would replace the collectionManager. Since we were using ColdSpring, this was a mere class path change in the coldspring.xml and that was it (I love ColdSpring).


Solr Data-Config:

I'll get a more formalized presentation together about swapping out Verity in favor of Solr, but in general, our data was in a SQL Database vs. xml etc, so we used the 'data-config.xml' to basically come up with our main query that populates the index (each solr instance has a single instance or collection vs. verity where there is 1 verity and many collections) as well as delta and delete queries for updating and removal of the collection data through JDBC. This data-config.xml file is simply an xml file in which you write sql to select your columns based on some criteria.
Note: it's best to have created, lastmodified, id, active columns in your database table you wish to index, as this will help Solr in determining what data has changed, and what has been deleted. Hint: keep these queries simple and use a 'view' if your sql is complex at all, as it will hide the complexity, making the file easier to read, and abstract the data in a way that you can change the view w/o having to change the data-config.xml file each time. It's also worth pointing out that the 'server time' should be exactly the same on both your database and solr machine (if not on the same box) as Solr stores internally the last time it indexed your data in a value referenced in your sql as '${dataimporter.last_index_time}'.
** If you have a 'smalldatetime' value for lastmodified where sql will 'round' the seconds of the field, then you may run into issues where Solr won't pick up on changes that have happened since it's last_index_time as compared to your database timestamps etc.

Solr Schema:
This is basically the data definition in a single schema.xml of your data. You define your 'fields' which are your columns of data from the data-config and the datatypes etc. This is very flexible including the ability to 'group' multiple fields together as a single field (copyfield) as well as defining what is the default field that's searched on if none is specified in the field (i.e. search for 'MN jobs' could search job titles, job desc, locations etc).

Solr Config:
There was a one line change here just to tell Solr to use the data-config for it's data:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>


ColdFusion:
Wayne Graham has published an open source component out there at Ria Forge to assist in the low level cfhttp calls called SolManager. I've added this as a helper to my main 'SolrManager' component and i just call it to do the cfhttp calls to Solr and then I can do what I need with the data it returns.
Since the data is returned as xml, and the 'schema' of the data defines which columns / nodes are returned, I then have full control over the result. I convert the xml to a query using query new, xmlparse, and xmlsearch - and for our purpose as long as the query is in the same format as the one returned by a cfsearch through verity, the application won't know any difference.

Performance:
FAST! Solr does it's own caching and warming of searches and results and from my metrics, depending on the xml result set (which can EASILY be adjusted to limit the size of the xml as a '&rows=x' query string attribute in the search string) the performance is negligible if not better across the board for searches.

There's a lot more to discuss such as search syntax, replication, admin dashboard etc, but I'll leave a few stones un-turned for you to dive in and get your feet wet. They have good documentation on the site as well as an Extremely easy and efficient Jetty package that you can start up, execute a few commands on through a tutorial on their site, and get a feel for what to expect.

All in all, we were to the point where Verity was NO LONGER AN OPTION or a viable solution for our company and was causing nearly hourly baby sitting, so this solution was implemented JIT (Just in Time) and has saved us from feeling the effects of daily search stress.

Have fun and stay tuned for a full run down with code samples at MN CFUG soon.