Wednesday, October 15, 2008

Object Refactoring into Structures for Performance

Lately I've been tasked with refactoring some of our batch processes that are taking entirely too long to run - especially with large datasets (10-25K records). I've noticed one thing in common with each procedure that I've had to rewrite - they have heavy reliance on objects to do the work for them i.e. I'm doing some work on a 'job' object, for each job i'll call 'getJobById' which returns a populated job bean from the db in which I manipulate it and then call the 'save' method on it - and repeat 10-25 thousand times. With cf8.01 we're seeing heap size of the JVM being used up after a period of time crunching through these methods. They become slow to the point of unresponsive and ultimately bring the jvm down or cause the heap size to become too large and the process halted.

Here are some of the highlights that I've done in refactoring these processes:
1.) Only query for the data I need (i.e. NOT selecting all records when all i need are the id field and a location field)
2.) Verity that all local variables are var scoped.
3.) remove any calls to 'getJobById' in favor of a simple reusable JobStructure.
4.) populate the job Structure from the query directly and then pass this simplified structure into an alternate method that accepts the structure and does any massaging it needs to with the data (returning it by reference).
5.) Create a specific 'update' method in my DAO that accepts this Structure and saves ONLY the data that is needed vs. the entire bean object.

The Results??? Extremely fast execution and stable memory throughout the entire process - regardless of how long the entire process may take. We went from roofing the JVM to maintaining a stable memory footprint on these executions, while also staying true to good programming techniques - and even 'introspecting' a single Object before the run and exposing it's internal 'metadata' to the job Structure as a base for the operation - and then proceeding from there - Appending data and overwriting the Structure each time through the loop.
i.e. StructAppend(jobStruct,newStruct,'true')

We have all our instance data in a 'variables.instance' structure w/in the object. We also create a 'getInstance' method which returns that structure. That means we can use this structure of the objects data instead of using a new object each time - NOR using the bean each time. We're still able to change/modify the Bean metadata and still have that available to the other calling methods that require it.

I.e. We can call the 'getBeanConfig()' method on the job object which returns a structure of metadata (structure of structures) about things like field size, and nullability etc. of each field in our bean.

<cffunction name="getBeanconfig" output="false" access="public" returntype="Struct" hint="returns the entire bean config structure">
<cfreturn variables.beanconfig />

<cffunction name="setBeanConfig" access="private" returntype="void" output="false" displayname="category config" hint="I setup the configuration for the bean category.">
variables.beanconfig = structnew();
variables.beanconfig.title = StructNew();
variables.beanconfig.title.maxlength = 95;

When working with thousands of beans in a process - originally it was a lot easier to get the thing working by just calling 'getBeanById()' and then manipulating it and calling 'Save()' on it. Over time I've learned that it's not sustainable and may need to be refactored. As long as you're being careful to var scope all your local variables in your cfcs and use a lowest common denominator approach to large long running batch requests, you will see a large performance gain by extracting your object structure data and working with a single copy of it vs. populating your entire bean each time through a loop and saving the entire bean once you're done with it. That plus leveraging the CFML Gateway to thread your executions, you'll be off the ground flying once again!

No comments: