Showing posts with label Objects. Show all posts
Showing posts with label Objects. Show all posts

Wednesday, October 15, 2008

Object Refactoring into Structures for Performance

Lately I've been tasked with refactoring some of our batch processes that are taking entirely too long to run - especially with large datasets (10-25K records). I've noticed one thing in common with each procedure that I've had to rewrite - they have heavy reliance on objects to do the work for them i.e. I'm doing some work on a 'job' object, for each job i'll call 'getJobById' which returns a populated job bean from the db in which I manipulate it and then call the 'save' method on it - and repeat 10-25 thousand times. With cf8.01 we're seeing heap size of the JVM being used up after a period of time crunching through these methods. They become slow to the point of unresponsive and ultimately bring the jvm down or cause the heap size to become too large and the process halted.

Here are some of the highlights that I've done in refactoring these processes:
1.) Only query for the data I need (i.e. NOT selecting all records when all i need are the id field and a location field)
2.) Verity that all local variables are var scoped.
3.) remove any calls to 'getJobById' in favor of a simple reusable JobStructure.
4.) populate the job Structure from the query directly and then pass this simplified structure into an alternate method that accepts the structure and does any massaging it needs to with the data (returning it by reference).
5.) Create a specific 'update' method in my DAO that accepts this Structure and saves ONLY the data that is needed vs. the entire bean object.


The Results??? Extremely fast execution and stable memory throughout the entire process - regardless of how long the entire process may take. We went from roofing the JVM to maintaining a stable memory footprint on these executions, while also staying true to good programming techniques - and even 'introspecting' a single Object before the run and exposing it's internal 'metadata' to the job Structure as a base for the operation - and then proceeding from there - Appending data and overwriting the Structure each time through the loop.
i.e. StructAppend(jobStruct,newStruct,'true')

CreateObject('component','job').init().getInstance()
We have all our instance data in a 'variables.instance' structure w/in the object. We also create a 'getInstance' method which returns that structure. That means we can use this structure of the objects data instead of using a new object each time - NOR using the bean each time. We're still able to change/modify the Bean metadata and still have that available to the other calling methods that require it.

I.e. We can call the 'getBeanConfig()' method on the job object which returns a structure of metadata (structure of structures) about things like field size, and nullability etc. of each field in our bean.


<cffunction name="getBeanconfig" output="false" access="public" returntype="Struct" hint="returns the entire bean config structure">
<cfreturn variables.beanconfig />
</cffunction>

<cffunction name="setBeanConfig" access="private" returntype="void" output="false" displayname="category config" hint="I setup the configuration for the bean category.">
<cfscript>
variables.beanconfig = structnew();
variables.beanconfig.title = StructNew();
variables.beanconfig.title.isnullable=0;
variables.beanconfig.title.maxlength = 95;
</cfscript>
</cffunction>


When working with thousands of beans in a process - originally it was a lot easier to get the thing working by just calling 'getBeanById()' and then manipulating it and calling 'Save()' on it. Over time I've learned that it's not sustainable and may need to be refactored. As long as you're being careful to var scope all your local variables in your cfcs and use a lowest common denominator approach to large long running batch requests, you will see a large performance gain by extracting your object structure data and working with a single copy of it vs. populating your entire bean each time through a loop and saving the entire bean once you're done with it. That plus leveraging the CFML Gateway to thread your executions, you'll be off the ground flying once again!

Friday, October 3, 2008

ColdFusion Gateway Data passed by Reference?

At our company tech summit just held the past few days (where all us remote developers get together and discuss all things technical) I had the opportunity to show off some of the cool new CFML Gateways functionality that ships with CF 8. I had created a generic object that could be called at any point in the application that would allow asynchronous threading of method executions where we needed it. I would execute

SendGatewayMessage(gatewayid, structure)

call which would then go through the cfml gateway and exeute the method that ColdSpring new about (code to come).

This was all great and fun, but what I wanted to know is this:

'What if I have an object that has it's data changed from inside the gateway. Will the calling page know about that change, and does the calling page still have reference to the object?'.


Objects and Structures are passed by 'reference' in ColdFusion. But what if that object is being manipulated in an Asynchronous process that is executing in a CFML Event Gateway in a separate thread from the original? That was my 'Poll the audience' question in which I asked our developers to think about and come up with their answer -

A - The Object would Change
B - The Object would not Chanage

I put together a CFC and Gateway call that tested this theory.

1.) Create an Object and assign a value through an accessor method i.e. setTitle('Bank Foreclosure Professional')

2.) Append that object to the data 'structure' that I am passing to the 'SendGateWayMessage(gatewayid, structure)

3.) Dump Data of the Object to a cfm page

4.) Call the Gateway and pass the structure containing the object

5.) Within the gateway call, do a thread sleep for 2.5 seconds and then call a setTitle('I have changed') on the Object.

6.) Meanwhile back on the .cfm page, right after my call to the 'sendgatewaymessage' function i will dump the value of the object again to the page (as this dump will happen immediately after the sendgatewaymessage call is executed asynchronously)

7.) On the .cfm page I will do a thread sleep for 5 seconds to make sure the gateway has enough time to change the value of the object.

8.) Finally, once I feel I've waited long enough, I'll dump the results of the Original Object to the page....

The Results????

You may be surprised at the results - and in the room of 9 people, we split the room in half with 'No Change' and 'Change' votes - the tie breaker coming from our Project Manager who (like myself) guessed 'NO Change' (i.e. there is no reference to the original object once it get's passed into the gateway and executes).

Who was right?? Please provide your comments and I'll let you know what I found out and who won the Poll (I'll post the code for those who want to see it and prove it on their own)