Showing posts with label Verity. Show all posts
Showing posts with label Verity. Show all posts

Thursday, July 23, 2009

Search Engine Death Match (Pt. 3) MPLS CFUG

I'll be speaking to the Minneapolis/St. Paul ColdFusion User Group (CFUG) on Aug. 6, 2009 about the Solr Integration in ColdFusion 9 (Centaur Beta). If you're in the area we'd love to see you, session starts at 6:30 - more information at ColderFusion.com.

Preview:
Search Engine Death Match: Solr (Luncene) vs. Verity

Search is King.
See how the 2 engines that ship with ColdFusion 9 stack up.
Learn more about the newly integrated Solr (based on Java Lucene) engine and how it stacks up against Verity.
Listen to real world stories about Solr's performance in a large scale Enterprise application.
Preview the features of Solr in Centaur Beta and determine if you're company could benefit from a better, more scalable, more reliable search solution.

Here is a preview of the session (pdf).

Hope you see you there!

Friday, May 1, 2009

Search Engine Death Match - Solr Wins (Pt. 2)

Abstraction:
They key to the integration of Solr and removal of Verity in our model was to abstract the search functionality of the application. The process started by creating a 'CollectionManager' which held the methods that were needed to interact with the Verity Server through cfsearch and consolodated all calls to 'cfsearch' through this service.

Coldspring:
Once that was done and working fine - we were ready to dive into Solr and figure out the nuances of the product and ultimately create a 'SolrManager' which would replace the collectionManager. Since we were using ColdSpring, this was a mere class path change in the coldspring.xml and that was it (I love ColdSpring).


Solr Data-Config:

I'll get a more formalized presentation together about swapping out Verity in favor of Solr, but in general, our data was in a SQL Database vs. xml etc, so we used the 'data-config.xml' to basically come up with our main query that populates the index (each solr instance has a single instance or collection vs. verity where there is 1 verity and many collections) as well as delta and delete queries for updating and removal of the collection data through JDBC. This data-config.xml file is simply an xml file in which you write sql to select your columns based on some criteria.
Note: it's best to have created, lastmodified, id, active columns in your database table you wish to index, as this will help Solr in determining what data has changed, and what has been deleted. Hint: keep these queries simple and use a 'view' if your sql is complex at all, as it will hide the complexity, making the file easier to read, and abstract the data in a way that you can change the view w/o having to change the data-config.xml file each time. It's also worth pointing out that the 'server time' should be exactly the same on both your database and solr machine (if not on the same box) as Solr stores internally the last time it indexed your data in a value referenced in your sql as '${dataimporter.last_index_time}'.
** If you have a 'smalldatetime' value for lastmodified where sql will 'round' the seconds of the field, then you may run into issues where Solr won't pick up on changes that have happened since it's last_index_time as compared to your database timestamps etc.

Solr Schema:
This is basically the data definition in a single schema.xml of your data. You define your 'fields' which are your columns of data from the data-config and the datatypes etc. This is very flexible including the ability to 'group' multiple fields together as a single field (copyfield) as well as defining what is the default field that's searched on if none is specified in the field (i.e. search for 'MN jobs' could search job titles, job desc, locations etc).

Solr Config:
There was a one line change here just to tell Solr to use the data-config for it's data:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>


ColdFusion:
Wayne Graham has published an open source component out there at Ria Forge to assist in the low level cfhttp calls called SolManager. I've added this as a helper to my main 'SolrManager' component and i just call it to do the cfhttp calls to Solr and then I can do what I need with the data it returns.
Since the data is returned as xml, and the 'schema' of the data defines which columns / nodes are returned, I then have full control over the result. I convert the xml to a query using query new, xmlparse, and xmlsearch - and for our purpose as long as the query is in the same format as the one returned by a cfsearch through verity, the application won't know any difference.

Performance:
FAST! Solr does it's own caching and warming of searches and results and from my metrics, depending on the xml result set (which can EASILY be adjusted to limit the size of the xml as a '&rows=x' query string attribute in the search string) the performance is negligible if not better across the board for searches.

There's a lot more to discuss such as search syntax, replication, admin dashboard etc, but I'll leave a few stones un-turned for you to dive in and get your feet wet. They have good documentation on the site as well as an Extremely easy and efficient Jetty package that you can start up, execute a few commands on through a tutorial on their site, and get a feel for what to expect.

All in all, we were to the point where Verity was NO LONGER AN OPTION or a viable solution for our company and was causing nearly hourly baby sitting, so this solution was implemented JIT (Just in Time) and has saved us from feeling the effects of daily search stress.

Have fun and stay tuned for a full run down with code samples at MN CFUG soon.

Thursday, April 30, 2009

Search Engine Death Match - Solr Wins

It's now been over a month since we rolled out our Lucene / Solr powered search engine replacement for Verity in our application and let me sum it up this way: Zero maintenance.

Even though the OEM Verity licensing states 250K record max, we ran into a ceiling on a few of our machines when the number of Verity collections reached 82-88. We were doing constant rcadmin fixes to drop the thread counts for each collection down to 2 from 3 - but on each restart or at random points in time, the thread counts would return to 3. Instability and random collection switches were the main reasons we moved off the platform, and we'll never look back.

Solr took some configuration, but it was more than flexible enough for our needs, and with the delta updates and deletes, it made the maintenance near zero.

As far as parsing the resulting xml, that was done efficiently through coldfusion's xmlparse and xmlsearch to walk the nodes and return a 'like' verity recordset to the application. With ColdSpring we were able to swap out implementations easily and everything just worked. Great planning and execution resulted in a seamless rollout with Solr being run on RedHat on a dedicated machine.

I will most likely do a presentation at our next/upcoming CFUG (ColdFusion User Group) in Minneapolis/St.Paul in the near months about the transition and talk about integration points before ColdFusion 9 comes out and makes it all available with a single tag ;)

Long Live Open Source software

Thursday, October 23, 2008

Lowering the Number of Verity Threads on Servers with Large Number of Collections

With our 55+ collections - we have some suggested ways to better manage the amount of threads that Verity has to manage. Below is the suggestion direct form Adobe Tier 3 ColdFusion/Jrun support consultants.

So, the suggested workaround to prevent the issue of collectionnames being ‘switched’ is to lower the number of threads assigned to each collection in Verity. Verity support had indicated this is an issue they are familiar with. They see it when the ratio of (total number of threads/CPU) gets high. Your total threads is about 165 now (55 collections * 3 threads). They suggest lowering the threads per collection to 2. Note, it is generally not recommended you use less than 2 threads per collection. One thread per collection may work for collections that are not heavily used.

You can update the threads per collection by running Verity RCAdmin. I have pasted below the output from my updating a “linuxarchive” collection. You will run the rcadmin from your verity_root\k2\_nt40\bin directory in a DOS window.
1. type in rcadmin
2. at rcadmin prompt type in indexattach
3. type in collectionname for index alias
4. enter ‘c’ for collection
5. enter the search server alias (ColdFusionK2_server1)
6. modify type is update = 0
7. index state is online = 2
8. Threads is the update you are making. All collections have 3 threads currently, which you should change to 2.
9. save changes=y
You should do this for each of your collections. Once you have updated all the collections you will need to stop and restart the Verity search service.


I have recently done this on our beta environment and the numbers fell from about 270 threads (task manager - Processes - view - Select columns - thread count) to 210 or so. We'll see how this effects overall management of the collections. This is not necessary on servers with lower number of managed collections.

Monday, September 22, 2008

Verity HotFix Received from Adobe

We have received Tier 3 support from Adobe and they have gotten us a patch for our Verity Collection documented in prior posts. The one thing that shocked me as they they took a look at our case and stated that "Ideally, Verity will run without incident efficiently on about 12 collections (especially if they are using CATEGORIES)." 12 collections? I've never seen this documented anywhere - (as of 9.21.2008) and I've read about everything there is to read on the subject.

So there may be a hotfix for this and other Verity issues coming out soon from Adobe - The entire process took a month to resolve but we're excited to implement the fix and I'll be sure to blog the results of the patch/hotfix.

Wednesday, August 27, 2008

Verity - Internal collection mappings

This is in response to my origional POST about verity returning incorrect data.

This seems to confirm that there is something WRONG with Verity’s internal mapping of collections – as this is what I had executed prior to restarting the verity service. (listing 1.0)

Listing 1.0
D:\CFusionMX\verity\Data\services\ColdFusionK2_indexserver1\log\status.log

2008-08-27 09:55:25 Status: OdkWrk1 service worker thread [0] executing request K2IdxCollOpen
2008-08-27 09:55:25 Error: [VDKCB ws=cf_jrpp-1383_workspace] Error E0-0401 (Vdb): Couldn't find vdb descriptor d:\CFusionMX\verity\collections\ameriprise\assists\00000002.wld
2008-08-27 09:55:25 Status: OdkWrk1 service worker thread [0] executing request K2IdxCollSubmitBif
2008-08-27 09:55:26 Status: [VDKCB ws=cf_jrpp-1383_workspace] Bulk insert operation into d:/CFusionMX/verity/collections/ameriprise (7) from d:/CFusionMX/verity/Data/services/ColdFusionK2_indexserver1/tmp/cf_jrpp-1383_workspace_BIF (0, 0)
2008-08-27 09:55:26 Status: [VDKCB ws=cf_jrpp-1383_workspace] Initializing dataset 00000007.ddd, index 00000007.did
2008-08-27 09:55:27 Status: [VDKCB ws=cf_jrpp-1383_workspace] Totals (154 documents): 770 para 3336 sent 88918 word (1392 Kb used)
2008-08-27 09:55:27 Status: [VDKCB ws=cf_jrpp-1383_workspace] Optimizing database layout
2008-08-27 09:55:27 Status: [VDKCB ws=cf_jrpp-1383_workspace] (1734 ms) Indexed 154 docs into d:/CFusionMX/verity/collections/ameriprise/parts/00000007
2008-08-27 09:55:27 Status: [VDKCB ws=cf_jrpp-1383_workspace] Writing partition index data
2008-08-27 09:55:28 Error: [VDKCB ws=cf_jrpp-1383_workspace] Error E0-0401 (Vdb): Couldn't find vdb descriptor d:\CFusionMX\verity\collections\ameriprise\assists\00000002.wld


Note that you can see the correct data 154 records being added to the collection – this is also likely why in the cfadministrator, it’s listed this collection as having 154 records.
However, as we know, when we ask for this collection, it seems to be returning

It’s interesting that each client was ‘one off’ according to this CollRef listing in D:\CFusionMX\verity\Data\host\admin\adminX.xml.
As in – ask for ‘ameriprise’, you’ll get ‘cingular’.

<CollRef alias="aigretirement" status="online" threads="3"/>
<CollRef alias="ameriprise" status="online" threads="3"/>
<CollRef alias="cingular" status="online" threads="3"/>

Thursday, August 14, 2008

Verity - Collections Taken Offline

In doing some more digging into what makes Verity tick (the stand alone version that can be installed separately from the ColdFusion application sever on a separate physical/virtual machine) I uncovered yet another clue into collections being taken 'offline' at various times. We have multiple collections registered through Verity for our clients, and almost monthly now we'll notice that one or more indexes just decides to stop responding to requests (and looks like it's disappeared as it's not listed in the cfadmin).

The directory and files to look at are here: \{verityroot}\verity\Data\host\admin.
Here you will find 2 files -
admin.dat - This file looks like it tells you the 'ID' of the adminX.xml file that its currently using as it's data store
adminX.xml - where X is the value from admin.dat file.

AdminX.xml is a file which has all configuration settings for the K2 Index engine, including names of collections, ports, paths, memory data etc.

You will also find an interesting area near the bottom of the file:


<CollRefs>
<CollRef alias="col1" status="online" threads="2"/>
<CollRef alias="col2" status="online" threads="3"/>
<CollRef alias="col3" status="offline" threads="3"/>


Here you may find your missing indexes - as they have been turned 'offline' for some reason (there is little logging going on to determine why an index get's taken offline).

This may help you in debugging, as next time this happens, I will be going straight to this file and turning any indexes back 'online' and noting any ones that are marked as offline.

To be continued....

Wednesday, August 6, 2008

Verity - Ask for a Collection and Ye Shall not Receive

Today is another one of those days that 'I hate Verity' (that is shipped with coldfusion).

Every few months we have a disaster occur in production where we have 50+ verity collections on a separate server (separate server install of the Verity version that is shipped with cf 7). Seems like randomly after a service restart of the Instance that mostly calls the Verity collections through cfsearch cfcollection and cfindex, Verity will simply return a resultset from a random collection.

So we'll send a
<cfsearch collection="CollectionA">
and we'll get back results from CollectionB, for no good reason.

Solution:
Restarting the ColdFusion MX 7 Search Service on the other machine 're-aligns' things and once again the application resumes normal processing.

This has occurred so many times, that I've decided to log a Bug with Adobe about the product.

We separated out the Verity Search service onto a separate machine as this was happening very more frequently when on the same machine (we'd ask for data from one collection, return data from another collection). It still happens now, although less frequent, but with our current dependency on Verity to do much of the work for string comparisons and categorization, when Verity goes, so does the application.

If anyone else has experienced this please post your response. I've even logged to a cflog exactly what we're passing cfsearch when this issue is occurring, and all the data is correct.

It's like the connection between the 2 machines get's severed and resuming that connection the collection mapping get's thrown off, and incorrect data is the result.

Arg - Can't wait for Lucene Integration in cf9, but I'm afraid it will be too late by then (Summer 2009)

Sunday, July 27, 2008

Verity Separate Server Install Collections Fail to Create

Verity sucks some days, and today is one of those days.

We had another episode with Verity where we had a collection disappear. I am not sure where or why, but I was tasked with getting it back online. This is not the first time this has happened, but the solution should be easy: Re-Create the Index, however this is where things got UGLY.

I looked in the CFadmin to see if the collection was listed, which it was not. So I proceeded to attempt to create it and received:

Unable to create collection companyX.
An error occurred while creating the collection: com.verity.api.administration.ConfigurationException: Index exists. Cannot insert entry. (-6065)

So it isn't listing the collection, and yet it says it exists?? OK, so that's problem one.

Problem 2 is that when i execute a 'reindex' which does a
<cfcollection action="create" ..>
I received no errors from the application (things looked like they were successful)

However when I look into CFADMIN again, in Verity Collections, the collection was once again not listed.

Here's what I did to fix it:

In our installation we have the Verity Installer (ColdFusion 7) installed on a separate box from the web server. I told the web1 to use the Verity K2Server at the web1 address, instead of the web2 address that it was using. Then I recreated and populated the collection using the same server the application was using (Verity was installed on both servers, but is typically not running on web1). Once done, I specified the web2 IP address again in cfadmin and copied the collection directory of the collection that I wanted from the web1 to web2.

When I go into cfadmin I should NOT see the collection listed - but after I executed a <cfcollection action="create"> with that collection name, something behind the scenes worked that time, and the collection showed up in cfadmin WITH the correct amount of data in it.

This is strange for a number of reasons, but I had to repeat this process for another collection just after this, so I know this process works. Executing a 'action="create"' typically kills any data that is in the collection, which didn't happen when I used this approach.

Anyway, hope this helps, and I know it will help me again the next time it happens....because it WILL!