Closed (fixed)
Project:
Xapian integration
Version:
5.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
12 May 2008 at 06:40 UTC
Updated:
15 Jun 2008 at 08:45 UTC
Jump to comment: Most recent file
Comments
Comment #1
jeremy commentedAre these 5,000 nodes generated by the devel module?
I'm testing this module on my development server, populating nodes using the devel module. I've not tried the perl indexer, but Xapian took nearly 3 minutes to index only 500 nodes, I think that qualifies as slow.
Reading through the code, I see that you open the database, write to it, and close/flush it for each and every node that is indexed -- wouldn't it make more sense to open it once, write a whole bunch of nodes, then close/flush it?
Comment #2
jeremy commentedAh, I see now that this is what you're talking about re the static in your comment, and I see the comment in your code now. Okay, I'll help track down why the index is unusable when you use a static.
Comment #3
jeremy commentedPassing the whole db_query result into the reindex function, and then looping there through all the nodes is extremely fast and generates a valid index. I'm still trying to understand why using a static does not work.
Comment #4
jeremy commentedThe issue had nothing to do with using a static. Instead, it's the way the PHP bindings work. After you are done making changes to a writable Xapian database instance, you have to set it to NULL otherwise it won't be flushed to disk.
The attached patch adds a new "flush" parameter to the _xapian_init_database() function, and updates all instances of opening a writable database to call the init function when finished, causing all changes to be flushed to disk.
It now takes ~35 seconds to index 5,000 devel generated nodes on my test server.
Comment #5
singularoYes, that seems to have solved the problem nicely. On my test db, it reindexes the whole db in 46 seconds for over 6000 nodes, which is not bad ;-)
Reindexed 6064 nodes successfully, 2 failed in: 46542.6700115 ms
Results afterwards are very good as well.
Xapian:
First query (1 match)
Query: Xapian::Query(Zfrog:(pos=1))
Query time: 25.1049995422ms
Subsequent (many matches)
Query: Xapian::Query(Zhonda:(pos=1))
Query time: 1.90591812134ms
vs core search
First Query (1 match)
Query: frog
Query time: 55.0000667572ms
Subsequent (many matches)
Query: honda
Query time: 463.091850281ms
Comment #6
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.