Indexing is slow, yet using the static variable causes the index to be unusable.

Need to figure out why this is/work around it so that the index creation is much faster.

Perl only takes about a minute to index 5000 nodes, the php code should be very close to the same time.

CommentFileSizeAuthor
#4 xapian.module.patch1.78 KBjeremy

Comments

jeremy’s picture

Are these 5,000 nodes generated by the devel module?

I'm testing this module on my development server, populating nodes using the devel module. I've not tried the perl indexer, but Xapian took nearly 3 minutes to index only 500 nodes, I think that qualifies as slow.

Reading through the code, I see that you open the database, write to it, and close/flush it for each and every node that is indexed -- wouldn't it make more sense to open it once, write a whole bunch of nodes, then close/flush it?

jeremy’s picture

Ah, I see now that this is what you're talking about re the static in your comment, and I see the comment in your code now. Okay, I'll help track down why the index is unusable when you use a static.

jeremy’s picture

Passing the whole db_query result into the reindex function, and then looping there through all the nodes is extremely fast and generates a valid index. I'm still trying to understand why using a static does not work.

jeremy’s picture

Status: Active » Needs review
StatusFileSize
new1.78 KB

The issue had nothing to do with using a static. Instead, it's the way the PHP bindings work. After you are done making changes to a writable Xapian database instance, you have to set it to NULL otherwise it won't be flushed to disk.

The attached patch adds a new "flush" parameter to the _xapian_init_database() function, and updates all instances of opening a writable database to call the init function when finished, causing all changes to be flushed to disk.

It now takes ~35 seconds to index 5,000 devel generated nodes on my test server.

singularo’s picture

Status: Needs review » Fixed

Yes, that seems to have solved the problem nicely. On my test db, it reindexes the whole db in 46 seconds for over 6000 nodes, which is not bad ;-)

Reindexed 6064 nodes successfully, 2 failed in: 46542.6700115 ms

Results afterwards are very good as well.

Xapian:
First query (1 match)
Query: Xapian::Query(Zfrog:(pos=1))
Query time: 25.1049995422ms

Subsequent (many matches)
Query: Xapian::Query(Zhonda:(pos=1))
Query time: 1.90591812134ms

vs core search

First Query (1 match)
Query: frog
Query time: 55.0000667572ms

Subsequent (many matches)
Query: honda
Query time: 463.091850281ms

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.