indexing nodes in search/cron increasingly getting slower until it stops working
I have a 5.1 Drupal site with about 7000 nodes. When I started to index the site (using cron.php) everything seemed to work pretty well at the start. I was indexing about 100 nodes per cron run with no problems.
After several runs of cron, I started to get timeout errors (240 second timeouts) from php, so I reduced the number of search items to 50. Everything worked well for some time again and then I started to get the timeouts again. So I then reduced it to 20. Again, everything was happy for a while, but then, I started getting the timeouts. I reduced it to 10 - same thing happened again - after about 10-20 runs of cron.php, I started getting the timeouts.
So I can no longer run cron.php at the lowest setting (10) without getting the time outs. There are still 771 nodes to index (89%) but so far, I can't get any more to index no matter how many times I run cron.php (and time out).
I went through the database tuning documents to make sure I had things set up correctly. I'm using MySQL with the innodb engine (as recommended), query caching is turned on with a 16MB cache size. I've "optimized" the search_* tables. I've turned on query logging (both regular and slow queries). I've isolated the site so that no other operations are occuring other than the search indexing within cron.
At present, there are no log entries for any slow queries. The query log itself is quite large. In the query log, I see a very long list of insert/selects/updates that follow an obvious pattern. I see many of these types of queries
222 Query SELECT SUM(score) FROM search_index WHERE word = '742'
222 Query UPDATE search_total SET count = 0.0104654336782 WHERE word = '742'
222 Query INSERT INTO search_index (word, sid, type, score) VALUES ('grains', 6534,'node',4)
It would appear that the indexing gets slower and slower as the number of nodes that have been indexed grows. Since there are Drupal sites that have a lot more than 7000 nodes to index, I have to believe it is either some local issue or there is some optimization that I can't figure out.
Does anyone have any suggestions on how I can get the indexing to speed up so it can run to completion? And would such tricks help me as new nodes are added or if I need to reindex the site for some purpose?

found a slow query
I optimized all of the search tables again and reran the cron/indexer a few times. This time, I found a couple of entries for slow queries:
# Time: 070427 17:19:27
# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 11 Lock_time: 0 Rows_sent: 0 Rows_examined: 450938
use nutrition;
SELECT t.word AS realword, i.word FROM search_total t LEFT JOIN search_index i ON t.word = i.word WHERE i.word
IS NULL;
# Time: 070427 17:27:28
# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 11 Lock_time: 0 Rows_sent: 0 Rows_examined: 451605
SELECT t.word AS realword, i.word FROM search_total t LEFT JOIN search_index i ON t.word = i.word WHERE i.word
IS NULL;
These queries are called in search_update_totals(). But there really aren't a lot of these queries in the log.
The indexes for these tables seem to be in order but I noticed that the search_index table didn't have any primary keys defined. After poking around, I decided to try to assign the word, sid, and type fields as a primary key. Since the inserts into the search_index table make up almost 50% of the queries generated by the cron task, I figure anything to make those faster would be help.
After making the change to the database, the performance of the cron-initiated indexing for 10 nodes was reduced by more than 60%.
Does anyone see any issue or future problems with me defining word, sid, and type as the primary key for the search_index table?
Same problem here, though I
Same problem here, though I use MyISAM as backend. I only have 1650 items to be indexed and the indexing process renders slower and slower with each iteration. After I set 'word' to be the primary key it is slightly improving (can't set more primary keys, is that a InnoDB feature?).
Now, when I set it to '500' it takes 2m43 the first time. For the second run I've set it to '100' and it takes 2m34 (just a little bit faster). Setting it back to '500' will take 12m29 for the third run. Setting it back to 100 for the fourth run will take 10m ... still slow, don't you think?
This is on a hosting I can't tweak the mysql settings for. But 1650 items looks like a rather small site so why would I experience these drastic performance problems?
Could you clarify
When you added the primary key was the performance reduced by 60% or was the time used for the query reduced by 60%?
-----------------------------------------
Blog: Adding Understanding
Drupal Developer Search Engine