Performance: node_update_index() slow with large numbers of nodes
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | search.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | Performance |
I've been looking into using Drupal with millions of nodes. One problem: the search indexer is slow. This tends to cause CPU quota errors on shared hosting every time cron runs.
One of the slow queries is in node_update_index():
SELECT n.nid FROM node n LEFT JOIN search_dataset d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC LIMIT 0, 100;
Which does a full table scan on the {node} table, and also has the painful "Using temporary; Using filesort"
It's possible to fix this with two changes:
1. Adding an index to search_dataset to avoid the filesort
2. Ensuring that every node in the database has a corresponding row in search_dataset
Change #2 would have the disadvantage of an extra database write every time a node is created. That may or may not be an acceptable tradeoff to the core maintainers.

#1
Sub.
#2
Subscribe... but I don't know if this will ever be fixed.
#3
Hi
Could any of you give me more details on how to fix this problem? I have little experience with mysql and the following two actions are not clear enough for me.
>1. Adding an index to search_dataset to avoid the filesort
>2. Ensuring that every node in the database has a corresponding row in search_dataset
Thanks in advance
#4
You can find instructions at
http://wtanaka.com/drupal/million-nodes-6
under the bug 312393 heading
That section header was previously mislabeled as bug 312939, but it's labeled correctly now.
#5
Hi
A few days ago I I already tried to fix bug 312939, according to the instructions at http://wtanaka.com/drupal/million-nodes-6.
As I posted to the blog at that page, at step 3, I got the following error:
INSERT INTO drupal_search_dataset( sid,
TYPE , reindex )
SELECT nid AS sid, 'node' AS
TYPE , 2147483647 AS reindex
FROM drupal_node
MySQL said:
#1062 - Duplicate entry '1-node' for key 1
I have no idea on what's wrong. Any clue?
I am really interested in solving this problem because, since a few weeks ago, the search capabilities of my system are blocked (more than 10,000 files attached. I also commented the problem at http://wtanaka.com/drupal/million-nodes-6).
I am considering other solutions different to core search (plus search files module, such as apache solr + solr attachment, but I have not been able to make it work correctly yet.
Thanks in advance and regards
#6
I see it now and have answered you there: http://wtanaka.com/drupal/million-nodes-6#comment-5941
#7
Here is a first pass at a patch for D7. Sooner or later, the tests will fail, because it uses a hook_update_70XX with the same number as the patch in #336483: Performance: SELECT MAX(comment_count) FROM node_comment_statistics does full table scan.
#8
#9
Just tagging appropriately
#10
*facepalms*
... and I forgot to attach the actual patch.
#11
The last submitted patch failed testing.