Queries on search admin and node indexing are slow for many-node sites [#312395]

Comment	File	Size	Author
#53	queries_on_search_admin-312395-53.patch	10.4 KB	manuel garcia
#47	explain2.png	15.66 KB	jhodgdon
#47	explain.png	15.28 KB	jhodgdon
#41	interdiff-312395-34-41.txt	8.61 KB	travis-bradbury
#41	indexing-slow-on-many-node-sites-312395-41.patch	11.2 KB	travis-bradbury
#41	indexing-slow-on-many-node-sites-312395-41-test-only.patch	7.79 KB	travis-bradbury
#34	interdiff-312395-31-34.txt	1.17 KB	travis-bradbury
#34	indexing-slow-on-many-node-sites-312395-34.patch	2.58 KB	travis-bradbury
#31	indexing-slow-on-many-node-sites-312395-31.patch	2.49 KB	travis-bradbury
#31	interdiff-312395-22-31.txt	2.52 KB	travis-bradbury
#22	indexing-slow-on-many-node-sites-312395-22.patch	2.61 KB	travis-bradbury

Comment #1

robertdouglass commented 25 September 2008 at 19:02

Sub.

Log in or register to post comments

Comment #2

slantview commented 25 September 2008 at 20:54

Have you tried converting your tables to INNODB ? Just wondering if there would be any performance gain doing that.

Log in or register to post comments

Comment #3

Wesley Tanaka commented 26 September 2008 at 04:50

I haven't tried INNODB

Log in or register to post comments

Comment #4

caole261188 commented 29 September 2008 at 01:40

subscribe

Log in or register to post comments

Comment #5

jhodgdon

she/her

English

commented 5 March 2010 at 18:45

Version:

6.4

» 7.x-dev

This needs to be fixed in Drupal 7 first, then backported to Drupal 6.

Log in or register to post comments

Comment #6

jhodgdon

she/her

English

commented 6 March 2010 at 00:27

I looked into this for Drupal 7, where the path is now admin/config/search/settings -- just to see what it was doing and verify it's the same in D7... (it is):

In both D6 and D7, the respective path's page callback is search_admin_settings(), and the offending queries are called from code that looks like this in D7 (very similar in D6 except the active modules bit and it's hook_search($op = 'status') instead of hook_search_status()):

 foreach(variable_get('search_active_modules', array('node', 'user')) as $module) {
    if ($status = module_invoke($module, 'search_status')) {
      $remaining += $status['remaining'];
      $total += $status['total'];
    }
  }

node_search_status() [D7] or node_search($op='status') [D6] contains the slow query.

Log in or register to post comments

Comment #7

jhodgdon

she/her

English

commented 6 March 2010 at 00:32

So there are a couple of things to consider here:
a) We'll probably be removing the WHERE -- see #239196: Indexing status shown on search settings page is incorrect
b) There is no index in the node table on field status -- there are just a couple of combined ones that include status and some other fields. Not sure why?
c) The only index in search.dataset is the primary key (combination of sid and type).

Maybe we could/should add indexes:
- status field for node
- sid, reindex, type for search_dataset (three different indexes)

I don't know enough about indexes and databases to know whether this would be a good idea or not?

Log in or register to post comments

Comment #8

jhodgdon

she/her

English

commented 6 March 2010 at 16:22

Title:

admin/settings/search is slow with many nodes

» Queries on search admin and node indexing are slow for many-node sites

I've just closed #312393: Performance: node_update_index() slow with large numbers of nodes as a duplicate of this issue, because it's pointing out that nearly the same query that happens during node_update_index() is also slow. The query in question is:

SELECT n.nid FROM node n LEFT JOIN search_dataset d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC LIMIT 0, 100;

This is the same query as above except that the one mentioned here for the Search admin page counts the entries, and this query finds the first N to index.

Log in or register to post comments

Comment #9

jhodgdon

she/her

English

commented 6 March 2010 at 16:23

Issue tags:

+Performance, +cron

adding tags

Log in or register to post comments

Comment #10

damien tournoud commented 6 March 2010 at 17:01

Counting will ever be slow, even if you add an index, as long as the dataset you are counting is large.

Those query do two things: select (or count the number of) nodes marked as needing reindexing (in search_dataset), select (or count the number of) nodes that have never been indexed (ie. are not yet in search_dataset).

There is only one real way to improve that, it is to make sure that every node has an entry in search_dataset (as identified by Wesley Tanaka in the other issue). This way you can limit the query to one table, and you can add the indexes to make it reasonably fast. This is basically what the Apachesolr integration module does.

Log in or register to post comments

Comment #11

jhodgdon

she/her

English

commented 6 March 2010 at 17:28

Really, there is only one way to improve it? I was under the impression that adding indices to database tables generally improves speed, even if there are joins involved? But I'm not any kind of expert in database optimization...

Log in or register to post comments

Comment #12

damien tournoud commented 6 March 2010 at 17:49

@jhodgdon: in the current situation, the table scan cannot be avoided: we ask the database to list all the rows in {node} that doesn't have an entry in {search_dataset}. The only way to do that is to scan {node}, lookup the value in {search_dataset} ([sid, type] is a primary key, so the database engine will use that) and return the row if no value is found. No index can improve the situation here. Adding a random index will only do one thing: reduce the insert performance.

Log in or register to post comments

Comment #13

geerlingguy commented 6 March 2010 at 19:14

Subscribe.

Log in or register to post comments

Comment #14

jhodgdon

she/her

English

commented 7 March 2010 at 15:25

Damien Tournoud: Thanks for the explanation... I thought that maybe having those indexes might help with the WHERE part of the operation, but I see your point that the main slowdown is in the join-with-nulls.

So.... Back to the idea of having the node module add each node to the search dataset table when it's first created....

Let's see.

node_update_index() currently finds nodes either never added to {search_dataset} or with their {search_dataset}.reindex bit set to something non-zero, and orders them ascending by .reindex. search_node_insert() and other functions that call search_touch_node() to indicate "this node needs a reindex" are currently putting REQUEST_TIME in .reindex whenever a node is edited.

So if during node creation, the node was added to search_dataset with .reindex=1, that could indicate "new node, high priority for reindex". The way to do that would be to make a search_node_update() function -- i.e. use hook_node_insert().

We would also need to modify the query in node_search_status(), where it calculates $remaining, to remove the join and make it look for .reindex > 1 instead of NULL or non-zero.

That should work, I think?

Log in or register to post comments

Comment #15

jhodgdon

she/her

English

commented 7 March 2010 at 15:32

Here's a separate issue that I noticed while thinking about this issue:
#735154: search_touch_node()/search_mark_for_reindex() should not update if already touched

Log in or register to post comments

Comment #16

dawehner

German

commented 19 May 2011 at 16:59

One big performance improvement can be done if the ORDER BY criterias can be dropped.

At least the node.nid part isn't required from my perspective.
The reindex orderning itself makes sense.

Log in or register to post comments

Comment #17

sun.core commented 22 September 2011 at 12:33

Issue tags:

+Administration

Log in or register to post comments

Comment #18

mgifford

he/him

English

commented 20 April 2013 at 13:27

Is this still likely to be an issue in D8?

Log in or register to post comments

Comment #19

jhodgdon

she/her

English

commented 22 April 2013 at 15:00

Version:

7.x-dev

» 8.x-dev

As far as I know, the queries are identical in Drupal 8.x, so yes.

Log in or register to post comments

Comment #20

jhodgdon

she/her

English

commented 14 April 2015 at 14:39

Issue summary:	View changes
Issue tags:		+Novice

We probably need to do this in 8 and it should be easy... maybe a good Novice issue?

Log in or register to post comments

Comment #21

julfabre commented 14 April 2015 at 14:46

Assigned:	Unassigned	» julfabre
Issue tags:		+drupaldevdays

Ok, I'm at Drupal Dev Day's and I take it !

Log in or register to post comments

Comment #22

travis-bradbury commented 16 June 2015 at 21:45

Status	File	Size
new	indexing-slow-on-many-node-sites-312395-22.patch	2.61 KB

Here's a start.

Before the patch nodes are identified for indexing by existing in the node table but not in search_dataset or existing in search_dataset with a non-zero reindex value.
This patch saves all new nodes to search_dataset with reindex=1. Queries to find nodes requiring indexing now use only the search_dataset table.

Behavior before the patch:
Created two new nodes.
/admin/config/search/pages indicates the status of nodes indexed (eg: 0 of 2 items indexed)
Ran cron.
Confirmed nodes with a non-zero reindex value have been indexed (now have data in search_dataset and reindex is 0).
/admin/config/search/pages now indicates all items have been indexed.

After the patch:
Created two new nodes.
/admin/config/search/pages indicates the status of nodes indexed (eg: 2 of 4 items indexed).
Ran cron.
Confirmed nodes with a non-zero reindex value have been indexed (now have data in search_dataset and reindex is 0).
/admin/config/search/pages now indicates all items have been indexed.

Log in or register to post comments

Comment #23

jhodgdon

she/her

English

commented 16 June 2015 at 21:48

Status:

Active

» Needs review

Thanks very much for the patch!

Don't forget to (a) assign the issue to yourself and (b) set the status to "needs review" when you upload a patch file.

So... This looks pretty reasonable. I'll set it to Needs Review to see if the tests fail. Then if that works, there are a couple of nitpicks to take care of with the patch in code/comments.

Log in or register to post comments

Comment #24

jhodgdon

she/her

English

commented 16 June 2015 at 21:59

Status:

Needs review

» Needs work

Cosmetic and coding standards things to fix up in this patch:

+++ b/core/modules/node/src/Entity/Node.php
@@ -122,6 +122,18 @@ public function postSave(EntityStorageInterface $storage, $update = TRUE) {
+      // add to search_dataset for performance when counting nodes to be indexed

Comments need to be sentences. The text is Ok but needs to start with capital letter and end with . -- this will probably make the line longer than 80 characters, so it will need to wrap.

+++ b/core/modules/node/src/Entity/Node.php
@@ -122,6 +122,18 @@ public function postSave(EntityStorageInterface $storage, $update = TRUE) {
+          'reindex' => 1, // high priority

we don't use comments on the ends of lines.

I guess I need to point you to the comment standards:

https://www.drupal.org/node/1354#inline

+++ b/core/modules/node/src/Plugin/Search/NodeSearch.php
@@ -415,7 +415,14 @@ public function updateIndex() {
+        "SELECT sid, sd.reindex FROM {search_dataset} sd"
+        . " WHERE sd.reindex <> 0 AND sd.type = :type"
+        . " ORDER BY sd.reindex ASC",
+        0,

I think we usually put operators on the ends of lines, not the beginning of the next line? Check the rest of Core and see.

Or better yet just make a longer line with the whole "" in one line. We do not have an 80-character restriction on code lines, only comments.

+++ b/core/modules/node/src/Plugin/Search/NodeSearch.php
@@ -481,7 +488,11 @@ public function markForReindex() {
+      "SELECT COUNT(DISTINCT sd.sid) FROM {search_dataset} sd"
+      . " WHERE sd.reindex <> 0 AND sd.type = :type",
+      array(':type' => $this->getPluginId()))

See previous comment.

Log in or register to post comments

Comment #25

jhodgdon

she/her

English

commented 16 June 2015 at 22:01

Also, since this is a Performance issue, I think it would be helpful if we could see how well this query performs under high volume.

Log in or register to post comments

Comment #26

17 June 2015 at 05:16

The last submitted patch, 22: indexing-slow-on-many-node-sites-312395-22.patch, failed testing.

Log in or register to post comments

Comment #27

jhodgdon

she/her

English

commented 17 June 2015 at 17:22

That seems to have been a general bot fail, retesting so we can see what the actual issues might be. Still Needs Work for #24

Log in or register to post comments

Comment #28

17 June 2015 at 17:22

Status:

Needs work

» Needs review

jhodgdon queued 22: indexing-slow-on-many-node-sites-312395-22.patch for re-testing.

Log in or register to post comments

Comment #29

jhodgdon

she/her

English

commented 17 June 2015 at 17:22

Status:

Needs review

» Needs work

Log in or register to post comments

Comment #30

17 June 2015 at 17:23

The last submitted patch, 22: indexing-slow-on-many-node-sites-312395-22.patch, failed testing.

Log in or register to post comments

Comment #31

travis-bradbury commented 17 June 2015 at 21:53

Assigned:

julfabre

» travis-bradbury

Status	File	Size
new	interdiff-312395-22-31.txt	2.52 KB
new	indexing-slow-on-many-node-sites-312395-31.patch	2.49 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	indexing-slow-on-many-node-sites-312395-22.patch	2.61 KB

Failing test might still be testbot (Setup environment: failed to clear checkout directory.) but I'll submit a new patch rather than retest because I cleaned up the nitpicks from #24.

For performance, I'm seeing significant improvement on my test site with 18200 nodes.

mysql's benchmark function before/after removing the join.

> SELECT BENCHMARK(10000, (SELECT COUNT(DISTINCT n.nid) FROM node AS n LEFT JOIN search_dataset AS sd ON sd.sid=n.nid AND sd.type='node_search' WHERE sd.sid IS NULL OR sd.reindex <> 0)) AS PRE;                                            
+-----+
| PRE |
+-----+
|   0 |
+-----+
1 row in set (0.29 sec)

> SELECT BENCHMARK(10000, (SELECT COUNT(DISTINCT sd.sid) FROM search_dataset AS sd WHERE sd.reindex <> 0 AND sd.type='node_search')) AS POST;                                                                                                
+------+
| POST |
+------+
|    0 |
+------+
1 row in set (0.01 sec)

I also saw an improvement in time per request using ab (some information removed from output for brevity).

$ ab -c1 -n100 -C "SESS=cookie" "http://drupal8-dev.localhost/admin/config/search/pages"

Server Software:        Apache/2.4.10
Document Path:          /admin/config/search/pages
Time taken for tests:   24.163 seconds
Requests per second:    4.14 [#/sec] (mean)

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   194  242  29.0    239     304
Waiting:      185  231  28.2    229     292
Total:        194  242  29.0    239     304

$ ab -c1 -n100 -C "SESS=cookie" "http://drupal8-dev.localhost/admin/config/search/pages"

Server Software:        Apache/2.4.10
Document Path:          /admin/config/search/pages
Time taken for tests:   14.748 seconds
Requests per second:    6.78 [#/sec] (mean)

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   109  147  24.9    147     247
Waiting:      100  137  24.0    136     233
Total:        109  147  24.9    147     248

Edit:
Forgot to mention an issue - nodes that are created but weren't indexed before applying the patch (ie, cron hasn't run yet or has run but didn't index everything before reaching its limit) will no longer be detected as requiring indexing.

Log in or register to post comments

Comment #32

travis-bradbury commented 17 June 2015 at 23:01

Status:

Needs work

» Needs review

Summoning testbot.

Edit: Okay, there is definitely a major problem with the patch. At first glance, I think a lot of tests are failing because they try and create a new node and it fails because there is no simpletest12345search_dataset table.

Log in or register to post comments

Comment #33

17 June 2015 at 22:17

Status:

Needs review

» Needs work

The last submitted patch, 31: indexing-slow-on-many-node-sites-312395-31.patch, failed testing.

Log in or register to post comments

Comment #34

travis-bradbury commented 18 June 2015 at 17:59

Status:

Needs work

» Needs review

Status	File	Size
new	indexing-slow-on-many-node-sites-312395-34.patch	2.58 KB
new	interdiff-312395-31-34.txt	1.17 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff-312395-22-31.txt	2.52 KB
hidden	indexing-slow-on-many-node-sites-312395-31.patch	2.49 KB

Locally, tests are passing if I require the search module in test classes or check that the search module is installed before trying to insert into {search_dataset}. I changed the patch to do the latter.

I'm not sure that it's the best solution. Would it be better to make a larger change, such as below?
a) Create a function similar to node_reindex_node_search that does the check for the search module and calls a function from the search module similar to search_mark_for_reindex.
b) Create functions to replace node_reindex_node_search and search_mark_for_reindex where the new functions update search_dataset.reindex where the node already exists or create a new entry in search_dataset where necessary.

Of the two, I think I'd prefer a) because it leaves detection of an update (versus a new node) to the node module, which already has if ($update).

That would still leave us with the problem of not-yet-indexed nodes no longer being detected for indexing after the patch is applied.

Log in or register to post comments

Comment #35

18 June 2015 at 18:22

Status:

Needs review

» Needs work

The last submitted patch, 34: indexing-slow-on-many-node-sites-312395-34.patch, failed testing.

Log in or register to post comments

Comment #36

jhodgdon

she/her

English

commented 18 June 2015 at 20:52

So... You know, the more I think about it, the more I think this approach is not going to work.

What happens if you have a site up and have some nodes in it already, and then you enable the (patched) Search module? The nodes that predate the Search module are not going to be in the index, and I think they'll never be found for indexing with the new queries, right?

Log in or register to post comments

Comment #37

travis-bradbury commented 18 June 2015 at 21:08

I've had a look at the last tests that failed.
\Drupal\search\Tests\SearchMultilingualEntityTest has a few tests that look for how many rows are in search_dataset to test indexing. The tests are wrong after the patch changes the when nodes are added to search_dataset. Fixing them should be straight forward.

What happens if you have a site up and have some nodes in it already, and then you enable the (patched) Search module? The nodes that predate the Search module are not going to be in the index, and I think they'll never be found for indexing with the new queries, right?

You're right. I'm not sure how to solve that. The only way I can see to guarantee every node has been indexed is to check both {node} with {search_dataset} and look for nodes not in {search_dataset}, which is exactly what we started with. Is there another approach?

Log in or register to post comments

Comment #38

travis-bradbury commented 18 June 2015 at 21:56

Regarding existing sites enabling the search module, what if we added all nodes to search_dataset during install of the search module? For each row in {node} we'd insert a row into {search_dataset}.

Enabling search suddenly on a site that already has many nodes should be less work than the original query (joining tables and looking for nulls), right?

Log in or register to post comments

Comment #39

travis-bradbury commented 19 June 2015 at 00:09

I was playing around and assuming others would be on board with my suggestion in #38, it looks like it'd be reasonably quick.

This statement inserts a row into search_dataset for every node and it could be used in search's hook_install().

MariaDB [drupal8_dev]> INSERT INTO search_dataset (sid, langcode, reindex, data, type) SELECT nid, langcode, 1, '', 'node_search'
 FROM node;                                                                                                                      Query OK, 18200 rows affected (1.16 sec)

I'm sure that the performance of 1.16 seconds for 18,000 nodes would vary a lot in other environments, but even a extra few seconds to enable a module seems fair to me. If you're enabling search in the middle of a site that already has tens of thousands of nodes - or more - you've got bigger problems to worry about, like actually getting them all indexed, right?

Log in or register to post comments

Comment #40

jhodgdon

she/her

English

commented 19 June 2015 at 13:49

I think that is a fine idea. We would need to make sure the INSERT was using ANSI SQL syntax so that it would work in all databases. We would also need a test for this new behavior that would create a couple of nodes, then enable the Search module, run indexing (you can see how other tests are doing that part), and verify those nodes show up in search results. As a bonus you could test the search status just after enabling Search (should say 2 items to index or however many nodes you made) and after indexing (should say 100%).

Great progress, thanks tbradbury!

Log in or register to post comments

Comment #41

travis-bradbury commented 1 July 2015 at 22:49

Status:

Needs work

» Needs review

Status	File	Size
new	indexing-slow-on-many-node-sites-312395-41-test-only.patch	7.79 KB
new	indexing-slow-on-many-node-sites-312395-41.patch	11.2 KB
new	interdiff-312395-34-41.txt	8.61 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	indexing-slow-on-many-node-sites-312395-34.patch	2.58 KB
hidden	interdiff-312395-31-34.txt	1.17 KB

Added search_install() to search.install to populate {search_dataset} with necessary information from {node}.
Wrote tests to:
- Ensure correct count of nodes needing to be indexed before and after indexing.
- Ensure indexed nodes actually appear in search results.
- Ensure nodes created before enabling the search module are queued for indexing.
Updated tests in SearchMultilingualEntityTest to reflect correct counts after changing the module's behavior.

The insert with select statement syntax used in search.install is supported by sqlite and PostgreSQL. I tested it with SQLite 2.8.17 to be sure and I understand that Drupal requires 3.6.8 (up from 3.3) so I think we should be good for compatibility.

Log in or register to post comments

Comment #42

1 July 2015 at 22:19

The last submitted patch, 41: indexing-slow-on-many-node-sites-312395-41-test-only.patch, failed testing.

Log in or register to post comments

Comment #43

jhodgdon

she/her

English

commented 2 July 2015 at 14:26

Status:

Needs review

» Needs work

I'm a bit worried about this solution. We've had an expectation since Drupal Ancient History that the search index tables were data that could be recreated:
- If a site's search database seems corrupted (which can happen), the usual solution you'd find all over our docs and the Internet is "Clear out these search database tables and run cron until they're all created again". This would no longer be possible, because it would never be recreated properly.
- The default settings for modules like Backup and Migrate is to create but not populate the search tables, for the same reason -- because their data can be recreated (or at least it could be until this patch).
- We have an issue out requesting a feature of a Clear Index button, to get rid of corruption.
- Etc.

So I am not all that comfortable with modifying that expectation, as this patch does.

But I had an idea: instead of doing this in search_install(), which is a one-time thing and really it shouldn't be assuming knowledge of the NodeSearch plugin anyway... What if we instead put a similar query at the beginning of each search indexing run? It might slow down the indexing again, but it would at least make it so that on the search admin page, the query could be faster as in this patch.

Then we could also put something on the screen that says that counts of completeness are only accurate as of the last cron run.

Thoughts?

Log in or register to post comments

Comment #44

jhodgdon

she/her

English

commented 2 July 2015 at 14:39

Also one other point on this:

It is entirely possible (and even likely) to enable the core Search module but not enable/define any NodeSearch pages (you could be using a contrib module that defines its own search type). In this case, without this patch, nodes would never be added to the node_search index, because the indexing methods on NodeSearch are not called if there is no NodeSearch page defined.

So this patch, as it is, would slow down every node save operation by adding an unnecessary entry to the search database, and also would grow the search database unnecessarily, since it really shouldn't have any nodes in it at all in this case.

So... really I think we just need to do what I suggested in #43: at the beginning of search indexing in each cron run, add nodes so that the queries don't have to do the NULL check.

Log in or register to post comments

Comment #45

travis-bradbury commented 2 July 2015 at 22:13

I'm definitely on board with not changing the expectation that the search index can be wiped out and re-created. The patch relies on all nodes being added on creation of on install of the search module, which is definitely more fragile for the reasons you pointed out.

So, we're looking at:

Updating the list of nodes to be indexed every cron run.
Query only {search_dataset} for the admin page, which means it'd be fast but only accurate as of the last cron run.

Are we left with the original slow query for each cron run? A reasonably quick way to update {search_dataset} with new nodes is INSERT IGNORE INTO search_dataset (sid, langcode, reindex, type, data) SELECT nid, langcode, 1, 'node_search', '' FROM node. On my test, it took 0.16s to add 150 new rows to {search_dataset} where {node} had 18350 rows (versus 1.22 seconds to take an empty {search_dataset} and copy every id from {node} into it). The drawback to INSERT IGNORE is that it ignores more than just duplicate keys. I'm not sure the other ignored cases are a concern to us, but we'd also need to worry about other databases (MySQL/MariaDB uses INSERT IGNORE but sqlite appears to use INSERT OR IGNORE).

Do we still have people complaining of poor performance on these pages? I've seen issues similar to this one created around 2008 and Wesley Tanaka's blog mentioned in the issue summary which has a number of comments from people having major issues (queries too slow to finish before the connection times out - and they didn't have an astronomical number of nodes). If people still run into real-world problems it'd be interesting to know what their environment is. So far, I haven't really been able to replicate it. With 18,000 nodes on my test site I saw a pretty good increase in time to load the admin page (comment 31, 240ms before/140ms after) but I'm not seeing anything like the 60-second queries that people were describing years ago.

Next, I'll try and put together a patch that speeds up the admin page, leaves cron with the slow-but-reliable query, and tests that the admin page's index status is correct.

Log in or register to post comments

Comment #46

jhodgdon

she/her

English

commented 3 July 2015 at 03:16

Hm, I'm confused...

I think in order to have the admin page even be correct as of the last cron run, the first thing we need to do in each cron run is make sure there's an entry in search_dataset for every missing node. Right?

I don't think we want to do an INSERT IGNORE though... that could (as you point out) cause other problems.

So let's think. The slow query in the node update search index function that was reported on the other issue marked as a duplicate of this one (#312393: Performance: node_update_index() slow with large numbers of nodes) was:

SELECT n.nid FROM node n LEFT JOIN search_dataset d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC LIMIT 0, 100;

And the reporter said it was slow because it "... does a full table scan on the {node} table, and also has the painful "Using temporary; Using filesort" "

But to do the insert, we don't need the ORDER BY, we would just need something like (probably have the syntax wrong, but something like this):

INSERT INTO search_dataset FROM SELECT n.nid as sid, 'node_search' as type, 1 as reindex FROM node n LEFT JOIN search_dataset ON d.type = 'node_search' AND d.sid = n.nid WHERE d.sid IS NULL

That doesn't seem like it would be as bad as the original query, but maybe... probably would need to do an EXPLAIN on it to see what it does (after fixing my likely wrong syntax)...

Also as a note, these problems were reported on sites with millions of nodes, not thousands.

Log in or register to post comments

Comment #47

jhodgdon

she/her

English

commented 21 July 2015 at 18:40

Issue summary:

View changes

Status	File	Size
new	explain.png	15.28 KB
new	explain2.png	15.66 KB

I did an explain on the SELECT part of this query (note: it had a small typo):

EXPLAIN SELECT n.nid AS sid, 'node_search' AS
TYPE , 1 AS reindex
FROM node n
LEFT JOIN search_dataset d ON d.type = 'node_search'
AND d.sid = n.nid
WHERE d.sid IS NULL

It looks pretty good:
output of explain query above

As opposed to the query we're trying to get rid of:

EXPLAIN SELECT n.nid
FROM node n
LEFT JOIN search_dataset d ON d.type = 'node'
AND d.sid = n.nid
WHERE d.sid IS NULL
OR d.reindex <>0
ORDER BY d.reindex ASC , n.nid ASC

output of second explain query above

So that seems like a good approach.

@tbradbury, do you still want to work on this?

Log in or register to post comments

Comment #48

travis-bradbury commented 21 July 2015 at 18:47

Sorry for being absent. I'm still happy to work on it. I'm a bit short on time but I should be able to create another patch in a couple days.

Log in or register to post comments

Comment #49

jhodgdon

she/her

English

commented 21 July 2015 at 18:49

Great, thanks! I've been rather busy lately too -- anyway, no need to apologize. Your help, whenever it comes, is much appreciated. :)

Log in or register to post comments

Comment #50

21 July 2015 at 18:49

Version:

8.0.x-dev

» 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #51

21 July 2015 at 18:49

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #52

michaellenahan commented 28 September 2016 at 16:47

Version:	8.2.x-dev	» 8.3.x-dev
Assigned:	travis-bradbury	» Unassigned
Issue tags:	-drupaldevdays	+Dublin2016, +Needs reroll

I looked at this at the novice issues triage at Dublin2016. Might be a good issue for someone new to Drupal but with database knowledge. One thing to consider is if a database update is required.

Log in or register to post comments

Comment #53

manuel garcia commented 19 November 2016 at 15:32

Status:	Needs work	» Needs review
Issue tags:	-Needs reroll

Status	File	Size
new	queries_on_search_admin-312395-53.patch	10.4 KB

Rerrolled #41.

Manually Fixed conflict on core/modules/node/src/Plugin/Search/NodeSearch.php - updateIndex() now looks to already have the change intended in the previous patch, so I have not made any changes to that method. Needs review ;)

Log in or register to post comments

Comment #54

19 November 2016 at 16:16

Status:

Needs review

» Needs work

The last submitted patch, 53: queries_on_search_admin-312395-53.patch, failed testing.

Log in or register to post comments

Comment #55

19 November 2016 at 16:16

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #56

19 November 2016 at 16:16

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #57

19 November 2016 at 16:16

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #58

19 November 2016 at 16:16

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #59

19 November 2016 at 16:16

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #60

19 November 2016 at 16:16

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #61

19 November 2016 at 16:16

Version:

8.9.x-dev

» 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #62

19 November 2016 at 16:16

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Log in or register to post comments

Comment #63

19 November 2016 at 16:16

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #64

19 November 2016 at 16:16

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #65

kim.pepper

English

🏄‍♂️🇦🇺Sydney, Australia

commented 27 April 2022 at 05:43

Category:

Bug report

» Task

Reviewed as part of the Bugsmash initiative. As this is a performance optimisation, I believe we can re-classify this as a task.

Also, after a quick scan of the patch in #53 I think the implementation needs to happen in the Search module, not Node.

Log in or register to post comments

Comment #66

kim.pepper

English

🏄‍♂️🇦🇺Sydney, Australia

commented 27 April 2022 at 22:11

Issue tags:

+Bug Smash Initiative

Log in or register to post comments

Comment #67

27 April 2022 at 22:11

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #68

27 April 2022 at 22:11

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #69

27 April 2022 at 22:11

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #70

quietone commented 5 August 2024 at 00:49

Issue tags:

-Novice

There doesn't seem to be novice type work here right now.

Log in or register to post comments

Comment #71

hablat commented 14 April 2025 at 17:12

This issue is also triggered when visiting normal /admin/config page and causes timeout on sites with lots of nodes. See
https://www.drupal.org/project/drupal/issues/3336621#comment-16067058

Log in or register to post comments

Comment #72

14 April 2025 at 17:12

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Comment #73

quietone commented 1 April 2026 at 04:00

Status:

Needs work

» Postponed

The Search Module was approved for removal in #3476883: [Policy, no patch] Move Search module to contrib .

This is Postponed. The status is set according to two policies. The Remove a core extension and move it to a contributed project and the Extensions approved for removal policies.

The deprecation work is in #3565780: [meta] Tasks to deprecate the Search module and the removal work in #3565783: [meta] Tasks to remove the Search module.

Search will be moved to a contributed project before Drupal 12.0.0 is released.

Log in or register to post comments

Queries on search admin and node indexing are slow for many-node sites

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Comments