Hooks node_type, taxonomy and user knocks out our database server
| Project: | Apache Solr Search Integration |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
The implementations of hook_node_type, hook_user and hook_taxonomy in apachesolr.module completely knocks out our MySQL-server.
For example when someone commits a change to a term, this code can swamp the MySQL-server with very slow queries. We have seen queries like this running for 5+ hours on our test server.
Troublesome queries:
db_query("UPDATE {apachesolr_search_node} SET changed = %d WHERE nid IN (SELECT nid FROM {node} WHERE uid = %d)", time(), $account->uid);
db_query("UPDATE {apachesolr_search_node} SET changed = %d WHERE nid IN (SELECT nid FROM {term_node} WHERE tid = %d)", time(), $edit['tid']);
db_query("UPDATE {apachesolr_search_node} SET changed = %d WHERE nid IN (SELECT nid FROM {node} WHERE type = '%s' OR type = '%s')", time(), $info->old_type, $info->type);This server is a dual quad core xeon with 8GB memory and 15K rpm disks, with MySQL optimized with our standard settings.
The Drupal installation is of medium size - around 1.6 million nodes and 2.4 million term node connections.
We have changed these queries to this:
db_query("UPDATE {UPDATE apachesolr_search_node} a INNER JOIN {node} b ON a.nid = b.nid SET a.changed = %d WHERE b.uid = %d", array(time(), $account->uid));
db_query("UPDATE {UPDATE apachesolr_search_node} a INNER JOIN {term_node} b ON a.nid = b.nid SET a.changed = %d WHERE b.tid = %d", array(time(), $edit['tid']));
db_query("UPDATE {apachesolr_search_node} a INNER JOIN {node} b ON a.nid = b.nid SET a.changed = %d WHERE (b.type = '%s' OR b.type = '%s')", array(time(), $info->old_type, $info->type));For us this is a critical flaw, because it renders the whole system unusable if the queries are not changed.

#1
We're certainly open to optimizing them - but I'm wondering if that syntax is MySQL only?
Also, how many nodes do you have - I imagine it must be very large?
#2
What part of that syntax do you consider to be MySQL only?
I would guess that other database systems supports joins of type inner join. Especially since a lot of Drupal core uses joins by default.
But optimizing or not, it seems that the current queries simply don't work on any Drupal installation of a certain size for those that use MySQL.
This site has 1.6 mill nodes and 2.4 million term node connections.
We are already setting up some other sites for Apache Solr support, with 5+ million nodes. But it's a bit tiresome to always have to rewrite "stock modules".
#3
Ah would love to see a patch (drupal.org/patch) and would love to see the EXPLAIN for before and after.
The debate for and against subqueries/JOINS continues on!
#4
it doesn't look like postgres supports this syntax
#5
Looks like Postgres can use a similar but incompatible syntax like:
UPDATE {apachesolr_search_node} sn SET changed = %d FROM {node} n WHERE sn.nid=n.nid AND n.uid = %dhttp://www.postgresql.org/docs/8.0/static/sql-update.html
#6
I'm not trying to start a join vs. subqueries discussion.
Just voicing my experience with the current code and MySQL. If 90% of all Drupal installations are on MySQL (anyone have any numbers?), having queries which are highly unoptimized for MySQL does not seem to be overly clever.
Scott Reynolds, is there a way to do an EXPLAIN on an update query with subquery?
#7
Discussed this with David Strauss - MySQL might handle this better in MySQL 6, but likely to support this better in the module we'll need to switch on the DB type. I have no idea how that will work in D7.
#8
Nope, apparently I hadn't had my coffee yet before posting that comment :-D
Incompatible how? Does it accomplish exactly what the original query was doing? I believe MySQL supports that syntax as well. http://dev.mysql.com/doc/refman/5.0/en/update.html
#9
Mysql orders the table references differently and does not use the "FROM" keyword. I tried it before posting the above. MySQL can do something like:
UPDATE {apachesolr_search_node} sn, {node} n SET changed = %d WHERE sn.nid=n.nid AND n.uid = %dcompared to Postgres, where the table being updated is the only one before the SET:
UPDATE {apachesolr_search_node} sn SET changed = %d FROM {node} n WHERE sn.nid=n.nid AND n.uid = %d#10
I've spent some time trying to find ways to make this work.
It seems that MySQL really has some problems with "WHERE IN (subquery)" because of it's query optimizing algorithms. Found some more technical documentation, where someone proved that doing a "WHERE IN (subquery (subquery))" would force MySQL to create a temporary result set for the inner most query. Didn't read too much into it, because it seems to me that would just be trying to solve the problem in a "wrong way" :)
But I am curious.
Does someone have a large data set to test these queries in PostgreSQL or any other database?
Is this the proper way to write such queries - query with subquery?
#11
Looks like we need to switch on DB type if we want to support this - anyone want to roll a patch and test it?
#12
David Straus mentions that MySQL 6 will handle this bettter, but that's a ways off: http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6
#13
let's get this fixed before the next release