db_rewrite_sql() breaks when working with SQLs that has subqueries because it assumes there is only 1 instance of WHERE, GROUP, ORDER, LIMIT keywords, while a subquery can of course include any of these keywords too, which breaks db_rewrite_sql().
I've stabbed the problem with this preliminary patch. To describe the patch approach in few words, it doesn't change db_rewrite_sql() internal SQL processing, but strips out all subqueries before processing, then add them again after processing. Not sure if that's a good approach or if the patch in its current status is worthy of core. But it's here for your review and in case someone needs it. Thank you.
Note: I hope this is considered a bug. I know Drupal core doesn't use subqueries so far, but as far as I know the only reason for this is to keep compatibility with MySQL 4.0 but I hope this compatibility shouldn't mean developers using later versions of MySQL won't be able to take advantage of it.
Comment | File | Size | Author |
---|---|---|---|
#2 | db_rewrite_sql_subqueries2.patch | 2.42 KB | AmrMostafa |
db_rewrite_sql_subqueries.patch | 2.17 KB | AmrMostafa | |
Comments
Comment #1
David StraussDrupal 6 does not support MySQL 4.0, so subqueries are technically acceptable. However, the subquery optimizer in MySQL is so poor that they're not very usable. Generally, the subqueries that do optimize properly can be rewritten as joins.
Comment #2
AmrMostafa CreditAttribution: AmrMostafa commentedFixing this bug doesn't necessarily mean we must use subqueries somewhere, it's simply just *allowing* its usage. (It's standard ANSI SQL after all).
Thanks for the tip though :-). I don't know much about subqueries performance, and I used them in cases where it looked like there no other simple ways to accomplish the same result (It was for calculating a record RANK in a set of records based on some factor. e.g. think of nodes with votings, and you want to display a table with every node and its rank. where rank is calculated by summing all the votes of a node and checking its position amongst other nodes).
Another patch (same approach) with some fixes.
Comment #3
David StraussAny time a sort if done on a calculated value, the calculated value should be cached and indexed (denormalized) to maintain performance. There are few things more painful in a SELECT query than an ORDER BY on calculated criteria.
That said, you can usually calculate ratings using LEFT JOIN, GROUP BY, and aggregate functions. It's just frighteningly slow.
Comment #4
devin.gardner CreditAttribution: devin.gardner commentedThis desperately needs to be fixed in Drupal 5.x. As you know, there are many, many things that can only be accomplished by subqueries if you have to provide SQL fragments through an API (especially views). With Drupal, many times to do complicated things without hacking other people's code, you must supply all the information for a field in a SELECT as part of the field itself, or you must put a subquery in as part of the WHERE clause, because you're including an additional table for some reason other than matching fields. This comes up very often when making custom fields and filters for the Views module, and there's no other way around it. A subquery must be used, regardless of how "quickly it performs".
I hope there's a solution to making db_rewrite_sql() handle subqueries without including a full-fledged parenthesis parser in it, though. Does anyone have any ideas?
Comment #5
David StraussNo, this doesn't need to be fixed for Drupal 5. Drupal 5 supports MySQL 4.0. So, support for MySQL 4.1+ features (like subqueries) should not be expected from Drupal 5.
Comment #6
devin.gardner CreditAttribution: devin.gardner commentedYou know, I should have looked at that patch before posting my reply, because sure enough someone has already implemented a parser.
I put it into Drupal 5.1 and it fixes several Views pages of mine that were previously giving SQL syntax errors.
But your new function _db_sql_subqueries() terminates in an infinite loop when trying this query (also from a Views page):
SELECT node.nid, (CASE WHEN (SELECT DISTINCT oga_1.nid FROM og_ancestry oga_1 WHERE oga_1.nid = node.nid) THEN 1 ELSE 0 END) AS group_membership, (CASE WHEN (SELECT DISTINCT oga_2.nid FROM og_ancestry oga_2 WHERE oga_2.nid = node.nid) THEN (SELECT oga_3.group_nid FROM og_ancestry oga_3 WHERE oga_3.nid = node.nid ORDER BY oga_3.group_nid LIMIT 1) ELSE 0 END) AS group_membership_nid, node.title AS node_title, node.changed AS node_changed, (CASE WHEN node.type = 'poll' THEN (SELECT COUNT(*) FROM poll_votes v WHERE node.nid = v.nid) ELSE (SELECT COUNT(*) FROM votingapi_vote v WHERE node.nid = v.content_id AND v.value_type = 'option') END) AS vote_count, users.name AS users_name, users.uid AS users_uid, node.created AS node_created, (CASE WHEN node.type = 'poll' THEN (SELECT p.active FROM poll p WHERE node.nid = p.nid) ELSE (SELECT p.active FROM advpoll p WHERE node.nid = p.nid) END) AS active, votingapi_cache_vote_percent_average.value AS votingapi_cache_vote_percent_average_value, node_comment_statistics.comment_count AS node_comment_statistics_comment_count FROM node node INNER JOIN users users ON node.uid = users.uid LEFT JOIN votingapi_cache votingapi_cache_vote_percent_average ON node.nid = votingapi_cache_vote_percent_average.content_id AND votingapi_cache_vote_percent_average.content_type = 'node' AND votingapi_cache_vote_percent_average.value_type = 'percent' AND votingapi_cache_vote_percent_average.tag = 'vote' AND votingapi_cache_vote_percent_average.function = 'average' LEFT JOIN node_comment_statistics node_comment_statistics ON node.nid = node_comment_statistics.nid WHERE (node.status = '1') AND (node.type IN ('advpoll_binary','advpoll_ranking','poll')) ORDER BY node_created DESC
I've just started looking at the issue now.
Comment #7
devin.gardner CreditAttribution: devin.gardner commentedYour second patch fixes the infinite loop. It appears to work correctly for everything I've thrown at it, good work.
Comment #8
moshe weitzman CreditAttribution: moshe weitzman commentedsubscribe
Comment #9
chx CreditAttribution: chx commentedAnd I read the patch and I find it ok and well commented. So, RTBC.
Comment #10
Gábor HojtsyErm, what if a subquery does a SELECT on the node table or something, and I have node permissions. This patch just moved it out of the way for the rewrite, the the subquery will not obey the permissions. Is this the case?
Comment #11
nedjoLooks like Gabor is right and we need to ensure that the subqueries are also rewritten. Do we need then to feed the subqueries back through db_rewrite_sql?
Comment #12
nedjoOn second thought, the best approach is probably just to document the need to pass subqueries through
db_rewrite_sql()
independently.We can't have
db_rewrite_sql()
rewrite subqueries automatically without a lot of new work, since e.g. the table alias might differ between a main query and a subquery.But what about something like this?
If that works, we're fine and the patch can go in as is.
alienbrain or devin.gardner, can you test to confirm that this 2-step approach works (first rewrite the subquery, then pass the result for rewriting)? If so I think this can go back to RTBC status.
Comment #13
colanSubscribing.
Comment #14
AmrMostafa CreditAttribution: AmrMostafa commentednedjo, that's what I had in mind too. I think that's the best approach.
The real win we can get out of making db_rewrite_sql() recursively rewriting subqueries on its own is that it would be possible to make it aware of content i.e. when the outer query of a subquery is a critical piece of information for the rewriting of that subquery, I think this can be a D7 functionality.
Comment #15
AmrMostafa CreditAttribution: AmrMostafa commenteds/content/context/
Comment #16
moshe weitzman CreditAttribution: moshe weitzman commenteddowngrading. we could ship without this if noone works on it (has not been touched for a month)
Comment #17
webernet CreditAttribution: webernet commented#151910: Support subqueries, db_rewrite_sql broken
Comment #18
abautu CreditAttribution: abautu commentedHi,
In my opinion, using regular expression for db queries it's not a good idea (mainly, for performance reasons, but not only that).
I've played a little with D5 db_rewrite_sql and here's the result. The basic idea is that I tried to rewrite only the main query without touching any subquery:
In case you are interested, here are some tests:
Description: All fields for all nodes
Original query: SELECT * FROM {node} node
Drupal rewrite: SELECT * FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner')))
Patched rewrite: SELECT * FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner')))
Same: yes, Drupal works: yes (0.063778877258301 sec), Patched works: yes (0.06314492225647 sec)
Description: Node IDs for all nodes
Original query: SELECT nid FROM {node} node
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner')))
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner')))
Same: yes, Drupal works: yes (0.042663097381592 sec), Patched works: yes (0.043194055557251 sec)
Description: Node IDs for uid=1 nodes
Original query: SELECT nid FROM {node} node WHERE node.uid=1
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid=1)
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid=1)
Same: yes, Drupal works: yes (0.054802894592285 sec), Patched works: yes (0.054560899734497 sec)
Description: Node IDs for all nodes ordered
Original query: SELECT nid FROM {node} node ORDER BY nid
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) ORDER BY nid
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) ORDER BY nid
Same: yes, Drupal works: yes (0.043607950210571 sec), Patched works: yes (0.044195175170898 sec)
Description: Node IDs for all nodes limited
Original query: SELECT nid FROM {node} node LIMIT 1
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) LIMIT 1
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) LIMIT 1
Same: yes, Drupal works: yes (0.0012619495391846 sec), Patched works: yes (0.0015439987182617 sec)
Description: Node IDs for uid=1 nodes ordered by nid
Original query: SELECT nid FROM {node} node WHERE node.uid=1 ORDER BY node.nid
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid=1 ) ORDER BY node.nid
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid=1 ) ORDER BY node.nid
Same: yes, Drupal works: yes (0.055071830749512 sec), Patched works: yes (0.05460000038147 sec)
Description: Node count per user for all nodes
Original query: SELECT COUNT(*) FROM {node} node GROUP BY node.uid
Drupal rewrite: SELECT COUNT(*) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) GROUP BY node.uid
Patched rewrite: SELECT COUNT(*) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) GROUP BY node.uid
Same: yes, Drupal works: yes (0.056019067764282 sec), Patched works: yes (0.055269002914429 sec)
Description: Node count per user for uid<>1 nodes
Original query: SELECT COUNT(*) FROM {node} node WHERE node.uid<>1 GROUP BY node.uid
Drupal rewrite: SELECT COUNT(*) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid<>1 ) GROUP BY node.uid
Patched rewrite: SELECT COUNT(*) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid<>1 ) GROUP BY node.uid
Same: yes, Drupal works: yes (0.055700063705444 sec), Patched works: yes (0.056375026702881 sec)
Description: Node IDs for all nodes, except belonging to last user (NOT IN)
Original query: SELECT nid FROM {node} node WHERE node.uid NOT IN (SELECT MAX(users.uid) FROM {users} users)
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT MAX(users.uid) FROM {users} users))
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT MAX(users.uid) FROM {users} users))
Same: yes, Drupal works: yes (0.059988021850586 sec), Patched works: yes (0.060559988021851 sec)
Description: Node IDs for all nodes, except belonging to last user (<>)
Original query: SELECT nid FROM {node} node WHERE node.uid<>(SELECT MAX(users.uid) FROM {users} users)
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid<>(SELECT MAX(users.uid) FROM {users} users))
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid<>(SELECT MAX(users.uid) FROM {users} users))
Same: yes, Drupal works: yes (0.060837030410767 sec), Patched works: yes (0.061212062835693 sec)
Description: Node IDs for users whose name doesn't start with a
Original query: SELECT nid FROM {node} node WHERE node.uid NOT IN (SELECT users.uid FROM {users} users WHERE users.name LIKE "a%")
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT users.uid FROM {users} users INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( users.name LIKE "a%"))
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT users.uid FROM {users} users WHERE users.name LIKE "a%"))
Same: no, Drupal works: no, Patched works: yes (0.076982021331787 sec)
Description: Node IDs for nodes not belonging to users with username like a*
Original query: SELECT nid FROM {node} node WHERE node.uid NOT IN (SELECT users.uid FROM {users} WHERE users.name LIKE "a%")
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT users.uid FROM {users} INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( users.name LIKE "a%"))
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT users.uid FROM {users} WHERE users.name LIKE "a%"))
Same: no, Drupal works: no, Patched works: yes (0.076756000518799 sec)
Description: Node IDs for nodes not belonging to users with less than 10 post
Original query: SELECT nid FROM {node} node WHERE node.uid NOT IN (SELECT node.uid FROM {node} node GROUP BY node.uid HAVING COUNT(*)<10)
Drupal rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT node.uid FROM {node} node ) GROUP BY node.uid HAVING COUNT(*)<10)
Patched rewrite: SELECT DISTINCT(node.nid) FROM {node} node INNER JOIN {node_access} na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( node.uid NOT IN (SELECT node.uid FROM {node} node GROUP BY node.uid HAVING COUNT(*)<10))
Same: no, Drupal works: no, Patched works: yes (6.1381869316101 sec)
I know that most of these queries can be rewriten with joins or other way, but think about views filters with multiple CCK nodereference fields or stuff like that.
Cheers,
Andrei
Comment #19
mtraherne CreditAttribution: mtraherne commentedI have tested Abautu's code above, in relation to the subquery I wrote in relation to this issue: #685824: Filter by Author Role "IS NONE OF" and it seems to be working fine (so far). Thanks Abautu!
Not being funny, but aren't sub queries pretty standard by now? Why not just up the requirements for drupal? This would seem like quite a core issue to me, especially with all new users downloading 6 (or even 7 soon). I can't think of a good reason not to upgrade MySQL if you are an existing user apart from lazyness. Call me a cynic if you like!