With the new 6.x mysql requirements (see 105855), we can now use sub-queries instead of temporary tables. Drupal currently does not allow sub-queries. And the implementation of db_rewrite_query actually prevents them from working.

Should we allow sub-queries? I hope so. Sub queries will eliminate many needs for db_query_temporary, which in turn will allow these queries to be cached. There's a performance gain to be had in making this work. The one I've been working on, is in the search engine. I think that if we can get this to work, page 2 3 4, etc, will use cached results. As it works now, going from page to page of the search results has to re-run the entire query.

Although there are other use cases, I'm only concerned with sub-queries as part of a JOIN. This is a somewhat simplified example of the actual query:

SELECT node.nid, temp_vfs.score, temp_vfs.score AS score FROM node node INNER JOIN (SELECT n.nid, i.score FROM node n LEFT JOIN search_index i ON n.nid=i.sid WHERE i.fromsid=0 AND i.word IN ('lorem') GROUP BY n.nid HAVING COUNT(*)=1) temp_vfs ON node.nid = temp_vfs.nid WHERE node.status = '1'

The problem is that db_rewrite_query does a blind str_replace on the WHERE in this statement and adds the rewritten JOIN clause to both uses of WHERE. It's not a simple problem. db_rewrite_query also looks for GROUP and LIMIT which can occur in the sub-query.

As this currently works, when the user is not logged in, node_access adds JOIN information, and I end up with the following incorrect query (simplified again):

SELECT node.nid, temp_vfs.score FROM node node INNER JOIN (SELECT n.nid, i.score FROM node n LEFT JOIN search_index i ON n.nid=i.sid INNER JOIN node_access na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( i.fromsid=0 AND i.word IN ('lorem') ) GROUP BY n.nid HAVING COUNT(*)=1) temp_vfs ON node.nid = temp_vfs.nid INNER JOIN node_access na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( (node.status = '1') ) 

I think that the only technical problem with this SQL is that the sub-query portion references node.nid when it should be n.nid. But the node_access JOIN in the sub-query wasn't expected or needed. I think that it would be better to limit the db_rewrite_query only to SQL not part of a sub-query.

I plan on offering a solution, but I first want to document the problem.

See also #143888 (support sub-queries in views) and views_fastsearch 5.x-dev release notes.

Comments

moshe weitzman’s picture

one obvious solution is to not send anything through db_rewrite_sql that should be rewritten. the caller can inject his subquery afterwards. if the subquery has to be rewritten, he can send that through separately.

an alternate solution is for the caller to add a hint in the db_rewrite_sql() call and replace some token with the suquery with example_rewrite_sql(). in other words, implement the hook yourself and inject there.

there may be better options though.

douggreen’s picture

StatusFileSize
new1.18 KB

This patch is an improvement over the current implementation in that it will properly handle all cases where the query (not the sub-query) has a WHERE clause. It does this by looking for the position of the last WHERE in the clause and does not change anything before it.

For the one case that this does not solve, that of a query that uses sub-queries with WHERE's in the sub-query but no WHERE in the query, the developer can easily resolve this by simply adding "WHERE (1)" to the query.

douggreen’s picture

The above patch was rolled against 5.x, but works on 6.x too.

douggreen’s picture

Title: Support subqueries in 6.x with the move to mysql 4.1 » get ready to Support subqueries, db_rewrite_sql broken

pgsql supports sub-queries as of 7.1 and mysql supports sub-queries as of 4.1. The current (5.x) Drupal requirements for pgsql of 7.3 already supports sub-queries and the coming 6.x requirements for mysql will support sub-queries.

webchick’s picture

Status: Active » Needs review

This is a patch.

catch’s picture

Status: Needs review » Needs work

which no longer applies.

douggreen’s picture

Title: get ready to Support subqueries, db_rewrite_sql broken » Support subqueries, db_rewrite_sql broken
Category: feature » bug
Status: Needs work » Needs review
StatusFileSize
new1.43 KB

Re-rolled for the latest 6.x Drupal Beta 4. Because both databases minimum versions support subqueries, contrib modules will try to use subqueries. Shipping 6.x without subquery support is IMO a bug so I've changed this from a feature request to a bug.

douggreen’s picture

Version: 6.x-dev » 6.0-beta4
douggreen’s picture

While re-reading the comments, I'd like to slightly contradict what I said in #2 above. I think that this is actually an adequate long term fix. With this fix, if someone did have a db_query SQL string that didn't have a WHERE clause in the primary SQL, could simple add WHERE (1), and thus get around the restriction.

To also clarify, I do not think that this is not a problem with the new search implementation... I was able to implement that without subqueries, except for the COUNT(*) version, that I don't think is affected by this issue.

catch’s picture

To also clarify, I do not think that this is not a problem with the new search implementation...

Er, could you clarify - it's not a problem, or it is a problem?

subqueries are lovely and it'd be great to get this in, but I'm not sure how to test this..

douggreen’s picture

StatusFileSize
new1.4 KB

This is NOT a problem with the new search implementation. Sorry if this was confusing. I only made this comment in #10 because the initial issue refers to views_fastsearch, and I didn't want people to think that this was a bug in the new search implementation. It's not. This is a bug for anyone who wants to write subqueries with WHERE clauses.

It's kinda hard to test, because it changes db_rewrite_query. To test, first I'd like to make sure that it doesn't break any existing queries. I'd apply the patch and run a full suite of tests, although that's never going to be exhaustive. Maybe keep this patch installed for a day while you test other things. Then I'd create a custom module that has a subquery that has a WHERE in the query and a WHERE in the subquery, and make sure that works.

What would really be helpful, is a code review from a seasoned developer. So that someone else can review the code, this is what it does:

First, it finds the last position of WHERE in the query. The current code finds the first position. Finding the first position of WHERE fails when you have a WHERE in your subquery. For example, this would fail with the current code, but will work with the proposed patch. This is just an example, you'd probably never write this:

SELECT node.nid FROM node INNER JOIN (SELECT uid FROM node WHERE uid > 1) u ON node.uid = u.uid WHERE node.uid > 1

Second, it only looks for GROUP, HAVING, ORDER and LIMIT after the last WHERE clause. This example looks for nodes from (non-admin uid=1) user's that have authored two or more nodes:

SELECT node.nid FROM node INNER JOIN (SELECT uid FROM node WHERE uid > 1 GROUP BY uid HAVING COUNT(*) > 1) u ON node.uid = u.uid WHERE node.uid > 1

The GROUP BY clause is in the subquery, not the primary query, so the search for GROUP should fail. And the replacement string will properly be based on the primary queries WHERE clause.

The proposed patch's test for WHERE is not perfect. For example, in the above examples, we really don't need the second WHERE clause -- it would be functionally equivalent without it. But the primary query must always have a WHERE clause. In cases that you don't really need the WHERE clause, I propose that developers just add WHERE 1. For example,

SELECT node.nid FROM node INNER JOIN (SELECT uid FROM node WHERE uid > 1) u ON node.uid = u.uid WHERE 1

This writeup helped me, because I think I found a bug in my patch. Attached is a new patch.

catch’s picture

Thanks for the write-up. I can confirm that testing with all core modules enabled, adding, deleting users, nodes, taxonomy terms etc. is fine. Complex queries like search and tracker are unaffected. Wish I could help with the seasoned developer bit!

moshe weitzman’s picture

Version: 6.0-beta4 » 6.0-rc1
Priority: Normal » Critical

i am a bit bold tonight and calling this critical. IMO, we upped system requirements in part so we can use subqueries in Drupal6. we can't have a key function like db_rewrite_sql() actively discourage their use. once can't avoid that function - it enforces our whole node access control. every Views query uses it as well.

chx’s picture

StatusFileSize
new1.87 KB

The WHERE you need is not the first nor the last.
SELECT nid FROM (SELECT nid FROM term_node WHERE tid = 1) AS x INNER JOIN (SELECT nid FROM term_node WHERE tid = 1) AS y USING (nid) INNER JOIN foo USING (nid) WHERE (a = B) OR (weight IN (SELECT weight FROM allowed_weights WHERE (allow = 1))) AND q = 1 and I guess I could write even uglier pieces. The WHERE you need is the one outside of balanced parentheses, in other words ^(?P<anonymous_view>(?:[^()]++|\((?P>anonymous_view)\))*).*?WHERE. However, we want the last of GROUP|ORDER|LIMIT and there is no point in putting HAVING in there because that requires the presence of GROUP.

chx’s picture

StatusFileSize
new1.86 KB

As the anonymous_view subpattern had a * quantifier having a ? after it is not needed.

catch’s picture

OK well I won't pretend to know what's going on in #16, but it doesn't break anything I tested either (adding/deleting nodes, users, taxonomy terms, searching, tracker etc.)

gerhard killesreiter’s picture

Status: Needs review » Needs work

I think the patch removes support for "HAVING".

chx’s picture

Status: Needs work » Needs review
StatusFileSize
new2.37 KB

But that's fine as said in my original post, GROUP BY comes before HAVING. Anyways, the patch above found the first group/order/limit instead of the last. We would need strrpos for strings. That's PHP5. We can emulate.

catch’s picture

No breakage with #19 either.

chx’s picture

StatusFileSize
new2.34 KB

No need to retest, this is a trivial cleanup: $needle is not used so I removed it.

douggreen’s picture

Status: Needs review » Needs work

I like where he's going with this, but I'm sorry to report that chx's patch doesn't work, yet mine does.

I wrote a simple module that has the following function (called from a menu handler):

function dbrewritesql_menu() {
  $items = array();
  $items['admin/db_rewrite_sql'] = array(
    'title' => t('DB Rewrite SQL Test'),
    'page callback' => 'dbrewritesql_test_page',
    'access arguments' => array('administer site configuration'),
    'type' => MENU_NORMAL_ITEM,
  );
  return $items;
}

function dbrewritesql_test_page() {
  $sql = "SELECT n.nid FROM {node} n INNER JOIN (SELECT uid, COUNT(*) AS num FROM {node} WHERE status = 1 GROUP BY uid HAVING num > 1) sub1 ON sub1.uid = n.uid WHERE n.uid > 0";
  drupal_set_message($sql);
  $sql = db_rewrite_sql($sql);
  drupal_set_message($sql);
  $result = db_query($sql);
  while ($node = db_fetch_object($result)) {
    $node = node_load($node->nid);
    $output .= node_view($node, TRUE);
  }
  return $output;
}

function dbrewritesql_db_rewrite_sql($query, $primary_table, $primary_field, $args) {
  if ($primary_table == 'n') {
    return array('where' => '1 = 1');
  }
}

You'll also need the info file:

name = dbrewritesql
description = DB Rewrite SQL Test
core = "6.x"

Look at the db_rewrite_sql that's displayed in the drupal_set_message. chx's version adds the WHERE clause to the subquery, whereas my version adds it to the final WHERE clause. This is what you should see:

SELECT n.nid FROM {node} n INNER JOIN (SELECT uid, COUNT(*) AS num FROM {node} WHERE status = 1 GROUP BY uid HAVING num > 1) sub1 ON sub1.uid = n.uid WHERE (1 = 1) AND ( n.uid > 0)

chx’s picture

Status: Needs work » Needs review
StatusFileSize
new2.27 KB

Try this and still note that your test case does not use a subquery after the main while. Note this version is also a cleanup: uses one less preg.

douggreen’s picture

I'm not sure what you mean by "your test case does not use a subquery after the main while," but this version does pass my test.

I notice that you dropped the comment about why you're doing the strrev (// PHP 4 does not support strrpos for strings. We emulate it.). I think the comment added something for me. Did you remove it intentionally?

It's too bad that we're still supporting php4 with Drupal 6.x -- that's some convoluted code to do the strrev check. gophp5!

If catch can make sure that this doesn't break anything else, I think it's RTBC. I can live without the comment, although I do think it helped.

catch’s picture

adding/deleting nodes, adding/deleting user, adding/deleting term all fine. No time for more comprehensive testing until later, possibly tomorrow - so wouldn't hurt for an additional review from someone else.

chx’s picture

StatusFileSize
new2.33 KB

No change at all, just added back that comment. SELECT nid FROM node WHERE nid IN (SELECT nid FROM somewhere WHERE foo = 'bar') this kind of subquery will make your version fail IMO.

douggreen’s picture

Status: Needs review » Reviewed & tested by the community

@chx, agreed, you're patch is better. I've also confirmed that it works on the second type of subselect, that you've shown above.

gábor hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Thanks for the patch and for all the testing. Hopefully we can make this a bit more nicer come Drupal 7 on PHP 5.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

killes@www.drop.org’s picture

Status: Closed (fixed) » Active
killes@www.drop.org’s picture

To elaborate: This patch has broken the use of HAVING without GROUP BY which is IMO perfectly valid SQL. http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt does not mention that GROUP BY would be required of HAVING is used:

If neither a <where clause> nor a <group by clause> is speci-
fied, then let T be the result of the preceding <from clause>;
if a <where clause> is specified, but a <group by clause> is
not specified, then let T be the result of the preceding <where
clause>; otherwise, let T be the result of the preceding <group
by clause>.

killes@www.drop.org’s picture

Version: 6.0-rc1 » 7.x-dev

moving

killes@www.drop.org’s picture

Version: 7.x-dev » 6.x-dev

Moving back to D6, chx explained that db_rewrite_sql will be removed in D7.

ingo86’s picture

Status: Active » Needs review
StatusFileSize
new1.02 KB

Hi everyone,
Here is a patch in order to enable HAVING clause without GROUP BY clause. It's what #31 report.

wedge’s picture

I was bit by this today and #26 solved my issue. Drupal5 postgresql 8.3. Thanks chx and doug!

colan’s picture

Yes, definitely #26++! I used it on 5.7 (without problems) to solve a huge MySQL error I was getting. I'm not sure what can go into point releases, but would it be possible to also release this as a bug fix for 5 when it finally gets resolved? I'm writing code for some 5.x contrib modules that will depend on this patch.

emok’s picture

(Should I have opened a new bug for this?)
db_rewrite_sql in Drupal 6.3 seems to be similar to #26. With it I get different results between PHP 4.3.11 and PHP 5.2.3. I've nailed the problem down to the actual preg_match-call. So I think there is some change in how PHP works between these versions, which affects queries where we do not want to use the last WHERE of the string.

 //Test case. The $pattern is equivalent to that in #26 but shorter to write.
$q = "SELECT (SELECT * FROM {x} WHERE 1) AS inner1
FROM {node} node_SHOULD_END_HERE WHERE
node.vid IN (SELECT tn.vid FROM {term_node} tn_ERROR_MATCHED_TOO_MUCH WHERE tid = %d)";
$pattern = '/^((?P<anonymous_view>(?:[^()]++|\((?P>anonymous_view)\))*)[^()]+WHERE)/';
preg_match($pattern, $q, $matches);
print "<pre>\n{$matches[1]}\n</pre>";

On PHP 5.2.3 I get the expected:

SELECT (SELECT * FROM {x} WHERE 1) AS inner1
 FROM {node} node_SHOULD_END_HERE WHERE

While on PHP 4.3.11 the output is:

SELECT (SELECT * FROM {x} WHERE 1) AS inner1
 FROM {node} node_SHOULD_END_HERE WHERE
 node.vid IN (SELECT tn.vid FROM {term_node} tn_ERROR_MATCHED_TOO_MUCH WHERE

The erroneous PHP4 output means that db_rewrite_sql will put additional WHERE-code (like node access restrictions) in the inner query ({term_node} in my case). In certain views this can cause a SQL error.

Any ideas about how to overcome this difference, so that PHP4 is equally supported? On PHP5 #26 (and probably #34) works fine.

emok’s picture

StatusFileSize
new1.92 KB

I now have a solution to #37 so that db_rewrite_sql works the same way on PHP 4.3.11 as on PHP 5.2.3. This version is probably a bit slower, 60-90% when only measuring the affected lines, but I doubt it would be worth using different code for PHP4 and PHP5.
The attached patch applies to the Drupal 6.3 release. If desired, #34 can still be applied on top of this.

abautu’s picture

I posted something related here (http://drupal.org/node/162970#comment-1009000) before notice that issue is a duplicate for this one.

arhak’s picture

subscribing

codecowboy’s picture

Obviously this isn't a major issue in D6 as a year without activity indicates. Maybe this issue should be won't fixed as community focus is shifting to D7 and code freeze and D6 seems to be humming along fine without this patch.

douggreen’s picture

Status: Needs review » Closed (won't fix)

Agreed, but I do think this is a problem. But I've done very little D6 work ... been stuck on 5 mostly and I hope my focus will be on 7 soon.

poshaughnessy’s picture

Thanks very much for the patches here, I applied emok's (#38) and it fixed a real headache I was having with SQL syntax errors which seem to have been down to how the original pattern worked. I'm afraid I haven't been able to figure out why it has fixed it (I'm new to PHP and Drupal) but my post explains the issue in case anyone's interested: http://drupal.org/node/585458.