In testing #681760: Try to improve performance and eliminate duplicates caused by node_access table joins, we found that we could not replicate this proposed Drupal 6 query syntax in Drupal 7.

SELECT n.nid, n.sticky, n.created FROM node n WHERE (EXISTS (SELECT na.nid FROM node_access na WHERE n.nid = na.nid AND (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))))) AND ( n.promote = 1 AND n.status = 1 )ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10;

When using $query->condition(), the fieldname is required, so we are stuck with something like:

$query->condition('n.nid', $subquery, 'EXISTS'), which returns malformed SQL.

SELECT n.nid, n.sticky, n.created FROM node n WHERE ( n.promote = 1 AND n.status = 1 ) AND (n.nid EXISTS (SELECT na.nid FROM node_access na WHERE n.nid = na.nid AND (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))))) ANDORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10;

The documentation at http://drupal.org/node/310086 suggests that EXISTS is not properly supported by DBTNG:

Subselects are generally useful only in two cases: Where the subselect results in only a single row and value returned and the operator is =, <, >, <=, or >=; or when the subselect returns a single column of information and the operator is IN. Most other combination would result in a syntax error.

So, how to fix?

Comments

pwolanin’s picture

Can you supply an empty string as the field name?

update: Ken says no.

pwolanin’s picture

Some similar:

"WHERE NOT EXISTS"
"WHERE NOT (boolean expression)"

but generally this is an unusual construction it seems.

pwolanin’s picture

Seems like to do this in a sane way we need to make the condition interface more complete, e.g.:

@@ -77,6 +77,28 @@ interface QueryConditionInterface {
   public function isNotNull($field);
 
   /**
+   * Sets a condition that the specified subquery matches one or more rows.
+   *
+   * @param $query
+   *   The query object corresponding to a subquery.
+   *
+   * @return QueryConditionInterface
+   *   The called object.
+   */
+  public function exists(SelectQuery $query);
+
+  /**
+   * Sets a condition that the specified subquery matches no rows.
+   *
+   * @param $query
+   *   The query object corresponding to a subquery.
+   *
+   * @return QueryConditionInterface
+   *   The called object.
+   */
+  public function notExists(SelectQuery $query);
+
+  /**
webchick’s picture

This sounds fairly non-obtrusive, so as long as there are tests, I'm comfortable squeezing this in to make our node access queries performant.

Crell’s picture

I second Angie's comment. With unit tests this should be a no-break addition so I'm comfortable with it. We want fast node access queries. :-)

webchick’s picture

Incidentally, can we get confirmation that EXISTS isn't some weird MySQL-ism? I never ran into that clause before.

Crell’s picture

agentrickard’s picture

Status: Active » Needs review
StatusFileSize
new2.79 KB

Here ya go, tests.

If this goes in, #681760: Try to improve performance and eliminate duplicates caused by node_access table joins needs revision to remove patch duplication.

NOTE: The basic db_select() syntax is not tested for other conditionals in core. Only for simple '=' operations. So this patch was held to a higher standard than the rest of core. Tests for items like BETWEEN are limited to Update queries. Someone else needs to fix that in another issue.

Crell’s picture

Status: Needs review » Needs work

Wouldn't pwolanin's approach in #3 be cleaner? That seems like it would be far more self-documenting, and make it easier to support the various variants of EXISTS that chx tells me, er, exist. (SOME, ALL, ANY.)

agentrickard’s picture

I knew you would say that. Doing so puts that squarely on you, IMO.

agentrickard’s picture

StatusFileSize
new7.5 KB

This is as far as I can go. The rest is a black box of OO Fu.

Something actually has to implement adding the EXISTS clause to the query string.

chx’s picture

Title: DBTNG support for EXISTS conditions » DBTNG support for subquery quantifiers
Status: Needs work » Reviewed & tested by the community

ANSI SQL has more quantifiers (namely ANY/SOME, ALL) we are not adding support for them because we ... can't. And that's fine. Noone uses them anyways :) Your query looks like SELECT * FROM node WHERE nid > ANY (SELECT nid FROM node_access). Read http://www.postgresql.org/docs/8.3/static/functions-subquery.html for more. Edit: this makes ANY and ALL unsupportable with just condition so we would need methods for these two.

So at this point adding more methods Edit: for just EXISTS is a farce and pointless. I am RTBC'ing #89. It has tests and does what we need. Making exists a new method is entirely pointless -- we will do that alongside with ANY/SOME and ALL in Drupal 8. With an eye on backport.

chx’s picture

Title: DBTNG support for subquery quantifiers » DBTNG support for EXISTS conditions

Sorry I changed the title before I decided what I wanted :)

pwolanin’s picture

Gicen that most of us have never even though of using "EXISTS" in a query, and we are still not sure whether it performs better for the node access use than "IN", I think the simpler patch in #9 is probably the right thing for now.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 1001242-exists-with-tests.patch, failed testing.

chx’s picture

Status: Needs work » Reviewed & tested by the community

We had a long chat on IRC about this, let me try to sum up. The problem is that we are two RCs in and I really would like to keep with current syntax. That's possible with ->condition('', $subquery, 'EXISTS') and it's really similar to the current IN support. This is as far as I would go with Drupal 7.

Now, you can't do that with ->condition('nid', $subquery, '> ANY') because you would need to map > ANY and all such constructs separately. That's IMO not acceptable as it would immediately triple the size of mapConditionOperator needlessly. So for complete quantifier support we need new method(s) or changing the condition function to be ->condition('', $subquery, '>', 'ANY') and then figure out mapping but that's Drupal 8 territory.

agentrickard’s picture

@pwolanin

Were you ever able to do any query/load testing on the difference between IN and EXISTS in #681760: Try to improve performance and eliminate duplicates caused by node_access table joins? I suppose I can fire up 10,000 nodes and give it a shot.

webchick’s picture

Status: Reviewed & tested by the community » Needs review

1) This still needs Crell's sign-off.
2) Do we still need this? #681760: Try to improve performance and eliminate duplicates caused by node_access table joins is talking about IN queries now, which we already have support for.

agentrickard’s picture

I'm waiting for @pwolanin or another power SQL user to tell us exactly which query format performs better. I also think we might have a COUNT query problem, since the current patch in #681760: Try to improve performance and eliminate duplicates caused by node_access table joins creates a COUNT(*) query with _two_ subselects.

pwolanin’s picture

I think we want this in our tool kit even if we decide to go with IN for the other patch.

Crell’s picture

Assigned: Unassigned » Crell

We do want to have the utility methods. Relying on the implementation detail that right now a '' happens to work does not make EXISTS supported. It's a clever hack. Even if we use that fact internally it should not be exposed to callers.

Assigning to me to handle the extra methods.

chx’s picture

Version: 7.x-dev » 8.x-dev

Well then.

agentrickard’s picture

@chx

If we bump that to D8, then the other patch has to get blessed using IN syntax. Which seems fine to me, but someone who is more critical of database performance really needs to stress-test that.

Crell’s picture

Version: 8.x-dev » 7.x-dev

*sigh*

agentrickard’s picture

@Crell, AFAIK the EXISTS v. IN doesn't make a bit of performance difference. But I should not have final say.

webchick’s picture

When I talked to Narayan about this the other day, he said he thought the MySQL query optimizer would automatically convert IN to EXISTS but he wasn't totally sure.

agentrickard’s picture

It seems to. Naryan and I were discussing that in IRC. But using a native EXISTS is likely better, since it avoids this conversion.

Crell’s picture

StatusFileSize
new7.6 KB

With a utility method. This is based on agentrickard's patch in #9, so credit accordingly.

The reason why we need a method for exists() but don't need one for ANY etc. is that the others do require a field. EXISTS does not. Therefore it is perfectly natural to do ->condition($field, $subquery, '< ANY') or ->condition($field, $subquery, '> ALL') just as we already routinely do ->condition($field, $subquery, 'IN'). However, EXISTS has no field, and we should not rely on the quirk of the current implementation that an empty string for $field just happens to work right now. Instead we wrap that into a utility method that takes advantage of that fact, offering a better separation between interface and implementation.

While MySQL may optimize EXISTS into IN or vice-versa, that doesn't mean every database will do so. Better to support the real SQL syntax and let the specific database handle it. That's what it's good at.

chx’s picture

You still can't do '< ANY' because there is no mapping for it and it needs a mapping and we do nto want to add a mapping for every combination of possible operation + some/any/all.... So you need methods or a double op condition which is too big a change for now. So is #29.

Crell’s picture

mapConditionOperator() is not a whitelist, just a translation list. If you map an operator it doesn't know about, it just upper-cases it and turns it into the array structure the compiler expects. I do believe your statement that we cannot do "< ANY" is false.

And seriously, we've done far worse to an RC before in the past 2 weeks.

chx’s picture

you need prefix/postfix ( ) for < ANY to work and those come from mapping.

andypost’s picture

I prefer using EXISTS from my oracle's times - if internal dataset is bigger then 10-20 rows then EXISTS gives more performance and eats less memory. I've googled around postgesql and it seems they are adopted same technique for IN and EXISTS

So +1 for commiting this! I think there would be follow-ups to optimize some queries by changing IN into EXISTS

agentrickard’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new7.6 KB

There were two tabs in the patch, this works as expected.

agentrickard’s picture

Note that this is not really an API change. It is a correction of an oversight, since EXISTS and NOT EXISTS are ANSI standard.

chx’s picture

For the sake of the node patch and peace, let it be but I am going to use this issue as a major argument for backporting ANY/SOME/ALL support into D7.1

Crell’s picture

Grrr Eclipse and the 15 places you have to change a setting to get rid of tabs...

For the record, I am not adverse to ANY/ALL backports at a later date if they can be done cleanly. (And chx's point in #32 is valid.)

webchick’s picture

Status: Reviewed & tested by the community » Fixed

I'm not adverse to that either as long as they're similar to this patch: minimally invasive, doesn't break any existing code, documented, comes with ample tests. ;)

Committed to HEAD. Thanks!

webchick’s picture

Status: Fixed » Needs work

Meh. This broke SQLite:

http://ci.drupal.org/result.php?job=drupal-test-sqlite&build=2011-01-03_...
Fail Other database_test.test 1712
Fetched name is correct using EXISTS query.

bfroehle’s picture

Here's my analysis of the failing SQLite test.

The SQL test in question, testExistsSubquerySelect(), relies on two tables, {test} and {test_people}.

mysql> select * from {test};
+----+--------+-----+------------+
| id | name   | age | job        |
+----+--------+-----+------------+
|  1 | John   |  25 | Singer     |
|  2 | George |  27 | Singer     |
|  3 | Ringo  |  28 | Drummer    |
|  4 | Paul   |  26 | Songwriter |
+----+--------+-----+------------+
4 rows in set (0.01 sec)

mysql> select * from {test_people};
+----------+-----+---------+
| name     | age | job     |
+----------+-----+---------+
| George   |  27 | Singer  |
| Meredith |  30 | Speaker |
+----------+-----+---------+
2 rows in set (0.00 sec)

It then builds and runs the following query:

SELECT t.name AS name FROM {test} t WHERE  ( EXISTS  (SELECT tp.name AS name FROM {test_people} tp WHERE  (name = 'George') ))

In both MySQL and SQLite, this returns all rows from {test}... however the ordering is different and this breaks the test.

In SQLite this returns:

Array
(
    [0] => stdClass Object
        (
            [name] => John
        )
 
    [1] => stdClass Object
        (
            [name] => George
        )
 
    [2] => stdClass Object
        (
            [name] => Ringo
        )
 
    [3] => stdClass Object
        (
            [name] => Paul
        )
)

and it MySQL

Array
(
    [0] => stdClass Object
        (
            [name] => George
        )
 
    [1] => stdClass Object
        (
            [name] => John
        )
 
    [2] => stdClass Object
        (
            [name] => Paul
        )
 
    [3] => stdClass Object
        (
            [name] => Ringo
        )
)

I think this test code is not what the author intended -- I think the goal was to just select one row from {test}.

bfroehle’s picture

Status: Needs work » Needs review
StatusFileSize
new1.89 KB

I've revised and simplified the test to now test the following query:

    // Base query to {test}.
    $query = db_select('test', 't')
      ->fields('t', array('name'))
      ->condition('t.name', 'George');
    // Subquery to {test_people}.
    $subquery = db_select('test_people', 'tp')
      ->fields('tp', array('name'))
      ->condition('tp.name', 'Meredith');
    $query->exists($subquery);
    $result = $query->execute();

Tests pass locally in both MySQL and SQLite.

steven jones’s picture

Assigned: Crell » Unassigned
StatusFileSize
new2.05 KB

Seems like the original intention of the exists test was to only select rows from test that have a matching name in the test_people table, i.e. just George.

Attached patch corrects the tests to do that.
Passing on PostgreSQL.

(This doesn't need to be assigned to Crell any more)

bfroehle’s picture

Status: Needs review » Reviewed & tested by the community

The patch in #42 is ready to go. It clearly captures the test author's original intent and adds some comments so it's clear what it is testing for. I've verified that SQLite passes on my local machine, Steven Jones verifies it works for PostgreSQL, and the testbot checked MySQL, so we are all set.

As a nice side benefit, this will make the SQLite Continuous Integration testing pass.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Awesome work folks, thanks!!

Committed to HEAD.

Crell’s picture

Nice Sleuthing, guys!

agentrickard’s picture

And correct, btw, my intent was to return only the 'George' row.

Status: Fixed » Closed (fixed)
Issue tags: -DBTNG Conversion

Automatically closed -- issue fixed for 2 weeks with no activity.