Download & Extend

Use subqueries for ->countQuery(), at least for MySQL

Project:Drupal core
Version:7.x-dev
Component:database system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

Because MySQL has a query cache, doing:

SELECT COUNT(*) FROM (<the subquery>)

... should be lightening fast. Why not using that, at least for MySQL (PostgreSQL and SQLite have no query cache, so this will require testing on those).

Comments

#1

It might work on other DBMS too, if we can "optimize" the subquery, I'm thinking about:

- we don't need ORDER BY
- we don't need all the fields, the first one or so should be enough
- maybe other stuff ?

I had the same idea in the privatemsg module (we have our own simple query builder for 6.x there), because we have quite complex queries with HAVING and so on, and it's really hard to create count queries for them.

#2

Thinking about it a little bit more, this will not work as I intended on MySQL for two reasons:

- Because "The cache is not used for queries that are a subquery of an outer query" (from [1])
- And because the queries are generally different (the main query has a LIMIT statement, while the count query hasn't)

We still must study the best way of doing those count queries. It's really a shame that the database server generally has to do mostly the same work twice.

[1] http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html

#3

subscribe.

#4

An idea I had recently was that we could check if there are GROUP BY or HAVING condititions and if yes, use a subquery.

This would allow to always have a *working* count query. If the caller knows how to create a more efficient count query, he can still define it explicitly.

#5

While using a subquery may not give performance benefits, it has another impact - giving correct counts for more queries in more situations. This should "fix" pager_query to work in more instances.

(iirc, the other issue that was worked around in the privatemsg query builder was also incorrect count queries where the SELECT statement had a DISTINCT.)

EDIT - it was more likely GROUP BY as mentioned by Berdir and not DISTINCT that was the problem (DISTINCT being he solution in this case.)

#6

If the MySQL count query can be made better (where better is one of faster/more accurate/better tasting/less filling) with a subquery, let's go for it. The query builders support DB-specific subclasses for exactly that reason. It should be a very simple patch, but we'd need some way to proper test and benchmark it.

My question, though, is whether COUNT(*) FROM (subquery) is going to run into the same problem as COUNT(*) FROM normal query on InnoDB. COUNT(*) is only an O(1) operation on MyISAM tables, while we're now defaulting to InnoDB.

#7

@Crell: COUNT(*) is O(1) on MyISAM only for straight SELECT COUNT(*) FROM table. As long as the query is more complex, MySQL have to determine the full result set before counting, both on MyISAM and InnoDB.

The idea here was that maybe we could avoid having to determine the full result set twice. But apparently this is not possible, because MySQL is not able to use the query cache for subqueries.

I'm tempted to put that in the trash can of "potentially good ideas".

#8

If there's no performance benefit, is there an accuracy benefit as well? #5 hinted at that.

#9

There is certainly a reduction of the complexity, which is a great benefit. I'll try to benchmark some of this.

#10

Very minor, but we'd probably save a tiny bit in the query builder and PDO by building and running one query instead of two no? unlikely to be measurable but doesn't do any harm either.

#11

Version:7.x-dev» 8.x-dev

This is unlikely to be a major performance change, so we'll come back to it later maybe.

#12

Version:8.x-dev» 7.x-dev
Category:task» bug report

Please refer back to comments #4 and #5 - this isn't just a performance issue, it's out-and-out wrong with GROUP BY.

For this query:

<?php
    $query
= db_select('migrate_example_beer', 'b')
             ->
fields('b', array('bid', 'name', 'body', 'excerpt', 'aid', 'countries', 'image'));
   
$query->leftJoin('migrate_example_topic_beer', 'tb', 'b.bid = tb.bid');
   
// Gives a single comma-separated list of terms
   
$query->groupBy('tb.bid');
   
$query->addExpression('GROUP_CONCAT(tb.name)', 'terms');
?>

countQuery() generates this SQL:

SELECT COUNT(*) AS expression
FROM
{migrate_example_beer} b
LEFT OUTER JOIN {migrate_example_topic_beer} tb ON b.bid = tb.bid
GROUP BY tb.bid

The beer table has 3 rows - what we get back from this count query is 3 rows with 1, 1, and 2 respectively (the number of related topics for each beer row).

#13

Version:7.x-dev» 8.x-dev
Category:bug report» feature request

I don't think so.

Yes, the query is wrong, but you can create a separate countQuery and set that with $query->setCountQuery(). That's exactly how it is in D6 with pager_query(), so there is no regression. Of course, it would be great if a correct count query could be automatically generated but that not always possible. Well, the subquery approach probably would be, but that is in most cases also a lot slower than a custom query that just selects the count.

#14

Version:8.x-dev» 7.x-dev
Category:feature request» bug report

With GROUP BY queries, countQuery() does not behave as documented or as expected. That, to my mind, is a bug. The fact that a work-around exists does not make it not a bug. Neither does the fact that a similar (but different) function in D6 had the same bug.

If this is not going to be fixed in D7, then it at least needs to be a documented restriction, so people don't waste unnecessary time trying to figure out what's wrong (like I did).

I think you had it right in comment #4 - use the subquery approach only in the cases that need it, so countQuery() is functionally correct; the caller can always construct their own count query if the performance is inadequate.

#15

We can fix it in D7 if someone steps up to fix it. :-) I'm not entirely certain how we'd do that but if someone wants to work on it I will work with them on it.

Also, is it just in MySQL that Group By doesn't work? Would we need an entirely different count query in other DBs?

#16

<?php
      $count_query
= db_select($count_query);
     
$count_query->addExpression('COUNT(*)');
?>

This works for me. This allows groupby, having and everything else.
Sadly this has no optimisation yet.

What can be done

* Remove sorts
* Perhaps remove some of the fields.
* Perhaps remove some of the expressions.

#17

Sure, that *works*. But we didn't do that in D6 either so it is not a regression :)

#778050: Add support for database hints and make PagerDefault properly pluggeable even avoids the count query in most cases on MySQL.

#18

It may not be a regression but anything that makes Drupal faster and more robust is OK in my book assuming it doesn't add more bugs. It's the latter part we want to check there.

#19

Status:active» needs review

I'm pretty sure this doesn't actually work yet, but I need to go work on something else for a while and unit tests for me are running criminally slow for some strange reason. Anyone want to finish it off?

I also realized while doing this that we never added a getGroupBy() accessor method. Now we got one. :-)

AttachmentSizeStatusTest resultOperations
423888-subquery-count.patch3.26 KBIdleFAILED: [[SimpleTest]]: [MySQL] 20,368 pass(es), 0 fail(s), and 2 exception(es).View details

#20

Status:needs review» needs work

The last submitted patch, 423888-subquery-count.patch, failed testing.

#21

Almost there - that patch works for my case (comment #12), but as the tests show breaks on DISTINCT.

#22

OK, think I have a fix for the distinct case - in my environment, this works for my case and for the two tests that previously failed, let's see how the full test suite handles it.

Crell, thanks for taking this on!

AttachmentSizeStatusTest resultOperations
423888-subquery-count-1.patch3.76 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,432 pass(es).View details

#23

Status:needs work» needs review

Setting the status.

#24

Added test testCountQueryGroupBy().

AttachmentSizeStatusTest resultOperations
423888-subquery-count-2.patch4.74 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,445 pass(es).View details

#25

Status:needs review» reviewed & tested by the community

Works for me, thanks!

And imo either this should get in, or the situation has to be documented. Having a function only return correct results sometimes isn't really satisfactory.

#26

#24 looks OK to me, too. This should make life simpler for Views in D7, I believe. Yay!

#27

+++ includes/database/select.inc 24 May 2010 18:01:38 -0000
@@ -94,6 +94,24 @@ interface SelectQueryInterface extends Q
+   * Returns a reference to the group by array for this query.

I don't really understand this sentence.

+++ includes/database/select.inc 24 May 2010 18:01:38 -0000
@@ -1297,22 +1334,25 @@ class SelectQuery extends Query implemen
+    if ($count->distinct && !empty($group_by)) {
+      // If the query is distinct and contains a GROUP BY, we need to remove the
+      // distinct because SQL99 does not support counting on distinct multiple fields.
       $count->distinct = FALSE;
     }

Is this what we want? It seems like this kind of fixing things up could lead to unexpected behavior.

#28

> I don't really understand this sentence.

I think the sentence is just missing a dash: "Returns a reference to the group-by array for this query."
There were also a few places in the copied doc comment, where it stil said "order-by" (or similar) instead of "group-by".

Attached is a patch with updated comments, code is the same. Not sure about the DISTINCT "fix".

AttachmentSizeStatusTest resultOperations
423888-subquery-count-2.patch4.73 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,629 pass(es).View details

#29

@Dries: What else would you do with it? If the SQL spec doesn't allow a distinct, we can't have a distinct. The alternative is unpredictable "ignore or fail" which would be DB-dependent. (MySQL of course ignores it, but I'm not sure about other databases.)

#30

Status:reviewed & tested by the community» fixed

Looked at the surrounding code again, and you're right Crell. Code looks good and the documentation fix is good. Committed to CVS HEAD.

#31

Some other Field / Language patch got committed along with this one : http://drupal.org/cvs?commit=403880

[edit: That's #871310: field_language() does not work if language fallback is disabled, which Dries intented to commit as well. All is good. Leaving a note for later blame searches]

#32

#33

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

#34

Status:closed (fixed)» active

It seems that some of this code did not make it to HEAD--specifically, the changes to SelectQuery::countQuery() are absent. Most of the patch is present in the commit yched linked to (403880, includes/database/select.inc is relevant), but looking at the current HEAD of select.inc (around line 771) the code in the patch is not present.

#35

ER, I feel stupid. The code is present and I was looking in the wrong method implementation.

The test provided with this patch does have a bug, though: it references a db field by name, but should use $query->groupBy($pid_field); instead of $query->groupBy('pid');

#36

Status:active» closed (fixed)
nobody click here