When executing the COUNT() query, I am seeing SQL like:

SELECT COUNT(*) FROM (SELECT node.nid AS nid, node.title AS node_title, node.uid AS node_uid, node.type AS node_type, node_revisions.format AS node_revisions_format FROM node node LEFT JOIN node_revisions node_revisions ON node.vid = node_revisions.vid ) AS count_alias

There is no benefit and much problem with listing the fields in the subquery. That subquery should omit the fields and just hard code a 1 like SELECT 1 FROM .... The problem with keeping the fields is that a lot of data has to be selected out when there are 100,000 rows (for example). Omitting the fields dropped my count query to 1sec from 6sec.

Comments

moshe weitzman’s picture

Assigned: Unassigned » moshe weitzman
Status: Active » Needs review
StatusFileSize
new1 KB

Here is a possible patch. This once slices out the fields if we are not running a distinct query. I think more could be done in the case where we *are* running a distinct query, but this an improvement on its own.

moshe weitzman’s picture

Status: Needs review » Needs work

I am seeing an error here because the ORDER BY clause sometimes specifies fields using an alias. But we don't even want to ORDER in the count query. Thats yet another source of delay.

@Earl - do you think we could ask the query builder to build another query but without the fields and order by? Any other solution you can think of?

merlinofchaos’s picture

Yea, that's pretty much what has to happen. Might break group by though.

moshe weitzman’s picture

Yeah - perhaps bail on the optimization if the query has distinct or uses group by

merlinofchaos’s picture

That's the safest route -- I would be in favor of that.

query::query() has an argument that used to be used for this kind of thing (though the reason I switched to this method is that it had occasional miscounts). But if we check that argument, drop the orderby, I think we're ok.

Check for distinct; we should always get the first field (which should be the primary field) which won't add anything to the query -- as it's going to be the primary key for whatever base table is in operation.

Actually, I think we have to check each field for distinct, and just render them all if any one of them is set distinct (as we can't know what some random handler will need).

moshe weitzman’s picture

StatusFileSize
new2.92 KB

This one does as we discussed, except it only checks the base field. I will work on it some more.

moshe weitzman’s picture

Status: Needs work » Needs review
StatusFileSize
new3.42 KB

Now checks all fields for a distinct. This should be ready to go.

I am still am not too keen on our novel way of doing the count query. I guess the db_rewrite_sql() improvements we made at #151910: Support subqueries, db_rewrite_sql broken were not sufficient for Views. On my test site, the count query consistently takes longer than the original query, even after this patch. Anyway, thats a task for another day.

moshe weitzman’s picture

Anyone available to review this?

merlinofchaos’s picture

Status: Needs review » Fixed

Committed.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

vasi’s picture

Priority: Critical » Normal
Status: Closed (fixed) » Active

Views for Drupal 5 does this count-query optimization, but doesn't test for DISTINCT or GROUP BY first. Instead if there's a GROUP BY clause, the count query just ignores it, which will give bad results of course. I'd write up a patch, but the presence of both $this->distinct and $this->no_distinct is confusing me :-( Anybody feel up to it?

dergachev’s picture

deleted.

merlinofchaos’s picture

Status: Active » Closed (fixed)

vasi: A new issue that links to this one would be preferred, as it'll be difficult for people to follow the conversation as it changes versions.