Don't retrieve fields when they are not needed [#290132]

When executing the COUNT() query, I am seeing SQL like:

SELECT COUNT(*) FROM (SELECT node.nid AS nid, node.title AS node_title, node.uid AS node_uid, node.type AS node_type, node_revisions.format AS node_revisions_format FROM node node LEFT JOIN node_revisions node_revisions ON node.vid = node_revisions.vid ) AS count_alias

There is no benefit and much problem with listing the fields in the subquery. That subquery should omit the fields and just hard code a 1 like SELECT 1 FROM .... The problem with keeping the fields is that a lot of data has to be selected out when there are 100,000 rows (for example). Omitting the fields dropped my count query to 1sec from 6sec.

Comment	File	Size	Author
#7	mw.patch	3.42 KB	moshe weitzman
#6	mw.patch	2.92 KB	moshe weitzman
#1	mw.patch	1 KB	moshe weitzman

Comments

Comment #1

moshe weitzman commented 15 August 2008 at 15:10

Assigned:	Unassigned	» moshe weitzman
Status:	Active	» Needs review

Status	File	Size
new	mw.patch	1 KB

Here is a possible patch. This once slices out the fields if we are not running a distinct query. I think more could be done in the case where we *are* running a distinct query, but this an improvement on its own.

Comment #2

moshe weitzman commented 15 August 2008 at 16:49

Status:

Needs review

» Needs work

I am seeing an error here because the ORDER BY clause sometimes specifies fields using an alias. But we don't even want to ORDER in the count query. Thats yet another source of delay.

@Earl - do you think we could ask the query builder to build another query but without the fields and order by? Any other solution you can think of?

Comment #3

merlinofchaos commented 15 August 2008 at 16:53

Yea, that's pretty much what has to happen. Might break group by though.

Comment #4

moshe weitzman commented 15 August 2008 at 16:57

Yeah - perhaps bail on the optimization if the query has distinct or uses group by

Comment #5

merlinofchaos commented 15 August 2008 at 17:32

That's the safest route -- I would be in favor of that.

query::query() has an argument that used to be used for this kind of thing (though the reason I switched to this method is that it had occasional miscounts). But if we check that argument, drop the orderby, I think we're ok.

Check for distinct; we should always get the first field (which should be the primary field) which won't add anything to the query -- as it's going to be the primary key for whatever base table is in operation.

Actually, I think we have to check each field for distinct, and just render them all if any one of them is set distinct (as we can't know what some random handler will need).

Comment #6

moshe weitzman commented 15 August 2008 at 19:19

Status	File	Size
new	mw.patch	2.92 KB

This one does as we discussed, except it only checks the base field. I will work on it some more.

Comment #7

moshe weitzman commented 16 August 2008 at 02:20

Status:

Needs work

» Needs review

Status	File	Size
new	mw.patch	3.42 KB

Now checks all fields for a distinct. This should be ready to go.

I am still am not too keen on our novel way of doing the count query. I guess the db_rewrite_sql() improvements we made at #151910: Support subqueries, db_rewrite_sql broken were not sufficient for Views. On my test site, the count query consistently takes longer than the original query, even after this patch. Anyway, thats a task for another day.

Comment #8

moshe weitzman commented 2 September 2008 at 19:10

Anyone available to review this?

Comment #9

merlinofchaos commented 10 September 2008 at 20:16

Status:

Needs review

» Fixed

Committed.

Comment #10

Anonymous (not verified) commented 24 September 2008 at 20:54

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

Comment #11

vasi commented 11 December 2008 at 16:24

Priority:	Critical	» Normal
Status:	Closed (fixed)	» Active

Views for Drupal 5 does this count-query optimization, but doesn't test for DISTINCT or GROUP BY first. Instead if there's a GROUP BY clause, the count query just ignores it, which will give bad results of course. I'd write up a patch, but the presence of both $this->distinct and $this->no_distinct is confusing me :-( Anybody feel up to it?

Comment #12

dergachev commented 12 December 2008 at 06:51

deleted.

Comment #13

merlinofchaos commented 27 December 2008 at 18:05

Status:

Active

» Closed (fixed)

vasi: A new issue that links to this one would be preferred, as it'll be difficult for people to follow the conversation as it changes versions.

Don't retrieve fields when they are not needed

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

News items

Our community

Documentation

Drupal code base

Governance of community