Pager.inc regexp misses whitespace use case

stephen.colson - July 30, 2008 - 01:03
Project:Drupal
Version:7.x-dev
Component:database system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs work
Description

function pager_query's count regexp is missing a use case. Currently, it searches for the following on line 65 of pager.inc:

$count_query = preg_replace(array('/SELECT.*?FROM /As', '/ORDER BY .*/'), array('SELECT COUNT(*) FROM ', ''), $query);

Note that after the FROM in the match side is a space. I recently ran in to a module having problems and after debugging found that it introduced a newline (and no space) immediately following the FROM keyword. Thus, I would propose the count gets changed to the following:

$count_query = preg_replace(array('/SELECT.*?FROM(\s)*/As', '/ORDER BY .*/'), array('SELECT COUNT(*) FROM ', ''), $query);

#1

stephen.colson - July 30, 2008 - 11:26
Version:6.3» 6.x-dev

I just confirmed the initial preg match is in CVS HEAD, so re-setting the applicable version.

#2

tobiassjosten - August 6, 2008 - 18:04

Matching that last space is, in my opinion, a waste of precious bytes.

Attached patch simply removes the space from after the FROM statement, from the regexp pattern and replacement, to allow a more structured style (see below). This fixes the pager counter, which would otherwise fail and return garbage page numbers.

$query = 'SELECT
my_field
FROM
my_table';

AttachmentSize
pager-288837.patch 586 bytes

#3

tobiassjosten - August 6, 2008 - 18:09
Status:active» needs review

#4

stephen.colson - August 6, 2008 - 18:14
Status:needs review» needs work

The problem with not matching a space afterward is there is the possibility that someone, for some reason, put in part of their query SELECT n.nid, FROMBLAH.uid, z.foo, FROM {node} as n, ...

The match, in my opinion, must be on FROM as a word by itself and not as the beginning of a word.

#5

webchick - August 6, 2008 - 18:17

Hm. As I see it, this would fail on:

SELECT NID, VID, FROMAGE FROM {cheese}

#6

tobiassjosten - August 6, 2008 - 19:04
Status:needs work» needs review

Here's a second attempt at solving this issue, now matching pretty much any acceptable character. I also added this character class to between SELECT and FROM, so that it also will not break, as HEAD currently does, on following query:

SELECT NID, VID, AGEFROM FROM {cheese}

Because of the similarity in characteristic of below issues, I also added the trim() method to the query, to solve them as well.
#163118: pager_query() misses multi.line SELECT\tab\FROM\tab.. SQL-Strings
#110093: pager_query does not work when there is whitespace in query

AttachmentSize
pager-288837-163118-110093.patch 606 bytes

#7

stephen.colson - August 6, 2008 - 19:16
Status:needs review» needs work

\s in the regexp is the same as looking for any tab, cr, or lf. I don't really see any reason to enumerate them...

#8

tobiassjosten - August 6, 2008 - 19:34
Status:needs work» needs review
AttachmentSize
pager-288837-163118-110093-2.patch 602 bytes

#9

Damien Tournoud - August 6, 2008 - 19:44
Status:needs review» needs work

Because the new database patch will change that part of the code extensively, there is no point in fixing this in D7, so the version here is correct.

Please take into account Gabor concerns in http://drupal.org/node/163118#comment-620588 ("Why do we restrict the ORDER BY to work with spaces only?").

#11

tobiassjosten - August 6, 2008 - 20:58
Status:needs work» needs review

Made the ORDER BY part also work with all sorts of whitespaces.

AttachmentSize
pager-288837-163118-110093-3.patch 606 bytes

#12

stephen.colson - August 7, 2008 - 13:04

Looks good to me.

#13

Damien Tournoud - August 27, 2008 - 09:25
Version:6.x-dev» 7.x-dev
Status:needs review» reviewed & tested by the community

This should be bumped to D7 (applies cleanly there). Looks good to me.

#14

Damien Tournoud - August 27, 2008 - 09:27

#39302: multi-line queries failure was a duplicate, even if much older.

#15

fgm - August 27, 2008 - 10:05
Status:reviewed & tested by the community» needs review

The old #39302: multi-line queries failure issue of which this one is a duplicate (and not the other way round) also raised another question: Killes wrote there that multi-line queries were not allowed by the coding standards, even if this is now fixed by the proposed patch.

If there is indeed a policy about not allowing multi-line queries, it should be added to the SQL coding conventions page which currently doesn't list such a requirement. Otherwise, that same page should probably be updated with a recommended formatting for long queries, even if only to say the single-line is recommended, as seems to be the case in core.

#16

Damien Tournoud - August 27, 2008 - 10:11
Status:needs review» reviewed & tested by the community

@fgm: that's a different issue. As far as I know, multiline queries *are* allowed (there are even some in core), and the SQL coding conventions made no reference to that issue. The current practice is to avoid multi-line queries when there are no readability issues.

#17

fgm - August 27, 2008 - 19:05

This sounds quite logical, but is not in agreement with what Killes wrote in #39302, so it would be better if the practice you describe was indeed codified. The multiline query which used to be in book.module is no longer there in D7, and I'm not sure there are others in core, so it's all the more important to have a proper documentation.

#18

Dries - August 28, 2008 - 08:19

Not sure. I'd rather not make those regex slower.

#19

tobiassjosten - August 28, 2008 - 08:51

How about skipping regexp all together by simply enclosing the actual query in a sub select when creating the counter query. Something like $counter = 'SELECT COUNT(*) FROM (' . $query . ') counter';. That'd make it faster and make it handle any query. I can't see any downside to this method either - maybe you can?

#20

Crell - August 28, 2008 - 09:34

A regex is the Wrong Way(tm) to be dealing with SQL. Instead we should be moving pager query to the new dynamic query builder. That requires first implementing #299267: Add "Extender" support to SELECT query builder. Let's get back to this after that goes in so we can do this right.

(Any place we're regexing a query needs to become a dynamic query, without exception.)

#21

Crell - August 28, 2008 - 09:34

A regex is the Wrong Way(tm) to be dealing with SQL. Instead we should be moving pager query to the new dynamic query builder. That requires first implementing #299267: Add "Extender" support to SELECT query builder. Let's get back to this after that goes in so we can do this right.

(Any place we're regexing a query needs to become a dynamic query, without exception.)

#22

Dries - August 30, 2008 - 09:58

Should this go back to 'code needs work' then? :)

#23

Crell - September 8, 2008 - 15:39
Status:reviewed & tested by the community» needs work

Yeah, probably. :-) I'm tempted to go postponed, but I am not 100% certain of how the dynamic pager builder will work yet.

Also, multi-line queries not allowed? That's nonsense. If that's a current standard that needs to change, as any query of remotely interesting complexity is going to go well past 80 characters anyway.

 
 

Drupal is a registered trademark of Dries Buytaert.