Pager.inc regexp misses whitespace use case
stephen.colson - July 30, 2008 - 01:03
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | database system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
Description
function pager_query's count regexp is missing a use case. Currently, it searches for the following on line 65 of pager.inc:
$count_query = preg_replace(array('/SELECT.*?FROM /As', '/ORDER BY .*/'), array('SELECT COUNT(*) FROM ', ''), $query);Note that after the FROM in the match side is a space. I recently ran in to a module having problems and after debugging found that it introduced a newline (and no space) immediately following the FROM keyword. Thus, I would propose the count gets changed to the following:
$count_query = preg_replace(array('/SELECT.*?FROM(\s)*/As', '/ORDER BY .*/'), array('SELECT COUNT(*) FROM ', ''), $query);
#1
I just confirmed the initial preg match is in CVS HEAD, so re-setting the applicable version.
#2
Matching that last space is, in my opinion, a waste of precious bytes.
Attached patch simply removes the space from after the FROM statement, from the regexp pattern and replacement, to allow a more structured style (see below). This fixes the pager counter, which would otherwise fail and return garbage page numbers.
$query = 'SELECT
my_field
FROM
my_table';
#3
#4
The problem with not matching a space afterward is there is the possibility that someone, for some reason, put in part of their query SELECT n.nid, FROMBLAH.uid, z.foo, FROM {node} as n, ...
The match, in my opinion, must be on FROM as a word by itself and not as the beginning of a word.
#5
Hm. As I see it, this would fail on:
SELECT NID, VID, FROMAGE FROM {cheese}#6
Here's a second attempt at solving this issue, now matching pretty much any acceptable character. I also added this character class to between SELECT and FROM, so that it also will not break, as HEAD currently does, on following query:
SELECT NID, VID, AGEFROM FROM {cheese}Because of the similarity in characteristic of below issues, I also added the trim() method to the query, to solve them as well.
#163118: pager_query() misses multi.line SELECT\tab\FROM\tab.. SQL-Strings
#110093: pager_query does not work when there is whitespace in query
#7
\s in the regexp is the same as looking for any tab, cr, or lf. I don't really see any reason to enumerate them...
#8
#9
Because the new database patch will change that part of the code extensively, there is no point in fixing this in D7, so the version here is correct.
Please take into account Gabor concerns in http://drupal.org/node/163118#comment-620588 ("Why do we restrict the ORDER BY to work with spaces only?").
#10
Were duplicates:
#163118: pager_query() misses multi.line SELECT\tab\FROM\tab.. SQL-Strings
#110093: pager_query does not work when there is whitespace in query
#11
Made the ORDER BY part also work with all sorts of whitespaces.
#12
Looks good to me.
#13
This should be bumped to D7 (applies cleanly there). Looks good to me.
#14
#39302: multi-line queries failure was a duplicate, even if much older.
#15
The old #39302: multi-line queries failure issue of which this one is a duplicate (and not the other way round) also raised another question: Killes wrote there that multi-line queries were not allowed by the coding standards, even if this is now fixed by the proposed patch.
If there is indeed a policy about not allowing multi-line queries, it should be added to the SQL coding conventions page which currently doesn't list such a requirement. Otherwise, that same page should probably be updated with a recommended formatting for long queries, even if only to say the single-line is recommended, as seems to be the case in core.
#16
@fgm: that's a different issue. As far as I know, multiline queries *are* allowed (there are even some in core), and the SQL coding conventions made no reference to that issue. The current practice is to avoid multi-line queries when there are no readability issues.
#17
This sounds quite logical, but is not in agreement with what Killes wrote in #39302, so it would be better if the practice you describe was indeed codified. The multiline query which used to be in book.module is no longer there in D7, and I'm not sure there are others in core, so it's all the more important to have a proper documentation.
#18
Not sure. I'd rather not make those regex slower.
#19
How about skipping regexp all together by simply enclosing the actual query in a sub select when creating the counter query. Something like
$counter = 'SELECT COUNT(*) FROM (' . $query . ') counter';. That'd make it faster and make it handle any query. I can't see any downside to this method either - maybe you can?#20
A regex is the Wrong Way(tm) to be dealing with SQL. Instead we should be moving pager query to the new dynamic query builder. That requires first implementing #299267: Add "Extender" support to SELECT query builder. Let's get back to this after that goes in so we can do this right.
(Any place we're regexing a query needs to become a dynamic query, without exception.)
#21
A regex is the Wrong Way(tm) to be dealing with SQL. Instead we should be moving pager query to the new dynamic query builder. That requires first implementing #299267: Add "Extender" support to SELECT query builder. Let's get back to this after that goes in so we can do this right.
(Any place we're regexing a query needs to become a dynamic query, without exception.)
#22
Should this go back to 'code needs work' then? :)
#23
Yeah, probably. :-) I'm tempted to go postponed, but I am not 100% certain of how the dynamic pager builder will work yet.
Also, multi-line queries not allowed? That's nonsense. If that's a current standard that needs to change, as any query of remotely interesting complexity is going to go well past 80 characters anyway.