We should make our database abstraction layer more robust and ensure that module authors can use it without string manipulations inside the query. Several queries use implode() to get their arguments into the query. This is undesirable as we rely on the module author to check the keys and values of such arrays for exploitation attempts.

I have created the attached patch which shouldbe able to allow us to not use implode anymore.

A minor problem is that all inserted values will be treated as strings. This might be a problem with PostgreSQL at least. However, the same strategy is already used in Drupal core without any complaints I know of.

Summary: This patch will alow us to simplify some code in node.module, user.module, taxonomy.module and probably others.

Comments

killes@www.drop.org’s picture

It's a patch.

killes@www.drop.org’s picture

StatusFileSize
new2.07 KB

Squeezed out two lines of code after consultation with Karoly. Adds only 10 loc (plus some docs).

chx’s picture

Do I need to say +1?

killes@www.drop.org’s picture

StatusFileSize
new2.69 KB

After some discussion with Adrian at Drupal Con we found out that we do not know why node_save currently works with pgsql. It currently assumes that all db columns are strings. It seems to work but we should not rely on it.

Here is a patch that checks for the type of field that is inserted.

It needs testing.

drumm’s picture

+1 for making this into an API. I've seen too many hacked together query builders in Drupal and Contrib. I have not tested.

Bèr Kessels’s picture

untested. a big +1 for the feature

killes@www.drop.org’s picture

the patch still applies. the new patch here updates node_save to use it. Untested.

killes@www.drop.org’s picture

StatusFileSize
new926 bytes

the patch still applies. the new patch here updates node_save to use it. Untested.

Cvbge’s picture

After some discussion with Adrian at Drupal Con we found out that we do not know why node_save currently works with pgsql.

Well, it does not work. Real life exampless of not-working include flexinode (my experience) and forum module (as someone reported). Probably also others.

The bug occurs when a normal user (but with necessarily rights) adds a node and he has no controls (the 'moderated', 'sticky', 'published' etc). The, in node_load() $node->sticky, $node->moderated (and maybe others) are set to FALSE (or TRUE, but in this case it works).
When doing printf("%s", FALSE) the FALSE is change to empty string. The sticky and moderated db fields are numeric and postgresql do not accept '' (empty string) as a value of integer type.

The result is for example such error:

warning: pg_query(): Query failed: ERROR:  invalid input syntax for integer: "" in ..../includes/database.pgsql.inc on line 45.

user error: 
query: INSERT INTO node (title, uid, type, teaser, status, moderate, promote, sticky, body, comment, created, changed, nid) VALUES('xx', '2', 'flexinode-1', '<div class="flexinode-body flexinode-1"><div class="flexinode-image-3"><div class="form-item">
 <label>Zdjęcie:</label><br />
 <img alt="xx" src="..../pliki/" /><br /><a href="..../pliki//tmp/male.jpg">Get original file (28KB)</a>
</div>
</div></div>', '1', '', '1', '', '<div class="flexinode-body flexinode-1"><div class="flexinode-image-3"><div class="form-item">
 <label>Zdjęcie:</label><br />
 <img alt="xx" src="..../pliki/" /><br /><a href="..../pliki//tmp/male.jpg">Get original file (28KB)</a>
</div>
< in ..../includes/database.pgsql.inc on line 62.
Cvbge’s picture

StatusFileSize
new500 bytes

Here's a quick fix for 4.6

killes@www.drop.org’s picture

StatusFileSize
new1.8 KB

the last patch wasn't good.

killes@www.drop.org’s picture

StatusFileSize
new3.5 KB

Here is an additional patch for taxonomy.module
it removes the two custom functions that are used for inserts and updates.
Needs testing.

@Cvbge: shoduld we have a better test than just is_numeric in _db_argument_type to fix the pgsql problems? We could also try to pass $value by reference and cast it to int. this function is only used for updates/inserts so we can afford an additional check.

killes@www.drop.org’s picture

StatusFileSize
new2.74 KB

Here's a revised version fo the original patch. _db_argument_type now gets the $value argument by reference and we check for bools too. numerics and bools are cast to int.

Bèr Kessels’s picture

I cannot say anything about performance for I do not know how to benchmark this. But I tried it, And rewrote some queries in a custom module, to use it, and like it a lot.
The code becomes cleaner, and better readable. But above all, i see this idea going into a very interesting and good direction: that of 'more' query builders in core'.
a +1!

Cvbge’s picture

I'll be talking about postgres db only.

The patch won't work as expected in some cases.

Let's say we have a text/char/other character column and we want to insert a text that looks like a number. The _db_argument_type() will recognize the value as a numeric/number etc and will use $key = %d format. Even not considering printf() numeric conversion this can lead to incorrect value inserted into db.
Examples of such problematic texts are: 01 (leading 0), 2e2 (scientific notation), probably also text with trailing/leading white spaces. When inserting such data into db without quotes the postgres db will think this is a number and will change 010 to 10, 2e2 to 200 . Here is an example:

piotr=> CREATE TABLE t(c CHAR(10));
piotr=> INSERT INTO t VALUES (01);
piotr=> INSERT INTO t VALUES (2e2);
piotr=> INSERT INTO t VALUES (-01);
piotr=> INSERT INTO t VALUES (000);
piotr=> SELECT * from t;
     c
------------
 1
 200
 -1
 0
(4 rows)

I think it's safe to use '%s' everywhere. I could not find it writtent directly in postgres documentation (although I remember reading it somewhere some time ago). Postgres will convert the string to correct integer/numeric/bool value.

killes@www.drop.org’s picture

StatusFileSize
new2.69 KB

Ok, the patch now does only distinguish between integerss and the rest. This is needed to decide whether we need quotes.

Cvbge’s picture

I'm sorry, but I see no reason for this integer check. Can you explain why there is a need to differentiate between integers and other types?

The only problem I see is with converting bool(FALSE). This value is converted to empty string ('') and when trying to insert it into numeric, float, bool, date (probably any column that is not text type) etc you get error.
One solution would be to check for bool(FALSE) and convert it to 0. This would work for numeric-like and bool fields. But it would not work for DATA columns (would produce error). Also, entering '0' into text column might not be what the author wanted.
But this is more a problem with the coder - if you have integer/date etc field, insert integer/date/etc data type, not bool!

Side note: the integer-version, the one without '', will be used only for integers (i.e. ..., -2, -1, 0, 1, 2, ... see http://php.net/manual/en/language.types.integer.php).

Cvbge’s picture

"DATA" should be "DATE"

killes@www.drop.org’s picture

StatusFileSize
new2.71 KB

Ok, another attempt. There is indeed no real reason to use %d for integers, they work fine as strings and are converted by sprintf to strings independend of %d or %s. But it is the Right Thing(tm), so I kept it.
Booleans, however, need the %d formatter in order to be converted to 0 and 1. TRUE to 1 always works, but FALSE to 0 only works with %d.

All tests on php 4.3.x.

Cvbge’s picture

The part I was objecting previously looks ok now. I haven't tested the code though.

I still think it'd be nice to change php NULL to real SQL NULLs, but I don't have how to do it (at least not with current approach). The best would be if there was (s)printf flag that would just ignore it's argument...

Thomas Ilsche’s picture

StatusFileSize
new907 bytes

+1, tested

Runs nicely but maybe NULL and floats might need special treatment.

I have attached an INCOMPLETE patch for node_save using the new feature that also fixes the following issue.

killes@www.drop.org’s picture

StatusFileSize
new3.65 KB

Here is an updated patch that takes a lot of the code I added to node_save out again (approx 20 loc). it also fixes a docs glitch in that function. Needs testing and is to be used with the patch from http://drupal.org/node/17656#comment-39222

Cvbge’s picture

A remark about NULL values:

in this patch they are treated as integers and thus allways changed to '0'.
In existing code the '' was INSERTed into text-like columns (%s) if didn't check if the variable was set (or did not care).
If they now use %a and still do not check for NULLs the '0' will be inserted which migh or might not create problems (it won't matter for if()s, but maybe for other uses?)

killes@www.drop.org’s picture

Cvbge: AFAIK we don't use NULL values inside the DB in Drupal core.

Cvbge’s picture

You're probably right, then
1. not checking if a variable is set is a bug (and I've already filled a bug for one of such bugs, can't find it - sticky, moderated etc. fields were not set at all when not selected when submitting a post)
2. There's already a lot of DEFAULT NULL in the database schema...

Cvbge’s picture

Hello, NULLs again.

Previously I wrote that NULLs are treated as integers and passed with %d and converted to 0.

This is of course not true [were I sick when writting it?]. They are treated as strings and converted to '' (empty string)

This will create similar problems as FALSE. Maybe NULL should be treated as %d (and converted to 0) after all?

killes@www.drop.org’s picture

StatusFileSize
new2.72 KB

Ok, I've included is_null() in the condition.

Cvbge’s picture

ok, last thing (really ;)): maybe add some information to docs saying that null is converted to '0' (not to '') ? :)

killes@www.drop.org’s picture

StatusFileSize
new2.85 KB

Kjartan prefers this version. I also added some comment.

killes@www.drop.org’s picture

After some more discussion the consensus was that we should not care too much about NULL values since they are not used in Drupal core and use the second patch (http://drupal.org/files/issues/db_query_2.patch).

chx’s picture

Status: Needs review » Reviewed & tested by the community

I tested said patch today. It applies and it's an absolute must for security purposes. If it would have been in, two out of the three flexinode vulnerabilities would not have happened.

chx’s picture

I meant: if it would have been available in the distant past -- it was not a lament "why this was not commited earlier". Also I was totally wrong because db_query array parameter usage would have been enough for flexinode :( .

Note to self: look around for db_escape_string usage and get those SELECTs reworked.

killes@www.drop.org’s picture

StatusFileSize
new2.82 KB

I've changed a fugly strpos to a less fugly substr and added a code comment for Cvbge.

killes@www.drop.org’s picture

StatusFileSize
new2.82 KB

silly typo in patch.

killes@helios:/home/killes/drupal-cvs$ php -l includes/database.inc
No syntax errors detected in includes/database.inc

:p

Cvbge’s picture

I'm trying to implement support for %b which is needed for storing binary data in the database (see http://drupal.org/node/10407#comment-47173).
I have a working implementation, but it's slower then killes' for queries not involving %a. Below some test results done on Celeron 850 without any load. Only db_query() was benchmarked, total and average of 10000 runs. Difference can be seen in test 2 and 5 mainly.

Test 1:
-------

Original:
Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Test time: 928.140ms, which is 0.093ms per execution

My:
Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Test time: 1009.840ms, which is 0.101ms per execution

Killes:
Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Test time: 1007.860ms, which is 0.101ms per execution


Test 2:
-------

Original:
Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Test time: 1528.720ms, which is 0.153ms per execution

My:
Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Test time: 2043.070ms, which is 0.204ms per execution

Killes:
Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Test time: 1656.920ms, which is 0.166ms per execution


Test 3:
-------

My:
Test 1:  Testing INSERT INTO {node} %a...
Test time: 2912.160ms, which is 0.291ms per execution

Killes:
Test 1:  Testing INSERT INTO {node} %a...
Test time: 2874.060ms, which is 0.287ms per execution


Test 4:
-------

My:
Test 1:  Testing UPDATE {node} SET %a WHERE nid = %d...
Test time: 3229.650ms, which is 0.323ms per execution

Killes:
Test 1:  Testing UPDATE {node} SET %a WHERE nid = %d...
Test time: 2929.280ms, which is 0.293ms per execution


Test 5: lots of arguments
-------

Original:
Test 1:  Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Test time: 2058.200ms, which is 0.206ms per execution

Me:
Test 1:  Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Test time: 3465.680ms, which is 0.347ms per execution

Killes:
Test 1:  Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Test time: 2216.160ms, which is 0.222ms per execution

Cvbge’s picture

Version: » 4.6.3
StatusFileSize
new2.97 KB

ok, last test was not done correctly, this time they are ;)
Attaching script for testing.
Again tests were done using 10000 runs.

Results seem to suggest that the longer the query, the slower my db_query() gets. I should test it more, I'll probably do later.
But I'd like to note that the query parsing time is a) much smaller then query execution time b) it's very small anyway.
Let's take the highest query parsing time from the test, 0.6ms. Even if there were 1000 such queries that would make total time of 600ms, that is 0.6 second. Such time is almost marginal.

But I'd like to see some real queries, from email I sent to drupal-devel:

I need some overview of what kind of queries are executed on real drupal
site and in what amount. I need a list of queries sent to _db_query().
IIRC devel module has such possibility, it can list all queries on a
page, with number of executions (and maybe time?).

The best would be moderately rich site (maybe a front page with some
stories/flexinode type or other and with blocks - latest posts, users
online, some images etc) - something with a variety of everything...
At least that's the best site for benchmarking. Some average rich site.

Results:
Original db_query(): (%a queries are not meaningfull because original function does not support it)

Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 708.65ms, avg query time: 0.07ms, std. dev.: 0.01ms

Test 2:  Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 1225.82ms, avg query time: 0.12ms, std. dev.: 0.10ms

Test 3:  Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 1820.12ms, avg query time: 0.18ms, std. dev.: 0.03ms

Test 4:  Testing INSERT INTO {node} %a...
Total test time: 1325.25ms, avg query time: 0.13ms, std. dev.: 0.02ms

Test 5:  Testing UPDATE {node} SET %a WHERE nid = %d...
Total test time: 1430.40ms, avg query time: 0.14ms, std. dev.: 0.03ms

My version:

Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 711.73ms, avg query time: 0.07ms, std. dev.: 0.01ms

Test 2:  Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 1715.24ms, avg query time: 0.17ms, std. dev.: 0.02ms

Test 3:  Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 3139.22ms, avg query time: 0.31ms, std. dev.: 0.02ms

Test 4:  Testing INSERT INTO {node} %a...
Total test time: 2623.54ms, avg query time: 0.26ms, std. dev.: 0.02ms

Test 5:  Testing UPDATE {node} SET %a WHERE nid = %d...
Total test time: 2835.83ms, avg query time: 0.28ms, std. dev.: 0.02ms

Test 6:  Testing UPDATE {node} SET %a WHERE nid = %d AND pid = '%s' OR fid = %d AND foo = '%b' AND aid = '%d' OR bid = '%d' AND cid = '%b' OR did = '%s' AND eid = '%s'  GROUP BY somethig ORDER BY 1,2...
Total test time: 5841.36ms, avg query time: 0.58ms, std. dev.: 0.03ms

Killes version:

Test 1:  Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 709.86ms, avg query time: 0.07ms, std. dev.: 0.01ms

Test 2:  Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 1256.60ms, avg query time: 0.13ms, std. dev.: 0.02ms

Test 3:  Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 1866.38ms, avg query time: 0.19ms, std. dev.: 0.01ms

Test 4:  Testing INSERT INTO {node} %a...
Total test time: 2619.78ms, avg query time: 0.26ms, std. dev.: 0.01ms

Test 5:  Testing UPDATE {node} SET %a WHERE nid = %d...
Total test time: 2667.97ms, avg query time: 0.27ms, std. dev.: 0.02ms

Test 6:  Testing UPDATE {node} SET %a WHERE nid = %d AND pid = '%s' OR fid = %d AND foo = '%b' AND aid = '%d' OR bid = '%d' AND cid = '%b' OR did = '%s' AND eid = '%s'  GROUP BY somethig ORDER BY 1,2...
Total test time: 4158.79ms, avg query time: 0.42ms, std. dev.: 0.02ms
Cvbge’s picture

Version: 4.6.3 »

Fixing version.

Cvbge’s picture

Version: » x.y.z

I've been busy for a couple of last days, but finally got some time and used devel.module on a copy of real site. I've also enabled all modules and all blocks.

For admin/access: "Executed 40 queries in 182.94 microseconds." for the first time (no cache), and "32 queries in 107.16 microseconds" for the second time (cached) (hmm for uid=1?)
Longest taking query was INSERT INTO {cache} - 60ms. Parsing time was 2.3ms.
Fastest one was SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %d which took 1.6ms. Parsing time was 0.13ms.
SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESC took 13ms and parsing time was 0.14ms.

For main page: "Executed 72 queries in 243.69 microseconds."
Again longest query were
INSERT INTO {cache} (53ms),
UPDATE {cache} (10ms),
SELECT * FROM menu ORDER BY mid ASC (18ms).
After being cached longest queries were
SELECT COUNT(*) FROM node_access WHERE nid = 0 AND CONCAT(realm, gid) IN ('all0') AND grant_view = 1 (6.5ms),
SELECT t.* FROM term_data t, term_node r WHERE r.tid = t.tid AND r.nid = 23 ORDER BY weight, name (6.75ms), and two
SELECT flexinode_1.textual_data AS flexinode_1, flexin ... flexinode's query that took 7 and 5ms.

I believe this proves that query parsing times need not to be taken into account.

Cvbge’s picture

BTW, "cache support" was disabled, so how come {cache} was used?

Cvbge’s picture

Status: Reviewed & tested by the community » Needs work

Well it's not going to get in so changing the status to remove from search results for RTBC issues ;)

moshe weitzman’s picture

well, whats the status here? seems like most think this is a good approach. i agree, but haven't tested yet.

killes@www.drop.org’s picture

The problem is that Dries doesn't like it...

svenax’s picture

Sorry for bumping an old issue, but is there a reason that the %a syntax hasn't been included? I think it would be absolutely terrific to be able to construct complex INSERT and UPDATE queries in this way. It sure would make life easier.

killes@www.drop.org’s picture

I think the main reason is "Dries doesn't like it" :p

I still think it is an excellent idea.

Crell’s picture

Dries shot down using a separate function, too. Similiar functionality is now in helpers.module. See: http://drupal.org/node/53488

Crell’s picture

Version: x.y.z » 7.x-dev
Status: Needs work » Closed (duplicate)

The functionality this patch offers has been reimplemented as part of the Database TNG patch: http://drupal.org/node/225450