Download & Extend

db_query() should whitelist queries

Project:Drupal core
Version:8.x-dev
Component:database system
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Issue tags:API change

Issue Summary

As discussed during the "database mini-summit" inside Drupalcon (really a lunch), we need to whitelist queries that goes thru db_query(): only simple SELECT queries should use that interface.

As discussed, we should have an "unsafe" mode, in which queries go unchecked. For adventurous module writers.

Comments

#1

Status:active» needs review

This is pretty much the approach that we agreed upon, let's see if hell breaks loose.

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch26.95 KBIdleFAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in schema.inc.View details | Re-test

#2

Status:needs review» needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

#3

Status:needs work» needs review

Fixed a syntax error in pgsql/schema.inc. (oups)

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch26.95 KBIdleFAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.View details | Re-test

#4

Why should we have an unchecked mode? I'm afraid this will be the easy way out for module authors. Just flip the flag to TRUE and you're done updating.

#5

Just a couple of questions... would appreciate some insight.

Why is this critical?
Is 'whitelist' the proper term for what's happening here?
What are the performance implications of adding the regex?
Why is this necessary?

#6

Status:needs review» needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

#7

Priority:critical» normal

This looks like:

1. Babysitting broken code.
2. Introducing the equivalent of 'this function is deprecated' BC handling.

Neither are critical. Both are what we have coder module for.

#8

@catch, if we don't enfore SELECT only db_queries, we may end up with lots of modules that do not work across the different database drivers.

#9

Yes, that would be a bug in the module, which we would file bug reports for in the respective issue queue.

I've seen modules use mysql_query before too, but we don't regexp module files for mysql_query every page request and throw exceptions just in case either.

edit: also,

One of the main design goals of DBTNG is to avoid string parsing of queries whenever possible.

http://drupal.org/node/314464#comment-1078529

#10

Priority:normal» critical

This doesn't fall in the "Babysitting broken code" category. If you issue an INSERT query using db_query() it might not fail under MySQL, but might fail under PostgreSQL or SQLite. We are making something that doesn't fail (but should) fail.

This is a critical *task*, not a bug.

Adding a regexp here should not incur any performance hit (but this will have to proven).

#11

Status:needs work» needs review

Installer queries need to be marked as unsafe, too. This should make Drupal install again.

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch27.41 KBIdleFAILED: [[SimpleTest]]: [MySQL] 20,044 pass(es), 0 fail(s), and 1 exception(es).View details | Re-test

#12

Come on, this is over the top. You are trying to enforce your view of the world on other (non-core) modules.

#13

Status:needs review» needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

#14

Status:needs work» needs review

I can't believe we still had a direct UPDATE query in core!

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch28.26 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,075 pass(es).View details | Re-test

#15

If you issue an INSERT query using db_query() it might not fail under MySQL, but might fail under PostgreSQL or SQLite. We are making something that doesn't fail (but should) fail.

Once again, that's a bug, it can be fixed in a bug report for the module, it doesn't need cruft adding to core to protect against it. Also you can write SELECT queries with db_query() which fail on Postgres and SQL too.

#16

Yeah, how could a regexp for every single db_query() possibly have a performance penalty?

#17

The fact that we still had a db_query("UPDATE...") in core says a lot. That query should have failed. We might as well not bother with multiple db drivers if we're going to let db_query() cheat. Could we do this with a simple strpos() though? It would be a lot faster.

#18

@ksenzee: I believe preg_match() will be faster in every case. Regexps are fast and compiled, and generally faster then dumb string functions as soon as you are doing something more then once.

#19

I spoke to Damien in irc but still firmly disagree with this. I would not object to adding a (temporary, removable in D8) regexp (or simple strpos for common patterns) in hook_requirements() and/or hook_modules_enabled() - that could still throw an exception and prevent the module being used until it's cleaned up.

#20

Status:needs review» needs work

+        // Whitelist queries. Only simple SELECT queries should be passed to this
+        // function (and the db_query() wrapper). The queries should start with
+        // SELECT and should not contain any quotes.
+        if (empty($options['unsafe'])) {
+          if (!preg_match('/\s*SELECT\s+[^\']+/', $query)) {

For the regex to match the comment, shouldn't the regex have anchors:

<?php
         
if (!preg_match('/^\s*SELECT\s+[^\']+$/', $query)) {
?>

Also, why are only single quotes evil? Why not double quotes?

#21

Priority:critical» normal

We discussed this in SF as a way to ensure that module developers that haven't heard us ranting about not putting INSERT/UPDATE/DELETE queries into db_query() for the past two years or don't read the documentation don't end up writing code that is impossible to run on non-MySQL. (Any inserts/updates involving LOB or BLOB fields, and on Oracle anything with an empty string because Oracle can't tell the difference between empty string and NULL.) I am OK with it on condition that there is no measurable performance hit.

I am however going to downgrade from critical, though. Critical means "Drupal doesn't work without this". Wrong. Module developers can be too lazy without realizing it without this. That's not critical.

"unsafe" is the wrong key to be using, though. That reads as "hey Drupal, this query isn't safe." Which is not really what we mean here. "allow_full_query"? "allow_non_select"? "relax" => TRUE? Not sure here.

#22

What would prevent anyone encountering a problem from just flipping the flag to TRUE? Can't we make a non API function (eg _install_query) that allows this? (Then again, some people would use that without thinking twice).

#23

CMS that make blockades to development end up with people simply working around the blockade. There are other spots that Drupal does not protect developers from themselves, why such a strong stance on forcing people to make their code cross database compatible?

#24

I'm against babysitting developers who don't know what they're doing and throwing roadblocks in front of developers who do.

#25

I agree with @catch and others - this check would be really great for Coder module, but not here.

There are a million ways in Drupal to write code that works on some systems but breaks on others. I can use core PHP functions that were only introduced in PHP 5.3, use functions from PHP libraries that not everyone has installed, make direct system() calls that run programs that not everyone has, etc... Why is the database layer special? These are all bugs.

Also, there can be legitimate reasons to use db_query() for these queries sometimes. Maybe I'm writing a simple, compatible UPDATE query (and I'm a lazy developer), or more to the point, maybe I'm writing custom code that will only ever run on a single server and I want to use some MySQL-specific hack. Do we really want to make all those people set the 'yes_drupal_I_really_mean_it' => TRUE flag every time they do this?

I think as an alternative the documentation for db_query() could maybe go more in-depth about when and why it's a bad idea to use it? - right now it just states it pretty matter-of-factly without explaining why.

***

Oh, also, here is a more complete list of places in core that are theoretically using the API incorrectly (although a couple have notes suggesting that the database API doesn't support their query yet - don't know if that's still true):

./modules/comment/comment.install:199:  db_query('UPDATE {comment} SET created = changed');
./modules/system/system.install:1732:    db_query("UPDATE {poll_votes} SET chid = (SELECT chid FROM {poll_choices} c WHERE {poll_votes}.chorder = c.chorder AND {poll_votes}.nid = c.nid)");
./modules/system/system.test:1683:    db_query("UPDATE {users} SET pass = :pass WHERE uid = :uid", array(':pass' => $user1->pass, ':uid' => $user1->uid));
./modules/forum/forum.install:280:  db_query('INSERT INTO {forum_index} (SELECT n.nid, n.title, f.tid, n.sticky, n.created, ncs.last_comment_timestamp, ncs.comment_count FROM {node} n INNER JOIN {forum} f on n.vid = f.vid INNER JOIN {node_comment_statistics} ncs ON n.nid = ncs.nid)');
./modules/simpletest/drupal_web_test_case.php:1232:    db_query('INSERT INTO {registry} SELECT * FROM ' . $this->originalPrefix . 'registry');
./modules/simpletest/drupal_web_test_case.php:1233:    db_query('INSERT INTO {registry_file} SELECT * FROM ' . $this->originalPrefix . 'registry_file');
./modules/book/book.install:22:  db_query("DELETE FROM {menu_links} WHERE module = 'book'");
./includes/database/mysql/database.inc:107:    db_query('DELETE FROM {sequences} WHERE value < :value', array(':value' => $max_id));

I'm not sure any of these actually break anything, but for the sake of setting a good example it sounds like it makes sense to convert as many as possible.

#26

@nevets, I'm a little curious why you consider this a blockade to development. The new db layer is quite usable - easier than doing db_query() right, if you ask me. This is more like putting a "Road Closed" sign on the smaller, bumpier road, so people look up and notice the freshly paved highway right in front of them.

There are a million ways in Drupal to write code that works on some systems but breaks on others. ... Why is the database layer special?

This is a special case because it's way too easy to do this wrong. To do it right, contrib authors would have to be even more aware of the coding standards than core contributors, apparently, since we currently have a spot in core where it's done wrong. And if either core or contrib uses the DB layer wrong, we're putting huge barriers in front of anyone who wants to use Drupal on anything but MySQL. Those people's needs are important, and we've made a public commitment to them.

In hindsight, it might have been useful to rename db_query() to something else - maybe db_simple_select(), although that's a dorky name - to make it blindingly obvious that you're only supposed to use it for simple select queries. We didn't do that, and so what we have is a deceptive API. This is a simple change that will help correct that.

#27

@All

If you are going to provide a function that is intended to be used exclusively for SELECT queries then call it db_select_query! No confusion about its function then and no arguments about screening queries for SELECT. The overall API then makes more sense... use db_query if you want to break things, use db_select_query for SELECT, etc.

@crell (#21)

... to ensure that module developers that haven't heard us ranting about not putting INSERT/UPDATE/DELETE queries into db_query() for the past two years or don't read the documentation ...

There is nothing in the D6 documentation for db_query that mentions this although there is an informal comment added about 6 months ago that refers to INSERT. If you had amended the D5/D6 documentation 2 years ago to inform developers working with D5 and D6 that there was a non-backward compatible change coming in D7 and advising how they should code to avoid problems in the future then you would have grounds to complain. As it stands I think it is the developers who will be complaining.

@David_Rothstein (#25)

maybe I'm writing custom code that will only ever run on a single server and I want to use some MySQL-specific hack. Do we really want to make all those people set the 'yes_drupal_I_really_mean_it' => TRUE flag every time they do this?

Agree completely. If you force developers trying to use Drupal for purely in-house applications to jump through hoops to support multi-database capablities they have no interest in then they may well go somewhere else.

cheers

#28

If you force developers trying to use Drupal for purely in-house applications to jump through hoops to support multi-database capablities they have no interest in then they may well go somewhere else.

Or they will write:

<?php
function backdoor_db_query($query, array $args = array(), array $options = array()) {
  if (empty(
$options['target'])) {
   
$options['target'] = 'default';
  }
 
$options['unsafe'] = TRUE;

  return
Database::getConnection($options['target'])->query($query, $args, $options);
}
?>

So, really, what is the point? Developers who don't want to heed advice won't, all of a sudden, read the documentation if you throw a DatabaseInvalidQueryException; they will just look for the quickest, simplest work-around.

#29

@jpmckinney: This isn't aimed at developers who are hell-bent on using the APIs wrong. It's an attempt to help people who are trying to do it right. Why wouldn't your theoretical dev just skip the API entirely and use mysql_query()?

#30

Valid point about people trying to do it right. Still, I'm not convinced throwing exceptions is the best way to reach these developers.

#31

@lostchord: The entire D7 DB layer is a BC break, on every query. Drupal does that in major versions. We have the BC argument every version but we're still in alpha state for D7, so it's too early for the scheduled flame war. Let's wait a few weeks and then we can have it. :-)

@jpmckinney: I am open to suggestions on alternate ways to ensure that developers do things "right". We have an increasing number of BLOB fields in core (all serialized fields should be BLOBs, there's an open issue for that which is in progress), which impacts Postgres. Oracle and SQLServer both need help on both BLOB and LOB, I believe. We need to get people off of db_query() for insert and update queries, and schema queries are just very unstandard to begin with. If you have a better suggestion for how to do that for the 6000+ contrib modules out there, I am open to other ideas.

#32

@Crell - what's wrong with the hook_requirements() + hook_modules_enabled() (or equivalent) from #19.

#33

String parsing source code is icky business and not fool-proof. I could easily see unnacceptable false positives there. Also, that wouldn't allow for "Yes I know I'm not supposed to but I really do have a good reason on this site" cases.

#34

OK let's not do it at all then, I think this is won't fix, and a coder rule.

#35

If this is a won't fix, then we need to remove all the non MySQL drivers from core. Relying on *users* of those databases to fix the bugs is nothing more then pure wishful thinking (see: the history of Drupal < 7 on PostgreSQL).

Making sure that your modules are using the API correctly and that you issue portable queries is the job of *every single* module maintainers. It is not the job of the maintainers of the "third-party" drivers, it is not the job of the users of those engines.

Obviously, this simple regexp will not ensure that every SELECT query is portable. But it takes us a long way in making sure that module maintainers are aware of those issues and don't write queries the wrong way unknowingly.

#36

Its not Drupals job either to make sure people from database portable code, it's job is to facilitate portable database code.

#37

Facilitating good code is exactly what this does. If there's an "I know what I'm doing" option, then nobody's getting forced to do anything. Just add the option and you're done.

I would 100% agree that this is a coder module feature, not something for core, *if* we had renamed db_query(). As it is, 99.9% of contrib modules are going to screw this up somewhere. Damien is right: If we don't do something here, we might as well not pretend we're supporting anything but MySQL.

#38

EDIT: I spoke too soon @ksenzee. You're correct that it does at least offer an option as a way out of the exception.

@Damien, I'm not sure if you spoke incorrectly, but your assertion illustrates why this patch doesn't belong in core, IMO.

Making sure that your modules are using the API correctly and that you issue portable queries is the job of *every single* module maintainers [sic].

This should be documented and put in coder.module. That's how we've solved "making sure that module maintainers are aware of those issues and don't write queries the wrong way unknowingly" this far, as I can see.

#39

Can we reliably write a coder module check that flags non-SELECTs in db_query() and says "hey dude, stop that!"?

#40

Status:needs work» needs review

I'm surprised that no one in this thread has mentioned the simpletest framework, which, BTW tests every patch on d.o. Funny enough, d.o is that same place we host contrib modules. So why not use the test bots to test every recommended release of a contrib module and provide on the project page a box with crosses or ticks next to each database driver indicating driver compliance for that release. This will:

  • Encourage contrib maintainers to write tests for there modules. (Yay!)
  • Encourage contrib maintainers to write compatible code for there modules, or show that there module is pretty poor and anti-compliant. Same on them

The coder module is not a solution as there are many developers who don't use it. The thing about open source is that we cannot control how people use or develop Drupal solutions. But since most people go to the project page to download a module. We can give them guidance on whether to use that module or not. We already do this with usage stats.

So I don't agree with checking for illegal use - we don't do that in parts of the drivers. But I think it would be a good idea to rename the db_query function so that module maintainers do need to re-code there modules which will force them to implement DBTNG correctly, or rename the db_query function if they are lazy.

I propose db_static_query, as it is used for more than just SELECT queries and with good reason, as the drivers don't cover 100% of SQL requirements.

AttachmentSizeStatusTest resultOperations
783814-rename-db_query.patch342.31 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 783814-rename-db_query.patch.View details | Re-test

#41

Status:needs review» needs work

The last submitted patch, 783814-rename-db_query.patch, failed testing.

#42

The placeholders change from D6 to D7 ought to ensure that queries don't work without some kind of tweak anyway, unless they have no placeholders at all, so I'm not sure that a change which breaks all existing D7 modules, and breaks D6 modules a little bit more, just for the sake of breaking them, is much help.

I'm also not clear why multi-database type test bots aren't mentioned on here either, I remember that being discussed in irc and the argument, irrc, was that it was going to be too resource-intensive - I don't see how we can get away without that eventually, so I'd rather have the resources being expended in test slaves than in runtime code on every Drupal 7 site.

#43

We discussed multi-DB testbots in SF with the reps from MS and Oracle. The basic takeaway was that we wanted to have nightly "run everything and report somewhere outside of the queue" setup that MS and Oracle could run themselves on their infrastructure. Presumably we'd then host a similar setup for SQLite and PostgreSQL on the d.o infrastructure. Issues found that way would not be considered critical or issue blockers, but we would know very quickly if something breaks so we can go fix it.

That was only discussed in the context of core, however. I don't know what the resource needs would be to extend it to contrib. This issue is more about contrib. I'm open to the test-server approach instead, if that can be made to work.

Renaming db_query() is not, I think, necessary at this point, as catch said. That's also a huge API break to happen in alpha5, and not one that I would support. :-)

#44

As an example of all the messing we are getting ourselves into, see Randy's surprise that double quotes in static queries don't work consistently in #813310: db_query(): Static query with double quotes is not properly parsed.

Randy is far from being the random module author. I stand that we definitely need to automatically fail those queries, or our cross-database support will remain a pipe dream.

#45

Status:needs work» needs review

Rerolled my earlier patch from #14, that I'm pretty much convinced we need. With the following improvements:

  • properly anchor the regexp (thanks jpmckinney#20)
  • add the double quote to the regexp (thanks rfay)
  • rename 'unsafe' to 'relax_checks', hoping to make Crell happy

#46

The patch.

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch28.06 KBIdleFAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.View details | Re-test

#47

Subscribing... Not completely able to parse this issue.

I certainly think that:

  • If quotes aren't allowed they shouldn't be allowed (single or double)
  • This should be mentioned somewhere in the static queries docs page: http://drupal.org/node/310072

#48

Wrong patch, sorry.

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch28.02 KBIdleFAILED: [[SimpleTest]]: [MySQL] 353 pass(es), 340 fail(s), and 0 exception(es).View details | Re-test

#49

Rfay noted that we need to anchor the regexp on the right.

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch28.02 KBIdleFAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.View details | Re-test

#50

I tested my case and this works for me. Consistent, predictable. Needs to be documented still, of course.

#51

Status:needs review» needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

#52

I benchmarked this, and there is (as expected) no significant difference made by that additional check.

I tested the following:

  • full benchmark + 50 static queries per run
  • 100 runs total

The difference is below the noise level.

#53

Status:needs work» needs review

I don't believe how painful this is. At least this one should install, and I hope I haven't missed a broken query.

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch75.94 KBIdleFAILED: [[SimpleTest]]: [MySQL] 20,352 pass(es), 139 fail(s), and 49 exception(es).View details | Re-test

#54

Status:needs review» needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

#55

Status:needs work» needs review

Ok, so I missed some of those. This should bring us closer.

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch83.04 KBIdleFAILED: [[SimpleTest]]: [MySQL] 20,532 pass(es), 67 fail(s), and 5 exception(es).View details | Re-test

#56

And this one should fix the "Exception thrown without a stack frame" (triggered from DatabaseConnection_mysql::nextIdDelete() which is registered as a shutdown function).

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch83.52 KBIdleFAILED: [[SimpleTest]]: [MySQL] 20,539 pass(es), 2 fail(s), and 5 exception(es).View details | Re-test

#57

Status:needs review» needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

#58

Status:needs work» needs review

Fixing the remaining issues (temporary tables queries should be relaxed, one missing query in menu tests, and one typo in the action tests).

AttachmentSizeStatusTest resultOperations
783814-whitelist-queries.patch86.15 KBIdlePASSED: [[SimpleTest]]: [MySQL] 20,544 pass(es).View details | Re-test

#59

Status:needs review» reviewed & tested by the community

#60

Status:reviewed & tested by the community» needs review

What's going on in DatabasTasks_pgsql? I don't know Postgres, so what is this patch doing there with the LANGUAGE='sql'?

The exception thrown by the new regex check say a bad query was passed to db_query(). That's technically not true, since we do support calling $connection->query() as a first-class API; that's the better way to handle DB switching now.

Of course, that then immediately raises the question in my mind, would we be better off putting this check in db_query() and allowing $connection->query() as the bypass? That would dramatically reduce the amount of code we have to change within the DB drivers themselves to set the bypass flag all over the place. (Some of them are kinda ugly if we don't have any placeholders.)

#61

Can we see the overhead added to an individual db_query() rather than a verbal summary of ab runs please? xhprof output showing the relative cost of the preg compared to the rest of db_query() functions, for say 300 queries, along with microtime on 10,000 runs before and after ought to do.

And on the assumption this ends up going in (which I very strongly feel it shouldn't for the same reasons given repeatedly above), I still want to see a @todo to remove this from D8.

#62

I would still be open to a coder.module rule instead if we can show it will be sufficiently effective at weening people off of doing things in a non-portable way.

#63

To wean people, we need to have clearly stated rules (and hopefully consistent behavior when the rules are violated). I personally think that as far as software reliability over the long term this approach is good. It means that errors are treated as errors. I do understand the (potential) conflict between reliability and performance, but I always go back to reliability as the starting place.

#64

jesus, 10k db_query()'s resulted in 350k function calls.

Although I'd tend to agree with catch, adding the preg_match seems to have a miniscule impact on performance in the grand scheme of things. I don't see it hurting given these benchmarks. I hereby release jpegs into the world.

AttachmentSizeStatusTest resultOperations
783814-withpatch.jpg222.85 KBIgnored: Check issue status.NoneNone
783814-withoutpatch.jpg227.05 KBIgnored: Check issue status.NoneNone

#65

Status:needs review» needs work

@dhthwy - could you post the same thing including the cpu time? I doubt it's that much different but nice to know.

If those numbers are right this in in the region of 1.5% of each call to db_query() which is indeed fairly negligible, although I still wince at adding any overhead at all to the critical path especially when in this case there are non-runtime solutions.

Other issues with the patch. That was only from a cursory glance (I still don't consider the arguments made here for doing this at all to be convincing given alternatives which don't require critical path API changes, and while I might be in a minority, it's not just me, so didn't do a full review yet), however if you want to ignore all that, please don't mark it rtbc without fixing them:

1. Where $relax_checks is used, it should be documented, particularly the nextId() hunks stood out.

2. The patch has to rewrite valid db_query() calls where '' is used as a string literal in the query due to adding those to the regexp (presumably to account for mistakes like ':placeholders'). We should not break valid code using the API correctly just because it might be a false positive for cases when it's not. That's moving from enforcing an existing API to changing it. Those changes should just be reverted, or moved to a separate issue. The separate issue, if one is created, should also correct that fact that those query strings overall are still double quoted, which doesn't match coding standards - in fact if we decided to make such a change for consistency (with or without the regexp) one advantage would be that any normal calls to db_query() would use single quoted strings per coding-standards.

#66

A first pass at answering concerns here:

@Crell: What's going on in DatabasTasks_pgsql? I don't know Postgres, so what is this patch doing there with the LANGUAGE='sql'?

We are not doing anything here, just adding the relax check.

@Crell: Of course, that then immediately raises the question in my mind, would we be better off putting this check in db_query() and allowing $connection->query() as the bypass? That would dramatically reduce the amount of code we have to change within the DB drivers themselves to set the bypass flag all over the place. (Some of them are kinda ugly if we don't have any placeholders.)

I actually floated the idea in DCSF, and you rejected it on the basis of API consistency. And I agree with the you from DCSF.

@Crell: I would still be open to a coder.module rule instead if we can show it will be sufficiently effective at weening people off of doing things in a non-portable way.

For all those that think that social means (like the coder module, which might, one day, be deployed on the drupal.org test infrastructure) will solve this issue, think again. The support of PostgreSQL on Drupal core and modules has been a joke for years.

This patch aims at encouraging module developpers to do the right thing from the start. Once something is broken, it's generally very hard to unbreak. Of course, this complements other tools, like the Coder module.

Even if I'm maintaining the PostgreSQL and SQLite drivers in core, *I will not make sure that every module on Drupal.org works on PostgreSQL and SQLite*.

@catch: The patch has to rewrite valid db_query() calls where '' is used as a string literal in the query due to adding those to the regexp (presumably to account for mistakes like ':placeholders').

Those queries are using string literals. As a consequence, they are not valid.

The separate issue, if one is created, should also correct that fact that those query strings overall are still double quoted, which doesn't match coding standards - in fact if we decided to make such a change for consistency (with or without the regexp) one advantage would be that any normal calls to db_query() would use single quoted strings per coding-standards.

Indeed. The fact that those queries uses double quotes is a symptom that they need to be fixed. I considered doing that but rulled against it, as it would significantly increase the size of this patch.

#67

AttachmentSizeStatusTest resultOperations
783814-2-withpatch.jpg381.36 KBIgnored: Check issue status.NoneNone
783814-2-withoutpatch.jpg371.55 KBIgnored: Check issue status.NoneNone

#68

I don't remember string literals being invalid in db_query(), was unable to find any discussion prior to this issue, or docs either. Double quoted strings, or placing quotes around placeholders yes, but not the examples which are removed in this patch (which is probably why there's so many of them). So if they're invalid we need better documentation than "use the appropriate placeholders to pass string values.".

On this:

Even if I'm maintaining the PostgreSQL and SQLite drivers in core, *I will not make sure that every module on Drupal.org works on PostgreSQL and SQLite*.

We can equally change that sentence to:

I will not make sure that every module on Drupal.org works

And it would be even more reason not to add safety checks to core functions which run on every request.

This is getting repetitive on both sides, but to reiterate, I can't think of a single example where contrib modules being broken has resulted in the addition of a lot of runtime code to core, with the exception of security issues which are their own category (but even when core is extremely helpful the responsibility is on contrib modules to use the API correctly).

@dhthwy: thanks, that looks like 2.2% cpu usage added to db_query(), since db_query() cpu usage itself is only 2-5% of most requests in terms of CPU that confirms a negligible performance difference, so it's really down to whether the change is desirable in itself or not.

#69

I have to admit I am still baffled why a string literal would be a compatibility problem, assuming that SQL is the base for static queries at all. But I'm fine with it, as long as it's consistently handled.

#70

I suspect the quote issue is due to the way the DB itself treats " vs. ' in query strings. It's not something that the Drupal code cares about either way at the moment.

#71

@Crell, not wanting to get sidetracked, but in SQL (I think every variant) single and double quotes are equivalent and are appropriate for string literals. And the failing queries in #813310: db_query(): Static query with double quotes is not properly parsed work fine when issued directly to mysql.

But the question remains: Why are string literals considered "bad" in PDO/DBTNG? Because they can be used by stupid developers to introduce SQL injection? They are not "bad" in the SQL sense - they're a standard and expected part of the language. And we do support static queries.

#72

I believe string literals in queries were considered "bad" starting with D6, on the grounds that they made cross-DB support harder. I cannot find a reference to that in the D6 update guide, though. I thought it was there: http://drupal.org/update/modules/5/6

In practice I believe it's only an issue for BLOB and LOB fields, since those require special handling, including secondary queries, on some databases (Postgres and Oracle in particular). varchars and ints should be safe in practice, but as always it's easier to document "just don't do it" than "just don't do it on these types of fields in these cases under these conditions".

#73

String literals remain fundamental to D6. You have to use them with string-type placeholders in order to have a valid query: db_query('update {node} set ... where title = "%s"', $newtitle ); So I think this is a D7 thing.

#74

Single quotes and double quotes have never been the same in SQL (some discussions about this):

That said, there are several benefits in removing all string literals from queries, which is the point of this issue:

  • Without string literals, all queries are guaranteed to contain only identifiers, so third party drivers can quote reserved identifiers (that even Drupal core still uses) using (more) simple regexps
  • Without string literals, drivers can massage the strings, as needed for Oracle to workaround the empty string == NULL bug
  • Without string literals, we don't risk SQL injections anymore, #win

Ensuring that only SELECT queries are passed to db_query() and that all DML queries (INSERT, UPDATE, DELETE, TRUNCATE, MERGE) are using our query builders also has the following additional benefits:

  • Module writers don't write queries they think are valid, but really aren't,
  • BLOB / LOB fields can be properly massaged (see InsertQuery_pgsql, UpdateQuery_pgsql),
  • The database engine can workaround bugs or semantic differences in those queries (see UpdateQuery_sqlite, DeleteQuery_sqlite),
  • The database engine can implement constructs that are not supported by some database engines (see TruncateQuery_sqlite),

We have a nice database abstraction layer, which will make true database portability a reality, assuming module authors uses the API properly. Relying on social pressure to do that will fail, as we fail to support PostgreSQL for years.

#75

Thanks, DamZ. I'm +1 on clarifying this and enforcing it in the code, all the way.

#76

I'm neutral on removing single quoted literals from a code style perspective. However it's valid ANSI SQL and removing it is a very late API change which judging by the patch will affect a reasonable number of modules. Absolutely protecting against SQL injection might make it worth doing regardless (although we don't have a history of lots of SQL injection issues compare to XSS or access bypass), but adding the API change tag here either way so it gets documented properly if this goes in.

Once again on the social pressure argument, I still don't see what we get here that's not covered by automated testing, which feels like the proper place to be doing these kind of checks rather than in the critical code execution path.

#77

So you're telling me that once people find out "All I have to do is relax_checks = TRUE", that they'll somehow be 'socially pressured' to do the right thing in this case? lol

In other news, hey everybody, just use 'relax_checks' => TRUE,

#78

cha0s, I think you're misinterpreting Damien's comments. This patch is not intended to be social pressure. It is intended to make people say, "Whoa, I'm doing something wrong, guess I'll check the docs." See my road analogy in #26. What we have *now* is social pressure, and it's basically useless.

#79

Version:7.x-dev» 8.x-dev

It seems from the comments above that the more or less decided on way to go here includes a rather intrusive API change to the DB abstraction layer.
Moving to Drupal 8.
If you feel, this must be fixed in Drupal 7, please move back and elevate to critical.
Or if instead this issue should be fixed in Drupal 7 through documentation, hence without an API change, please move back and remove the API change tag.

nobody click here