to be database independent and ansi-sql compliant, our sql should not contain sql reserved words. i hacked a little perl script which checks a sql dump against http://developer.mimer.se/validator/sql-reserved-words.tml. here's the output of checking database.mysql:

line 22: module
line 33: module
line 39: path
line 91: data (SQL-99)
line 120: timestamp
line 131: user
line 135: timestamp
line 148: timestamp
line 161: timestamp
line 177: timestamp
line 192: timestamp
line 220: no
line 234: timestamp
line 270: static (SQL-99)
line 333: new (SQL-99)
line 334: old (SQL-99)
line 344: timestamp
line 366: count
line 379: size
line 380: timestamp
line 488: timestamp
line 493: language
line 496: session
line 497: data (SQL-99)
line 509: value
line 541: timestamp

quite some "errors" ... what do you think?

in any case: i'd would suggest anyone implementing database changes to check their sql at http://developer.mimer.se/validator/index.htm first

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

Priority: Major » Normal

This is slowly getting better over time. Closing this bug report as it is more of a coding/naming standards issue.

Uwe Hermann’s picture

Component: Other » database system
Priority: Normal » Minor

Reopening, but setting to minor. I think this is still an issue which should be solved and I think it'll be forgotten if the bug is closed.

tostinni’s picture

May I post this thread to remember :
http://drupal.org/node/4907 It's about Oracle reserved Word, like UID :(

hswong3i’s picture

see also SQL naming conventions in the handbook.

carlmcdade’s picture

This is a critical bug as it effects every Drupal user. The fact that the issue is 3 years old and many releases later is really upsetting.

The use of timestamp as a column name has now caused a problem using standard mySQL tools for database recovery. Timestamp is also being inappropriately used as an SQL identifier and as a PHP variable. I feel for those that have db's in the 10mb+ range with this problem.

This should be taken care of before any final releases of 4.6 so the buck stops with that version.

Sorry, I don't have time to write a patch for this... I am too busy trying to get backups working on some sites.

What a friggin' mess!

Uwe Hermann’s picture

Any updates on this?

Cvbge’s picture

Fixing this would be quite hard, wouldn't it?

RobRoy’s picture

Isn't this just as easy as adding a backtick (`) around the all table names? That is what MySQL recommends. Would this break pgSQL, etc. or could we just do a strip('`') for those functions?

magico’s picture

Title: SQL Reserved Words » Remove (or sanitize) the use of SQL reserved words in query statements
Priority: Minor » Normal

RobRoy gave the solution to mysql.
We should discuss further this problem, because it can have potential problems in the future!

Anonymous’s picture

Version: x.y.z » 5.x-dev

Mostly chiming in to subscribe but also to remind of the issue. RobRoy's question goes unanswered about psSQL and using the backtick character. The backtick would need to be used for the table column name as well as the table name to fully resolve this issue. What about other DB engines and the use of the backtick, e.g. Oracle was mentioned in this thread?

chx’s picture

Title: Remove (or sanitize) the use of SQL reserved words in query statements » Remove the use of SQL reserved words in query statements
Version: 5.x-dev » 6.x-dev
Assigned: Unassigned » chx
Priority: Normal » Critical
NaX’s picture

I did a search on this issue and found a few things that might help with this.

phpBB http://area51.phpbb.com/phpBB/viewtopic.php?t=20911

Table, index and column names are user supplied identifiers. To minimize portability related issues, all user supplied identifiers are subject to the following rules:

Backticked (quoted) identifiers: Some SQL servers allow to use non-ASCII characters, or even Reserved Keywords, when identifiers are backticked (MySQL) or double quoted (other SQL servers). However, even if backticked, a valid identifier for MySQL can be considered illegal on other SQL servers. Therefore, backticks are ignored and the rest of the rules enforced to minimize portability related issues.

http://www.radicore.org/installation.php

In order to allow the use of database names with '-' (hyphen) instead of '_' (underscore) you must use the configuration option sql_mode='ANSI_QUOTES' so that database names can be enclosed in double quotes (the SQL standard) as well as backticks (specific to MySQL). Please refer to Selecting SQL Modes for details. If your version of MySQL is before 4.1 then you must go info MySQL Administrator->Startup Variables->Advanced and ensure that Use ANSI SQL is checked ON. If your version of MySQL is 4.1 or later this option will be set programmatically.

mysql: sql modes
http://dev.mysql.com/doc/refman/4.1/en/server-sql-mode.html

From what I can tell MySQL uses backticks and pgSQL uses double quotes, and I think Oracle supports both but I cant be sure. I can only test MySQL.

hickory’s picture

Still broken for table names with hyphens - putting backticks outside the curly brackets is a workaround.

Anonymous’s picture

Perhaps an alternative could be to have define ('DB_QUOTE', '`'); in settings.php and use the defined constant in the code? The backtick would be the default since MySql is the default.

RobRoy’s picture

No, IMO. We just need to change those reserved words to something non-reserved and keep it that way.

Anonymous’s picture

@RobRoy: That works for today but tomorrow when SQL engine version x introduces new reserved words we must go change our tables and columns. Then there are the contributed developers who have SQL engine X and then the contributed user trys to use it with SQL engine Y and we discover new reserved words. IMO, your fix is the incorrect fix for the long term even though it works for 99% of the users today.

magico’s picture

+1 to earnie suggestion at #14.

ff1’s picture

Many users (including myself) wouldn't have a clue what character their database uses to enclose table and column names. I would be much happier if the installation script added earnie's code define('DB_QUOTE', '`'); to the settings.php file for me.

BioALIEN’s picture

I think earnie's suggestion is valid. Stick the define in settings.php and default it to mysql and keep everything consistent with the current Drupal way of doing things. Might I add, that this is also the simplest approach to fixing this problem?

It's already dragged on for years, lets not hold it back any longer please.

ff1’s picture

Fair comment. :) I aggree it is better to have a solution in there now than to attempt sort out every aspect of usability and end up with no solution in 6.x.

+1 to earnir suggestion in #14.

hswong3i’s picture

i found that drupal-6.x-dev is still using "session" within {sessions}, which is a reserved word under ANSI SQL-99 standard (http://developer.mimer.se/validator/sql-reserved-words.tml), and also listed within our reserved word list: http://drupal.org/node/141051.

Oracle (http://drupal.org/node/39260) and DB2 (http://drupal.org/node/165788) driver for D6 try to rewrite ALL reserved words that are beside ALL version of ANSI reserved word (which also means: at least Oracle and DB2 driver will able to due with all ANSI standard). should we remove such column name on time?

hswong3i’s picture

after a complete research, i found that we are still using the following ANSI SQL-92/99/2003 reserved words within drupal-6.x-dev: timestamp, data, path, position, external, depth, module, session, language, value, translate.

since it is too much for improve and remove these reserved words, i will try to escape them within oracle and db2 driver, internally. may be someday we will able to clean them up, but i guess now shouldn't be a good timing :(

chx’s picture

Version: 6.x-dev » 7.x-dev
Assigned: chx » Unassigned
Category: bug » task
Priority: Critical » Normal
hswong3i’s picture

Assigned: Unassigned » hswong3i

i guess this issue will able to be solved by at least 3 solutions:

  1. correct all of our core table/column/constrain name as no conflict with EVERY database system, as like as previous discussion within this issue. but the CONS is very simple (#16):

    @RobRoy: That works for today but tomorrow when SQL engine version x introduces new reserved words we must go change our tables and columns. Then there are the contributed developers who have SQL engine X and then the contributed user try to use it with SQL engine Y and we discover new reserved words. IMO, your fix is the incorrect fix for the long term even though it works for 99% of the users today.

  2. so how about manually escape all database specific reserved keywords within database abstract driver, as like as latest Oracle (http://drupal.org/node/39260) and DB2 (http://drupal.org/node/165788)? again, it is functioning on today, but soon be out dated when database vendor update their standard. we just can seems this as a temporary solution. BTW, this is a good starting point, as it can handle the case of todays Oracle and DB2 implementation
  3. so how about extending our {} syntax, says enclose all the stuffs that need to be escaped by using [], as like as Gallery2's handling? e.g.
    <?php
    // Free-form SQL
    $query = 'SELECT [SomeTable::someField], [SomeTable::otherField], [OtherTable::foo]
                    FROM [SomeTable], [OtherTable]
                    WHERE [SomeTable::id] = [OtherTable::id]
                        AND  [SomeTable::someField] = ?';
    $searchResults = $storage->search($query, array($dataToBind), $someQueryOptions);
    // Now iterate over the search results
    ?>
    

    yes, it seems to be a bit fancy syntax, but this will able to solve our problem permanently. surly that the replacement will be something that is database dependent. e.g. change the syntax into this format:

    <?php
    $result = db_query("SELECT [nid], [votes], [score], [users] FROM [{node}]");
    ?>
    

    P.S. as Oracle server will ALWAYS translate table/column/constrain name into UPPER case, UNLES manually enclose it with ", this change of syntax can also serve for this purpose :)

seems the last handling would be the best among both 3 solutions, as it is already proved as successful by Gallery2's implementation. i will try to implement this within technical preview of D7 database abstraction layer (http://drupal.org/node/172541), and hope this handling can solve the problem that have been annoying us for more than 5 years, permanently :)

hswong3i’s picture

Title: Remove the use of SQL reserved words in query statements » DB compatibility: resolve ANSI SQL-92/99/2003 reserved words conflict in query statements
Category: task » bug

After an indeed study, the [] syntax for all table/column/constraint will benefit in many areas:

  1. It is always TRUTH for resolving reserved words conflict.

    Name which are escaped will ALWAYS seems as text, but NEVER as SQL reserved words. Developers will never need to study the variation between different database standard: just use [], and their code will always correct and portable, whatever MySQL, PgSQL, etc.

  2. This can handle problem caused by case sensitive naming database, e.g. Oracle and DB2.

    In case of Oracle, name without escape will ALWAYS transform as UPPER case automatically, which not only cause problem during query input, but also a big problem when fetching data (returned array/object will contain upper case index).

    BTW, name escaped will ALWAYS keep in natural case as user input; no database automatically transformation will taken, so no need for any post-action after data fetching. (Case is also similar in DB2)

  3. This can give a big hand to database specific constraint name sizing limitation.

    E.g. Oracle restrict max.30 characters for all table/column/constraint name; DB2 even restrict constraint name within 18 characters. My current solution is preform a long-to-short and short-to-long mapping (similar as MS WIN95/DOS filename convert), but we will need to "search" out all table/column/constraint name for this driver level conversation (yes, this trade performance, but code will be more portable...).

    Enclose all table/column/constraint name with [] syntax can simplify the regex checking, on the other hand increase mapping accuracy.

Changing into PDO won't help any of the above problem; avoid the use of reserved word will just keep this issue as ALWAYS OPEN (moodle face the similar problem: http://drupal.org/node/182890#comment-624695). The logic of this change is not complicated (but huge).

hswong3i’s picture

Priority: Normal » Critical
Anonymous’s picture

Priority: Critical » Normal

Every reserved word can be quoted to be used as both table names and column names. The column terms you list are expressive to the data with which they hold so they are valid. I have suggested Drupal SQL to apply an abstraction layer to take care of the differences between MySql and any other DB engine. Let's discuss the possiblities there.

@hswong3i: Do you have a reference for the [] syntax?

hswong3i’s picture

@earnie: please check the point 7 of my previous research result in http://edin.no-ip.com/html/?q=node/310. My previous closed issue for D6 also mention about some detail of the needs of this [] syntax (http://drupal.org/node/182890).

Anonymous’s picture

@hswong3i: As I study your proposal more I think it may be a little too much. We can stick with the current methods we have just as easily, use the back quote ` character to quote the words and if the DB engine doesn't like the back quote then we change the back quote character to match the engine's quoting style. We could even use left/right braces to quote the column names like we do for the table name and substitute the brace for the correct quoting character.

-1 for the [] syntax change.

hswong3i’s picture

@earnie: using back quote ` is just friendly for MySQL, but not even for PgSQL and other DB. Moreover, as the use of back quote ` is not in pair, it is not easy to catch by using regex; but [] syntax is in pair style. In case of MySQL: the regex handling for [] can just as simple as following:

<?php
function db_escape_quote($sql) {
  return preg_replace('/\[([A-Za-z0-9_]+)\]/', DB_QUOTE_OPERATOR . '\1' . DB_QUOTE_OPERATOR, $sql);
}
?>
BTW, In case of Oracle, this [] give us a lot of flexibility for some advance and hidden abstraction handling (e.g. overcome the max.30 characters limitation in table/column/constraint naming):
<?php
function db_escape_quote($sql) {
  return preg_replace_callback('/\[([A-Za-z0-9_]+)\]/', '_db_escape_quote_callback', $sql);
}

/**
 * Helper function for db_escape_quote().
 */
function _db_escape_quote_callback($matches) {
  $match = $matches[1];
  if (strlen($match) > 30) {
    // Try to fetch mapping, if not exists, set it up.
    $trim = variable_get('oci8_' . $match, NULL);
    if (!$trim) {
      // Fetch last counter, and +1 for next mapping.
      $count = variable_get('oci8_max30', 0) + 1;
      variable_set('oci8_max30', $count);
      // Setup mapping and reverse mapping.
      $trim = substr($match, 0, 30 - strlen($count)) . $count;
      variable_set('oci8_' . $match, $trim);
      variable_set('oci8_' . $trim, $match);
      $match = $trim;
    }
  }
  return DB_QUOTE_OPERATOR . $match . DB_QUOTE_OPERATOR;
}
?>

For clarify the idea of change in coding style, here is some simple code sample:

<?php
-  $result = @db_query("SELECT value FROM {variable} WHERE name = '%s'", 'install_task');
+  $result = @db_query("SELECT [value] FROM [{variable}] WHERE [name] = %s", 'install_task');
?>
hswong3i’s picture

Title: DB compatibility: resolve ANSI SQL-92/99/2003 reserved words conflict in query statements » Siren #1: resolve ANSI SQL-92/99/2003 reserved words conflict in query statements
chx’s picture

Title: Siren #1: resolve ANSI SQL-92/99/2003 reserved words conflict in query statements » resolve ANSI SQL-92/99/2003 reserved words conflict in query statements

As Siren is not a project on Drupal.org but an unofficial fork, I am removing it from issue titles to avoid confusion.

hswong3i’s picture

Category: bug » task
Priority: Normal » Critical
Status: Active » Needs review
FileSize
12.83 KB

Since D7 is now open for public development, hope it is now a good timing to complete this task. According to the progress of my personal research project Siren and personal battle target, simply escape ALL potential reserved word can solve this problem for sure, based on 8 successful database drivers implementation, with 4 different database backend (MySQL, PostgreSQL, Oracle and SQLite).

This patch simply introduce the ability of phasing the [] syntax for reserved words escape, plus an example implementation in session.inc. We also need to update tablesort.inc, since we may fill in normal column in table.column pattern. I will introduce other core queries update into number of small patches, after the successful commit of this patch (as we will need to alter EVERY core queries...)

mike2854’s picture

Status: Needs review » Reviewed & tested by the community

For a Oracle-Drupal user,
The great news is it solved the problem of Oracle 30-character limitation.
Also it greatly eliminates those awful table name replacements.

I believe this is a step to push Drupal to enterprise world.

webchick’s picture

Status: Reviewed & tested by the community » Needs review

about [{sessions}]

Why can't the database abstraction layer handle quoting reserved keywords, rather than the developer having to keep a list in her head of all the various names that need to be escaped when writing her queries?

Couldn't we just enclose /all/ table names in DB_QUOTE_OPERATOR during the prefixing operation, to be on the safe side?

hswong3i’s picture

I try to handle all reserved word escape within each DB driver implementation once before, and http://drupal.org/node/39260#comment-632120 contain my past implementation. _db_query_callback_ansi_reserved() try to search out all potential reserved words and escape them, dependent on different database requirement. But this is not work at all, at least too complicated and always miss...

This is because we are not going to escape table name only, but also column and constraint name. According to our existing implementation, there is no way to search them out without ANY miss. Build a universal query analyzer to phase out all query elements for escape is not a possible solution (it should be the duty of DB...), too. IMHO, what DB abstraction can handle is quite limited ;-(

Moreover, using [] should be similar idea as {}, so this is not a flash new idea. As mentioned in #25, this syntax also come with numbers of potential benefit.

Anonymous’s picture

Just quote all table names and column names with DB_QUOTE_OPERATOR. Don't worry with just the reserved words. DB_QUOTE_OPERATOR is set in settings.php file with the default being the backquote ` character since MySql is the default database.

cburschka’s picture

[{sessions}] strikes me as ugly and inconvenient too. Surely if you're already recognizing {} to rewrite table names, you can rewrite the quotes in table names while you're at it?

Perhaps rewrite table names from {table} to [prefix_table] which the other function then rewrites to `prefix_table`. Surely the rewriting happens in a defined order...

chx’s picture

Status: Needs review » Needs work

The burden of handling strange databases should never fall on the shoulders of Drupal developers. The DB driver should carry this burden. I am all for MySQL compliance but this patch is just unacceptable.

For example, we can change Drupal schema and code to not use reserved words, document them and supply a validator script or a link to a validator so others comply too. This is still a burden for the developers but it's a hell lot less, once you have your schema done, it's done. You do not need to use some arcane syntax for every single query.

hickory’s picture

Can we not just put DB_QUOTE_OPERATOR around all the table names (which are already marked out with {}, so it's no extra work for the developer), close this issue, then worry about column names later?

hswong3i’s picture

@Arancaytar: [{}] sounds duplicate and complicate for me, but seems to be a more clean implementation. Since {} is target for table prefix, where [] is target for escape character, they are totally different in meanings. Split can be more independent and flexible, but merge will result as 2 individual syntax: {} for table but [] for column/constraint. IMHO, split should be more elegant.

[prefix_tablename] sounds elegant for me, too. BTW, this may invoke more workload for upgrade our existing implementation. If we both agree this as the optimal solution, I would like to dig with it (upgrade all core queries syntax as [{}] cost me 2 weeks workload for Siren...).

@chx: totally agree with you; on the other hand, this is what I have be done around 3 months ago but fair (see #36). Handle within driver is not too effective: 1. usually have miss, 2. need upgrade (seems like moodle), 3. slow in regex matching, and 4. complicated + not elegant. I don't hope to push the workload to end developers; but that is just something similar as existing {} syntax.

Not using reserved word is similar as moodle approach (http://drupal.org/node/182890#comment-624695). IMHO, they don't solve the problem; on the other hand, we also try to do so since the open of this issue (June 30, 2002) but still not success. This may just left this issue as ALWAYS OPEN and never end. On the other hand, fully use of escape characters is widely applied in phpmyadmin, phppgadmin, Oracle/DB2/MSSQL EM. They always generate queries with escape characters and always solve the problem.

@hickory: it is not possible... Since if column name belongs to reserved word list of database A, it will never function. Resolve reserved word conflict is something can't just handle as partly: it must be done, or else meaningless. Moreover, after using 2 weeks for escape ALL core queries in Siren, I am able to implement DB2 and SQLite drivers within days. Once we get all done, we will never need to review this issue in the coming future :-)

chx’s picture

Can we use backtick? That at least is familiar to mysql developers which are basically all Drupal developers.

Anonymous’s picture

With the other discussions of moving other DB to contrib +1 for backtick. Should be easy enough to use a regex to change ` to ' or whatever is needed.

hswong3i’s picture

Status: Needs work » Needs review

You means `? There is some limitation:

  • It is a MySQL specific syntax. Most databases use ", e.g. PostgreSQL, Oracle, DB2 and SQLite. On the other hand, MySQL also support " (may need some tricky configuration? I don't really remember...). As I remember, " is a SQL-99 standard, too.
  • It is not a pair up syntax. A pair up syntax such as {} is very elegant, since the implementation of its regex is very simple. On the other hand, we don't need to consider about its sequence. Mostly likely, using pair up syntax would be less buggy.

Since () is used for most database internally as special meaning, and {} is already be chosen for our table prefix handling, [] become my one-and-the-only-one choose :-(

chx’s picture

Priority: Critical » Normal
Status: Needs review » Needs work

So... let's see what we have here. The usual problem. In order to support about .0001% of Drupal users, you want to change all queries to read something ungodly. I can't see this happen. Can we get back to discussiong the "change Drupal schema and code to not use reserved words, document them and supply a validator script or a link to a validator so others comply too" approach?

BTW. we are about to submit a new database layer where quite some queries are dynamically built and have their field names supplied separately. That will solve a number of these -- but not all -- because we still support raw queries.

hswong3i’s picture

Priority: Normal » Critical
Status: Needs work » Needs review

@chx: My always position about market sharing is: please don't ask for a compare until we have its build-in support; or else compare with something always zero should be always result as absolute 100%, not ~99.9999%.

Well, the [] is not such ungodly: at least it is already applied to Gallery2. They use this syntax and so able to support both 6 database (MySQL, PostgreSQL, Oracle, DB2, MSSQL and SQLite). It is proved by successful story, and so should be a possible solution.

On the other hand, Gallery2 also provide something similar as our proposed Data API abstraction. By using this type of high level abstraction, raw queries will build for us automatically and internally, and so developers may have less needs to use raw syntax directly. This bi-solution should be the most elegant format according to my research.

Finally, Moodle already did your suggested approach (http://docs.moodle.org/en/XMLDB_reserved_words#The_list). Err... We will need to keep trace on the list update, we will need to have similar resource for support, we may also need script to check our syntax, etc, etc... But a simple [] syntax is very strict forward for database newbie, and it is always truth: no exceptional case, no extra handling :-)

Anonymous’s picture

Priority: Critical » Normal
Status: Needs review » Postponed

Ugh, the suggested [] is just ugly. I'm setting this to postponed to be revisited after the new database methods are in place. I agree that it is important but it certainly isn't critical.

hswong3i’s picture

Priority: Normal » Critical
Status: Postponed » Needs review

@earnie: that may not be totally correct... Since we are going to introduce SQLite support in Drupal 7.x, we will face the reserved word conflict once again. According to my understanding, Data API should be suitable as high level abstraction, e.g. pass in data object, Data API build the raw query, DB abstraction layer handle the database access. Data API can simplify and reduce the use of raw queries directly, but that's not means we can escape from the definition of a suitable raw query syntax.

We may seems this issue as a common base for further more development and abstraction. We need this issue in parallel with Data API or new DB driver development, they are not interchangeable or replaceable.

chx’s picture

*shrug* i have better things to do than battle this. Please do not set back to critical. Critical is when a) Drupal is fundamentally broken b) something we can't or don't want to release without. For those, see Dries' posts. I know that this is important to *you* but not to Drupal as a whole.

I heartily recommend counting what percentage of the current critical queue are your patches and then think again "are these really critical"?

hswong3i’s picture

Well, I guess D7 + SQLite is important to *you*, too. Resolve reserved word conflict can let its development become much easy than what you can imagine. I say this because I get it done, and that's all.

catch’s picture

Priority: Critical » Normal

If/when this is directly holding up SQLite, mark it back to critical. At the moment, it's not.

chx’s picture

After a lot of thought, I believe we can come to a solution here. When the new database system gets in, a lot of queries will go through the builder -- as the field names are separated out in the builder, that's not a problem. The remaining queries will need editing anyways to remove the old %d style placeholders in favor of shiny new :foo placeholders and when that edit happens, well, we might as well add [ ] around the field names. The table names already have their { } so I really do not think we should use [{ }].

Anonymous’s picture

Chx: Why use characters[] that are guaranteed to need to be parsed and replaced? Why not use DB_QUOTE_OPERATOR to avoid having to further parse the field?

hswong3i’s picture

@earnie: may I have more detail? Are you means using "DB_QUOTE_OPERATOR" directly, or using "`" directly? Choosing [ ] since it is a pair up token and so regex will be much simple. We need to quota table/column/constrain name since it is (nearly) not possible to parse those wording from query, within driver internal implementation...

@chx: I take some brain storming about using { } for table name and [ ] for column name, too. In this case, we may need to redefine the meaning of { } as "token for table prefix + escape character", where [ ] just means "token only for escape character": well, they seems duplicated... If we can handle them as separate meanings, code may seems a bit fancy, but the duty division of these tokens could be more clear, so normal developers can simply follow it without bother (P.S. as you mentioned, with the help of new wrappers, we may have lesser chance to use raw syntax directly, so hopefully this may not be too complicated for our future).

Dries’s picture

@hickory: it is not possible... Since if column name belongs to reserved word list of database A, it will never function. Resolve reserved word conflict is something can't just handle as partly: it must be done, or else meaningless. Moreover, after using 2 weeks for escape ALL core queries in Siren, I am able to implement DB2 and SQLite drivers within days. Once we get all done, we will never need to review this issue in the coming future :-)

Do you have a Drupal example of this? It seems to me that fixing the table names gets us a long way -- we'd only have to rename database columns.

Also, with the work on the PDO-fication of Drupal, I'm not keen on committing this patch. It's probably best to let the PDO driver go in first.

hswong3i’s picture

Up to D6, our core conflict with the following ANSI-standard reserved words: data|depth|external|language|module|path|position|session|timestamp|translate|value. Moreover, it is for sure that uid is conflict with Oracle, and so on (sorry that I forget to mark down those conflict with DB2, MSSQL and SQLite: after I implement with [ ] syntax, I don't even need to find them out since all are now safe).

Since the conflict can always solved by using escape character, we won't need to ask contribute developers as multi-database professional, or keep on tracing a huge list of reserved word. A simplified guideline may trigger more developers to build their work as portable.

On the other hand, PDO can't solve this problem, as like as ADOdb can't. E.g. Both Moodle and Gallery2 are using ADOdb, but they solve the resolved word conflict in totally different direction: Moodle choose to avoid the use of conflict, where Gallery2 handle with escape character. PDO can simplify the variation between legacy driver implementation, but it is nothing other than a data-access API.

This issue is all about the last gate (raw query coding syntax) before data injected into database, and so should be no dependence with our next generation PDO + DB API development. It need to be solved individually.

webchick’s picture

Status: Needs review » Postponed

IMO this gets postponed until the PDO stuff. If we have to wildly change how every single database query in Drupal is written, let's not do that twice. :P

My hope is that PDO's query object is smart enough to separate field names from table names from values and so we can apply whatever syntax makes sense just before the query is passed to the DB abstraction layers of choice.

dropiopio’s picture

Status: Postponed » Reviewed & tested by the community

Why not have a solution for most database now but wait long for PDO project?
Loss of time is critical, for drupal loosing ground comparing to competitive CMS.
Isn't it so?
Why not let this minor change go in, then handle the rest of follow up with PDO solution?
is it a loss of time?
thanks

webchick’s picture

Status: Reviewed & tested by the community » Postponed

Because it requires re-writing every single query in Drupal to use [ ] around the fields. If we're going to do that, let's roll it into the PDO changes which are already going to require rewriting every single query.

catch’s picture

Also, there's no time loss here, changes that go in now will be released at the same time as changes that go in four or eight months from now - when D7 is ready.

hswong3i’s picture

Status: Postponed » Needs review
FileSize
15.3 KB

A simple summarize of this issue:

  1. Based on the latest few comment and feedback, we found that this issue can be solved by the use of escape characters. We may still have some variation about its implementation, but the general idea should be acceptable.
  2. BTW, for completely solve this issue, we will need to change every single core query, and the workload is countable, too. E.g. this will result as a patch file with around 14xxx lines (result of cvs diff -uRpN).
  3. We all agree that Data API + PDO will prove some handy abstraction in higher level, so the use of raw query will much decrease in case of D7. This will also simplify the workload as stated in point 2.
  4. According to point 3, we are now postpone this issue until the success of PDO stuffs.

Up to this point, I would like to reactivate this issues, since changing every single query is not its current main focus. The patch itself just trying to introduce a simple feature to db_query() which will not affect any existing implementation, plus 2 examples (session.inc and user_admin_account()) as demo, and that's all. All of the follow up action should be postpone as stated in point 3 and 4, but that shouldn't apply to the patch itself. Any suggestion?

lilou’s picture

Status: Needs review » Needs work

Patch non longer applied ... need to be re-rolled.

chx’s picture

pdo is in. you can use the query builder. your chance is now before we rewrite every query -- for sure we won't do it twice.

I am absolutely not convinced about this all. The main drive for the new database driver is that we dislike string operations... and now you would mandate string operations for every Drupal install when only a small minority really uses it. And dont tell me that it's a small minority because Drupal does not support it, we support postgresql and very few use it and even fewer help. I am deeply concerned over adding stuff to core which relies on a single developer.

Crell’s picture

I am open to changing field/table names as needed to be ANSI complaint. I'd much rather do that then try to add another bit of abstraction on top of it, as that will break down at some point.

The new PDO layer would allow for automated field escaping on all dynamic queries, but not for static queries. (db_query() is not dead.) Unless we introduce another escaping syntax (I believe [] has been proposed before) we cannot cleanly do that, and that would mean still more string parsing of queries, which is a bad thing. Let's just stop using reserved words and be done with it.

catch’s picture

I agree with Crell, if we have to, let's just change the table/field names.

Crell’s picture

I think the way to proceed here would be a series of smaller patches, each one correcting one keyword or set of related keywords (eg, correct the node table). That stands a better chance of being reviewed and getting in.

hswong3i’s picture

Hmm... I would like to recall my previous comment for this issue (e.g. #25, #41 and #46). The avoid of reserved word is already implemented in Moodle for many years; as a result they ALWAYS need to be careful about the update of reserved word list, therefore an endless battle with DB vendor...

Adding a minor functionality of replacing [] into DB abstraction layer is just as simple as case of {}. We don't need to update all core queries at once as they all are still functioning with MySQL and PostgreSQL, only need to change them phase by phase (or maybe directly upgrade those queries with new query builder).

Moreover, my research project CVS repository is ready for the overall change, where patch file can also obtain easily (e.g. patch for Drupal 6.4). May we give some chance for this proposal?

hswong3i’s picture

Replay for #64: I have implement this [] syntax for almost an year, and keep on checking its performance with Apache ab. The result show that there is not much different between normal and [] syntax (just around 1% in case of MySQL and PostgreSQL). Performance degrade shouldn't be our primary concern.

Unless we will remove all static queries from core and replace as query builder, this [] can always ensure a valid static query syntax for every developers; we don't need to ask everyone become a cross database expert :-)

Damien Tournoud’s picture

I have to agree with hswong3i on this one. Implementing a thin abstraction for this one will probably be better in the long run. After all, we have the {} abstraction, the placeholders, PDO, the query builder for exactly the same reason: for not having to worry everywhere about prefixes, quoting or the exact syntax of the queries.

For builded query, the impact of this will probably be unmesurable. For other, it's nothing more that a simple regexp, that could probably be added to the prefix one.

hswong3i’s picture

Category: task » feature
Priority: Normal » Critical
Status: Needs work » Needs review
FileSize
4.95 KB

This patch add $nameQuote (which is overlapped with issue http://drupal.org/node/167335), protected function escapeQuote(), and preform the filtering within prepareQuery(). MySQL will override the escape character with ` internally. Update session.inc as example (which MUST pass or else page can't render correctly), and DELETE queries as query builder.

As mentioned in http://drupal.org/node/371#comment-639343, this abstraction is not only useful for reserved word issue only, which also give a great hand to some database specific limitation.

According to the support of my research project CVS repository, I will able to go though ALL core queries, and upgrade them for both TNG DB and [] syntax, or replace them as query builders if possible. Hope this can speed up the transaction from legacy queries to TNG syntax, too.

Patch via CVS HEAD, tested with MySQL and PostgreSQL. Both pass. May need some code cleanup if http://drupal.org/node/167335 also go into core. May someone give a kindly hand for it?

hswong3i’s picture

I summarize most possible solutions, and blog for a horizontal comparison: http://edin.no-ip.com/node/205. Hopefully this may provide some idea for our brainstorming. Based on this founding, I would like to suggest for:

This 2 approaches are both simple enough for frontend and backend, where also keep overall performance unchanged.

dalin’s picture

Status: Needs review » Needs work

Keep in mind that table prefixing is not always just adding a few characters to the front of a table name. In some cases it needs to add a database name. This happens primarily in the case of multisite installations. Take the following case. You have two sites lets call them Master and Partner. They live on the same server in a multisite configuration and share some tables. Their settings.php files might look something like:

// For master site:
$db_url = 'mysqli://user:pass@server/master_db';
$db_prefix = '';
 
// For partner sites:
$db_url = 'mysqli://user:pass@server/master_db';
$db_prefix = array(
  'default' => 'partner_db.',
  'users' => '',
  'authmap' => '',
  'sequences' => '',
  'vocabulary' => '',
  'taxonomy_term' => '',
  'term_data' => '',
  'term_hierarchy' => '',
  'term_relation' => '',
  'term_synonym' => '',
  'role' => '',
  'profile_fields' => '',
  'profile_values' => '',
  'multisite_login' => '',
  'masquerade' => '',
);

In this case any new escaping function is going to need to turn {table} into `db`.`table` or the like.

dalin’s picture

hswong3i showed me on IRC that this comment for db_prefix_tables implies that multiple databases are not used:

http://api.drupal.org/api/function/db_prefix_tables/6

This chunk of documentation was written over 4 years ago and was committed without any peer review:

http://drupal.org/node/9287

However the setup that I described in #72 is so common that I believe the comment for db_prefix_tables needs to be ignored. Here are a host of tutorials that explain how to setup multisites with shared tables:
http://drupal.org/node/43816
http://groups.drupal.org/multisite
http://www.practicalweb.co.uk/blog/08/08/07/drupal-multisite-shared-tables

Also some of the largest and highest profile Drupal sites run in this configuration. The Politicker Network is the first that comes to mind. There are also many dev shops that run all of their clients on one Drupal install with each client in a different database and a few shared tables for administration.

hswong3i’s picture

Status: Needs work » Needs review
FileSize
4.99 KB

@dalin: Thanks for summarize our IRC discussion as above comments. As you mention that if this $db_prefix => database_name.table_prefix is widely used for some huge Drupal site or dev shop, we may need to consider about the review of existing comment for prefixTables():

Queries sent to Drupal should wrap all table names in curly brackets. This function searches for this syntax and adds Drupal's table prefix to all tables, allowing Drupal to coexist with other systems in the same database if necessary.

For sure that if we follow the guideline strictly, we should only use $db_prefix as table_prefix but not database_name.table_prefix. If we do review our comment, this patch should also update prefixTables() which handle . => ].[ conversion correctly.

On the other hand, the use of $db_prefix => database_name.table_prefix (as like as your example) may introduce another problem: the master_db and partner_db MUST come with SAME username, password, IP, connection port and so on. This is because we only define ONE SET of connection parameter correctly. In this case the use of split DB is just equal effect with all tables located within single database instance. As MySQL come with nearly none of maximum number of tables limitation within single database instance, I can't find the usefulness of split DB handling.

Finally, I go though most of the multi-site setup reference from your comment, but most of them didn't mention for "using $db_prefix as shared database name + table prefix" (Sorry that I just scan and scheme them with 5min...).

P.S. patch via CVS HEAD, update document and replace preg_replace as strtr based on performance concern (http://reinholdweber.com/?p=3). Tested with MySQL and PostgreSQL and pass.

Damien Tournoud’s picture

Status: Needs review » Needs work

@hswong3i: removing the ability to share tables across database in MySQL install is simply not an option. Please find another way.

hswong3i’s picture

Status: Needs work » Needs review

@Damien: is that means you agree with "we should review comment for prefixTables(), and also consider $db_prefix => database_name.table_prefix syntax for this issue, even that is the illegal use of prefixTables() for share tables across database as not expected in original design"?

Test passed automated testing. More information can be found at http://testing.drupal.org/node/13865

Damien Tournoud’s picture

is that means you agree with "we should review comment for prefixTables(), and also consider $db_prefix => database_name.table_prefix syntax for this issue, even that is the illegal use of prefixTables() for share tables across database as not expected in original design"?

The documentation is wrong. That syntax is supported, and a lot of people use it.

hswong3i’s picture

From my point of view this is not a technical but principle concern:

  • Document is already there for more than 4 years (don't say we have no reviewer: at that moment work division is not such clear and Dries have more time to review issue I guess). Please don't simply judge that as wrong and seems everyone as fool. Even reference resources from #73 don't mention the use of $db_prefix => database_name.table_prefix. As handbook from official site is maintained by many contributors, I trust its information.
  • The syntax support != the function is designed for it. E.g. even db_escape_string() is used for tablesort.inc since 2004/10/15, as db_escape_string() is not design for such replacement, we also change its handling with correct implementation. And now prefixTables() is correct in both documentation and implementation, what we need to do is correct the use of wrong syntax, but not change the core as buggy.
  • A lot of people using it wrongly != everyone should agree with it. There maybe a company with 1000 clients which use it wrongly, but for sure that we have more then 1000 Drupal multi-site setup correctly.
  • We can propose some suitable correction for those wrong setup. E.g. Backup DB, change all database tables from backup with suitable prefix, re-import to single database, and also correct the settings.php. We have solution for it.

Support wrong syntax is not such difficult as patch attached. BTW, there is no point to patch core as buddy if the approach is incorrect.

Damien Tournoud’s picture

Status: Needs review » Needs work

@hswong3i: that's full of crap. Having a db_prefix containing a database part is supported under mysql. It is useful, and supported everywhere in Drupal. We don't need to change anything: it's already there.

Anonymous’s picture

$db_prefix => database_name.table_prefix

I don't know how widely used it is. From the text in settings.php one gets the feeling that alphanumeric and underscore should be the only characters used in the prefix. I can see where the quoted syntax would be handy for assigning a common user table.

$db_url['default'] = 'mysql:...';
$db_url['myuserdb'] = 'mysql:...';

$db_prefix['default']['users'] = 'myuserdb.';
hswong3i’s picture

Status: Needs work » Needs review

@Damien: Please be patience and give an indeed review for patch in #79. That patch already handle the exceptional $db_prefix => database_name.table_prefix as you mentioned. Please compare with #74 for more idea.

@earnie: I think that syntax can't help for different connection parameter of each database, as no db_set_active() is ever called. The $db_url will only load during connection establish, where db_set_active() will take action for. As we do a tricky cross database switching with $db_prefix => database_name.table_prefix, Drupal is being cheated as all tables belongs to same database. It is function just because we hit the backdoor of both Drupal and MySQL, but not means this syntax is expected as valid :S

chx’s picture

Status: Needs review » Needs work

See #52 for [{ }] -- I still do not like having _two_ , { } for tables , [ ] for columns . Maybe. Edit: The latest patch is unreadable, totally. strtr inside strtr? come now.

Anonymous’s picture

@hswong3i: If $db_prefix => 'database_name.table_prefix' works at all then what I suggested will as well. db_set_active isn't needed for what I suggest or it would need to be used for what you suggest.

hswong3i’s picture

Status: Needs work » Needs review

@chx: understand that [{}] maybe not your cup of tea, and so I have already give a (bias) horizontal comparison for each possible solution. I would like to have your comment and so understand your idea more clearly:

Here is a quick rundown of my review:

For the problem of dot in $db_prefix, may we refer to its original issue (http://drupal.org/node/103519) for indeed discussion? As example patches from both #74 and #79, adding support for dot in $db_prefix is not such difficult (even nested strtr() is not too elegant as chx mentioned). Keeping focus on suitable escape character placeholder/syntax may be more close for this issue.

hswong3i’s picture

Still I can't believe escapeTables() should support $db_prefix with dot exists (see http://drupal.org/node/302327); BTW, the workload for review and revamp all core queries are much heavy, and so shouldn't delay so much with that issue.

Patch via CVS HEAD, add dot filtering for escapeTables() and temporary slow preg_replace() in escapeConstraints() for zombie ].[ syntax. Tested with MySQL as below settings.php and all pass:

$databases = array (
  'default' => 
  array (
    'default' => 
    array (
      'driver' => 'mysql',
      'database' => 'DRUPAL_7',
      'username' => 'root',
      'password' => 'CHANGE',
      'host' => 'localhost',
      'port' => '',
    ),
  ),
);
$db_prefix = 'DRUPAL_7.';
hswong3i’s picture

Just recall the original motivation from #25. As an DB driver contributor for Drupal, my primary concerns are:

  • All core queries are well escaped.
  • A function hook for escape characters replacement is available.

Patches below demonstrate most possible solutions from my review. Solution 1 and 3 are skipped since difficult for implementation and not too attractive, too:

All patches tested with MySQL and PostgreSQL. The only different are all about the RAW SQL syntax and prefixTables() handling. We can choose either one as the solution. Since that is a heavy workload for review every single core queries as reserved word safe, maybe we should give more time to the follow up of this issues?

lilou’s picture

I prefer solution 2.

Crell’s picture

Status: Needs review » Needs work

Edison: We already said that we are not interested in an approach that adds still more string parsing to every query. Please provide new patches that eliminate the use of reserved words incrementally in separate issues that we can properly review.

hswong3i’s picture

Status: Needs work » Needs review

@Crell: According to the TNG DB abstraction SELECT/INSERT/UPDATE/DELETE query builder, it is for sure that we will come with smaller and smaller chance to use RAW SQL syntax directly. If we just seems the use of escape character as an always truth ability of the abstraction layer, for sure that would be a better choice when compare with eliminate the use of reserved words, which leave this issue keep open forever. It should be a nightmare if Drupal also come with similar reserved word list as Moodle (http://docs.moodle.org/en/XMLDB_reserved_words)...

hswong3i’s picture

I vote for Solution 4 ([@tbl_name] + [col_name]) because it is simple and easy to understand; on the other hand, also close to our Form API or drupal_set_message syntax (using @/#/!/etc as placeholder).

Solution 5 ([{tbl_name}] + [col_name]) is just a lazy syntax based on backward compatibility concern. For solution 2 ({tbl_name} + [col_name]) it seems to be a bit hybrid and not too elegant for me...

dalin’s picture

I suppose the simple solution to keep hswong3i's patch working with tables shared across DBs is to change $db_prefix['default'] to include the appropriate escape character. To this for MySQL:

$db_url = 'mysqli://user:pass@server/master_db';
$db_prefix = array(
  'default' => 'partner_db`.`',
  'users' => '',
  'sequences' => '',
  'vocabulary' => '',
  'taxonomy_term' => '',
  'term_data' => '',
  'term_hierarchy' => '',
  'term_relation' => '',
  'term_synonym' => '',
);

and to this for anything else:

$db_url = 'mysqli://user:pass@server/master_db';
$db_prefix = array(
  'default' => 'partner_db"."',
  'users' => '',
  'sequences' => '',
  'vocabulary' => '',
  'taxonomy_term' => '',
  'term_data' => '',
  'term_hierarchy' => '',
  'term_relation' => '',
  'term_synonym' => '',
);
dalin’s picture

+1 for proposed solution #2: {tbl_name} + [col_name] . I believe this will be one of the easiest solutions for developers to grasp. As opposed to proposed solution #5 ([{tbl_name}] + [col_name]) , many people will forget the order of brackets for tables: is it [{table}] or {[table]}?

This solution also maintains good backwards compatibility (whereas solutions #3 & #4 don't) and all at a negligible performance impact.

I don't think that simply removing all reserved words is the way to go as it is a moving target. We really only want to do this mass rewrite once. If a year from now a database vendor decides to make "node" a reserved word then we are back to square one (Albeit maintaining a list of reserved words could be made easier by adding rules to coder module).

I find it unfortunate that some people are approaching this issue from an exclusively MySQL viewpoint. I think we need to be forward looking. If Drupal is to maintain its growth it must become available to the widest audience possible. There are many organizations around the world who are wedded to one database vendor, whether for contractual or ideological reasons. These are usually larger organizations using "enterprise" database vendors. In its current state Drupal is unusable with these other database vendors without extreme hacks to core. This should be unacceptable to any organisation in their right mind who are looking for a CMS/CMF.

In terms of performance I see the cost of this wider interoperability as a negligible. In terms of workload I think we have never seen a better time (and perhaps never will again) due to the opportunity to piggyback the query rewriting that is already happening for TNG DB.

hswong3i’s picture

@dalin: I have think about this solution once before. But since we will need more document for that, maybe just left the follow up action belong to http://drupal.org/node/302327 whenever this issue is committed (at least, db_name.prefix is now function for both #87's patches). Commit the preferred RAW SQL syntax to core is much important than others for this issue. May I have your comment for those suggested syntax?

dalin’s picture

@hswong3i: I apologize but the language barrier is possibly the biggest source of misunderstandings, miscommunications, and offensive tones in response to your posts. I for one wasn't able to understand a single sentence from your last post.

hswong3i’s picture

@dalin: Sorry for my poor English. I am totally agree your comment for #93.

The different between solution 2 and 4 is my only concern (forget 5 since it is all about lazy). Solution 2 seems to be less workload than 4 but actually they are the same. In case of solution 2 {} will means for both table prefix AND escape character placeholder (so it is a mix); in case of solution 4 the meanings of [] and @ are split absolutely so maybe even simpler for Drupal new comers. Backward compatibility is not such important since we will work together with TNG DB revamp.

hswong3i’s picture

@dalin: thanks for your suggestion in #drupal. Seems solution 4 is not suitable to handle with @ prefix, because it is too close to that of t() syntax (http://api.drupal.org/api/function/t/7, already use !, @ and %). So how about prefix with #?

Here is the patches with solution 4 transformation (both tested with MySQL and PostgreSQl):

  • Solution 4a: [@tbl_name] + [col_name] (original design)
  • Solution 4b: [#tbl_name] + [col_name]

I also give a try with ^ prefix but not work. !@$%&*,.<>?:'"`(){}[] are all come with special meaning or not suitable. I even think about :: but maybe mix with PDO's :var placeholder syntax. # is almost the most possible choice...

webchick’s picture

I'm with Crell here. It's bad form for us to be using ANSI-reserved keywords in our column names, and I'd rather simply address that and keep a list documented in the API docs rather than adding overhead to each query.

Can we see some patches that rename fields like "timestamp" to a more semantic representation?

hswong3i’s picture

@Crell and webchick: So how about if handle this issue with both methods? I think there is no conflict for either solution.

Avoid the use of ANSI-reserved keywords is always truth. {node_counter}.timestamp is a good example that need some love. #22 list those for D6 (I guess D7 should be similar).

But only avoid the use of ANSI-reserved keywords don't solve everything at all. E.g. MySQL and PostgreSQL also come with words besides ANSI standard, which also keep on grouping (check the new reserved words in MySQL 5.0 and 5.1). Introduce escape character abstraction to Drupal core can give a great hand for this problem.

Crell’s picture

Status: Needs review » Closed (won't fix)

The D7 maintainer and the database maintainer, among others, have all now said that we should simply not use reserved words rather than trying to introduce more Drupal-specific string parsing. I am therefore marking this issue won't fix, as it is a dead-end approach until we have at least tried that approach in practice.

Instead, the following should happen:

1) Document all reserved words we should avoid in the handbooks.

2) Submit multiple patches that change field names in core where appropriate.

3) Submit a patch to coder module to check for SQL reserved words.

chx’s picture

Status: Closed (won't fix) » Active

If we are to support every database engine in this world then we have no ways to know ahead of the time. To give you a far fetched idea, ARITH_OVERFLOW is a sybase reserved word, while Arith is the name of a mountain (in Albania) so what if your table has a boolean flag whether that mountain is overflown or not (I can figure out more twisted examples if you so want).

I find the [columnname] and {tablename} easy.

The code is truly ungodly. Edit: I suggest the following:

  $class = get_class();
  if (is_callable(array($class, 'escapeKeywords'))) {
    $query = preg_replace_callback('!\[(.+)\]!U', "$class::escapeKeywords", $query);
 }
 else {
    $query = str_replace(array('[', ']'), '', $query);
 }

I believe this is enough and quite simple. I doubt core needs an escapeKeywords implementation.

hswong3i’s picture

@chx: Solution 2 ({tbl_name} + [col_name] + [constraint_name]) is totally fine for the issue (I just care if this will give some mix up meaning for {}). BTW, I have some suggestion for the code snippet.

strtr() should be better (refer to http://reinholdweber.com/?p=3):

11. str_replace is faster than preg_replace, but strtr is faster than str_replace by a factor of 4.

The cost of the additional function call should be very cheap, so IFELSE for MySQL/PostgreSQL may not give too much benefit.

On the other hand, keep MySQL/PostgreSQL safe should be a good idea, too (reserved words for MySQL is keep on increasing).

P.S. maybe we should request for some input from Dries?

dalin’s picture

To be more consistent with existing db_* functions would it make more sense to add some abstract methods and let the extending class (ie. the engine) deal with it:


// Adds prefixes to table names and escapes table names in the database appropriate format.
abstract protected function escapeTables($sql);

// Escapes column and constraint names in the database appropriate format.
abstract protected function escapeColumnsAndConstraints($sql);

// What used to be prefixTables()
protected function escapeKeywords($sql) {
  $sql = $this->escapeTables($sql);
  return $this->escapeColumnsAndConstraints($sql);
}

I'm not sure about having the table prefixing in the extending class, but I think this general idea is cleaner than using is_callable.

Crell’s picture

We're still talking a major change to the syntax to reintroduce a preg_replace that we just went to a lot of trouble to remove. The goal is not to support every database in the world; we currently couldn't support a non-PDO database, for instance, and the odds of anyone actually using more than 4 DBs is minimal. It's good form to avoid reserved words anyway. Let's do that first, before we introduce another regex and dozen function calls per query.

hswong3i’s picture

@Crell: Sorry that I can't agree with some of your points.

we currently couldn't support a non-PDO database

My research project already show the ability of D6 + MySQL/PostgreSQL/Oracle/SQLite in PDO style, so both 4 databases meet this minimum requirement for D7.

The goal is not to support every database in the world

At least add support for databases that able to integrate with Drupal should be reasonable and acceptable.

and the odds of anyone actually using more than 4 DBs is minimal

But everyone should have the chance to choose what they really needed for. That's not means MySQL can serve every case and requirement, and so the other database can't, too.

before we introduce another regex and dozen function calls per query

As proved from my patches (#87 and #97), no regex is needed (but strtr() instead, as like as escapeTables()), and there is nearly none of performance degrade (>+-1%). When compare with TNG DB this change should be very minor.

hswong3i’s picture

In case if we are still thinking about performance impact, here are some benchmarking result as your reference, for my >+/-1% assumption. CVS HEAD setup with:

Apache/2.2.9 (Unix) PHP/5.2.6
MySQL 5.0.51a-12
Default profile + menu + path + devel CVS HEAD
2000 users
15 taxonomies
250 terms
5000 nodes + terms + patch alias
10000 comments

The best record (i.e. with smallest SD) among 3 concurrent testes (ab -c1 -n100, front page with 10 nodes):

  • CVS HEAD: 504 +/- 82.6ms
  • Solution 2 ({tbl_name} + [col_name]): 515 +/- 75.5ms
  • Solution 4a ([@tbl_name] + [col_name]): 491 +/- 33.1ms
  • Solution 4b ([#tbl_name] + [col_name]): 489 +/- 33.6ms

Please note that both #2/#4a/#4b run escapeConstraints() for EVERY queries (even MySQL don't really needed for) without any performance impact; on the other hand, #4a/#4b is now coming with legacy {} support, so their final performance can even be better.

catch’s picture

I agree 100% with Crell - until there's patches for existing reserved words, and then we actually run into issues after those are committed, I can't see any reason to introduce additional complexity.

hswong3i’s picture

Status: Active » Needs review
FileSize
8.46 KB
347.72 KB
36.94 KB

Here I give a try for complete implementation of solution 5, which also test its correctness and performance with simpletest and ab.

Simpletetest result:
-------------------------
Block: 73 passes, 0 fails, 0 exceptions
Bootstrap: 13 passes, 0 fails, 0 exceptions
Comment: 513 passes, 0 fails, 0 exceptions
Filter: 115 passes, 0 fails, 0 exceptions
Form API: 38 passes, 0 fails, 0 exceptions
Help: 112 passes, 0 fails, 0 exceptions
Menu: 322 passes, 0 fails, 0 exceptions
Node: 250 passes, 0 fails, 0 exceptions
Path: 80 passes, 0 fails, 0 exceptions
Session: 82 passes, 0 fails, 0 exceptions
Simpletest: 46 passes, 4 fails, 0 exceptions (same as D7)
System: 273 passes, 8 fails, 0 exceptions (same as D7)
Taxonomy: 114 passes, 0 fails, 0 exceptions
User: 110 passes, 4 fails, 0 exceptions (same as D7)
Database: fail with Insert tests, Update tests, "Update tests, Complex", "Select tests, complex" (same as D7)

ab result (best record among 5 tests):
-------------------------
CVS HEAD: 394 +/- 47.3ms
with patch: 399 +/- 35.6ms

Testbed setup:
-------------------------
Host: Debian sid (KVM virtual client)
CPU: 2.2GHz AMD single core
Ram: 2GB
PHP: 5.2.6
MySQL: 5.0

Drupal setup:
-------------------------
users: 2000
categories: 15
terms: 250
nodes: 5000, with term and path alias
comments: 10000

The patch handle both normal and expert install profile, running with simpletest, generate dummy data with devel and so on. I don't give any changes to core schema and logic.

I split the patch into 2 part: the 371_db-01.patch focus on changes for our TNG DB layer, where 371_core-01.patch handle all others core queries syntax update (with some minor coding style clean up). Queries syntax update is kept as minimum as possible so no TNG DB syntax change is included.

If this approach is acceptable, I would like to keep on going for other optional core modules :D

P.S. 4 special preg_match() with syslog() is existing for queries debug. I didn't remove them during benchmarking so the real performance may even be more close to CVS HEAD.

webchick’s picture

Status: Needs review » Needs work

Dries @ #55:

It seems to me that fixing the table names gets us a long way -- we'd only have to rename database columns.

Crell @ #64:

I am open to changing field/table names as needed to be ANSI complaint. I'd much rather do that then try to add another bit of abstraction on top of it, as that will break down at some point.

webchick @ #98:

I'm with Crell here. It's bad form for us to be using ANSI-reserved keywords in our column names, and I'd rather simply address that and keep a list documented in the API docs rather than adding overhead to each query.

By my count, that's both core committers and the database subsystem maintainer asking for a patch that renames the tables/fields violating the ANSI SQL 2003 standard. I completely fail to see why we are still going in this [field]/[table] direction.

The [field]/[table] approach, despite its benefits listed @ #25, passes a tremendous amount of manual, error-prone effort for our thousands of day-to-day Drupal developers in favour of tweaky edge cases in proprietary databases that will ultimately benefit 0.00000001% of our users.

Resolving the ANSI SQL 2003 reserved words gets us a good chunk of the way there towards support for *all* database systems, without placing unnecessary burden on our developer base.

As for systems like Oracle, and its proprietary reserved keyword extensions like 'uid', I spoke to Crell about this. He pointed out that one of the advantages of the OO nature of PDO is that various database subsystem libraries can override methods and classes and do whatever the heck they want in, for example, DatabaseConnection::query() or DatabaseStatement. You also might be able to use something like hook_schema_alter() to add a property onto fields that require escaping that you can check for in the query.inc. But all of that is out of scope for this issue.

Let's keep the crazy logic that escapes the 'uid' column to databases/oracle/query.inc, but resolve the keyword problems that benefit 100% of the databases everywhere else.

webchick’s picture

Priority: Critical » Normal
hswong3i’s picture

FileSize
37.91 KB
513.28 KB
16.49 KB

The [field]/[table] approach, despite its benefits listed @ #25, passes a tremendous amount of manual, error-prone effort for our thousands of day-to-day Drupal developers in favour of tweaky edge cases in proprietary databases that will ultimately benefit 0.00000001% of our users.

I have some other point of view. Since D7 come with new query builders, e.g. db_select/db_insert/db_update/db_delete, based on MySQL master/slave requirement and other hidden benefit, normal Drupal developers will no longer require to duel with RAW query syntax daily.

The syntax changes of core will soon become tiny when most core queries are updated with those query builders. That's why I split the patch into 2 parts for simpler review: the 371_core-xx.patch will soon become dummy, so we just need to focus on the changes within 371_db-xx.patch.

Therefore besides the benefits list at #25, the most important requirement of this patch are:

  • If it will harm or destroy any programming logic? Thanks for simpletest this is already proved as safe.
  • Will the patch degrade overall performance? Again, ab test give us a systematic and scientific positive support.
  • Will the patch give a great revolution for D7 developers? No, since we will even use query builders more than RAW syntax, or else MySQL's master/slave setup will not function correctly.

It is not necessary to seems this patch as evil. This fine tune for TNG DB will only give us more benefit but not drawback, and also give a great hand for other database driver implementation :D

Some detail for patches: around 95% of core queries are updated with check of simpletest. Most schema update queries from *.install are left behind. It is completely tested with both MySQL/PostgreSQL and all works fine.

Simpletetest result:
-------------------------
MySQL: Identical result when compare with CVS HEAD.
PostgreSQL: Identical result when compare with CVS HEAD.

ab result (best record among 5 tests):
-------------------------
MySQL + CVS HEAD: 346 +/- 14.8ms
MySQL + patch: 362 +/- 15.4ms
PostgreSQL + CVS HEAD: 584 +/- 20.8ms
PostgreSQL + patch: 587 +/- 21.6ms

Testbed and Drupal setup:
-------------------------
PostgreSQL: 8.3
Others are same as case before.

hswong3i’s picture

Status: Needs work » Needs review
catch’s picture

Priority: Normal » Minor
Status: Needs review » Needs work

@hswong3i "normal Drupal developers will no longer require to duel with RAW query syntax daily."

Yes they will, 95% of SELECT queries will still use db_query(). The query builder for SELECTS should only be used for dynamic queries since it has it's own performance overhead.

chx’s picture

you still use [{ }] which i do not approve... and it's quite visible from the that it's superflous as you add a call to remove them after the prefixtables call.

hswong3i’s picture

Priority: Minor » Critical
Status: Needs work » Needs review

@catch: Maybe think from this way?

  • For core developers: This syntax do simplify the work as mentioned in #25, because it is always truth. As this patch already handle ~95% of core queries and quality checked with simpletest, we only need to concern with daily maintenance. BTW, if we do hope to have Drupal core + other databases support without any trouble, this should be our duty and responsibility for sure.
  • For contribute developers: This syntax is NOT a MUST for contribute modules, because [] just means as escape characters. Existing MySQL-only modules can still works happily without any quote. Unless contribute developers: 1. have enough resource, and 2. coming with cross database skill set, and 3. start to contribute module with multiple database support, this syntax is optional for them.

@chx: So the benefit of [{}] is completely observable: because [] is just optional, the impact for contribute developers can be minimized. ab also prove there is nearly none of performance degrade with this new syntax.

Crell’s picture

Priority: Critical » Minor
Status: Needs review » Needs work

hswong3i: The project lead, D7 branch maintainer, database maintainer, and most prolific core developer have all said no to this approach and given ample reason. The answer is still no. No matter how many times you try to set this to critical, the answer will still be no until an effort is made to eliminate the user of reserved words in the first place, as we've been saying for several months.

You are not accomplishing anything in this thread but holding back Drupal's development. Please stop. If you want to contribute, submit patches that rename fields to not use reserved words, in new issues.

Please do not hurt Drupal any more than your actions already have.

hswong3i’s picture

Priority: Minor » Normal
Status: Needs work » Needs review
FileSize
38.2 KB
513.66 KB

Update 371_db-*.patch based on includes/database/mysql/query.inc (http://drupal.org/node/301501).

Anonymous’s picture

Status: Needs review » Needs work

The last submitted patch failed testing.

hswong3i’s picture

Status: Needs work » Needs review
FileSize
22 KB

As we hope to solve this issue with avoid the use of reserved words, I clone the handling from Moodle. In case of Moodle, there is an admin page which report all reserved words for all supported database backend, where also report conflict keywords if there exists. With this implementation developer will no longer require to check the reserved words list manually.

In order to test the effect of this patch, 1. activate all target modules, 2. access admin/reports/database, 3. get the report. For CVS HEAD, we are now conflict with following keywords (including MySQL, PostgreSQL, SQLite and Oracle):

accesslog.uid
authmap.uid
blogapi_files.uid
comments.comment
comments.uid
files.uid
history.uid
node.comment
node.uid
node_revisions.uid
poll_votes.uid
profile_values.uid
search_dataset.reindex
sessions.session
sessions.uid
simpletest.file
users.uid
users_roles.uid
watchdog.uid

I have prepare the reserved word list for ALL ANSI standard and also some well-know keywords. BTW, I didn't include them during checking. If we do hope to follow with strict ANSI standard, please uncomment those word list and browse the result. BTW, as this an endless-battle with reserved word, I guess we don't need to be perfect but just enough.

The report page is just a prototype. It show off its logic and functionality, and should implement as more elegant. I am going to submit patches for remove the use of above reserved words.

webchick’s picture

Can you post a screenshot of the new screen?

It sounds like something that belongs in devel module rather than core, but I'll withold judgment until I see it in action. :)

hswong3i’s picture

FileSize
95.48 KB
98.86 KB


I even think this report page should belongs to coder :D

dalin’s picture

-1 I think this should go into Coder module instead of Drupal core.

Crell’s picture

This does not belong in core, but definitely should be in coder.module. hswong, can you refile it as a patch against Coder?

hswong3i’s picture

Sure :D

One more suggestion: should we maintain the reserved word list within each driver's schema, so each database driver maintainer should take care of that, and only move the GUI to coder.module?

Crell’s picture

Having the list be per-driver means that someone developing on MySQL will never know about reserved words in Postgres or Oracle. That rather defeats the purpose of having a common list of reserved words to avoid. :-)

We should maintain in coder either a list of the ANSI reserved words (and nothing else), or ANSI plus some reasonable set of words reserved by some database that we are likely to want to support. Maintaining that list outside of core also makes it easier to keep up to date if some database vendor decides to be stupid and reserve even more words in order to break existing code.

grub3’s picture

Great patch hswong3i .

Reserved keywords are a real problem in Drupal database abstraction.
When testing Drupal 6 modules, I found several issues linked to reserved-keywords.

I hope that your test can help developers access to the list of reserved keywords ...

hswong3i’s picture

Or just keep the implementation in simple? E.g. we keep the reserved word list within each driver (I think this should be the duty of each database driver developer), then create a new simpletest for testing core schema. For additional search and report GUI, move it into Coder module. Any idea?

dalin’s picture

@hswong3i you didn't address Crell's comment:

"Having the list be per-driver means that someone developing on MySQL will never know about reserved words in Postgres or Oracle. That rather defeats the purpose of having a common list of reserved words to avoid. :-)"

I don't think simpletests will work either. As far as I understand the simpletest framework, a test for reserved words that will always fail until those reserved words are removed from core cannot be committed to core since it will cause *all* patches to Drupal core to fail testing.

I think coder is really the only way to go. And I think it would be a lot simpler than your latest patch. It would just be creating new rules:
"uid" is a reserved word for Oracle and should not be used as a field or table name
"data" is a reserved word for ANSI/ISO SQL 99 standard and should not be used as field or table name
...

Dries’s picture

Project: Drupal core » Coder
Version: 7.x-dev » 6.x-2.x-dev
Component: database system » Review/Rules

I agree that this does not belong in core. It is for coder module -- moving this issue to coder module.

alexanderpas’s picture

Project: Coder » Drupal core
Version: 6.x-2.x-dev » 7.x-dev
Component: Review/Rules » database system
Priority: Normal » Critical

A technical standard is an established norm or requirement. therefor, in my opinion, as long as the database schema contains ANSI/ISO SQL 92/99/2003 (which is almost 6/10/17 years old) reserved words, this should be considered as a critical issue against drupal core.
note that i'm explicitly ignoring product specific reseved words in the above sentence.

short:
- We MUST not use ANSI/ISO SQL 92/99/2003 reserved words.
- We SHOULD not use product specific reserved words, and if used they MUST (automatically) be escaped properly.

I think we should create a roadmap to fix this issue.
1) Create rules etc... for Coder (both standard and seperate product specific) (and that should actually be the another issue)
2) Fix Drupal Core to adhere at least ANSI/ISO SQL 92/99/2003 standard. (which is this issue)
3) Create Simpletest (or similar) for ANSI/ISO SQL 92/99/2003 standard, to actively prevent any of that code entering core. (which should pass as per previous item, and are not subject to change) (also, let the bot do some of the reviewing ;))
4) Done? am I missing anything?
5) Update coder module product specific lists when a (new) vendor/database decides to add (new) reserved words.

each database product should have a seperate list in the coder module to alloe for modular extension and easy and faster patches.

@hswong3i
Great work, keep on rockin!

@Dries
Yes, I know i'm going right against you ;)
However, you still decide ;)

Status: Needs review » Needs work

The last submitted patch failed testing.

grub3’s picture

A good way to establish database standard would be to use a standard and reference SQL database, like PostgreSQL with pgAdmin3 interface. Don't flame me, but this seems a simple solution. Plus: I runs very well and very fast. I have some 500.000 pages site. It never returns nothing like MySQL sometimes does. One of my site (kdenlive.org) sometimes returns crap when 100 users are connecting! MySQL is really for Windows95 users (joke).

lambic’s picture

Priority: Critical » Normal
FileSize
9.33 KB

I've made a stab at starting this by renaming comment.timestamp to comment.comment_timestamp in the attached patch. If anyone has a better naming standard for renaming these columns I'm all ears.

Anyway, let's see how it goes with the testbot.

lambic’s picture

Status: Needs work » Needs review
catch’s picture

Status: Needs review » Needs work

comment.timestamp should be split into created and updated anyway.

lambic’s picture

Status: Needs work » Needs review
FileSize
9.83 KB

oops, missed a spot

Status: Needs review » Needs work

The last submitted patch failed testing.

lambic’s picture

Status: Needs work » Needs review
FileSize
14.43 KB

once more then I'm going to bed

lambic’s picture

FileSize
14.43 KB

once more then I'm going to bed

Status: Needs review » Needs work

The last submitted patch failed testing.

lambic’s picture

Status: Needs work » Needs review
FileSize
14.44 KB

so close, this one should do it.

Crell’s picture

Status: Needs review » Needs work

This is not performance related, so I'm not sure that it's still kosher to do in code slush. If it is, then as catch says above we really ought to split it into two separate timestamps to match nodes, and then the field names become self-evident: Use the same field names as the node table.

catch’s picture

I think we can still do the schema change as API cleanup (and there's one or two outstanding comment bugs to do with ordering by timestamp iirc) -but I don't think the ANSI issue is enough to warrant the schema change by itself.

lambic’s picture

rename timestamp to created and add updated

or

rename timestamp to updated and add created?

Damien Tournoud’s picture

The split of comment.timestamp is handled by #122098: Split comment.timestamp into 'created' and 'updated' columns.

lambic’s picture

I'm not sure how much of this should be attempted in code slush. The comment one is probably the simplest, and already being taken care of by Damien. The others should maybe wait until D8?

Another consideration for timestamp and other date columns: They are currently int(11) which is not future-proof, should the datatype be changed?

Crell’s picture

Version: 7.x-dev » 8.x-dev

Honestly our timestamp fields should become "real" date/time fields, not integers, but that's not a subject for this patch.

Since the easy-peasy stuff is already in progress elsewhere, I'm bumping this up to Drupal 8.

Cyberwolf’s picture

Subscribing.

chx’s picture

Status: Needs work » Closed (won't fix)

Now that we have DBTNG, you can write a driver and do whatever you want without problems.

alexpott’s picture

Issue summary: View changes

Note that with the advent of MySQL8 this has come up again. See #2986452: Database reserved keywords need to be quoted as per the ANSI standard and #2966523: MySQL 8 Support. As this issue is over 6 years old I don't think re-opening is the right thing to do but commenting here in case any of the original posters are still interested in how this might play out in future Drupals.

catch’s picture

Status: Closed (won't fix) » Closed (outdated)

#2986452: Database reserved keywords need to be quoted as per the ANSI standard has landed. Updating status to 'closed (outdated)' after nearly 18 years.