Seems a bit conflict when checking DB schema: node_revisions.body and node_revisions.teaser are not allow NULL, but miss out default value. In some rare case, it may cause an illegal INSERT/UPDATE handling from drupal_write_record(), as it is depending on cached schema.

Please correct me if this 2 fields should/must miss out default value for some hidden/critical reason.

Comments

webchick’s picture

Status: Needs review » Closed (works as designed)

MYSQL 5 does not support a default value applied to text fields. See http://dev.mysql.com/doc/refman/5.0/en/blob.html

hswong3i’s picture

Status: Closed (works as designed) » Needs review

But the schema API is not just target for MySQL, isn't? How about if other database need a default value for "NOT NULL" column?

IT IS a conflict if we don't have any default value for a NOT NULL column. What will be happened if we are calling INSERT/UPDATE from drupal_write_record(), but object with omit body and teaser? IT IS really happening: poll.module have a 'has_body' => FALSE attribute, so its object input for drupal_write_record() will ALWAYS without a VALID body and teaser.

This may cause some hidden bug, even it is now functioning for MySQL; provide a default value for NOT NULL is much logical ;-(

webchick’s picture

Priority: Critical » Normal
Status: Needs review » Needs work

Unless you can provide reproducible steps that cripple the use of Drupal without these default values (fatal errors when saving polls when using pgsql, for example), then this is not a critical bug.

Your patch is incomplete if you intend to add default values to all text/blob columns in all of core. You also need to deal with the fact that MySQL will yield errors when these have default values, which I imagine would need to be done somewhere in database.mysql.inc.

hswong3i’s picture

Title: node_revisions.body and node_revisions.teaser miss default » NOT NULL fields miss out default value
StatusFileSize
new14.96 KB

Patch tested for both MySQL and PgSQL with no error during install and modules installation. I grep all columns with "NOT NULL" where miss out default value, by using following script:

<?php

$array = array();

module_load_all_includes('install');
foreach (module_implements('schema') as $module) {
  $current = module_invoke($module, 'schema');
  foreach ($current as $table => $v1) {
    foreach ($v1['fields'] as $column => $v2) {
      if (isset ($v2['not null']) && $v2['not null'] && !isset($v2['default']) && $v2['type'] != 'serial') {
        $array[] = $table .".". $column .": ". $v2['type']; 
      }
    }
  }
}

print_r($array);

?>
hswong3i’s picture

Status: Needs work » Needs review
gábor hojtsy’s picture

wechick: the "cannot create polls in pgsql" issue you asked for is here: http://drupal.org/node/199161

gábor hojtsy’s picture

BTW IMHO it makes sense to add the defaults and ignore then when it does not make sense for the database handler at hand (in this case mysql). Although this would require comment from schema API designers, hopefully bjaspan.

hswong3i’s picture

Title: NOT NULL fields miss out default value » Siren #10: NOT NULL fields need default value
Assigned: Unassigned » hswong3i
Priority: Normal » Critical

I mark http://drupal.org/node/199161 as duplicated, as that is an real bug example of this issue. Here is its bug report in case of PostgreSQL:

Got the following page:
============
Page not found

* warning: pg_query() [function.pg-query]: Query failed: ERROR: null value in column "body" violates not-null constraint in /home/maximkr/drupal/includes/database.pgsql.inc on line 155.
* user warning: query: INSERT INTO node_revisions (nid, uid, title, teaser, log, timestamp, format) VALUES (14, 1, 'Do you like X', '* Yes * No ', '', 1197154493, 0) in /home/maximkr/drupal/includes/common.inc on line 3374.

Your Poll has been created.
The requested page could not be found.
=====

I mark this into my personal research project issue.

gábor hojtsy’s picture

I sent Barry a contact mail to ask him to look into this.

bjaspan’s picture

Subscribe. Feedback forthcoming.

bjaspan’s picture

StatusFileSize
new890 bytes

The bug here, as hswong3i points out, is that we have columns (namely node_revisions.{teaser,body}) that are NOT NULL, have no default value, but sometimes are given no value during an INSERT statement.

I see a number of problems with hswong3i's proposal:

1. Drupal will no longer support MySQL Strict mode.

If we add a default value to a text column, people will write queries that rely on the default value by not providing a value for that column (indeed, that is the exact bug we've encountered here). With hswong3i's patch, the MySQL engine will strip out the default value before creating the table. In MySQL normal mode, an "implicit" default value of the empty string will be used, so all will appear good. But in MySQL strict mode, no implicit default value is provided and so the query will fail. The only solution will be to provide an explicit value for the column (which is what I suggest below is the actual solution).

Postel's Law is wrong; we should never depend on "liberal acceptance" for any API or interface. Code that does so (without at least an extremely good reason) is broken.

2. It litters the schema will semantically invalid information.

Many columns have no legitimate default value. What is a default node id, or term id, or any id? What is a meaningful default date? What does the "body" of a poll node mean? We should not change Drupal's design to *require* a meaningless schema definition.

3. It obscures a bug with additional complexity instead of just fixing the bug.

The problem is that node_revisions.{teaser,body} are declared NOT NULL when, in fact, "no value" is legitimate for those columns; some nodes simply do not have a teaser and body (e.g. poll). hswong3i's proposed suggestion is to change the MySQL driver to ignore default value statements it cannot support and change the schema of a large number of tables to add default values to every column, effectively declaring default values to mandatory for every field in Drupal's schema. The hides the problem because poll nodes are forced to have a body when they shouldn't be but it does not fix it.

So, my suggestion? Fix the bug, in one of two ways:

1. Change node_revisions' schema to allow NULL values for teaser and body, matching the actual semantics of the object type. This is a simple change to make but could possibly have unintended side-effects this late in the development cycle, so I suggest this fix for D7.

2. In general, always provide values for every column that has no default value. If you are writing an INSERT statement, just list all the columns (a default value is never anything different than specifying the same value explicitly). If you are calling an API function like drupal_write_record() that builds an INSERT statement based on an object, make sure the object's fields are set. In node_save()'s case, we're already doing this for node_revision.log. An untested patch for teaser and body is attached. I recommend this for D6. (Frankly, I'd prefer *forbidding* default values rather than making them mandatory. Default values provide nothing useful; they just cause confusion and incompatibility, as we are seeing here.)

3. We can very easily enhance drupal_write_record() to detect when it is called with an object that has no field corresponding to a NOT NULL column and log an error to watchdog (if we were using our automated unit tests, it could make them fail). I do not know the community's taste for "assertion code" like this but since there basically isn't any in Drupal I assume it is not well-liked, so I haven't bothered submitting an issue to suggest it.

I point out that even if we accept hswong3i's patch we will *still* need to always provide explicit values for text and blob columns as per my second suggestion and patch if we want to support MySQL strict mode. So, we can either just fix the bug, or fix the bug and make semantically misleading and unnecessarily complex changes to Drupal. I vote for the former.

hswong3i’s picture

Please correct me if it is wrong:

  1. The change should not break MySQL strict mode. The define of schema provide required information for developer (e.g. NOT NULL field should at least have default with xxx/yyy/zzz), but that's not means developer can/should simply omit its value when working with manual INSERT/UPDATE. This seems to be the duty of developer and application level design, but not schema.
  2. No default value are added to serial fields, as script appended in #4. According to the logic of schema design, serial fields should NEVER handling manually: it should always maintained by DB internally.
  3. NOT NULL field without DEFAULT value is always a conflict in schema design. We should just seems the default value of NOT NULL fields as a "final catch" before data are written into DB. The default value of "body" of a poll node need not to have any meaning (as we don't use it among application level); BTW, that is always meaningful for schema design: it is a NOT NULL field, so just give me something (but not simply ignore it).

On the other hand, if default value is documented within schema, helper API (e.g. drupal_write_record()) can simply fill in the omit information with valid value. It should simplify the difficulty of application design (as helper API really HELP the "catch and replace" handling).

IMHO, patch in #11 is all about application design fix. It is always truth and we should have this checking, too. BTW, adding default value for NOT NULL field is the duty of schema design; and this design shouldn't depend on ANY ONE of DB engine: if it is logically correct, apply it; BTW, if it is not support by XXX/YYY/ZZZ DB engine, omit and fix it within driver/schema implementation internally.

P.S. as I am now apply this "default value" logic to my personal research project, and it is now able to handle both MySQL/PostgreSQL/Oracle with same patched schema design. I am also working with IBM DB2 (good in progress), without retouch *any* schema. Hope this can be a reference about the correctness of patch in #4.

gábor hojtsy’s picture

Priority: Critical » Normal
Status: Needs review » Active

hwsong3i: Since you agree in point 1 about the developer being required to specify the default explicitly anyway (as we have seen, we cannot blindly ignore any schema set defaults in the MySQL driver for text fields, as the behavior depends on MySQL configuration, which we cannot control). So I committed bjaspan's patch from #11 to fix the issue at hand (that was actually a patch for http://drupal.org/node/199161, but I cross-linked these issues in both comments and in the commit message).

From here onwards, discussing schema default values is not critical for Drupal 6.

bjaspan’s picture

Priority: Normal » Critical
Status: Active » Needs review

Re #12:

Perhaps the most straightforward difference between our views is your statement that "a NOT NULL field without DEFAULT value is always a conflict in schema design." I completely disagree. A NOT NULL field without a DEFAULT value simply indicates that it is mandatory to provide a value for that field during INSERT. Sometimes, you really want fields to be mandatory. The "final catch" of providing a default value eliminates the mandatory requirement; instead, the "final catch" ought to be that the query fails indicating that you are trying to do something the schema was specifically designed to prevent. In that case, the query failing is the correct response of the program; it indicates a developer mistake and the error message allows it to be detected and fixed.

As Gabor noted, you seem to agree that developers need to provide explicit values for columns without default values or, if we accepted your patch, for columns with default values that nevertheless might not use those default values on some database engines. Note the confusion that would arise if *some* columns with default values actually required an explicit value anyway (under certain circumstances)! It is much better to have the column always have a default or never have a default, not sometimes have a default.

I'm not sure how your observation about serial fields is relevant. They do not have a default value because they have special semantics that provide for an automatically incrementing default value; in fact, it ought to be illegal to INSERT a serial field with an explicit value (but I'm not sure Schema API specifies this).

Now, there are two points that I think are important:

1. It is not good that, because MySQL cannot support default values on 'text' columns, that Schema API does not allow them. That forces Schema API to the lowest common denominator. It would be better if Schema API allowed defaults on text columns and the mysql driver performed whatever query transformations were necessary to make it work (e.g. always providing an empty string for INSERT queries when the developer does not). Drupal's current design of passing explicit SQL statements to db_query() makes this kind of functionality unreasonable to implement but, if we switch away from explicit SQL (to object-based load/save APIs, or an abstract SQL query data structure, or whatever), I would support this solve-it-in-the-driver approach.

But remember that I'm saying I would support *allowing* text columns to have default values (if there is a semantically meaningful default), not *requiring* them to have default values or forbidding NOT NULL, no-default columns.

2. If you have identified some reason that NOT NULL, no-default columns prevent Drupal from supporting databases other than mysql or pgsql, that is important. Please identify what those reasons are.

And thanks for all your hard work on database portability issues.

hswong3i’s picture

Component: node system » database system
Priority: Critical » Normal

I guess the agree of point 1 is not something special: MySQL TEXT field ALWAYS need to OMIT default value, whatever we have/haven't schema API. This is the limitation for MySQL, but this is also a conflict for general and generic database schema design. This is always the developer's responsibility about INSERT/UPDATE correct information into DB, especially when we are handling NOT NULL fields.

Moreover, as now we having both schema API and drupal_write_record(), the correct and explicitly well defined default value for NOT NULL field can greatly increase the usability of these helper function. E.g. IF default value is ALREADY well defined for {node}, http://drupal.org/node/199161 won't appear. Again, this is not a new idea, let's review code from drupal_write_record():

<?php
    // For inserts, populate defaults from Schema if not already provided
    if (!isset($object->$field)  && !count($update) && isset($info['default'])) {
      $object->$field = $info['default'];
    }
?>

Recall the design of OSI 7-layers model: each layer should take their duty and responsibility correctly, in order to ensure the correctness of overall system. Patch in #11 take the responsibility in application level; on the other hand, patch in #4 take the responsibility of a correct database schema design and implementation. As Schema API is targeting for universal database abstraction, this is a CRITICAL logical bug if we don't have default value for NOT NULL fields correctly.

Patch in #4 is still valid for D6 CVS HEAD.

hswong3i’s picture

Priority: Normal » Critical

@bjaspan: That's why I don't hope to reply the question of #3 directly but waiting for the appear of http://drupal.org/node/199161: it will affect other database implementation, especially for Oracle and MSSQL.

In case of Oracle, it face a complicated difficulty when working with current schema implementation. This is because Oracle will handle empty string as NULL *incorrectly*. This will cause a critical error if we use = '' and != '' as inline query (in case of Oracle, we will need to use IS NULL or IS NOT NULL, which is totally incorrect). To help this issue, some special handling are needed to be taken within Oracle driver implementation INTERNALLY:

  1. If fields come with empty string as default value, _db_create_field_sql() will catch and replace it as OCI8_EMPTY_STRING_PLACEHOLDER (in this case, chr(0) as dummy placeholder).
  2. Don't use inline empty string SQL compare, but escape it as '%s'. db_escape_string() will catch it and replace as OCI8_EMPTY_STRING_PLACEHOLDER, so developer won't need to aware about this different between = '' and IS NULL.
  3. Whenever fetching information by db_fetch_object(), db_fetch_array() and db_result(), replace OCI8_EMPTY_STRING_PLACEHOLDER as empty string. So Oracle driver will mask the special handling completely internal.

In case of MSSQL, according to the buggy .dll for connecting PHP with MSSQL, fetching an empty string from DB will ALWAYS result as a single space character: ' ', so the == '' check after fetching will always fail. Again, we can handle the mask-and-replace handling within MSSQL implementation, just as like as Oracle.

I guess we won't like to discuss this topic now, since both Oracle and MSSQL will not officially supported by D6. And that's why I start my personal research project, to test my idea on or before contribute back for D7, and so prove its correctness with solid production-level implementation.

Anyway, the general idea is: implement schema correctly without conflict nor database bias, and so each driver implementation can catch-and-replace as its suitable handling, internally.

agentrickard’s picture

Version: 6.x-dev » 7.x-dev
Priority: Critical » Normal

Reviewing issue and moving to D7, status normal, per #13.

hswong3i’s picture

Version: 7.x-dev » 6.x-dev
Priority: Normal » Critical

@agentrickard: may I have some feedback for #15 and #16, before moving this issue to D7? Status critical, per #14.

catch’s picture

Version: 6.x-dev » 7.x-dev

hswong3i: neither #15 nor #16 show that anything is critically broken in D6.

So this is either not critical, or not for 6.x - I'm moving it back to 7.x since personally I think it's the latter.

webchick’s picture

Yes, please respect the core maintainers' wishes on these issues. I realize that you're eager to make our db schema support as good as it can be, but if Goba says no, you need to respect that.

hswong3i’s picture

I would like to give my final comment for the discussions of this issue in D6:

  1. The case of PostgreSQL is similar as Oracle; according to my research, it is also MSSQL related. The auto-translation of NULL -> empty string (or better say, convert as a valid INSERT for TEXT) is a MySQL-specific issue. My patch in #4 shouldn't judge as a trade-off of the lowest common denominator; moreover, what we are now having IS a MySQL-bias handling, which conflict with the design of schema API.
  2. drupal_write_record() will sometime seems buggy, when default value is not well defined, e.g. the case as http://drupal.org/node/199161. This helper function couldn't fully utilize its functionality, and so decrease its usability.
  3. The complicated NOT NULL + w/o default value is too fancy, and require for a very clear mind in developing and application level debug. If the idea is simplify as "NOT NULL == with default value", the schema design will become more consistence and simple. As we can foresee about the widely use of drupal_write_record() among D6/D7, this database specific handling can be completely masked.
  4. There is 36 lines of change in patch of #4, which also means that we have 36 fields need to be careful during INSERT/UPDATE EVEN we are now using drupal_write_record(), and need to preform AT LEAST 36 times of complicated application level debug handling. If schema is well defined and migrate the handling from manual INSERT/UPDATE to drupal_write_record(), case as http://drupal.org/node/199161 will not even happened.

I figure out this logical hidden bug with Oracle driver development, and try to report before it is become really harmful (Yes, it is even earlier than the announcement of http://drupal.org/node/199161. And finally, it is proved as harmful for PostgreSQL, too). According to complicated case as listed above, we can simply foreseeing that similar case will keep on reproduce among core and contribute modules within D6/D7, unless we accept for this "lowest common denominator" trade-off, whenever we are working with PostgreSQL or exploring for other database supporting possibility. This is not a threaten: it is a real case that really happening.

I have complete my duty and responsibility of reporting this bug, and propose a suitable handling which is proved as function among case of other database, too. I will come back for this in D7; BTW, I have no duty about similar issues which are causing by this miss-handling among D6.

bjaspan’s picture

hswong3i: I appreciate that you are working hard to try to improve Drupal's database support. I also realize that the language barrier is making precise communication difficult. I'd like to ask a few questions to clarify what you are saying.

Consider this table declaration:

CREATE TABLE t1 (
  row INTEGER NOT NULL,
  i2 INTEGER NOT NULL,
  i3 INTEGER,
)

Do you think it is semantically wrong in any way? Do you think a Drupal module using Schema API should be allowed to declare this table?

Now, assuming the table declaration above, consider these SQL statements:

INSERT INTO t1 (row, i2) VALUES (1, 100); -- query 1
INSERT INTO t1 (row, i3) VALUES (2, 200); -- query 2
INSERT INTO t1 (row, i2, i3) VALUES (3, 300, 400); -- query 3

If a Drupal module created the table as above and executed these queries, what do you think should happen?

SELECT row, i2, i3 FROM t1 WHERE i3 != 0; -- query 4

What do you think this query should return?

chx’s picture

Title: Siren #10: NOT NULL fields need default value » NOT NULL fields need default value

As Siren is not a project on Drupal.org but an unofficial fork, I am removing it from issue titles to avoid confusion.

catch’s picture

Status: Needs review » Closed (duplicate)