Hi,

I'm no DB expert by any means, but I've noticed a few inconsistencies with the current schema (in MySQL):

  • From what I've read, the aim is to be as ANSI compliant as possible, implying that there should not be any use of AUTO_INCREMENT at all.
  • Fixing (removing) the AUTO_INCREMENTs breaks Drupal as a number of modules (watchdog, locale) seem to rely on it (rather than using db_next_id(), which is not good.
  • The sequences table indicates this quite plainly as well. My current table has:

    aggregator_category_cid
    comments_cid
    files_fid
    menu_mid
    node_nid
    node_revisions_vid
    term_data_tid
    users_uid
    vocabulary_vid

    IDs from watchdog, access_log, url_aliases etc. are all missing..

  • There is a lot of inconsistency with the use of UNSIGNED fields. For e.g. {user}_uid is set to be UNSIGNED. But when used as a FK elsewhere, it is not set to be UNSIGNED.. While, in real world situations we won't see an issue until we hit the record no. 2^31, it is nevertheless an error. And it seems to be happening with smaller types as well..
  • Naming conventions: There are occurences of duplicate field names that can be very confusing. For e.g. there is {vocabulary}_vid which indicates the vocabulary id, and then there's {forum}_vid which indicates a "version id"... There are multiple instances of such duplicates..

I'm sure a lot of people have wasted a lot of their time because of these issues. The sooner they are fixed, the better. I'm happy to start rolling out patches where I can, but I'd appreciate some feedback on this before I dig in.

Thanks
Karthik.

CommentFileSizeAuthor
#12 unsigned_update.sql2.56 KBZen
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Zen’s picture

The following queries remove all auto_increments and add in UNSIGNED attributes where necessary:

ALTER TABLE access CHANGE aid aid TINYINT(10) UNSIGNED NOT NULL;
ALTER TABLE accesslog CHANGE aid aid INT(10) UNSIGNED NOT NULL;
ALTER TABLE aggregator_category CHANGE cid cid INT(10) UNSIGNED NOT NULL;

ALTER TABLE aggregator_category_feed CHANGE fid fid INT(10) UNSIGNED NOT NULL, 
CHANGE cid cid INT(10) UNSIGNED NOT NULL;

ALTER TABLE aggregator_category_item CHANGE iid iid INT(10) UNSIGNED NOT NULL, 
CHANGE cid cid INT(10) UNSIGNED NOT NULL;

ALTER TABLE aggregator_feed CHANGE fid fid INT(10) UNSIGNED NOT NULL;

ALTER TABLE aggregator_item CHANGE iid iid INT(10) UNSIGNED NOT NULL, 
CHANGE fid fid INT(10) UNSIGNED NOT NULL;

ALTER TABLE authmap CHANGE aid aid INT(10) UNSIGNED NOT NULL, 
CHANGE uid uid INT(10) UNSIGNED NOT NULL;

ALTER TABLE boxes CHANGE bid bid TINYINT(4) UNSIGNED NOT NULL;

ALTER TABLE comments CHANGE cid cid INT(10) NOT NULL, 
CHANGE pid pid INT(10) UNSIGNED NOT NULL, 
CHANGE nid nid INT(10) UNSIGNED NOT NULL, 
CHANGE uid uid INT(10) UNSIGNED NOT NULL;

ALTER TABLE filter_formats CHANGE format format INT(4) NOT NULL;

ALTER TABLE history CHANGE uid uid INT(10) UNSIGNED NOT NULL, 
CHANGE nid nid INT(10) UNSIGNED NOT NULL;

ALTER TABLE locales_source CHANGE lid lid INT(11) UNSIGNED NOT NULL;
ALTER TABLE locales_target CHANGE lid lid INT(11) UNSIGNED NOT NULL;
ALTER TABLE moderation_filters CHANGE fid fid INT(10) UNSIGNED NOT NULL;
ALTER TABLE moderation_votes CHANGE mid mid INT(10) UNSIGNED NOT NULL;

ALTER TABLE node CHANGE nid nid INT(10) UNSIGNED NOT NULL, 
CHANGE uid uid INT(10) UNSIGNED NOT NULL;

ALTER TABLE node_comment_statistics CHANGE nid nid INT(10) UNSIGNED NOT NULL;
ALTER TABLE node_counter CHANGE nid nid INT(11) UNSIGNED NOT NULL;
ALTER TABLE node_revisions CHANGE uid uid INT(10) UNSIGNED NOT NULL;
ALTER TABLE poll_choices CHANGE chid chid INT(10) UNSIGNED NOT NULL;
ALTER TABLE profile_fields CHANGE fid fid INT(10) UNSIGNED NOT NULL;
ALTER TABLE role CHANGE rid rid INT(10) UNSIGNED NOT NULL;
ALTER TABLE term_data CHANGE tid tid INT(10) UNSIGNED NOT NULL;
ALTER TABLE url_alias CHANGE pid pid INT(10) UNSIGNED NOT NULL;
ALTER TABLE vocabulary CHANGE vid vid INT(10) UNSIGNED NOT NULL;

ALTER TABLE watchdog CHANGE wid wid INT(5) UNSIGNED NOT NULL, 
CHANGE uid uid INT(10) UNSIGNED NOT NULL;

Updating the DB leads to the first of possibly many AUTO_INCREMENT related errors:

Fatal error: Duplicate entry '0' for key 1
query: INSERT INTO watchdog (uid, type, message, severity, link, location, referer, hostname, timestamp) VALUES (0, 'php', 'Duplicate entry \'0\' for key 1\nquery: INSERT INTO locales_source (location, source) VALUES (\'/~karthik/projects/drupal_cvs/\', \'OPML feed\') in /data2/www/drupal_cvs/includes/database.mysql.inc on line 108.', 2, '', '/~karthik/projects/drupal_cvs/', '', '192.168.0.1', 1132722094) in /data2/www/drupal_cvs/includes/database.mysql.inc on line 108

Typical example from locale.inc of code that relies on the AUTO_INCREMENT attribute:

        db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", $comments, $english);
        $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $english));
        $lid = $loc->lid;
        db_query("INSERT INTO {locales_target} (lid, locale, translation) VALUES (%d, '%s', '%s')", $lid, $lang, $translation);

Cheers
Karthik.

Zen’s picture

Just added two patches to the locale and watchdog modules [as above]..

Thanks
Karthik.

killes@www.drop.org’s picture

While we try to be ANSI compliant in the queries, the database definition files are permitted to use db specific features where possible. auto_increment is actually much faster than using db_next_id to get a new unique ID, so it has its uses especially where we do not need to know the ID in advance (watchdig, accesslog, etc).

Zen’s picture

I agree it saves a query, but I don't like these on/off levels of compliance.. IMO Drupal should either be compliant or not. A black and white rule like this will ensure that all the queries are more portable, and ease some of the work for other maintainers - pgSQL etc.

Most contribs abuse this quite regularly. Also, this locale module patch is an example in core where the ID is retrieved with a second query. Such inconsistencies will stop with a hard-and-fast rule.

I would really just like to see some consistency :)
Karthik.

killes@www.drop.org’s picture

I personally prefer pure, raw speed of page execution over anything else including consistency for consistency's sake. In any case the individual use cases of auto_increment need to be evaluated individually.

scroogie’s picture

These issues could be handled per database if drupal had a sensible database abstraction. The mySQL abstraction could use auto_increment, and the pgSQL could use sequences, like in PEAR::DB, etc.

killes@www.drop.org’s picture

scroogie: we do that since one or two years... even without "sensible database abstraction"

Is today global idiot day?

chx’s picture

Killes: apparently.

Zen’s picture

I don't think it's a very good sign when senior developers on an open source project start calling contributors and/or people who voice slightly negative opinions.. "Idiots"..

-K

scroogie’s picture

I dont know whats wrong with the term "sensible database abstraction" (im german, maybe i hit the wrong words), anyways, it was not meant to be provocative.
Back to the topic, i dont think drupal does it as i mentioned. I thought Drupal used sequences in any case (at least thats what db_next_id() in database.mysql.inc seems to do). Whatever, please dont take this as an attack, because it was not meant to be one.

killes@www.drop.org’s picture

" if drupal had a sensible database abstraction" ~ "wenn Drupal eine vernünftige DB-Abstraktion hätte." Was wir momentan haben ist durchaus vernünftig weil klein und nicht übermäßig kompliziert.

Zen’s picture

FileSize
2.56 KB

Since we are at loggerheads over the use of AUTO_INCREMENT, the attachment fixes all the UNSIGNED inconsistencies, while preserving the existing AUTO_INCREMENTs. I'll prepare a patch for updates.inc if needed.

And don't get me wrong, I have no problems with using AUTO_INCREMENTs. However, if it is fine to use them, then there is absolutely *no* apparent reason for using the sequences table. The whole point as far as I can tell of using the sequences table is to completely *avoid* the use of AUTO_INCREMENT. The current situation is neither here nor there.

If raw speed is the preference, then AUTO_INCREMENTs should be used everywhere. This saves you atleast three queries every time [The {sequences} LOCK, REPLACE and UNLOCK..].

This just doesn't make sense to me..
Regards,
Karthik.

Geary’s picture

id·i·ot (n) A person of profound mental retardation having a mental age below three years and generally being unable to learn connected speech or guard against common dangers. The term belongs to a classification system no longer in use and is now considered offensive.

Surely nobody meant to say that, did they? Good, I didn't think so.

+1 on AUTO_INCREMENT (pun intended) if it would eliminate the LOCK TABLES. LOCK TABLES is a big hassle in many hosting situations. It's been argued that hosting companies should be more willing to allow it, but it remains a real nuisance.

Removing LOCK TABLES would greatly benefit Drupal's ease of installation. I have personally lost many hours to this one issue. Even if it were database specific it would be well worthwhile: the hosts where it's a problem are running MySQL.

killes@www.drop.org’s picture

I think the unsigned part shoudl be made into a patch.

Cvbge’s picture

+1 on AUTO_INCREMENT (pun intended) if it would eliminate the LOCK TABLES. LOCK TABLES is a big hassle in many hosting situations. It's been argued that hosting companies should be more willing to allow it, but it remains a real nuisance.

LOCK TABLES was introduced to guard against a situation, where two concurrent connections try to insert the the same values. It has nothing to do with AUTO_INCREMENT.

Looking from PostgreSQL point of view I think it'd be possibile to use AUTO_INCREMENT/SERIAL column types (would have to investigate to be sure)

Zen’s picture

Title: Database Consistency Issues - usage of AUTO_INCREMENT and UNSIGNED; Field naming conventions » Database Consistency Issues - usage of AUTO_INCREMENT; Field naming conventions

I've added the UNSIGNED patch as a separate issue. Modified topic accordingly.

-K

knite’s picture

I would love to see a patch that changed everything to use AUTO_INCREMENT. Like Geary, I've lost countless hours in installation headaches related to the LOCK TABLES privilege problem. The bonus in improved performance sounds great.

Karthik, would it be quick for you to whip up a patch that replaces the LOCKs with AUTO_INCREMENT everywhere? I'd volunteer to help test it.

moshe weitzman’s picture

Title: Database Consistency Issues - usage of AUTO_INCREMENT; Field naming conventions » Use AUTO_INCREMENT and similar instead of sequences

renaming acording to the current consensus and focus of this issue

kbahey’s picture

+1 on using AUTO_INCREMENT.

This, combined with LAST_INSERT_ID(), eliminates the need for the locking we do today.

However, see this discussion in the development mailing list about db_next_id() and auto increments.

Seems safe to do unless someone uses MySQL pconnect.

Other databases has similar mechanism, so this does not preclude them. In Fact, the PostgreSQL layer, uses "SELECT nextval() ...".

chx’s picture

Status: Active » Closed (duplicate)