Issue #140666 represents a regression in the database schema. We need to fix this pronto. Using the right keys in the database is one of the most important parts of database architecture. It affects performance and data integrity.

How to create the keys for a table:
1. Identify the candidate keys for a table. Usually, there's only one. If there's more than one, choose the one most likely to be used as a query criterion.
2. Identify fractional parts of the key that are likely query criteria. For example, if a table has the candidate key (nid, uid), and we're likely to query by uid, the uid portion is a useful fractional part of the candidate key. Order the primary key by placing the most useful subsets first. It makes the key many times more useful because a database can use key prefixes just as much as the whole key.
3. Set the primary key to be this candidate key in the most useful order.
4. Other candidate keys should have UNIQUE set on them. The ordering of the elements in these keys should be determined the same way as for the primary key.
5. Add other keys as standard INDEX keys so that they satisfy the WHERE criteria for queries as a prefix and have exactly the ORDER BY criteria left over.
6. Celebrate that your database is fast and prevents impossible data duplication.

How not to create a primary key:
1. Add an autoincrement column.
2. Make it the primary key.

The "how not to create a primary key" can be used as a last resort if no candidate key exists already and individual rows must be referenced.

Patch to follow.

CommentFileSizeAuthor
#2 schemas.patch7.93 KBdavid strauss

Comments

david strauss’s picture

One more exception that justifies using the autoincrement hack: if you have lots of foreign keys pointing to the table and don't want to have a multipart foreign key.

david strauss’s picture

StatusFileSize
new7.93 KB

This is draft 1. There remains lots to be done, including updating the modules to use the keys properly.

david strauss’s picture

Assigned: Unassigned » david strauss
Status: Active » Needs work
bjaspan’s picture

David, my goal in http://drupal.org/node/140666 was to make the minimum possible change in order to give each table a primary key and to fix the fact that node's primary key should be nid, not nid,vid. So, for example, turning nid,vid from primary key into a unique key constituted a smaller change than removing it completely. Sure, it is sub-optimal because vid is already unique, but that situation already existed before my patch. Adding an auto-increment primary key to the tables that did not have one is not as good as adding the right key either, but at least it makes functionality that requires a primary key possible and does not require re-examining all code that touches that table (i.e. "including updating the modules to use the keys properly."). I did not want the new schema data structure to be unable to live up to its promise until D7 came around just because some tables did not have a primary key at all.

That said, I certainly support a patch that sets all the primary keys correctly and will help review it when it is ready.

david strauss’s picture

@bjaspan The problem is that modules will start using the new autoincrement keys, making them difficult to remove. Several of the tables with the new column don't need a new column to get a primary key; they just need the primary key assigned.

moshe weitzman’s picture

@David - any updates on this?

david strauss’s picture

@moshe weitzman I'm still working on it. Despite the name, this is a pretty massive undertaking. There's very little core code I can ignore.

pwolanin’s picture

we need to combaine as much as possible with: http://drupal.org/node/164532

and get whatever we can in ASAP.

I just looked at your patch briefly - for {blocks} I think the primary key needs to be array('module', 'delta', 'theme')?

david strauss’s picture

Assigned: david strauss » Unassigned
Status: Needs work » Closed (fixed)

#164532 is a better issue to work from, now.

Reg’s picture

subscribe