Hello,
I'm considering using Drupal for a new project and up until now have been most impressed.
I have just been looking at the database structure for a few minutes and am very disappointed to notice the following problems:
No unique naming convention for primary keys (3 letters for most)
multi-column primary keys (not good practice)
non-integer PKs (again, not a good practice for indexing performance and size)
PKs that use indexes such as a file path name(see above!)
no primary keys on some tables (e.g. filters and blocks)
no naming convention for tables (some have singular names and others plural)
redundant and duplicated indexes( e.g. view_view.name_2)
Although some of these are simply a bit sloppy, others will have a fundamental affect on performance and really should be addressed.
Are there any plans to put these things right? I am happy to help.
Cheers!
Comments
Boolean
The following query is a good start towards optimizing the database but do run it in a test system first of course! It's not complete but you should see what I am doing:
1. Remove redundant indexes
2. Make ID and boolean fields unsigned
3. Make boolean field INT(1)
4. Change VARCHAR() into CHAR() (where possible) because fixed length rows improve disk retrieval speed. It would probably be possible to reduce the lengths of some 255 character columns at the same time but I DON'T recommend reducing column lengths without care.
alter table `drupal`.`node` drop key `status`;
alter table `drupal`.`node` drop key `nid`;
alter table `drupal`.`term_hierarchy` drop key `tid`;
alter table `drupal`.`term_node` drop key `tid`;
alter table `drupal`.`view_view` drop key `name_2`;
alter table `drupal`.`contact` change `selected` `selected` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL;
alter table `drupal`.`comments` change `status` `status` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL;
alter table `drupal`.`blocks` change `status` `status` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL , change `visibility` `visibility` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL;
alter table `drupal`.`access` change `status` `status` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL;
alter table `drupal`.`content_type_supplier` change `field_address_1_value` `field_address_1_value` char (50) DEFAULT '' NOT NULL , change `field_address_2_value` `field_address_2_value` char (50) DEFAULT '' NOT NULL;
alter table `drupal`.`custom_pager` change `reverse_list` `reverse_list` tinyint (1)UNSIGNED DEFAULT '1' NOT NULL , change `cache_list` `cache_list` tinyint (1)UNSIGNED DEFAULT '1' NOT NULL;
alter table `drupal`.`file_revisions` change `description` `description` char (255) DEFAULT '' NOT NULL , change `list` `list` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL;
alter table `drupal`.`filters` change `format` `format` int (11)UNSIGNED DEFAULT '0' NOT NULL , change `module` `module` char (64) DEFAULT '' NOT NULL , change `delta` `delta` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL , change `weight` `weight` tinyint (1)UNSIGNED DEFAULT '0' NOT NULL;
Work in progress
Thanks for your interest!
It's extremely unlikely that any changes will happen for Drupal 6 at this point, as it's in feature freeze. However, please keep these in mind for when Drupal 7 opens up, as we are planning some database changes in Drupal 7.
There are two main challenges for the database schema are that we have to support multiple databases, and we have to support the way that Drupal actually uses the data in the database. That means the database may not be as efficient as it could theoretically be, but it does improve the overall performance and functionality of Drupal itself.
As to your specific recommendations:
No unique naming convention for primary keys (3 letters for most)
That is mostly for legacy reasons, and to make typing them shorter. It has become more of an issue more recently as Drupal has grown, and some folks have suggested using longer field names. This will likely be an open discussion for Drupal 7. Your input at that point would certainly be welcome.
multi-column primary keys (not good practice)
non-integer PKs (again, not a good practice for indexing performance and size)
Not true. There's nothing inherently wrong with a compound primary key, if that is legitimately the unique identifier for a record. Surrogate keys make sense in a lot of places, and I dare say most places, but not everywhere. While integers are faster to join on than strings, frequently a string is the most logical primary key. It would be a potential performance hit for extra joins as well as added conceptual overhead to force every table to only have a surrogate key. For example, blocks are keyed by the module that provides them, the internal delta of the block within the module that provided it, and the theme the block is active on. That triplet uniquely identifies a block record. Adding an additional block_entry_id column to the table would actually make the code less clean, and harder to query against. The majority of the time the blocks table is queried, it's queried by that triplet or a subset of it.
PKs that use indexes such as a file path name(see above!)
As above, not every table needs a surrogate key if the data itself provides a natural key already.
no primary keys on some tables (e.g. filters and blocks)
I believe this has already been fixed in Drupal 6 as part of the Schema API cleanup.
no naming convention for tables (some have singular names and others plural)
No dispute there. We've been saying for a while that someone needs to go through and standardize core one way or another, but it's been too low on the todo list for anyone to step up and do the leg work for it. If you'd like to help out there, that would be great!
redundant and duplicated indexes( e.g. view_view.name_2)
If you see any indexes you feel are complete duplicates in Drupal 6, please file a bug report along with a patch to the schema hook for the relevant module to remove them. Those should be removed if they serve no purpose.
Welcome to Drupal!
--
Larry Garfield
http://www.garfieldtech.com/blog
http://www.palantir.net
--
Larry Garfield
http://www.garfieldtech.com/
Thinking Functionally in PHP: https://leanpub.com/thinking-functionally-in-php