Dear Friends,
We have found an issue in the database scheme for profiles which has caused us some difficulty and I hope we can get the change I describe applied to the core system.
In brief we should not allow multiple rows in profile_values which have the same (fid, uid), as this represents multiple values for a given profile field, which we believe to be wrong.
Background
We are using Drupal in an environment where we have approximately 2,500 users whose usernames, passwords, e-mail addresses etc are imported periodically from another system. As part of this we have some profile fields which describe the person (profile_fullname is one) and we have found a problem with the database definitions.
Details
Specifically, profile_values is defined in MySQL as:
CREATE TABLE profile_values (
fid int(10) unsigned default '0',
uid int(10) unsigned default '0',
value text,
KEY uid (uid),
KEY fid (fid)
)
DEFAULT CHARACTER SET utf8;
We believe that we should change the last KEY into
unique key fiduid (fid, uid)
We have done this post-hoc by adding an index
create unique index fiduid on profile_values(fid, uid);
Why is this a problem?
The reason we needed to make this change is because our user list is updated periodically by another script producing an SQL script which writes thousands of rows into profile_values table as follows:
insert ignore into profile_values
select fid, 1234, 'Winston Smith' from profile_fields
where (name = 'profile_fullname');
This script is designed to be stateless: we run it as many times as we like with the same end result. However unless you have the unique index as described above, you get lots of entries for the user's full name. Multiple rows with the same values don't appear to break anything within Drupal however, which reports only a single full name per person.
We noticed it because our 20-second update script started to take minutes -- it grows linearly on the length of the table -- and it made a cryptic mess of our cron jobs.
Discussion
- Normalisation rules require that every row in the table profile_values has a unique key.
- Our examples in /admin/settings/profile are "country, real name, age ...", all of which are singular.
- If we wanted to have multiple entries for a profile (such as might would make sense in the example of profile_car for people who have several cars), we should do this a different way, either using the 'freeform list' or another more-structured way.
Please can we add the unique indexes as described?
Many thanks for any thoughts on this.
Jonathan.
Version details
We are using Drupal 4.7.4, MySQL 4.1.3, Apache 2.0.54, FreeBSD 4.11.
The problem appears to be identical in all Drupal 4.7.4 database files and also Drupal 5.0-beta2. I have only tried the fix with our system as described (and imagine it needs to be slightly different syntax for Postgres) but examined the following source files:
Drupal 4.7.4: (database/)
$Id: database.4.1.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
$Id: database.4.0.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
$Id: database.pgsql,v 1.173.2.2 2006/05/23 18:53:11 killes Exp $
Drupal 5.0-beta2: (modules/profile/)
$Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $
| Comment | File | Size | Author |
|---|---|---|---|
| #8 | profile.install.patch_1.txt | 1.01 KB | jonathanl |
| #7 | profile.install.patch_0.txt | 1.01 KB | jonathanl |
| #6 | database.pgsql.patch_0.txt | 525 bytes | jonathanl |
| #5 | database.4.1.mysql.patch.txt | 421 bytes | jonathanl |
| #4 | database.4.0.mysql.patch.txt | 369 bytes | jonathanl |
Comments
Comment #1
jonathanl commentedDear Drupal Friends -- Perhaps if I change this to a support request someone will comment on it. Many thanks, Jonathan.
Comment #2
jonathanl commentedDear Friends --
I don't know what I have to do to get this bug noticed: I'll change it back to bug report to see
if that helps.
Anyways attached are some patches for review, for MySQL 4.1.
Though unfortunately I have no way to test the others I have also made the patches for
MySQL 4.0 and postgresql.
Best regards,
Jonathan.
Comment #3
gerhard killesreiter commentedPlease attach patches as plain textfiles. Moving to D5.
Comment #4
jonathanl commentedDear Drupal -- Attached are plain text patches, as requested; also a patch for Drupal 5. The only one
I have been able to test is the Drupal 4.7.4, MySQL 4.1 version but I believe the others are correct.
Many thanks, Jonathan.
These are the 4.7.4 files which I patched:
-- $Id: database.4.0.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
-- $Id: database.4.1.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
-- $Id: database.pgsql,v 1.173.2.2 2006/05/23 18:53:11 killes Exp $
This is the 5 file:
// $Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $
Comment #5
jonathanl commentedPatch for :
-- $Id: database.4.1.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
Comment #6
jonathanl commented(Untested) patch for:
-- $Id: database.pgsql,v 1.173.2.2 2006/05/23 18:53:11 killes Exp $
Comment #7
jonathanl commented(Untested) patch for:
// $Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $
Comment #8
jonathanl commentedSorry, repeat patch for
// $Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $
(previous one had spurious lowercase, fixed)
Comment #9
rkerr commentedI'm not sure allowing multiple values in profile_values is entirely a bad idea, regardless it is hell on performance, and the profile.module does not currently support it anyway. So I can't think of a reason not to have a key on both fields as well as separate indexes.
For your import script, I would suggest using REPLACE INTO to avoid duplicates instead of running into errors with duplicate key values with or without this patch.
Whether this patch is critical to 5 being released, I'm not sure.
Comment #10
jonathanl commentedRokerr: Thanks for your comments.
Regarding REPLACE as opposed to INSERT IGNORE: they are effectively the same in that they both rest on the same unique keys.
From MySQL http://dev.mysql.com/doc/refman/4.1/en/replace.html
And from http://dev.mysql.com/doc/refman/4.1/en/insert.html
For what it's worth, we actually do an INSERT IGNORE to make sure we have a fid-uid pair in the table, followed by an UPDATE to ensure the values are correct. And as you point out, there are
indeed ways of doing this with REPLACE or indeed INSERT ... ON DUPLICATE KEY UPDATE.
You say "I can't think of a reason not to have a key on both fields as well as separate indexes": if you have an index on (fid, uid) this also acts as an index on (fid) as this is a leftmost prefix (MySQL: http://dev.mysql.com/doc/refman/4.1/en/mysql-indexes.html, Postgresql: http://www.postgresql.org/docs/7.3/interactive/indexes-multicolumn.html). So we only need two indexes: uid and fiduid.
On the question of whether we need this for Drupal 5 ... Essentially this is a database desiqn question: are multiple values allowed? My opinion is that they should not be, for the reasons given. And if we were to decide multiples are allowed, then I'd be advocating appropriate fixes to supply unique keys per normalisation rules.
It's clear that there are areas where the database design isn't very tight; and I'd like to strongly advocate we fix these things. This is not theoretical for us: we are a large design firm committing many resources to getting web sites made for our clients, and this kind of thing is an active problem.
Best regards,
Jonathan.
Comment #11
rkerr commentedJonathan,
Dries just made a call for improved database design in the next release:
http://lists.drupal.org/archives/development/2007-01/msg00401.html
I'm sure any improvements you can suggest would we welcomed. :)
Comment #12
piersg commentedUPDATE works for me, and solves the problems described above with INSERT and REPLACE
http://dev.mysql.com/doc/refman/5.1/en/update.html
For example,
$sql = "UPDATE {profile_values} p SET p.value = \"$expiry_date_string\" WHERE p.uid = $this_drupal_user_id AND p.fid = 2";You should check the return value from the sql command to check how many rows were updated; if none then you should use INSERT as before.
Comment #13
dpearcefl commentedConsidering the time elapsed between now and the last comment plus the fact that D5 is no longer supported, I am closing this ticket.