Dear Friends,

We have found an issue in the database scheme for profiles which has caused us some difficulty and I hope we can get the change I describe applied to the core system.

In brief we should not allow multiple rows in profile_values which have the same (fid, uid), as this represents multiple values for a given profile field, which we believe to be wrong.

Background
We are using Drupal in an environment where we have approximately 2,500 users whose usernames, passwords, e-mail addresses etc are imported periodically from another system. As part of this we have some profile fields which describe the person (profile_fullname is one) and we have found a problem with the database definitions.

Details
Specifically, profile_values is defined in MySQL as:

CREATE TABLE profile_values (
  fid int(10) unsigned default '0',
  uid int(10) unsigned default '0',
  value text,
  KEY uid (uid),
  KEY fid (fid)
)
DEFAULT CHARACTER SET utf8;

We believe that we should change the last KEY into

  unique key fiduid (fid, uid)

We have done this post-hoc by adding an index

create unique index fiduid on profile_values(fid, uid);

Why is this a problem?
The reason we needed to make this change is because our user list is updated periodically by another script producing an SQL script which writes thousands of rows into profile_values table as follows:

insert ignore into profile_values
 select fid, 1234, 'Winston Smith' from profile_fields
 where (name = 'profile_fullname');

This script is designed to be stateless: we run it as many times as we like with the same end result. However unless you have the unique index as described above, you get lots of entries for the user's full name. Multiple rows with the same values don't appear to break anything within Drupal however, which reports only a single full name per person.

We noticed it because our 20-second update script started to take minutes -- it grows linearly on the length of the table -- and it made a cryptic mess of our cron jobs.

Discussion

  • Normalisation rules require that every row in the table profile_values has a unique key.
  • Our examples in /admin/settings/profile are "country, real name, age ...", all of which are singular.
  • If we wanted to have multiple entries for a profile (such as might would make sense in the example of profile_car for people who have several cars), we should do this a different way, either using the 'freeform list' or another more-structured way.

Please can we add the unique indexes as described?

Many thanks for any thoughts on this.
Jonathan.

Version details
We are using Drupal 4.7.4, MySQL 4.1.3, Apache 2.0.54, FreeBSD 4.11.

The problem appears to be identical in all Drupal 4.7.4 database files and also Drupal 5.0-beta2. I have only tried the fix with our system as described (and imagine it needs to be slightly different syntax for Postgres) but examined the following source files:

Drupal 4.7.4: (database/)
$Id: database.4.1.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
$Id: database.4.0.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
$Id: database.pgsql,v 1.173.2.2 2006/05/23 18:53:11 killes Exp $

Drupal 5.0-beta2: (modules/profile/)
$Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $

Comments

jonathanl’s picture

Category: bug » support

Dear Drupal Friends -- Perhaps if I change this to a support request someone will comment on it. Many thanks, Jonathan.

jonathanl’s picture

Category: support » bug
Status: Active » Needs review
StatusFileSize
new10 KB

Dear Friends --

I don't know what I have to do to get this bug noticed: I'll change it back to bug report to see
if that helps.

Anyways attached are some patches for review, for MySQL 4.1.
Though unfortunately I have no way to test the others I have also made the patches for
MySQL 4.0 and postgresql.

Best regards,
Jonathan.

gerhard killesreiter’s picture

Version: 4.7.4 » 5.x-dev
Status: Needs review » Needs work

Please attach patches as plain textfiles. Moving to D5.

jonathanl’s picture

StatusFileSize
new369 bytes

Dear Drupal -- Attached are plain text patches, as requested; also a patch for Drupal 5. The only one
I have been able to test is the Drupal 4.7.4, MySQL 4.1 version but I believe the others are correct.
Many thanks, Jonathan.

These are the 4.7.4 files which I patched:
-- $Id: database.4.0.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
-- $Id: database.4.1.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $
-- $Id: database.pgsql,v 1.173.2.2 2006/05/23 18:53:11 killes Exp $
This is the 5 file:
// $Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $

jonathanl’s picture

StatusFileSize
new421 bytes

Patch for :
-- $Id: database.4.1.mysql,v 1.1.2.1 2006/05/23 18:45:47 killes Exp $

jonathanl’s picture

StatusFileSize
new525 bytes

(Untested) patch for:
-- $Id: database.pgsql,v 1.173.2.2 2006/05/23 18:53:11 killes Exp $

jonathanl’s picture

StatusFileSize
new1.01 KB

(Untested) patch for:
// $Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $

jonathanl’s picture

StatusFileSize
new1.01 KB

Sorry, repeat patch for
// $Id: profile.install,v 1.8 2006/11/28 14:37:44 dries Exp $

(previous one had spurious lowercase, fixed)

rkerr’s picture

I'm not sure allowing multiple values in profile_values is entirely a bad idea, regardless it is hell on performance, and the profile.module does not currently support it anyway. So I can't think of a reason not to have a key on both fields as well as separate indexes.

For your import script, I would suggest using REPLACE INTO to avoid duplicates instead of running into errors with duplicate key values with or without this patch.

Whether this patch is critical to 5 being released, I'm not sure.

jonathanl’s picture

Rokerr: Thanks for your comments.

Regarding REPLACE as opposed to INSERT IGNORE: they are effectively the same in that they both rest on the same unique keys.

From MySQL http://dev.mysql.com/doc/refman/4.1/en/replace.html

works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted. See Section 13.2.4, “INSERT Syntax”.
[...]
Note that unless the table has a PRIMARY KEY or UNIQUE index, using a REPLACE statement makes no sense. It becomes equivalent to INSERT, because there is no index to be used to determine whether a new row duplicates another.

And from http://dev.mysql.com/doc/refman/4.1/en/insert.html

f you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.

For what it's worth, we actually do an INSERT IGNORE to make sure we have a fid-uid pair in the table, followed by an UPDATE to ensure the values are correct. And as you point out, there are
indeed ways of doing this with REPLACE or indeed INSERT ... ON DUPLICATE KEY UPDATE.

You say "I can't think of a reason not to have a key on both fields as well as separate indexes": if you have an index on (fid, uid) this also acts as an index on (fid) as this is a leftmost prefix (MySQL: http://dev.mysql.com/doc/refman/4.1/en/mysql-indexes.html, Postgresql: http://www.postgresql.org/docs/7.3/interactive/indexes-multicolumn.html). So we only need two indexes: uid and fiduid.

On the question of whether we need this for Drupal 5 ... Essentially this is a database desiqn question: are multiple values allowed? My opinion is that they should not be, for the reasons given. And if we were to decide multiples are allowed, then I'd be advocating appropriate fixes to supply unique keys per normalisation rules.

It's clear that there are areas where the database design isn't very tight; and I'd like to strongly advocate we fix these things. This is not theoretical for us: we are a large design firm committing many resources to getting web sites made for our clients, and this kind of thing is an active problem.

Best regards,
Jonathan.

rkerr’s picture

Jonathan,

Dries just made a call for improved database design in the next release:
http://lists.drupal.org/archives/development/2007-01/msg00401.html

I'm sure any improvements you can suggest would we welcomed. :)

piersg’s picture

UPDATE works for me, and solves the problems described above with INSERT and REPLACE

http://dev.mysql.com/doc/refman/5.1/en/update.html

For example,

$sql = "UPDATE {profile_values} p SET p.value = \"$expiry_date_string\" WHERE p.uid = $this_drupal_user_id AND p.fid = 2";

You should check the return value from the sql command to check how many rows were updated; if none then you should use INSERT as before.

dpearcefl’s picture

Status: Needs work » Closed (won't fix)

Considering the time elapsed between now and the last comment plus the fact that D5 is no longer supported, I am closing this ticket.