Problem

The Schema module reports a schema mismatch for the Weblinks module's url index:

  • declared: array('url (2048)')
  • actual: array(array('url', 333))

PhpMyAdmin confirms that the table weblinks has an index url on field url with length 333. The MySQL version is 5.0.67-0ubuntu6.

In weblinks.install, this code adds the url index, specifying a length of 2048 characters, not 333 characters:

db_add_index($ret, 'weblinks', 'url', array('url (2048)'));

The problem is caused by MySQL Bug #4541 http://bugs.mysql.com/bug.php?id=4541 from 2004. MySQL's maximum key length is 1000 bytes or 333 UTF-8 characters.

Impacts

The url index is used in weblinks.module:

  • Line 175: Check for duplicate URL in weblinks_validate().
    • If two different long URLs have the same first 1000 characters, they would incorrectly be reported as duplicates.
  • Line 551: Update links status in weblinks_cron().
    • No impact because line 175 prevents two urls from matching their first 1000 characters.

Possible Workarounds

Duplicate checking: Hash urls with md5(). Store and index the hashed url values. Change the WHERE clause to:

$url_exists_nid = db_result(db_query("SELECT nid FROM {weblinks} l
                                      WHERE l.url_hash='%s' AND l.url='%s'",
                                      $url_hash, $url));

Link status update: Use nid and vid as keys for updating instead of the url:

db_query("UPDATE {weblinks} l SET l.last_status='%s', l.last_checked=%d, l.url='%s'
          WHERE l.nid='%d' AND l.vid='%d'", $qargs);
CommentFileSizeAuthor
#3 weblinks45.patch6.6 KBnancydru

Comments

nancydru’s picture

Yes, I noticed that key length change, but all was working correctly at the time. I had thought of changing the collation type to a single byte code (like the ubiquitous Swedish), but I'm not sure what this might do to sites with multi-byte characters. I don't think URLs are allowed to be in a multi-byte character set - that is if I read the RFC correctly.

Another option is to drop that index. This would have a performance impact on the duplicate checking and the Pralexa add-on.

nancydru’s picture

Status: Active » Postponed (maintainer needs more info)

I need some suggestions and guidance here.

nancydru’s picture

Status: Postponed (maintainer needs more info) » Needs review
StatusFileSize
new6.6 KB

Try the attached patch to the latest 6.x-2.x-dev . You will have to run update.php.

jshprentz’s picture

Thank you for providing weblinks45.patch.

I downloaded, reviewed, and installed the patch. The new urlhash field and index work as expected.

I recommend two changes:

Speedup to weblinks.install: In weblinks_update_6110(), replace the loop that computes new md5 checksums

  $result = db_query('SELECT * FROM {weblinks}');
  while ($link = db_fetch_object($result)) {
    $link->urlhash = md5($link->url);
    drupal_write_record('weblinks', $link, array('nid', 'vid'));
  }

with an update that uses MySQL's MD5 function

  update_sql("UPDATE {weblinks} SET urlhash=MD5(url)");

Correction to weblinks.module: In weblinks_cron(), when the link checker receives a 301 status, it saves the new url, but does not compute a new urlhash. Near the end of the function, set $link->urlhash:

//    $qargs = array($status, $now, $newurl, $url);
    $link->status = $status;
    $link->last_checked = $now;
    $link->url = $newurl;
    $link->urlhash = md5($newurl);
//    db_query("UPDATE {weblinks} l SET l.last_status='%s', l.last_checked=%d, l.url='%s' WHERE l.url='%s'", $qargs);
    drupal_write_record('weblinks', $link, array('nid', 'vid'));
nancydru’s picture

Thanks for checking it out. I think I discovered the missing hash after I posted this, but I will double check before I commit it.

I considered using the MySql hash, but could not determine if that is standard SQL - so I don't know if it would work with Postgres. If someone knows for sure, then your method is better.

jshprentz’s picture

MD5() is not standard SQL, but MySQL and PostgreSQL implement it.

PostgreSQL lists the md5() function here: http://www.postgresql.org/docs/current/static/functions-string.html

Oracle 10g and Sybase provide md5 capabilities with different interfaces. SQL Server and DB2 do not provide built-in md5 capability.

nancydru’s picture

Okay, I'll change it.

jshprentz’s picture

The Schema module spotted another minor discrepancy in your patch:

The urlhash field is defined with 'not null' => TRUE:

    'urlhash' => array(
      'description' => 'The hashed value of the URL',
      'type' => 'char',
      'length' => 32,
      'not null' => TRUE,
      'default' => '',
      ),

In function weblinks_update_6110(), the field is added with 'not null' => FALSE:

  db_add_field($ret, $table, 'urlhash', array('type' => 'char', 'length' => 32, 'not null' => FALSE, 'default' => ''));

You could make the definition and db_add_field values match, or you could alter the field to 'not null' => TRUE after assigning urlhash values to all records.

nancydru’s picture

Status: Needs review » Fixed

committed to 6.x-2.x

nancydru’s picture

Status: Fixed » Closed (fixed)

Clearing the issue list.