Problem
The Schema module reports a schema mismatch for the Weblinks module's url index:
- declared: array('url (2048)')
- actual: array(array('url', 333))
PhpMyAdmin confirms that the table weblinks has an index url on field url with length 333. The MySQL version is 5.0.67-0ubuntu6.
In weblinks.install, this code adds the url index, specifying a length of 2048 characters, not 333 characters:
db_add_index($ret, 'weblinks', 'url', array('url (2048)'));
The problem is caused by MySQL Bug #4541 http://bugs.mysql.com/bug.php?id=4541 from 2004. MySQL's maximum key length is 1000 bytes or 333 UTF-8 characters.
Impacts
The url index is used in weblinks.module:
- Line 175: Check for duplicate URL in weblinks_validate().
- If two different long URLs have the same first 1000 characters, they would incorrectly be reported as duplicates.
- Line 551: Update links status in weblinks_cron().
- No impact because line 175 prevents two urls from matching their first 1000 characters.
Possible Workarounds
Duplicate checking: Hash urls with md5(). Store and index the hashed url values. Change the WHERE clause to:
$url_exists_nid = db_result(db_query("SELECT nid FROM {weblinks} l
WHERE l.url_hash='%s' AND l.url='%s'",
$url_hash, $url));
Link status update: Use nid and vid as keys for updating instead of the url:
db_query("UPDATE {weblinks} l SET l.last_status='%s', l.last_checked=%d, l.url='%s'
WHERE l.nid='%d' AND l.vid='%d'", $qargs);
| Comment | File | Size | Author |
|---|---|---|---|
| #3 | weblinks45.patch | 6.6 KB | nancydru |
Comments
Comment #1
nancydruYes, I noticed that key length change, but all was working correctly at the time. I had thought of changing the collation type to a single byte code (like the ubiquitous Swedish), but I'm not sure what this might do to sites with multi-byte characters. I don't think URLs are allowed to be in a multi-byte character set - that is if I read the RFC correctly.
Another option is to drop that index. This would have a performance impact on the duplicate checking and the Pralexa add-on.
Comment #2
nancydruI need some suggestions and guidance here.
Comment #3
nancydruTry the attached patch to the latest 6.x-2.x-dev . You will have to run update.php.
Comment #4
jshprentz commentedThank you for providing weblinks45.patch.
I downloaded, reviewed, and installed the patch. The new urlhash field and index work as expected.
I recommend two changes:
Speedup to weblinks.install: In
weblinks_update_6110(), replace the loop that computes new md5 checksumswith an update that uses MySQL's MD5 function
Correction to weblinks.module: In
weblinks_cron(), when the link checker receives a 301 status, it saves the new url, but does not compute a new urlhash. Near the end of the function, set$link->urlhash:Comment #5
nancydruThanks for checking it out. I think I discovered the missing hash after I posted this, but I will double check before I commit it.
I considered using the MySql hash, but could not determine if that is standard SQL - so I don't know if it would work with Postgres. If someone knows for sure, then your method is better.
Comment #6
jshprentz commentedMD5() is not standard SQL, but MySQL and PostgreSQL implement it.
PostgreSQL lists the md5() function here: http://www.postgresql.org/docs/current/static/functions-string.html
Oracle 10g and Sybase provide md5 capabilities with different interfaces. SQL Server and DB2 do not provide built-in md5 capability.
Comment #7
nancydruOkay, I'll change it.
Comment #8
jshprentz commentedThe Schema module spotted another minor discrepancy in your patch:
The urlhash field is defined with
'not null' => TRUE:In function weblinks_update_6110(), the field is added with
'not null' => FALSE:You could make the definition and db_add_field values match, or you could alter the field to
'not null' => TRUEafter assigning urlhash values to all records.Comment #9
nancydrucommitted to 6.x-2.x
Comment #10
nancydruClearing the issue list.