Hash Comments for performance [#134139]

There is this query:

$duplicate = db_result(db_query("SELECT COUNT(cid) FROM {comments} WHERE pid = %d AND nid = %d AND comment_md5 = '%s'", $edit['pid'], $edit['nid'], $edit['subject'],  $edit['comment'])), 0);

To check for duplicates. you are searching non-indexed fields for one, and then checking for the duplication of information that has a high entropy (meaning pretty random date.

This patch creates a CHAR(32) field. When a comment is added, a md5 hash is made from the subject and comment:

$duplicate = db_result(db_query("SELECT COUNT(cid) FROM {comments} WHERE pid = %d AND nid = %d AND comment_md5 = '%s'", $edit['pid'], $edit['nid'], md5(trim($edit['subject']) . trim($edit['comment']))), 0);

The code also re-hashes when a comment is updated. This should greatly improve comments and clean up some sql.

Comment	File	Size	Author
	comment_md5_speedup.patch	4.78 KB	Souvent22

Comments

Comment #1

m3avrck commented 6 April 2007 at 20:52

Status:

Needs review

» Needs work

Great indeed! But it can be faster:

  $duplicate = db_result(db_query("SELECT COUNT(cid) FROM {comments} WHERE pid = %d AND nid = %d AND comment_md5 = '%s'", $edit['pid'], $edit['nid'], md5(trim($edit['subject']) . trim($edit['comment']))), 0);

What about doing...

$duplicate = db_result(db_query(SELECT * FROM comments WHERE comment_md5 = md5(subject . comment))));

if ($duplicate) {
 // do something
}

That gets rid of the slower count, since it there is one result, then it's a duplicate. No need for the count.

Comment #2

m3avrck commented 4 June 2007 at 22:33

Status:

Needs work

» Closed (duplicate)

Here's a patch to remove this query since it offers no real value and just slows things down: http://drupal.org/node/149339

Hash Comments for performance

Comments

Comment #1

Comment #2

News items

Our community

Documentation

Drupal code base

Governance of community