There is this query:

$duplicate = db_result(db_query("SELECT COUNT(cid) FROM {comments} WHERE pid = %d AND nid = %d AND comment_md5 = '%s'", $edit['pid'], $edit['nid'], $edit['subject'],  $edit['comment'])), 0);

To check for duplicates. you are searching non-indexed fields for one, and then checking for the duplication of information that has a high entropy (meaning pretty random date.

This patch creates a CHAR(32) field. When a comment is added, a md5 hash is made from the subject and comment:

$duplicate = db_result(db_query("SELECT COUNT(cid) FROM {comments} WHERE pid = %d AND nid = %d AND comment_md5 = '%s'", $edit['pid'], $edit['nid'], md5(trim($edit['subject']) . trim($edit['comment']))), 0);

The code also re-hashes when a comment is updated. This should greatly improve comments and clean up some sql.

CommentFileSizeAuthor
comment_md5_speedup.patch4.78 KBSouvent22

Comments

m3avrck’s picture

Status: Needs review » Needs work

Great indeed! But it can be faster:

  $duplicate = db_result(db_query("SELECT COUNT(cid) FROM {comments} WHERE pid = %d AND nid = %d AND comment_md5 = '%s'", $edit['pid'], $edit['nid'], md5(trim($edit['subject']) . trim($edit['comment']))), 0);

What about doing...

$duplicate = db_result(db_query(SELECT * FROM comments WHERE comment_md5 = md5(subject . comment))));

if ($duplicate) {
 // do something
}

That gets rid of the slower count, since it there is one result, then it's a duplicate. No need for the count.

m3avrck’s picture

Status: Needs work » Closed (duplicate)

Here's a patch to remove this query since it offers no real value and just slows things down: http://drupal.org/node/149339