Download & Extend

improve regex for reCaptcha mailhide to reliably catch emails

Project:reCAPTCHA
Version:6.x-1.7
Component:reCAPTCHA Mailhide
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs review

Issue Summary

The mailhide finds emails without markup. Mailhide doesn't find emails inside of html tables. The input parsing order doesn't make a difference.

Comments

#1

The following line wasn't replaced by a mailhide:

<td>mail@domain.tld</td>

A simple solution is to add spaces before and after the mail-address:

<td> mail@domain.tld </td>

A better solution is to replace the following line of the modules/recaptcha/recaptcha_mailhide.module:

$text = preg_replace_callback("!(<p>|<li>|<br\s*/?>|[ \n\r\t\(])([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?]?)(?=(</p>|</li>|<br\s*/?>|[$

with the following line (added the <td> and </td> tags to the regex):
$text = preg_replace_callback("!(<p>|<li>|<br\s*/?>|<td>|[ \n\r\t\(])([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?]?)(?=(</p>|</li>|<br\s*/?>|</td>|[$

#2

Neither solution worked for me. See here:
http://txgifted.org/~txgift5/texas-parent-affiliate-groups-0

#3

changed the regex to

!\b[A-Z0-9+_.-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}\b!i

but you need to apply the patch below since some minor module changes are needed elsewhere

#4

Title:reCaptcha mailhide doesn't find emails inside of html tables» improve regex for reCaptcha mailhide to reliably catch emails
Version:6.x-1.0» 6.x-1.7
Status:active» needs review

Patch to improve regex:
Note: emails still need to be entered as plain text, but they should be captured wherever they appear, including in tables.

AttachmentSize
improve-email-regex-266197-4.patch 1.52 KB

#5

re-rolled patch using --relative diff

AttachmentSize
improve-email-regex-266197-5.patch 1.38 KB
nobody click here