SpamSpan should not touch URLs at all, I think.

[scheme]://[user]:[pass]@[host]/[path]?[query]#[fragment]
"[scheme]://" could be optional.

Try this as an example:

<a href="user:password@www.drupal.org/something?q=user@drupal.org#user@drupal.org">Link 1</a>
<a href="https://user:password@www.drupal.org/something?q=user@drupal.org#user@drupal.org">Link 2</a>

You could use the php function parse-url to check that:
http://php.net/manual/en/function.parse-url.php

Perhaps with an additional check if a non-valid url gets valid if a scheme is added before.

Comments

gaele’s picture

Version: 6.x-1.5-beta2 » 6.x-1.6
Priority: Minor » Normal

This sucks.
Flickr uses "@" a lot in their urls, e.g.
http://farm6.static.flickr.com/5060/buddyicons/1606911@N23.jpg

peterx’s picture

Issue summary: View changes
Status: Active » Postponed (maintainer needs more info)

There is no maintainer for the D6 version. The D7 version selects via a regular expression and regular expressions usually break when you change them, introducing more errors than you fix.

A change like this needs should be made in the D7 version then backported. D7 has a test system. This change needs someone who is an expert on regular expressions and the D7 test system, someone with the time to experiment and to create test cases in the Drupal test system.

peterx’s picture

Status: Postponed (maintainer needs more info) » Closed (won't fix)

The following line is the example presented as a failure.
user:password@www.drupal.org/something?q=user@drupal.org#user@drupal.org
I thought about changing the regular expression to exclude email addresses preceded by a colon then I found a site with the following text.
Email example:fred@example.com

A regular expression will not fix the problem. The other addresses need a span or div around them to protect them and Spamspan would need appropriate code to identify the protected addresses. Spamspan could have an option to only process addresses identified by a span or a div but it would have to be off by default and you would have to find someone to develop the change.

You could also add specialised fields to the content type and insert them into text through tokens. There are a few modules for that type of change.

If there is an easy reliable way to identify the difference between an email address and the examples you provide, talk with regex experts about submitting a change.