Closed (fixed)
Project:
External Links
Version:
5.x-1.6
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
22 Jun 2008 at 13:24 UTC
Updated:
20 Dec 2009 at 01:10 UTC
The current regex for host name and subdomain determination
- requires that the primary domain name is longer than 3 chars ({4,}), whereas domain names (f.e. .com) can even be two characters long.
- does not require a top-level domain ((\.[a-z]{1,4})*), which breaks compatibility with short primary domain names.
Attached patch fixes both regular expressions.
| Comment | File | Size | Author |
|---|---|---|---|
| extlink.js_.patch | 806 bytes | sun |
Comments
Comment #1
quicksketchCould you post an example of a domain that doesn't work with this checking? I've got the primary domain needing URLs longer than 4 characters just because it's easy to keep it separated from the top-level domain (though even this won't work now that they have "museum" top level domains, which also needs to be added).
It seems like the second change might result in improper identification of a subdomain also, since a domain like "bbc.co.uk" would end up with "bbc" being the subdomain, and "co" being the primary domain. Hence making any site in the UK (or many other countries) consider all links within the country internal.
Comment #2
sundamn. I didn't think of .co.uk... you're absolutely right! :)
Examples: hp.com, dvd.de, etc.
Well... the list of top-level-domain names and valid official domain names is pretty long, and changes quite often. I don't think we will be able to maintain all exceptions in this module.
Hence, how about removing this auto-detection part completely and replace it with a textarea on the settings page (much like the block visibility path pattern matching)?
So users would be able to enter their domains manually, f.e.
That would also open the door to support not only subdomains, but also different domain names, which might be served from the same Drupal site.
Comment #3
quicksketchI'd prefer to keep the subdomains checkbox option, I think that's a very common scenario. The white/blacklist is a good idea though, if we decide to go with that we should continue in #227020: Exempt links to particular domains.
Comment #4
serkanuz commented"(\\b(?
https?|ftp)://" +
"(?[-A-Z0-9.]+)\\." +
"(?([-A-Z0-9]+)(.com|.net|.org|.biz|.us|.edu|.[a-z][a-z]\\.[a-z][a-z]))" +
"(?[-A-Z0-9+&@#/%=~_|!:,.;]*)?" +
"(?
\\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?)\r\n"
hey guys it is my code. You can get co.uk, com.tr and others. But it has a missing if you want to parse http://serkanuz.com .
I hope it helps.
Comment #5
quicksketch#321690: Method to include certain internal links now allows a white/black list of URLs to include or exclude. If you come up with a better regex you can just enter it in the whitelist. Ofcourse if you get a perfect regex, it'd be great to just include it in the module directly. But currently we still don't have anything better.
Comment #6
quicksketch