I came to the conclusion, that we need to sort and discuss a better concept for URL field validation and maybe input/output form filtering in general, since all the different needs out there are causing tons of cluttering unsolved issues (look below). Not only for link module. But basically it is all about the same need: Filtering. Often discussed seperately, but even URL validation is nothing else than about options to limit or allow more character combination exceptions of a given URL, which will be compared to certain validation pattern. Which is finally nothing else then: Filtering. Good. But ...

There are too many corner cases and feature requests, to be able to implement them ALL one after the other without causing to loose focus on the link field itself. That's why I have started this issue here. We would have a cluttered and daily expanding settings form for input/output filtering only, conflicting with each other randomly. And I doubt that this kind of issues will ever have a final end regarding the wind of change in the daily web. One may wants an URL input filter to be restricted to Twitter, another to convert Facebook links only on output, another to mailto links or anchors only. The next wants to convert all inputs to a http absolute url, the next wants to restrict input to existing internal link urls etc. Try to imagine this on a checkbox form. Folks. This is a module, maybe an API already, but not a group of features extending the link field module. One solution could be, to inject an optional "regex" kind of input field with only a few checkboxes to create negative and postive validation restrictions. This would make it future-proof and if somebody is missing a validation pattern it could be added manually. Custom_filter module is a good example for this already, and should be on our checklist, if it makes sense to support Custom filter module in link field module.(?) So been sad, I think, the right (better) way is, to find a maybe more complex but all-embracing configuration method, which lets the admin better decide how and when to validate or filter the input/output in the url field.

Dear followers, please use the follow button up right instead of posting "subscribe" messages in the queue. Thanks

Issues I have closed with respect to the requests to lead the discussion on here, feel free to read thru' them for gathering information:

#722524: DO NOT accept local links
#863396: Link Field Treats Bad URLS as Internal URLS
#831980: disallow internal links / validation
#992408: Add Absolute Plain Text Formatter (D6 - patch provided in the issue queue)
#1092442: URL validation support for custom PHP validation code
#1115354: URL validation should match RFC 3986 (D7 - patch provided in the issue queue)
#725730: URL validation using tokens (D6 - patch provided in the issue queue)
#1422604: Validation and display of <front> links
#920312: Add Spanish and Catalan chars support in validation (D6 - patch provided in the issue queue)
#1047444: Token replacement supports node entity only (D7 - patch provided in the issue queue)
#739854: blacklist certain URLs
#369311: Use parse_url instead of regex
#1306352: Link adding equals signs to URL
#840902: URL not validated for multiple values
#575344: Spaces in URL cause link_validate_url() to fail
#412404: CCK link should take token as part of URL
#621164: ADDING "banned URLs" to LINK Field
#733640: Does not allow greek characters in URL
#1072304: only named anchor in link (D6 - patch provided in the issue queue)
#1042012: validate function does not allow unicode characters

And finally there also is a part of mine thinking about a new or maybe existing other module(?), which can provide it all but also can provide it for link field and also for other fields like for body fields own url validation. Google shows me very insightfully that there is something in need: http://www.google.com/search?client=ubuntu&channel=fs&q=Drupal+URL+valid...

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dqd’s picture

Title: Gathered: a new and more all-embracing purpose of URL validation » General discussion: future-proof choice of limiting validation pattern (link types)
Version: 7.x-1.x-dev » master

@TODO

Since some issues from above among themselves and the suggestions in the comments below are duplicates of the same feature request from time to time, I join them all here below:

Input / Output filtering and URL validation feature list

  1. # Adding "banned URLs" to a black list of not allowed URLs
  2. # Limit allowed input to mailto
  3. # Limit allowed input to absolute/external (http://)
  4. # Limit allowed input to internal
  5. # Allow language specific special chars and unicode as valid URL exceptions
    • (Feature seems to be half implemented in D7 ?)
  6. # URL validation support for custom PHP validation code
  7. # Validation and display of <front> links
  8. # Support for named # anchor links
  9. # URL validation using tokens
SeanA’s picture

Title: Optional input restrictions and future-proof list of choice for limited validation pattern (please join the discussion!) » Gathered: future-proof choice of limiting validation pattern (link types)

To start with, when we create a link widget, a simple choice between http and mailto would be nice. As it is now, I can label a field "email" but the user can actually enter any kind of link she wants. (By the way, "Limit field instance to certain link types" is still a pretty good description of this feature.)

If I understand your proposal, admins would be messing around with a complex regex in order to accomplish things like this? Sounds like it might generate a lot of support requests.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Funny, this comment here on another duplicate URL validation link field issue is mentioning the extlink module of quicksketch, which uses a very close and near to concept of what I have meant with "regex" kind of field in the beginning of this issue here above.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Title: Gathered: a new and more all-embracing purpose of URL validation » Gathered: future-proof choice of limiting validation pattern (link types)
Category: task » feature

SeanA: to limit the URL field input to mailto or http links only, could be indeed one of many use cases for URL validation options. I've added your suggestion under #1 (TODO).

But as I sad above, let's collect them all here to find a more general solution for ALL of them, which would implement your request also. When I sad "regex kind of", I meant that we rather should have 1,2 input fields (like tokens) and 1,2 checkboxes, rather than thousands of fields and checkboxes, each with only a single purpose to only catch each single scenario with another single form addition. And as the extlink module and the custom filter module shows, the idea isn't that bad at all ;) Nothing to "mess around" with. The issue queue of feature requests for each new single URL validation option would be quite longer (and already is). 50% of all issues in the queue of link module are regarding URL validation and input filtering.

Let's face it: There won't be and probably never will be an all-in-all already embracing and final landed input/output module for links only. That makes no sense. Which works like a selfexpanding chrismas wishlist for any URL validation or input/output filtering method? But then also with an "easy" form of checkboxes carrying tons of single use-case URL validation scenarios with "a mouseclick" while new upcoming scenarios are landing daily and are implemented and carried again and again automaticly? How this? No "think-twice-what-you-type" kind of input field, only to avoid, that you as a "web admin" has to "mess around" with one little bit more attention commanding but smart solution for all of this? I think, since allowed URLs are also expanding and changing over the time more and more, there is rather a smart user input in need, instead of a never uptodate select field or checkbox field group.

Modules like Better formats, Field formatters, Display suite's own field display input form for field container markup, Views, extlink, Custom filters, Wiki tags, and many more, they all doesn't seem to be put off the fact that you may have to "mess around" with it a little bit more thinking before you type, to gain their approaches.

dqd’s picture

If anyone else finds other URL validation or input/output filtering issues in the issue queue of link field module, please post them here and inform over there that we collect them all here. I've spend a whole night to collect already tons of them joined above. I could need some help on this.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Title: Gathered: future-proof choice of limiting validation pattern (link types) » Gathered: a new and more all-embracing purpose of URL validation
Version: master » 7.x-1.x-dev

Hm, I think it should rather be set up as a task, which has to be done asap., to respect all the closed issues for this here.

Additionally, it came already to my mind to split the input filter link field issues and the URL validation options functionality concept into 2 seperate issues (or even [sub]modules?), integrating nicely with each other of course, as already mentioned on top of this issue. This could maybe make the 2 features also available for many other scenarios, not only for the link field. For the link recognition in the text input formats for example, or the menu core module, or any social links modules a.s.o a.s.f.

Anonymous’s picture

Title: Gathered: future-proof choice of limiting validation pattern (link types) » Gathered: a new and more all-embracing purpose of URL validation
Category: feature » task

Subscribe. Interested to see if unicode support gets added.

dqd’s picture

@sypl:

http://drupal.org/node/1306444 => Stop subscribing, start following => 79 comments, 10 IRC mentions

;-)

( btw: it's on the list -> #5 )

SeanA’s picture

Title: General discussion: future-proof choice of limiting validation pattern (link types) » Gathered: a new and more all-embracing purpose of URL validation
Version: master » 7.x-1.x-dev

Maybe the 2 concepts, easy and advanced, can be combined. That is, have a few checkboxes (not hundreds or thousands) covering the main use cases. Along with each checkbox, a text field containing the regex for that checkbox, hidden in a fieldset (collapsed by default) which allows editing of that regex snippet for advanced users. So for example, the admin can just select "mailto" for a link field to create an email link, and it works with no further messing around. But also, if fancy custom validation is needed, the mailto regex can be edited.

Something like that should cover 90% of people's needs for link fields. For the other 10% and into the future, perhaps have the ability to add additional checkboxes/textfields where the admin can create custom regex snippets. Does that make sense?

It does sound like a good idea to have separate "URL Validation API" and "Link Field" modules. Some old discussion about field validation in general here: http://drupal.org/node/52051

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Issue summary: View changes

Updated issue summary.

dqd’s picture

Title: An all-embracing attempt for URL validation and Input / Output Form Filtering » Gathered: a new and more all-embracing purpose of URL validation

SeanA,
thats exactly what's in my mind. Look at what I sad in the start post:

One solution could be, to inject an optional "regex" kind of input field with only a few checkboxes to create negative and postive validation restrictions. This would make it future-proof and if somebody is missing a validation pattern it could be added manually.

Good to know that the streets cross here. And thanks for the link. Didn't knew that old discussion. But it shows in the beginning exactly what you and I say, a missing extended form and vaildation API that covers all the vaildation and input/output filtering wishes flying around here indeed. And not only in the issue queue of link field by the way.

Let's face it, sooner or later we have to hitch up our knickers to ...

  1. to convince some of the ancient Britons that the answer isn't here: "but we have already the form API" or "and we already have valid_url() and such" for donkey's years
  2. to get some helpers because I can't write all the code allone
  3. to keep up a good communication basis to be future-proof and to prevent reinventing the wheel while maybe something's going on for D8 already on this.
    • Actually I would like to say: "No, I don't want to move this topic to a D8 core issue/debate, because we need it now but not in 1 and a half year." Buuuut, maybe I am wrong on this. I am pretty sure there is a reason for the often seen strategy on d.o., to mark something for the next bigger release and then to backport it. If it comes to this, let's keep up to get it done soon. As sooner the result, as sooner the backport then.
  4. to keep up the energy on this and on the fact that we also need something for D7, if there is something going on for D8 already now. The motivation goes down then, because we know that we feed a dead horse. Maybe best would be to help with the D8 solution on its way then and fight for a backport from there, if so.
  5. to keep the table clean for all who join later and who can understand easely what this is all about.

Well ... From my experience with Drupal the pionts 4 and 5 will probably the hardest to keep up, if it comes to this.

dqd’s picture

Title: Gathered: a new and more all-embracing purpose of URL validation » An all-embracing attempt for URL validation and Input / Output Form Filtering

I also would like to point to a module which slidely seems to come from the same road. http://drupal.org/project/fapi_validation

I also would liek to point to read the issue queue for core regulary filtered by "validation" http://drupal.org/project/issues/drupal?text=validation&status=All

dqd’s picture

Issue summary: View changes

Updated issue summary.

adam_b’s picture

Title: Gathered: a new and more all-embracing purpose of URL validation » An all-embracing attempt for URL validation and Input / Output Form Filtering

+1 for:

...have a few checkboxes (not hundreds or thousands) covering the main use cases. Along with each checkbox, a text field containing the regex for that checkbox, hidden in a fieldset (collapsed by default) which allows editing of that regex snippet for advanced users.

Regex gives me a severe headache but even I would be prepared to use it for specialised cases.

SeanA’s picture

-

dqd’s picture

@adam_b: as I sad above: there is a rhetoric difference between regex input fields and regex kind-of input fields. Please read this part of my posts again ;)

@SeanA: as I also already have explained, there're already enough functions implemented in Drupal for URL validation as a descrete method. No need to check the links above if you don't know yet, what they are about. *g* And to seperate URL validation is exactly what I wanted to annul, since 2 of 3 parts are the same here, there is no need to have code two times in a D.R.Y. concept. So, NOT to seperate it from any other validation and filtering was actually a part of the whole idea here. If you come to another conclusion on this, then it would help to formulate it exactly as such, so that I know that you read my posts before. And please provide more explanation and arguments on this, if so. But thanks for your effort until now so far. :) (EDIT: This part makes no sense no more since SeanA has removed his comment)

Generally spoken: The issue here is about optimizing websitebuilders experience with input / output filtering as a whole, which -includes- URL validation logically, plus, expanding the evaluation functionally by combining and expanding the functionally supported by Drupal core. There are good reasons, that this topic could move to a Drupal core discussion, but which has only any chance if we are a little bit more careful about any input here in this issue. The link module would profit from that. So this issue should be discussed with the ability to move up on an imagination level apart from existing methods. Please don't get me wrong here, but I would love to prevent thread cluttering by quickly thrown words. Please read what this is all about carefully and please let us respond to and dwell on each other, supporting the conversation. Thanks for understanding. [No offense]

g089h515r806’s picture

1.# Adding "banned URLs" to a black list of not allowed URLs
2.# Limit allowed input to mailto
3.# Limit allowed input to absolute/external (http://)
4.# Limit allowed input to internal

6.# URL validation support for custom PHP validation code

All of these issues could be fixed(resolved) with field validation module.
http://drupal.org/project/field_validation

5.# Allow language specific special chars and unicode as valid URL exceptions •(Feature seems to be half implemented in D7 ?)
7.# Validation and display of links
9.# URL validation using tokens
Maybe also could be resolved by field validation module.

adammalone’s picture

Having browsed through a few issues on this particular module after wanting to insert skype links I've had two ideas.

The first would use some kind of admin interface for the link module to allow users to 'select levels of validation' In there, users could allow/disallow some of the protocols (http(s) allowed but ftp not allowed kind of thing) with a few other checkboxes for validation as users have suggested above.

The other thing I was thinking about which would make a bit more sense in a wider context, would be simply to invoke a hook where specified:

   if (preg_match($internal_pattern_file, $text)) {
     return LINK_INTERNAL;
   }
 // Insert hook here
   return FALSE;
}
 
/**
 * Implements hook_migrate_field_alter().
 */

This hook could take the URL and allow other modules to do their own checking and override any checking that the link module does. So if there is a special usecase like skype: or callto:// it can be written into a custom module.

This could be spread even further into a case where the link module provides the basic framework and there are a number of plugins which users can enable which use the link hook and return true for other things. ie skype plugin would do regex checking for callto:// and skype: links whereas an smb plugin would ensure smb links correctly formatted etc. If a user does not require that particular link type to be checked (they don't know what samba is etc) then they simply do not enable the plugin.

Thoughts?

adammalone’s picture

And after further digging, the link module eventually calls url() which in turn calls drupal_strip_dangerous_protocols() which strips out every protocol except those in the filter created variable - filter_allowed_protocols. Anything that is not one of 'ftp', 'http', 'https', 'irc', 'mailto', 'news', 'nntp', 'rtsp', 'sftp', 'ssh', 'tel', 'telnet', 'webcal' just gets auto stripped out in drupal core regardless of what is done in the link module.

Following this, I might have to agree with #15 that this be moved into a core discussion as a lot of things the link module does, it does in conjunction with filter module and functions with specific variables that are unable to be changed without custom modules.

gmclelland’s picture

Not sure this is relevant to the discussion, but in case you didn't know there is http://drupal.org/project/filter_protocols

Eugene Fidelin’s picture

Some of url filtering described here #1 are alredy implemented in Advanced Link module http://drupal.org/project/advanced_link

justindodge’s picture

I got here from the issue #1306352: Link adding equals signs to URL

The needed statement in your summary of this issue is:
"Support links with query strings that do not have value assignments, for example: http://test.com/doc?mydocid"

The issue that I'm citing provides a patch which improves current behavior, but doesn't handle all possible scenarios.

MTecknology’s picture

file: link/link.module

  $type = link_validate_url($item['url']);
  // If we can't determine the type of url, and we've been told not to validate it,
  // then we assume it's a LINK_EXTERNAL type for later processing. #357604
  if ($type == FALSE && $instance['settings']['validate_url'] === 0) {
    $type = LINK_EXTERNAL;
  }
  if ($type == LINK_EMAIL && !preg_match('/mailto:/', $item['url'])) {
    $item['url'] = 'mailto:' . $item['url'];
  }
  $url = link_cleanup_url($item['url']);
  //$email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';
  $email_pattern = '/^(mailto:|)'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';

It's a really quick hack in regards to making email links work the way that's been commonly requested. I'm sure there are drawbacks, but this is exactly what I need.

firfin’s picture

I have noted in #1179944: Support custom TLDs from the current hostname that it might be a 'duplicate' of this thread. Someone please confirm and set status accordingly. I haven's seen RFC2606 / internal domains names mentioned in this discussion yet.

Perhaps it might also be a good idea to add it to issues in the overview?

firfin’s picture

Issue summary: View changes

Updated issue summary.

jantoine’s picture

Thanks to #16, I just tried out the field validation module and it is amazing. I would challenge anyone on this list to give a use case that it doesn't cover. I think a good solution to this would be to add documentation directing users who need additional validation capabilities to download and install that module.

GiorgosK’s picture

@jantoine
does that mean that link module actually passes the validation of textbox inputs to field_validation module ?
without any modification to the link module ?

jantoine’s picture

@GiorgosK
The field_validation module allows administrators to specify additional validation handlers for individual fields. This is all done via the field_validation_ui sub-module, so the answer for you is yes, the field_validation module allows you to add validation to the link module without any code modifications to that module!

GiorgosK’s picture

@jantoine
thanks for the answer but my problem is such that the validation of link module does not allow Greek characters so the validation of field_validation will never be used

the link module needs modification so that it either lets greek characters pass or let field_validation handle ALL the validations

siretfeL’s picture

I second that. Greek character should pass validation in link module.

@GiorgosK
Is the patch mentioned at http://drupal.org/node/733640 the only solution at the moment; Are there also any performance issues related, apart from patching a core module? Thank you.

GiorgosK’s picture

@ siretfeL
You can try in the link field settings to disable VALIDATION (there is a setting for that) and then allow field_validation module take over validation. Perhaps something like that could work in which case
my previous statement was wrong, but I have not worked with field_validation module so I can't say for certain.

Other than that the patch you mentioned is the only available and it should not have any performance issues (its a very simple patch)

siretfeL’s picture

@GiorgosK
...thank you for your reply...I will try both your suggestions and let you know the results for reference sake...

torpy’s picture

As far as the 'extra equal sign in empty query parameters' problem is concerned (#21) #1306352: Link adding equals signs to URL it's a problem in core: #1425588: \Drupal\Component\Utility\UrlHelper::buildQuery() adds extra ampersands.

yang_yi_cn’s picture

I created an patch to do unicode link checker.

For external links, it uses PHP's filter_var().

For internal links, it uses the old regex.

Both of the above functions doesn't support unicode, so I adapted the idea in http://php.net/manual/en/function.filter-var.php#104160 which is to check if the string is a mbstring, if so, replace all unicode characters with "X" and test again.

I use this idea for both external and internal checks, so I don't need to maintain a list of unicode characters.

This might not be the most elegant way but it works.

Here are some test cases: http://mathiasbynens.be/demo/url-regex

yang_yi_cn’s picture

Issue summary: View changes

Updated issue summary. typos.

dqd’s picture

Status: Active » Closed (duplicate)

sorry for being off so long on this. can someone involved here provide an updated patch against latest dev? Thanks for all your effort.

Otherwise I have to agree with #16 (#1318938-16: An all-embracing attempt for URL validation and Input / Output Form Filtering to close this issue for another module already providing (most of) those features not only for link but for (m)any field modules ...

http://drupal.org/project/field_validation

Another question (bug?) is if URLs break because of special chars like (?). This issues should be collected in another issue. But it all wouldn't be a problem, if link module wouldn't try to make the job of another module. :-)

Feel free to reopen the issue if you have good reason or a good patch against latest dev for this. Thanks for understanding

3magnus’s picture

I just gave up and added this line

// Added code to remove http(s):// from link title
  $vars['element']['title'] = !empty($vars['element']['title']) ? preg_replace('/^http(s)?:\/\//i', '', $vars['element']['title']) : $vars['element']['title'];

to link.module file at line 990 (inside "theme_link_formatter_link_default" function), as @emartin suggested for an older version.

This accomplished the objective (problem) of removing protocol text from link titles (http://, https://).