while developing locally, I entered both the eventual domain as my localhost (127.0.0.1) in the domain field and got presented with this

so I had a lil looksie and noticed that the domain validation is brutally strict and entirely not future-proof

  foreach ($nofollow_domains as $nofollow_domain) {
    if (!preg_match('#^([a-z0-9]([-a-z0-9]*)?\.)+([a-z]+)$#i', $nofollow_domain)) {
      form_set_error('wysiwyg_filter_nofollow_domains_'. $format, t('Invalid domain %domain. Please, enter a comma separated list of valid domain names.', array('%domain' => $nofollow_domain)));
    }
  }

the following domains would be perfectly valid, but are denied by this lil preg_match:
localhost, 127.0.0.1 (or any other IP), íntérnàtîõnàlízéd.com (http://www.icann.org/en/topics/idn/)

so I would like to ask the maintainer what the reason is to even do a check on this? who cares if they entered an invalid domain, it'll just allow that invalid domain to be used, where's the harm? I really only see a downside to this, as the range of valid domains will only expand in the future...

attached patch gets rid of the domain validation

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

markus_petrux’s picture

Category: bug » feature
Status: Needs review » Needs work

Rather than removing, the validation logic could be enhanced.

gaellafond’s picture

The filter also add the nofollow directive to internal relative and absolute path with the Whitelist. They are obviously on the same domain and should never have that directives;

<a href="/path/to/page.html>Page</a>
become
<a href="/path/to/page.html" rel="nofollow">Page</a>

and

<a href="path/to/page.html">Page</a>
become
<a href="path/to/page.html" rel="nofollow">Page</a>

I think the filter should check for the protocol (http://, https://, ftp://, etc.). If there is none, the path is considered as internal and the nofollow should not be added.

seutje’s picture

some new domain TLDs were added, extension امارات for example

pretty sure this fails, as it doesn't consist of a-z

markus_petrux’s picture

@gaellafond: AFAICT, the parser checks for '://' + domain pattern, so internal paths should never match with domains specified here.

@all: I've been looking at the code, and now I see how we can get rid of this 'strict validation'. It could be completely free, but we need to do something more than just removing the validation code. We need:

1) Remove the validation code (as in the patch above), but also...
2) The domain patterns are now stored with dots scaped. Well, we should not scape anything now at storage time (when the settings form is submitted).
3) Since we would not store the patterns escaped, we need to escape them during text parsing, and we need to use preg_quote() for this job.
4) Siince we will be changing the way this setting is stored, we need to implement hook_update_N() to migrate existing installations.

It would be much appreciated if someone can come up with a patch for review. Sadly, I'm too busy to work on this. Thanks

doitDave’s picture

Category: feature » bug

Changed category. This is definitely a bug. Imagine any intranet service trying to whitelist e.g. a server named "docserver" here. Even if this is probably a rather academic thought.

Sorry, but I really don't see a reason to validate what the site admin considers being a domain. And after all even aside consideration, domains may consist of /[a-z0-9\-\.]/. Why do you stick to it?

Danny Englander’s picture

Hi, I am having this same issue with Drupal 7. Should I open a new issue since this one is marked for Drupal 6? Thanks.

scottshipman’s picture

Has this ever been resolved? I cannot edit my text filters on Drupal 7. I have installed the flexifilter module, which I believe may have reintroduced this issue.

Danny Englander’s picture

I figured out a work around for this. Instead of using "mysite" as the domain (as configured in MAMP Pro), I changed it to "mysite.dev". This fooled Drupal into thinking it was a real domain and it solved the issue here.

xtfer’s picture

Had to remove WYSIWYG filter entirely as a result of this bug... might be worth addressing...

brentratliff’s picture

Same issue here. This affects my entire team who develop locally with different conventions for local environments. For now my solution is to comment out the validation locally and make sure I don't commit my changes upstream; less than ideal.

Danny Englander’s picture

@brentratliff & @xtfer -- did you see my comment above? I simply changed mysite to mysite.dev and it fixed this issue albiet a workaround...

xtfer’s picture

@highrockmedia Thanks for the suggestion, but it fails on a path like mysite.local, which is also a legitimate domain.

jakew’s picture

On Drupal 7, It also blocks saving the Input Format admin form, even if WYSIWYG Filter is disabled! To save e.g. the Full HTML format with a disabled WYSIWYG Filter, you need to check the WYSIWYG Filter box to enable it and show the config, change the host name to something more acceptable, uncheck the WYSIWYG Filter box to disable it, and then save.

Alan D.’s picture

Version: 6.x-1.4 » 7.x-1.x-dev

@jakew, see #1687794: Validation occurs on disabled filter. I just opened that

Moving to the latest branch.

Personally, I think people will need to stop domain validation with the i18n changes, or at least allow users to bypass this.

In the link module, the is the instance setting 'validate_url' to bypass, that may be an option here. And it's own "lenient" validation function is:

/**
 * A lenient verification for URLs. Accepts all URLs following RFC 1738 standard
 * for URL formation and all email addresses following the RFC 2368 standard for
 * mailto address formation.
 *
 * @param string $text
 * @return mixed Returns boolean FALSE if the URL is not valid. On success, returns an object with
 * the following attributes: protocol, hostname, ip, and port.
 */
function link_validate_url($text) {
  $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
    "&#x00E6;", // æ
    "&#x00C6;", // Æ
    "&#x00C0;", // À
    "&#x00E0;", // à
    "&#x00C1;", // Á
    "&#x00E1;", // á
    "&#x00C2;", // Â
    "&#x00E2;", // â
    "&#x00E5;", // å
    "&#x00C5;", // Å
    "&#x00E4;", // ä
    "&#x00C4;", // Ä
    "&#x00C7;", // Ç
    "&#x00E7;", // ç
    "&#x00D0;", // Ð
    "&#x00F0;", // ð
    "&#x00C8;", // È
    "&#x00E8;", // è
    "&#x00C9;", // É
    "&#x00E9;", // é
    "&#x00CA;", // Ê
    "&#x00EA;", // ê
    "&#x00CB;", // Ë
    "&#x00EB;", // ë
    "&#x00CE;", // Î
    "&#x00EE;", // î
    "&#x00CF;", // Ï
    "&#x00EF;", // ï
    "&#x00F8;", // ø
    "&#x00D8;", // Ø
    "&#x00F6;", // ö
    "&#x00D6;", // Ö
    "&#x00D4;", // Ô
    "&#x00F4;", // ô
    "&#x00D5;",	// Õ
    "&#x00F5;",	// õ
    "&#x0152;", // Œ
    "&#x0153;", // œ
    "&#x00FC;", // ü
    "&#x00DC;", // Ü
    "&#x00D9;", // Ù
    "&#x00F9;", // ù
    "&#x00DB;", // Û
    "&#x00FB;", // û
    "&#x0178;", // Ÿ
    "&#x00FF;", // ÿ
    "&#x00D1;", // Ñ
    "&#x00F1;", // ñ
    "&#x00FE;", // þ
    "&#x00DE;", // Þ
    "&#x00FD;", // ý
    "&#x00DD;", // Ý
    "&#x00BF;", // ¿
  )), ENT_QUOTES, 'UTF-8');

  $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
    "&#x00DF;", // ß
  )), ENT_QUOTES, 'UTF-8');
  $allowed_protocols = variable_get('filter_allowed_protocols', array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal'));

  // Starting a parenthesis group with (?: means that it is grouped, but is not captured
  $protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
  $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
  $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. LINK_DOMAINS .'|[a-z]{2}))?)';
  $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
  $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
  $port = '(?::([0-9]{1,5}))';

  // Pattern specific to external links.
  $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';

  // Pattern specific to internal links.
  $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
  $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";

  $directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
  // Yes, four backslashes == a single backslash.
  $query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
  $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\ ? ]*)";

  // The rest of the path for a standard URL.
  $end = $directories . '?' . $query . '?' . $anchor . '?' . '$/i';

  $message_id = '[^@].*@'. $domain;
  $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
  $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';

  $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
  $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';

  if (strpos($text, '<front>') === 0) {
    return LINK_FRONT;
  }
  if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
    return LINK_EMAIL;
  }
  if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
    return LINK_NEWS;
  }
  if (preg_match($internal_pattern . $end, $text)) {
    return LINK_INTERNAL;
  }
  if (preg_match($external_pattern . $end, $text)) {
    return LINK_EXTERNAL;
  }
  if (preg_match($internal_pattern_file, $text)) {
    return LINK_INTERNAL;
  }

  return FALSE;
}
bob.hinrichs’s picture

I will lend my voice to this. It is creating a time-suck for me trying unsuccessfully to set up a local mail infrastructure when all we really need is for the validation to allow a perfectly valid ip mail server name.

kerrycurtain’s picture

Have same problem with Wysiwyg Filter - when enabled it:

1. Produced a red message 'Invalid domain localhost. Please, enter a comma separated list of valid domain names', and

2. Obstructed the 'save changes' function in the filtered HTML text format config making it impossible to save any changes to settings

Changing the domain name from localhost to localhost.dev in the Wysiwyg Filter settings appears to have worked for me having only one site. Thanks highrockmedia.

kifuzzy’s picture

the #13 works for me in my case, this time. thx :) @jakew - to prevent it by "system" working on this "issue" en details would be sthg to do otherdays.
(edit: drupal 7.22 - CKEditor 3.6.5.7647 - Wysiwyg 7.x-2.2)

squidhaven’s picture

FileSize
986 bytes

Fixed for my situation by just modifying the conditional to also allow a value of 'default'

doitDave’s picture

Honestly. Why validate a domain at all. What has this got to do with a WYSIWYG filter at all? Is it intended to be some security snake oil? If so, you're trying to reinvent the wheel although you are not even automotive. If not, is it a symptom of vanity ("look how much we care")? In that case it's even more evitable. No offense, but the world's best-performing tools have always been the ones to focus on core competence.

This module is intended to validate HTML syntax, not URL syntax. Validating a domain name ist best done by a) the guy to enter the markup (it's him who wants to link somewhere!) and b) the browser rendering the site, which is besides much more likely to happen correctly as browsers are maintained by far more than a handful of people and (hopefully) by the ones always up to date with continuous changes in such things as "what is a valid URL".

Sorry for the little sidestep, let's get back to the point: Drop it, move it into a separate plugin so site-builders are free to decide or make it optional. I really don't understand the big deal about it. No offense, again, but three years for a ridiculous issue like this? Without any real feedback? Has the module been dropped? Wouldn't it be time to at least invite an additional maintainer? Someone to fork it if nothing else helps?

Sad sad, because it's an important and helpful module after all. :-(

markus_petrux’s picture

Please, read #4. No one provided a patch in that direction since then. In addition to that, I may not have the time to deal with it, not even to keep track of follow ups nor the rest of the issues queue, so I have updated the project page as looking for additional co-maintainers. This should state more clearly that others may chime in here to help. ;)

doitDave’s picture

Hell, I overlooked that. Sorry. Well I would at least be able to free some minutes and add a "no validation" option in a first step.

alibama’s picture

am still getting this error - tried several patches (#1 here as well as #1687794: Validation occurs on disabled filter - no luck, am disabling

doitDave’s picture

Find a patch attached which adds a checkbox for each filter format. Now, the domain validation will only take place if this checkbox is checked.

Hope this helps and finds its way into the module quickly. A second step (but not necessary for this quick solution) could be a rework of the validation itself - to meet latest W3C best practices. If this is still an option, how would you want to validate strings that may consist of almost everything now, from /[a-z]+/ to /[^\.]+(\.[^\.]+)+/?

I would still vote for removing this validation completely. It does not make any sense for those who really would use it as they will know theirselves how their local domains are validly spelled, and for all others who do not want to bother with nofollow options, it is useless anyway.

Either way: HTH!

dave

doitDave’s picture

Status: Needs work » Needs review
doitDave’s picture

...

aubjr_drupal’s picture

I had this problem (using Panopoly distro from Pantheon, has WYSIWYG filter). I tried the patch in #23. Got an error the first time I went to /admin/config/content/formats/panopoly_wysiwyg_text:

Notice: Undefined index: nofollow_url_validation in wysiwyg_filter_filter_wysiwyg_settings() (line 151 of /home/me/mysite/profiles/panopoly/modules/contrib/wysiwyg_filter/wysiwyg_filter.admin.inc).

It looks like there's no default set already for $settings['nofollow_url_validation']. When I checked the new checkbox created by this patch and saved it, the error went away.

Also, upon clicking "Add filter" to create a new content filter, I get the same error. I also get the error upon submitting that form, though the new filter appears to have been created. (The latter may be attributable to Drupal's error message lag I've seen?) When I go back to configure the new filter and save it, the error is gone.

Again, this occurs whether "WYSIWYG Filter" is checked or not in any case.

Panopoly is currently using 7.x-1.6-rc2, not the dev version.

EDIT: I also vote for turning off this particular validation.

WeAreGeek’s picture

I also installed the patch from #23.
It works fine, but I also got the error message mentioned in #26.

I'm not using Panopoly btw.

And I also vote for turning off the validation, or at least to make it optional.

doitDave’s picture

Fixed the missing initialization check mentioned in #26, thanks for the hint. Here is a new patch checking this, although I barely believe that anything will happen here after all. Sorry to say that.

revagomes’s picture

+++ b/wysiwyg_filter.admin.incundefined
@@ -144,6 +144,13 @@ This option allows you to specify which HTML elements and attributes are allowed
+    '#description' => t('If enabled, domains you enter in the field below will run through a simple, but sometims too restrictive check.'),

There is a minor typo in the nofollow_url_validation form field description. Despite the typo the patch worked fine. Thanks!

revagomes’s picture

Status: Needs review » Reviewed & tested by the community

Changing the status.

HyperGlide’s picture

HyperGlide’s picture

Ping.

HyperGlide’s picture

Updating patch per #29 comments.

studiotwelve’s picture

Went the workaround route, enabling wysywig filter changing the hostname(s) to a valid one.

I think the best idea for a patch to this would be to simply disable this filters form_alter for the input format form so that it only tries to validate if the filter is enabled.

maybe...

<?php
if($form_state['values']['filters']['wysiwyg']['status'] == 1){
  // *** validate nofollow_domains ***
  $nofollow_domains = array_filter(explode(',', preg_replace('#\s+#', ',', $values['nofollow_domains']))); // form2db
  foreach ($nofollow_domains as $nofollow_domain) {
    if (!preg_match('#^([a-z0-9]([-a-z0-9]*)?\.)+([a-z]+)$#i', $nofollow_domain)) {
      form_set_error('nofollow_domains', t('Invalid domain %domain. Please, enter a comma separated list of valid domain names.', array('%domain' => $nofollow_domain)));
    }
  }
}
?>

- or -
wrap the entire contents of the validate function in an if statement ...should'nt it be standard for all _validate functions to check if their form should even expect input or not?

<?php
/**
 * Validate filter settings form.
 *
 * @ingroup forms
 */
function wysiwyg_filter_filter_wysiwyg_settings_validate($form, &$form_state) {
	if($form_state['values']['filters']['wysiwyg']['status']==1){
	  $values =& $form_state['values']['filters']['wysiwyg']['settings'];
	
	  // *** validate valid_elements ***
	  // Check elements against hardcoded backlist.
	  $elements_blacklist = wysiwyg_filter_get_elements_blacklist();
	  $valid_elements = trim($values['valid_elements']);
	  $valid_elements = wysiwyg_filter_parse_valid_elements($valid_elements);
	  $forbidden_elements = array();
	  foreach (array_keys($valid_elements) as $element) {
	    if (in_array($element, $elements_blacklist)) {
	      $forbidden_elements[] = $element;
	    }
	  }
	  if (!empty($forbidden_elements)) {
	    form_set_error('valid_elements', t('The following elements cannot be allowed: %elements.', array('%elements' => implode(', ', $forbidden_elements))));
	  }
	
	  // *** validate nofollow_domains ***
	  foreach (wysiwyg_filter_get_advanced_rules() as $rule_key => $rule_info) {
	    $field_name = "rule_$rule_key";
	    $expressions = array_filter(explode(',', preg_replace('#\s+#', ',', trim($values[$field_name])))); // form2db
	    $errors = array();
	    foreach ($expressions as $expression) {
	      if (preg_match('`[*?]\*|\*\?`', $expression)) {
	        $errors[] = t('Invalid expression %expression. Please, do not use more than one consecutive asterisk (**) or one that is next to a question mark wildcard (?* or *?).', array('%expression' => $expression));
	      }
	      if (!preg_match($rule_info['validate_regexp'], $expression)) {
	        $errors[] = t('Invalid expression %expression. Please, check the syntax of the %field field.', array('%expression' => $expression, '%field' => $rule_info['title']));
	      }
	    }
	    if (!empty($errors)) {
	      form_set_error($field_name, implode('<br />', $errors));
	    }
	  }
	
	  // *** validate nofollow_domains ***
	  $nofollow_domains = array_filter(explode(',', preg_replace('#\s+#', ',', $values['nofollow_domains']))); // form2db
	  foreach ($nofollow_domains as $nofollow_domain) {
	    if (!preg_match('#^([a-z0-9]([-a-z0-9]*)?\.)+([a-z]+)$#i', $nofollow_domain)) {
	      form_set_error('nofollow_domains', t('Invalid domain %domain. Please, enter a comma separated list of valid domain names.', array('%domain' => $nofollow_domain)));
	    }
	  }
	}
}
?>
studiotwelve’s picture

Issue summary: View changes
FileSize
1.13 KB

created a patch for this very simple approach...

studiotwelve’s picture

okay now it's patched so that wysiwyg doesn't do anything with it's form unless you enable it by checking the 'Wysiwyg Filter' checkbox under 'Enabled Filters' this should take some headaches away...

Maya2013’s picture

Both patches are not working when changing lines manually :
# 35: Unexpected if in line 171
# 36: unexpected if in line 234

Just for you information. When there's someone who could send the working wyswig_filter.admin.inc file to me I'd be very grateful!

Maya2013’s picture

For those who want to create a new text format to use full html (like for a paypal button-block):

I found a solution by changing the CKEditor profile settings like here. And disabling the Wysiwyg-Module. So the new text format was not necessary anymore.

HeathN’s picture

#13 is the most helpful.

pameeela’s picture

Thanks HeathN and jakew, I read through this issue several times before I saw your comment and read #13. There is a workaround without any code changes:
If you are just trying to save the settings:

  • Enable the WYSWIYG filter (even though you don't need it) and update the domain options, either enter the valid ones or just delete the default and leave it blank
  • Save the updated settings
  • Disable the WYSIWYG filter and save again
Mixologic’s picture

Issue tags: +IDN
dgtlmoon’s picture

Status: Reviewed & tested by the community » Needs review
geek-merlin’s picture

Status: Needs review » Closed (outdated)

This seems fixed according to current wysiwyg_filter_filter_wysiwyg_settings_validate().
Please reopen a new issue if not as this one is a real mess.