Hi all

Attached is a patch that is a major rewrite to function _filter_url($text, $format). It is based on a recent version of the 5.x development branch, but it only touches parts that are unchanged in 6.x so it can be applied as is to that too.

Rationale

The current version of the URL filter is really basic. The regular expressions used only find url's in plaintext, inside p, li and br tags. In addition it has some bad bugs, such as it might manipulate url's that are really part of some html element's attribute.

This patch extends the "contexts" in which URL's are searched for, and on the other hand makes the regular expressions more accurate so as to avoid manipulating url's that absolutely must not be touched.

What changed?

//From line 1109...
/**
 * URL filter. Automatically converts text web addresses (URLs, e-mail addresses,
 * ftp links, etc.) into hyperlinks.
 */
function _filter_url($text, $format) {
  //The following tags may contain text with a URL that should be converted to a link
  //List also contains standalone/empty tags like img and br
  $tags_begin =  'caption|td|th|div|dd|dt|li|blockquote|address|p|h[1-6]|i|b|u|tt|big|small|strike|s|em|strong|font|dfn|samp|kbd|var|cite|abbr|acronym|sub|sup|q|ins|del|img|br|object|applet';


  //transform each tagname from 'p' to '<\/?p[^>]*>'
  //this pattern will match <p>, </p>, <p class="foo"> and <p />
  //the last one is significant for xhtml standalone tags like <img /> and <br />
  //note: <img/> and <br/>, won't work. While valid xhtml their use is discouraged anyway
  //note: the pattern will obviously match many invalid tags too like <p &&&>    
  $arr_tags_begin = explode( "|", $tags_begin );
  for ( $i = 0; $i < count( $arr_tags_begin ); $i++ ){
    $arr_tags_begin[$i] = '<\/?' . $arr_tags_begin[$i] . '[^>]*>'; 
  }
  $tags_begin = implode( "|", $arr_tags_begin );

  // Pass length to regexp callback
  _filter_url_trim(NULL, variable_get('filter_url_length_'. $format, 72));

  $text   = ' '. $text .' ';

  //The structure of each three matches below is
  // 1. a tag from the list above or newline or </a>
  // 2. any characters except "<" 
  // 3. the url to match
  // 4. possible period or other marks at end of url are considered separate (end of sentence)
  // 5. end url in space, newline or tag ("<" character)

  //In many cases the patterns below only match the first url if there are many in the same paragraph. 
  //Therefore the preg_replace are encapsulate in loops that will re-run the replace as many times as needed.

  // Match absolute URLs. 
  $protocols = 'http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://';
  $urlpattern = "($protocols)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-])";
  $re = "`(($tags_begin|\n|\r|</a>)([^<])*?)($urlpattern)(([\.\,\?\!]*?)[\<\s\n\r])`i";
  while(preg_match($re, $text)) {
    $text = preg_replace_callback($re, '_filter_url_parse_full_links', $text);
  }
  
  // Match e-mail addresses.
  $urlpattern = '[A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4}';
  $re = "`(($tags_begin|\n|\r|</a>)([^<])*?)($urlpattern)(([\.\,\?\!]*?)[\<\s\n\r])`i";
  while(preg_match($re, $text)) {
    $text = preg_replace($re, '\1<a href="mailto:\4">\4</a>\5', $text);
  }
  
  // Match www domains/addresses.
  $urlpattern = 'www\.[a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+~#\&=/;-]';
  $re = "`(($tags_begin|\n|\r|</a>)([^<])*?)($urlpattern)(([\.\,\?\!]*?)[\<\s\n\r])`i";
  while(preg_match($re, $text)) {
    $text = preg_replace_callback($re, '_filter_url_parse_partial_links', $text);
  }
  
  $text = substr($text, 1, -1);
  return $text;
}

/**
 * Make links out of absolute URLs.
 */
function _filter_url_parse_full_links($match) {
  //which parenthesis in the regexp contains the url?
  $i = 4;
  $match[$i] = decode_entities($match[$i]);
  $caption = check_plain(_filter_url_trim($match[$i]));
  $match[$i] = check_url($match[$i]);
  return $match[1] . '<a href="'. $match[$i] .'" title="'. $match[$i] .'">'. $caption .'</a>'. $match[$i+3];
}

/**
 * Make links out of domain names starting with "www."
 */
function _filter_url_parse_partial_links($match) {
  //which parenthesis in the regexp contains the url?
  $i = 4;
  $match[$i] = decode_entities($match[$i]);
  $caption = check_plain(_filter_url_trim($match[$i]));
  $match[$i] = check_plain($match[$i]);
  return $match[1] . '<a href="http://'. $match[$i] .'" title="'. $match[$i] .'">'. $caption .'</a>'. $match[$i+1];
}

Attached are a patch that can be applied against 5.x-dev and also the entire filter.module file itself. (So you can just replace your current filter.module to test.)

CommentFileSizeAuthor
#153 drupal.filter-url.153.patch35.4 KBsun
#151 drupal.filter-url.150.patch160.31 KBsun
#144 drupal.filter-url.144.patch36.21 KBsun
#142 drupal.filter-url.141.patch33.99 KBsun
#139 drupal.filter-url.137.patch28.87 KBhingo
#139 test.tar_.gz1.16 KBhingo
#137 drupal.filter-url.137.patch24.43 KBsun
#131 drupal.filter-url.131.patch24.51 KBsun
#127 drupal.filter-url.127.patch24.14 KBsun
#126 drupal.filter-url.126.patch23.98 KBsun
#124 drupal.filter-url.123.patch16.57 KBsun
#122 drupal.filter-url.122.patch17.49 KBsun
#113 161217-url-filter.patch17.3 KBklausi
#111 161217-filter_url-3.patch8.82 KBredndahead
#109 161217-filter_url-2.patch8.78 KBredndahead
#103 filter_url.patch7.65 KBcatch
#99 filter_url.patch8.96 KBcatch
#91 url_filter.patch15.92 KBcatch
#89 url_filter.patch16.96 KBcatch
#87 url_filter.patch16.8 KBcatch
#86 url_filter.patch16.79 KBcatch
#84 url_filter.patch17.26 KBcatch
#80 urlfilter-80.patch8 KBdmitrig01
#74 locale_confirm_280628_7x_13.patch7.47 KBcwgordon7
#71 urlfilter-71.patch7.41 KBhingo
#68 urlfilteragain_0.patch6.29 KBhingo
#65 urlfilter.patch7.63 KBchx
#64 url_filter-161217-64.patch7.75 KBcatch
#63 url_filter-161217-63.patch7.75 KBfloretan
#62 urlfilteragain.patch6.29 KBcatch
urlfilter.patch6.29 KBcatch
urlfilter.patch6.29 KBcatch
#46 urlfilter-161217-46.patch6.3 KBfloretan
#43 UrlFilterTestCases.txt2.08 KBhingo
#36 filter.module.urlfilter-rewrite.HEAD-0.9.patch6.07 KBhingo
#33 filter.module.urlfilter-rewrite.HEAD-0.8.patch6.19 KBhingo
#22 filter.module.urlfilter-rewrite.HEAD-0.7.patch5.88 KBhingo
#17 filter.testdata.txt2.12 KBhingo
#16 filter.module.urlfilter-rewrite.HEAD-0.6.patch5.87 KBhingo
#10 filter.module.urlfilter-rewrite.HEAD-0.5.patch5.26 KBhingo
#9 filter.module.urlfilter-rewrite.HEAD-0.4-badidea.patch5.37 KBhingo
#8 filter.module.urlfilter-rewrite.HEAD-0.3.patch5.56 KBhingo
#5 filter.module.urlfilter-rewrite.HEAD-0.2.patch5.21 KBhingo
#3 filter.module.urlfilter-rewrite.HEAD-0.1.patch4.56 KBhingo
#2 urlfilter-rewrite.testdata.txt1.04 KBhingo
#1 filter.module.txt59.01 KBhingo
filter.module.urlfilter-rewrite.v0.1.patch4.55 KBhingo
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hingo’s picture

Title: Better URL filter in filter module » Better URL filter in filter module (filter.module as attachment)
FileSize
59.01 KB

Here is the filter.module attached

hingo’s picture

Title: Better URL filter in filter module (filter.module as attachment) » Better URL filter in filter module (testdata attachment)
FileSize
1.04 KB

And here is a file that can be used as testdata.

hingo’s picture

Title: Better URL filter in filter module (testdata attachment) » Better URL filter in filter module (patch against HEAD)
FileSize
4.56 KB

Attached is the same patch against the most recent HEAD / 6.0

So no you are all set to go to review and apply this on any system you wish.

drumm’s picture

Version: 5.x-dev » 7.x-dev
Status: Needs review » Needs work

Changes such as this will only be committed to the development branch, which I think will be for the 7.x release cycle, 6.x is currently in feature freeze.

This patch is not properly formatted, it should be in unified diff format. See http://drupal.org/diffandpatch.

hingo’s picture

Title: Better URL filter in filter module (patch against HEAD) » Better URL filter in filter module
Version: 7.x-dev » 6.x-dev
Status: Needs work » Needs review
FileSize
5.21 KB

Oh right, sorry about the patch. Here is a diff -up patch against head attached.

About the 6 vs 7 issue. I would like to point out that this patch contains absolutely no new features. It does the same replacements as the current urlfilter module (the core regexp patterns didn't change) but it does the replacements in some new places where the current one doesn't but imho should (inside many new tags) and otoh it doesn't touch several places the current one shouldn't but does. In this sense I would argue this fix is within the scope of "During the initial stage of the code freeze, documentation updates, usability improvements, and performance improvements will still be accepted. New functionality or API changes / additions, on the other hand, will not be unless deemed critical. As of today, the focus is to strengthen the code base's performance, usability, and stability. As we progress, focus will shift towards stability and, near the end of the code freeze, only bug fixes will be allowed, until no release critical bugs are left." (http://drupal.org/drupal-6.0-code-freeze)

I do realise this patch is not a simple one-liner bug fix and the URL filter has worked fine for many of you for many years now. So if you won't include this in 6.x I'm not going to whine about it more than this, but I do want to stress that nothing new is introduced in this patch, it is a bug fix.

I'm changing version back to 6.x-dev just to get attention from the right people. Please just change it back to 7.x if that is your judgement.

Steven’s picture

Status: Needs review » Needs work

If the goal is to make it context-aware, then I don't see why we don't do a full tag split like other core code does (e.g. htmlcorrector, the search indexer, drupal_html_to_text). That way we can simplify the extremely complicated regular expression into a simpler, more readable heuristic. Ignoring any content between <a> </a> tags or ignoring any HTML attributes would be a simple if check at the right point.

The combined regular expressions that result from all this code seem incredibly convoluted, and contain slow, ungreedy operators.

Also, the following is just not acceptable:

  //note: <img/> and <br/>, won't work. While valid xhtml their use is discouraged anyway
webchick’s picture

Also, from a code-style standpoint, please ensure there is one space between // and the comment and that the comment uses sentence capitalization. For example:

+  //this pattern will match <p>, </p>, <p class="foo"> and <p />

should be:

+  // This pattern will match <p>, </p>, <p class="foo"> and <p />.

There are places in core where the comments don't conform to this standard (yet), but we shouldn't introduce anymore new ones that don't. :)

hingo’s picture

Hi again. Thanks for the good comments. It took me a couple of days to go through them, but now I'm actually back with two new patches.

Attached here is a patch 0.3 that corrects all of the minor problems you have pointed out for 0.2. In particular:

Also, the following is just not acceptable:
//note: Only local images are allowed. and
, won't work. While valid xhtml their use is discouraged anyway

Actually, it seems that comment was wrong and those tags actually did work. The comment is now removed. However, the regexp at that position has now been reworked to be a bit more precise than before, since I realised I had tought about it the wrong way.

Also, from a code-style standpoint, please ensure there is one space between // and the comment and that the comment uses sentence capitalization. For example:

Done.

hingo’s picture

And here is the other patch, addressing the final suggestion made by Steven.

Please not that this patch is not intended to be reviewed or included anywhere. I'm just attaching it as an attachment to the explanation that follows. Please consider 0.3 for inclusion into Drupal core.

If the goal is to make it context-aware, then I don't see why we don't do a full tag split like other core code does (e.g. htmlcorrector, the search indexer, drupal_html_to_text). That way we can simplify the extremely complicated regular expression into a simpler, more readable heuristic. Ignoring any content between tags or ignoring any HTML attributes would be a simple if check at the right point.

In the attached patch I tried to follow this path instead. It is almost a complete rewrite of the whole _filter_url_filter() function. (Whereas patch 0.3. actually is more like additions onto the simple regexps in current HEAD.)

The problem is that URL filter actually makes 3 different replacements. It makes links out of

1) http://any.full.url
2) email@addresses.com
3) www.domains.com

The problem is that once you do the split (you can just split on all tags, it's really simple)
(1) works fine,
(2) works fine since it (practically) never confilcts with (1),
but then at step (3) you end up actually hitting all of the addresses that were converted in step (1), thus messing up the links created in (1). So the problem is that while we preg_split away all tags present in the input, (1) and (2) create new <a> tags and there is no new split function and then you mess everything up at (3).

Possible solutions to this dilemma:

1) Make the last regexp more intelligent/complex so it doesn't do this -> This would end up being the "complex" regexp used in patch v0.3, kind of defeating the whole point of the excercise: Then we'd have both more complex code and the complex regexp.

2) Separate the 3 regexps so that only one is done at a time. For instance have a loop that goes 3 times, doing the preg_split anew for each loop. This way the new <a> tags created in steps (1) and (2) would become protected at step (3).

I personally am of the opinion that while the regexps in v0.3 may be a bit wordy, the complexity of the if-else structures of v0.4 are not that simple either. Adding another loop or other logic doesn't seem worthwile to me - I'd rather vote for v0.3 where at least the code is straightforward, albeit the regexps may be on the long side.

(Another comment on patch v0.2)

The combined regular expressions that result from all this code seem incredibly convoluted, and contain slow, ungreedy operators.

Is the slowness of a regexp in a filter an issue, since the filter is cached? I do admit that URL filter is one of the most used ones, but nobody runs filters without having the caching on, right?

Whether the regexps are readable or not is of course a matter of taste, to the author everything always looks simple. But I'd argue the if-else logic in v0.4 takes about as much time to understand as the regexps in v0.3. (v.0.4 also took longer to get right than modding the regexps took for v0.3.)

So in summary, I'm still proposing the patch v0.3.

hingo’s picture

Small restructuring of the regexp in $tags_begin variable. (Refinement based on patch 0.3, see previous comment.)

Changes line 1089 (line 10 in this patch)

  $tags_begin = '<\/?(?:' . $tags_begin . ')(?:\s[^>]|\/)*>';
  // This pattern will match <p>, </p>, <p class="foo">, <p /> and <p/>
  // The last ones are significant for xhtml standalone tags like <img/> and <br/>
  // Note: the pattern will obviously match many invalid tag-like constructs too 
  // like <p &&&>    

Removes the for-loop that did the same transformation around each tag separately.

This change makes the regexps a bit shorter, so should be slightly simpler to read and also to execute.

hingo’s picture

Status: Needs work » Needs review

Oops, forgot to change status...

hingo’s picture

How should silence in a "needs review" situation be interpreted?

Where is 6.x at the moment, is it still realistic to target this for 6.x or are we close to RC time?

shap’s picture

I applied your patch 0.5 to test. There seems to be a bug here. Given the input

blah blah http://mumble.net

the url is getting filtered, with the effect that the code example is obfuscated. This occurs even though the code tag is NOT on the list of allowable contexts for filtering. Any idea what is happening?

Hmm. Actually, things get substituted in a code tag even without the surrounding blockquote. This doesn't seem right if I am understanding the module code's intentions.

hingo’s picture

Hi shap

Thanks for testing this.

Unfortunately, I cannot reproduce what you are getting. I tested on a vanilla 6.x-dev, with 0.5 patch applied. For instance,

This is just a www.test.com. paragraph with person@test.com. some http://www.test.com. urls thrown in and also <code>using www.test.com the code tag<  /code>.

<blockquote>
This is just a www.test.com. paragraph with person@test.com. some http://www.test.com. urls thrown in and also <code>using www.test.com the code tag<  /code>.
</blockquote>

<code>So what's this about? http://www.test.com abc<  /code>

http://www.test.com
www.test.com
person@test.com
<code>www.test.com<  /code>

All of the other url's get modified, but the url's inside the code tags are not modified.

On the other hand, the behaviour you describe matches the current drupal url filter. (And yes it is wrong and that is exactly the reason I'm trying to fix it.) So could you please check if:

* you really are running filter.module with the patch applied
* you are not seeing a cached version of your content. For instance, write foo123 at the end of the text to make sure it is different from some old version.

hingo’s picture

Status: Needs review » Needs work

I found one instance where actually url's are modified when they shouldn't:

<code>In this example www.test.com the first url is not modified, but <em>the second www.test.com is</em> even if it shouldn't. </ code>

The problem is that v0.5 only pays attention to the innermost tags. I haven't thought of a case in normal html where this would lead to problems, but indeed the code tag is different.

I'll work on this and see how to solve it. Thanks shap for bringin this to my attention - I don't know if this is exactly the bug you were describing, but thanks still.

hingo’s picture

Status: Needs work » Needs review
FileSize
5.87 KB

Well, to fix the problem with tags nested inside code tags, I had to revert to the preg_split/implode strategy from 0.4. I've completed and refined that approach and this patch is a refined version of 0.4.

I would appreciate if you can test this out too. I'm pretty sure this is close to perfect, I dare you to come up with examples to break this :-)

hingo’s picture

FileSize
2.12 KB

Attached is the testdata I'm currently using.

hingo’s picture

Version: 6.x-dev » 7.x-dev

Since 6.0 has now reached beta (congratulations, btw) I'm changing this to target 7.x-dev.

catch’s picture

Version: 7.x-dev » 6.x-dev
Category: feature » bug

Well I've had the url filter mess with urls created by the footnotes filter before (i.e. within a href), so I think this counts as a bug. I'm tentatively moving it back to 6.x-dev since other bugs will get fixed before rc, and changing it to a bug report since that's really what it is.

Sorry no time to review this morning.

hingo’s picture

Great! I certainly agree with classifying this as a bug, if there are more than me who will agree.

I was trying to be cautious, since I'm under the impression that the URL filter is "good enough" for many people and some may not agree that what has now become a complete rewrite would be considered for 6.x. But as this thread too shows, there are planty of things the old URL filter does wrong, and this patch should fix all of them.

hass’s picture

Status: Needs review » Needs work

You fileencoding is wrong (ISO). You must use UTF8 without BOM's. Check your editor, please.

hingo’s picture

Sorry about that, I didn't realise there were any characters where that would have mattered. Here is the same patch but now in UTF-8 format.

hingo’s picture

Status: Needs work » Needs review

Forgot to change status again...

hingo’s picture

...

StevenPatz’s picture

Version: 6.x-dev » 7.x-dev

Beta 2 is out for 6.x, so I'm moving this to the development queue.

catch’s picture

Title: Better URL filter in filter module » URL filter breaks generated href tags
Version: 7.x-dev » 6.x-dev

Bugs are still being fixed in 6.x, and at least the footnotes module is caught by this bug requiring careful weighting of the various filters. It's possible the patch may be too much of a change at this stage, but it'd be nice to get more reviews before bumping it.

hingo’s picture

As someone who doesn't follow the dev mailing list, I'll ask here: Is HEAD already taking in new patches or is activity still focused on D6?

catch’s picture

hingo, currently 6.x and HEAD are the same thing.

jpsalter’s picture

I just ran into this problem in 5.7. Has this thread gone cold? Or, is there a dupe somewhere?

My use case:

<p class="foo">www.example.com</p> --> no conversion
<p>www.example.com</p> --> works

Any chance this will make it into 5.x or should I hack core?

hingo’s picture

Hi jpsalter

You've found the right place, this is THE thread and it is still active.

6.0 is now out and this wasn't applied. I'm actually wondering myself what the correct Version tag for this would be? 6.0 (since this bug exists in 6.0) or 6.x (where it should be applied) or 6.1 or 7.x?

I doubt it will be applied to 5.x, but you should be pretty safe applying it yourself. From what I can tell the _filter_url() functionality has been untouched through the whole 5.x series to 6.x. Even if the patch wouldn't apply you could still just replace the relevant functions with copy pasting.

jpsalter’s picture

Thanks for the confirmation and all your hard work. I hope others recognize it too.

keith.smith’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs review » Needs work

This should be fixed in HEAD (now 7.x-dev) and then backported if appropriate.

The patch in #22 has some code style issues with comments (lacks periods at ends of comments, for one).

hingo’s picture

Status: Needs work » Needs review
FileSize
6.19 KB

Thanks Keith

Old habits die hard... I already had the periods in place, but rewriting parts of the code it seems I introduced periodless comments again.

This patch 0.8
- fixes style issues with comments (periods)
- diff against 7.x-dev (these functions untouched but file changed so line numbers changed from 6.x-dev)

hingo’s picture

Could someone please see if there are any more punctuation or style issues with the comments and help get this committed to 7.x.

I first submitted this patch during my last years Summer vacation, and now the snow is melting and the grass is green again, I would be disappointed not to have such a straightforward fix committed before my next Summer holiday due in 3 months...

keith.smith’s picture

Status: Needs review » Needs work

hingo: Well, since you ask, on a quick read-through, we do use 'HTML' and 'URL' in comments (rather than 'html' and 'url', respectively).

hingo’s picture

Thanks Keith

Would you find in yourself enough energy to see if there is still something that needs to be corrected. I'm afraid next you'll tell me that www should also be uppercase? Speaking of which, is there an official style guide where I could get all this advice in one shot?

v0.9
* style: uppercase HTML and URL

hass’s picture

http://drupal.org/coding-standards is a good starting point and try coder module... it helps, too.

hingo’s picture

Thanks. Looking at it now I realise I already read through it last summer. But it isn't as detailed as some of the advise I got here, which is more like grammatical stuff almost. *Which I'm completely ok with*, btw, it just seems there is an additional "oral tradition" to style issues at Drupal, so it seems I will have to continue relying on the grace of core developers giving some feedback every once in a while.

keith.smith’s picture

hingo: Some of this is contained in pages similar to the coding-standards page (but that are more general in scope), like http://drupal.org/about/authoring.

hingo’s picture

Ah, Thanks Kev. Had not seen that one.

In any case, we are digressing now a bit from the main question? What should yet be done so that this can be committed to 7.x?

hingo’s picture

Status: Needs work » Needs review
floretan’s picture

@hingo: Can you provide some sample content with URLs, some of which are inappropriately processed by the current filter, so that the efficiency of this patch can be tested?

hingo’s picture

FileSize
2.08 KB

Attached is my most developed "test case". The patch works good with all of these. The current url-filter fails at least:
* urls inside a dl list (not processed, should be)
* urls in a title attribute in a link (are processed, shouldn't)
* urls in html comments (are processed, shouldn't)
* urls inside some other html tags also are not processed but should
(This is out of my memory, I haven't used the official urlfilter in a while :-)

hingo’s picture

ping

floretan’s picture

Issue #249200: Test individual filters. adds some test cases that show the failure of the URL filter in the cases mentioned in #43. Please review and extend those tests, and it will help to get a revised URL filter into core.

floretan’s picture

FileSize
6.3 KB

Re-rolled patch from #36.

The tests written in #249200: Test individual filters. have been committed. 12 assertions fail with the current link detection algorithm, but all of them pass successfully with this patch.

hingo’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, flobruit. I'm immediately going to take the benefit of not being the submitter of the last patch anymore and put a reviewer hat on. Let's see if we can finally get this committed!

/me reviewing... Looks good!

We've had several people looking at this without complaints for months now, and now we also have the unit tests committed in filter.test that show the improvement over the existing url filter. I vote for committing the patch by flobruit in #46.

catch’s picture

Still applies with offset, and fixes those failures in filter.test (although there's some other unrelated ones).

catch’s picture

The other failures in filter.test are dealt with here: http://drupal.org/node/266539

sun’s picture

Status: Reviewed & tested by the community » Needs work

Please remove line-break here:

+    // Note: PHP ensures the array consists of alternating delimiters and literals
+    // and begins and ends with a literal (inserting NULL as required).
+
+    // If an ignoretag is found, it is stored here and removed when the 
+    // closing tag is found. Until then no replacements are made.
+    // Think of this as a stack that always has 0 or 1 items.

Please read coding standards again:

+  for ( $task = 1; $task <= 3; $task++ ) {
+    for ( $i = 0; $i < count( $chunks ); $i++ ) {
+        if ( $opentag == '' ) { 

Please improve these comments:

+      // Even/odd = text/tag.
+      if ( $i % 2 == 0 ) { 
+        // Text...
+
+        // Only do replacements when there are no unclosed ignoretags.
+      else { 
+        // Tag...

Please insert a newline between breaks and cases inside of switch statements:

+              break;
+            case 2:

$task seems not to be defined and poorly named.

+  for ( $task = 1; $task <= 3; $task++ ) {

Please improve these comments:

+          // No open "ignoretag".
+          // Is it an ignoretag? (opening tag.)

We do not use such "reminders" in Drupal: (if that's needed, the function body is likely too large)

+        } // if ( $opentag == '' )
+      } // if ( $i % 2 )
+    } // for chunks

...probably more, but please fix those first.

hingo’s picture

Status: Needs work » Needs review

Wow, this looks like a real review! I have to admit I had given up hope...

Sun,

Please remove line-break here:

+    // Note: PHP ensures the array consists of alternating delimiters and literals
+    // and begins and ends with a literal (inserting NULL as required).
+
+    // If an ignoretag is found, it is stored here and removed when the
+    // closing tag is found. Until then no replacements are made.
+    // Think of this as a stack that always has 0 or 1 items.

Please don't! Those are not the same comment. The 2 first lines refer to the previous line, while the three last ones to the following line. (And since next you are going to say that comments cannot appear after the statement they are a comment on, I might as well add right now that this code is a copy paste from existing code in Drupal core and is like that in 4 other places too.)

Please read coding standards again:

This is a rather unhelpful comment (I'm being polite here) and no, I won't read them again. If there is a style issue, say what it is.

We do not use such "reminders" in Drupal: (if that's needed, the function body is likely too large)

Sure, they can be removed. (This must be the first time anyone complained of there being too many comments in a patch I wrote :-)

...probably more, but please fix those first.

Sorry, this won't work. We've had several people now flying in and "helpfully" pointing out punctuation, capitalisation and whatnot issues. I'm not interested in that game anymore. This patch fixes a known bug in Drupal which several people have complained about. Since September 2007 (that's 9 months, I have made a baby in less!) there hasn't been any code updates, just grammar/style discussion. If we get a list of the style/comment issues that should be fixed we will fix them. But at least for my part I'm not gonna participate in a never ending game of "fix these first" reviews. This is a short patch, it is not too hard for you to give a finite list of problems and we'll fix them and then the patch can be committed.

Thanks.

(FWIW, your comments I didn't comment on are perfectly good comments and I'll be happy to improve on those together with the complete list of remarks that I hope to get soon. I also want to point out that it's obviously not your fault that we've already had too many unhelpful helpful reviewers pass by without any progress, you just happened unluckily to be the last one - in fact your review was the best so far.)

sun’s picture

Status: Needs review » Needs work

We've had several people now flying in and "helpfully" pointing out punctuation, capitalisation and whatnot issues. I'm not interested in that game anymore. This patch fixes a known bug in Drupal which several people have complained about.

Hrm. This is called peer review. We, the community, have advanced in this area a lot in the past. No patch will hit core that contains coding-style issues or any other logical/syntactical weaks. If we would slip such patches in, Drupal core would quickly look like err... tryin' hard to avoid names of other CMS. Please note that I have not tested your patch, just reviewed the code. Before any other core or core-alike developer will stumble upon this issue, you should really try hard to make this patch ready to be committed. That means, your patch should include no coding-style or coding-standards issues, and it must stand the objections of performance-related freaks that are yet to come.

I wouldn't have noted the following if it wasn't contained right in the first code example snippet of the first section of Drupal's coding standards:
Please read coding standards again:

-  for ( $task = 1; $task <= 3; $task++ ) {
+  for ($task = 1; $task <= 3; $task++) {
-    for ( $i = 0; $i < count( $chunks ); $i++ ) {
+    for ($i = 0; $i < count($chunks); $i++) {
-        if ( $opentag == '' ) {
+        if ($opentag == '') {

There's no patch attached.

hingo’s picture

Status: Needs work » Needs review

Sun, I have nothing against peer review. We have been waiting for soon a year now for it to happen, and for nine months this patch has been *nothing but* waiting for peer review to happen. So I'm not objecting to the issues you pointed out, I'm objecting to you saying that there are more that you didn't point out! Please do so.

I'm moving this back to Code Needs Review because that's what we really need here. When we know that we have a final list of issues to correct, I'll correct them.

(Thanks for clarification on that one point.)

jpsalter’s picture

A search-n-replace of '( ' => ')' and ' ) => ')' cleans up the spaces in functions.

There are a few comments left that do not begin with a capital letter and end with a period (mostly comments that end loops:

+ } // if ($opentag == '')
+ } // if ($i % 2)
+ } // for chunks

I'm not sure of the Drupal policy of closing } having comments. Maybe it is easier to remove them.

I've been struggling to get my code to meet Drupal's conventions. It is hard to undo years of coding habits. But, after taking the pain on a module or two -- it is getting easier.

Thanks for keeping at this.
Hope this helps.

catch’s picture

@hingo: there are over 270 patches in the queue against D7 that need review, some of those patches have been around a lot longer than this one, and been through many, many more re-rolls, even for comment style issues.

I didn't look at the patch yet, just ran it through tests, but will do so tomorrow.

hingo’s picture

jpsalter, catch: Thanks.

@catch: Sure, and I understand this is how projects like Drupal works, and I'm not in a hurry, for myself I can apply the patch anyway and if no-one else cares then it is ok to focus on higher priority issues.

What is unproductive imho, is that there are people passing by, pretending to do a review when it's just really very very superficial. So for instance in this last instance, 1) the reviewer himself says that his review is incomplete and he may come back for more comments later and 2) he later also says "I didn't even look at the code yet" (which I didn't want to unnecessarily point out, but was also apparent from the first comments, sorry to say). From my experience if those are fixed, then for 2 months it will be quiet again, then someone else will pop by saying "Here are a few more comments but this is not all..." And at no point in these reviews are we really moving closer to actually committing anything.

So by all means, feel free to leave even more such comments, but there is no need to change the Status to "code needs work" when you know yourself you didn't really do a proper review. What this patch needs is more review, and when we've had a proper review it's just 1 minutes work to incorporate all the issues also from the partial reviews, such as those made by Sun.

So what I'm waiting for now is for someone to say: "I have really looked at the code *and* comments and here are *all the issues as far as I can see*." Even better if we get someone to say: "After fixing these I can (or I will propose) commit this patch."

catch’s picture

FileSize
6.29 KB

Attachment problems again. This should get us at least part of the way with the code style issues.

floretan’s picture

FileSize
7.75 KB
// Pass length to regexp callback.
_filter_url_trim(NULL, variable_get('filter_url_length_' . $format, 72));

Why pass a drupal variable as a parameter to _filter_url_trim()? We can simply access that variable from inside the function.

$chunks = preg_split('/(<.+?>)/i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
// Note: PHP ensures the array consists of alternating delimiters and literals
// and begins and ends with a literal (inserting $null as required).

This comment was copied from _filter_htmlcorrector(). $null should be NULL.
Also, when positioned after the line of code, the comment is not really clear. What array?
I suggest this variant (the same comment appears in other functions in filter.module, I changed those as well):

$chunks = preg_split('/(<.+?>)/i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
// Note: The return value of preg_split() consists of an array of 
// alternating delimiters and literals that begins and ends with a 
// literal (inserting NULL as required). 

In most of core, drupal uses / or ! to delimit regular expressions. The use of ` isn't wrong, but we might as well make it consistent with the rest of the code.

Finally, a PHP-esque

foreach (range(1, 3) as $task) {

would look better in this context than the C++esque

for ($task = 1; $task <= 3; $task++) {
catch’s picture

FileSize
7.75 KB

@flobruit: those changes all look good.

Found a couple more stray spaces.

chx’s picture

FileSize
7.63 KB

I have removed a bit of code duplication and simplified a bit.

catch’s picture

Priority: Normal » Critical
Status: Needs review » Reviewed & tested by the community

Much tidier, and it still fixes all the failures in filter.test

Since this is a blocker to all tests passing, marking to critical.

catch’s picture

Status: Reviewed & tested by the community » Needs work

Good old fashioned clicking and posting found:

* notice: Undefined offset: 1 in /modules/filter/filter.module on line 771.

hingo’s picture

Status: Needs work » Needs review
FileSize
6.29 KB

Guys, I'm sorry to say, but this "cleanup work" has broken quite many things. It was nice to see so many genuinly helpful inputs and I'm sorry I wasn't able to follow up instantly on changes you made. However, looking at this patch now, there are several things that have been broken, you ahve each contributed some of it. To simplify this, could we just go back to #62 which is the last working patch as far as I can see.

Things broken in the new patch:

* A crucial } else { has inexplicably disappeared, this is a major bug of course. (missing at line 77 in patch).

* Removing $length parameter from _filter_url_trim() has broken that function ($format is now undefined). Probably buggy at least for long URL's?

Further, I strongly disagree with moving the fetching of the length parameter into the function in the first place. _filter_url_trim() is a generic utility function which should just take an input and return an output, it shouldn't have any drupal stuff inside of it. Utility functions should be generic, fetching a drupal variable from db belongs to the main code.

* I don't agree that moving the patterns into an array and away from their actual regexps calls help readability at all. If you want to be clever, go all the way and make the switch-case statement go away completely (email task needs to be modified to use callback too), that would make code shorter and at least arguably easier to read.

* Minor: in the callback functions I find the new comment "+ // Find the parenthesis in the regexp containing the URL." misleading, since this is not about finding anything. My original comment was just supposed to be a description of what variable $i is, not any actual action.

***

So, back to main topic, attached is the patch from #62 which we need to get back to. This contains the whitespace fixes but none of the restructuring of code which all introduces bugs. (It contains the new comment in the callback functions I don't like as listed above in the last minor point, so that could be reverted in the next edit.)

I'm switching back to code needs review and we can continue discussion from where it was at #62. (ie, shouldn't be any bugs, you may have style opinions you want to re-introduce...)

I'm thankful for more help on this patch, but it is not tolerable to introduce real bugs while discussing style issues. Please be more careful and remember that you are editing a working patch.

chx’s picture

hingo, my patch removed the whole if ($i % 2 == 0) business because I found it ugly. It's not just an else disappeared -- I replace that with an $i++. I am not debating merely code style here. I was thinking on how to remove the switch, but it never came to me to replace that with a preg_replace_callback. Honestly: good idea. (Sigh, I wish we had lambda functions, would be useful here) What bugs did I introduce?

Can you reroll with the if ($i % 2 == 0) and the switch removed?

Edit: I agree wholeheartedly that introducing bugs into working patches is very frustrating. However, I am not introducing new features here, but trying to get a better code (in my eyes, better, at least). Really sorry if I broke your patch -- I know how it feels.

hingo’s picture

Status: Needs review » Needs work

@chx: I tested your patch, with the particular input I had in mind. You're almost right, cause you didn't break what I thought. However, you are to blame for the 3 new notices we get:

# notice: Undefined offset: 13 in /var/www/drupal7/modules/filter/filter.module on line 771.
# notice: Undefined offset: 17 in /var/www/drupal7/modules/filter/filter.module on line 771.
# notice: Undefined offset: 21 in /var/www/drupal7/modules/filter/filter.module on line 771.

(The fourth notice is because of the _filter_url_trim() change introducing a bug.)

This is because the $chunks array always ends with a text chunk, but in your patch the code always ends with trying to process a tag, so these notices are the PHP equivalent of ArrayIndexOutOfBounds complaints. It's not fatal (output actually is still ok) but it is obviously not acceptable either. In a more general description: Your trick doesn't work if there is an odd number of array elements, and in this case there often (always?) is.

I agree that the if($i % 2 == 0) thing may not be the nicest code either. What do you think about the following:
1) Keep my original code hierarcy
2) Replace my modulo stuff with a variable that takes text values "text" and "tag" instead.
I mean:

// $chunks now contains a series of text and tags (literals and delimiters)
// always beginning with text.
$chunk_type = "text";

....

for ($i = 0; $i < count($chunks); $i++) {
  if($chunk_type == "text") {

  ...

   // Done processing text chunk, so next chunk is a tag.
   $chunk_type == "tag";
  }
  else {

  ...

   // Done processing tag chunk, so next chunk is text.
   $chunk_type == "text";  
  }
}

What do you say? Btw, this introduces an "unnecessary" variable just for readability, is that considered good or bad in the Drupal community?

***

Going forward: I need to go to sleep now and will be on a business trip the whole next week. If nothing happens during that time, I will update the patch with the below changes. If someone is still bold enough, feel free to do these updates and continue the debate, then I'll come back a week from now and tell you what I think again.

FWIW, I've been a little cranky here earlier, I am in fact happy now that the last contributions have been addressing code and other substance, not just whitespace and grammar. (Note to self: check whitespace when done, I'd hate to be caught a fourth time for it!)

TODO, wrt patch in #68:

1) Agree with chx (and others) how to replace if($i % 2 ... and replace it

2) Make the switch statement go away completely. Email regexp processing needs to be made into using a callback function too, and then use the name of each callback function as $task (See $patterns array in#65). (This is another "clever" thing I'm still lukewarm about, but it will make code significantly shorter and less indented, so if we do it with good comments I hope it will increase readability. I really never liked that switch, have to admit.)

3) Do NOT make any changes to _filter_url_trim().

4) Revert the changed comment inside the callback functions so that it is a comment introducing variable $i. If there is need to comment further on what the function is doing, do that too.

5) Double check, but I believe all issues in #50 have been addressed. (Either included or rejected.)

No patch attached, I'll be back in a week (plus perhaps a few days).

hingo’s picture

Status: Needs work » Needs review
FileSize
7.41 KB

Attached patch implements changes as discussed in comment #70. There are no outstanding issues so patch is ready to be reviewed and committed.

catch’s picture

All filter tests pass, no notices. Not looked at code changes.

hingo’s picture

Just a short note to everyone following this thread. I have today committed the above version of URL filter into the Footnotes module, of which I'm the maintainer. You can now easily install and use a working version of URL filter by

  1. Download Footnotes 6.x-2.x-dev snapshot (it is safe, there is no wild development going on.
  2. Activate the Footnotes module
  3. Replace "URL filter" with "Better URL filter" everywhere
  4. Note: You can take advantage of the Better URL filter whether you use Footnotes or not.
  5. Note: Even though this is a Drupal 6.x module, it can be installed into earlier versions too.

Please see #281143: Better URL filter now available within Footnotes for more info. Please comment on Better URL filter in that thread and not here!

I hope this will resolve the standstill we have here and users who need a working URL filter can finally have it without letting this discussion hold them hostage for another year or two.

cwgordon7’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
7.47 KB

This causes core filters tests to pass, well done. I've looked through the code, and all looks good, except for one thing:

-  // Use +3 for '...' string length.
-  if (strlen($text) > $_length + 3) {
+  if (strlen($text) > $_length) {

That makes me very very very unhappy. I don't have the heart, though, to mark it back to cnw, so rerolled (just those lines changed) and happily rtbc'ing instead. :) Please don't credit me on commit.

cwgordon7’s picture

Please excuse mislabeled patch in previous comment. It is the correct patch, only mislabeled. :)

catch’s picture

I've patched and run tests with this, confirmed RTBC.

hingo’s picture

@cwgordon

I originally removed the +3 since I don't understand the logic of it. If anything, it should be -3.

Now, if user has set max length for URL's to be 72 (for instance), then we should cut all URL's that are longer than 72, not longer than 75. Even so, this is not a major issue for me, so I'm letting your patch stand. I'm just saying I don't understand it and I don't see it doing what the comment says it is doing.

catch’s picture

Title: URL filter breaks generated href tags » URL filter breaks generated href tags (and filter.test)

;)

cwgordon7’s picture

Bump.

dmitrig01’s picture

FileSize
8 KB

don't smashtogether variable names. This patch moves comments around, removes windows linebrakes, and renames some variables. no actual logic has changed.

catch’s picture

Status: Reviewed & tested by the community » Needs work

Patch adds an extra _filter_url_parse_full_links() rather than replacing it.

catch’s picture

#260484: Regression: Filter tests fail was committed, so those tests need to be rolled in with this patch.

hingo’s picture

Indeed, patch #80 doesn't seem to remove much lines at all, same problem also with the other *_parse_*_links() functions for starters. Very unhelpful.

As I'm the original proponent of this patch, I guess it is fair to announce publicly that what happened in #260484: Regression: Filter tests fail made me loose whatever little faith I had left in this project and I don't intend to lift another finger working on this patch anymore.

(The good news is I will release version 2.0 of Footnotes during the weekend and users can just enable a working URL filter (this patch) from there instead, as it will be included.)

catch’s picture

Status: Needs work » Needs review
FileSize
17.26 KB

I've gone through and re-rolled #74 with some of the comment and variable naming improvements in #80 + the filter tests themselves. Back to needs review so we won't just be pretending we have no bugs in core :p.

hass’s picture

Why are you not using "example.com" for tests?

catch’s picture

FileSize
16.79 KB

These tests were just rolled back from core this morning, I didn't change anything when rolling into this patch. Either way, fixed.

catch’s picture

FileSize
16.8 KB

Missed some.

catch’s picture

doppelpost

catch’s picture

FileSize
16.96 KB

Whoops: http://drupal.org/node/260484#comment-966252

re-rolled to get this back to a functioning state, although it'd be nice to clean this up later so it's not using non-legit domain names.

Damien Tournoud’s picture

Assigned: hingo » Damien Tournoud
Status: Needs review » Needs work

I'm tired of this, mine for a few hours.

catch’s picture

Status: Needs work » Needs review
FileSize
15.92 KB

Re-rolled #89 and marking back to needs review since it still passes its own tests. We should either apply this, or get expectedFail() in. More elegant solutions can come later, leaving stuff broken in core when there's a working patch with tests for this long is bad.

Status: Needs review » Needs work

The last submitted patch failed testing.

lilou’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

webchick’s picture

Status: Needs work » Needs review

Oh you silly, silly test bot.

Also, wow. Somehow this managed to stay off my radar for a very long time. I guess my RTBC queue sprints and this patch's CNW statuses were not well-timed at all. Really sorry for that, hingo. :( You might have better luck asking for reviews in IRC? There's usually a group of us hanging out and talking D7 at all hours of the day.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Marking back to RTBC per #74 and pending non-broken test bot. There's been no significant changes to the patch since then apart from re-rolls if you don't count going around in circles.

webchick’s picture

Status: Reviewed & tested by the community » Needs work

This no longer applies after committing #276597: TestingParty08: filter.module. Some tests for the URL Filter were added as a part of that patch, so the tests here will have to be re-worked slightly to fit in there. Sorry for that, but I had to make a judgment call and XSS filter testing seemed pretty important.

Here's the format for assertions used in the other patch:

+    $f = _filter_url('http://www.example.com/', 'f');
+    $this->assertEqual($f, '<a href="http://www.example.com/" title="http://www.example.com/">http://www.example.com/</a>', t('Converting URLs.'));
+
+    $f = _filter_url('http://www.example.com/?a=1&b=2', 'f');
+    $this->assertEqual($f, '<a href="http://www.example.com/?a=1&amp;b=2" title="http://www.example.com/?a=1&amp;b=2">http://www.example.com/?a=1&amp;b=2</a>', t('Converting URLs -- ampersands.'));
+

We should make this patch fit that as well. Having the entire contents of the test string in HEREDOC syntax is very odd. This exercise will also help reviewers to understand the specific issues being fixed by this patch, since right now it's not very clear.

Unsurprisingly ;), the patch itself conforms well to coding standards, and is very well documented, etc. I'm not positive about the logic therein, but if the tests prove it works, we can probably just commit it as is unless someone comes up with any alarming reasons why not to. Although before that, it would be good to have some baseline benchmarks just to ensure we're not increasing the time spent in this function by 400%.

sun’s picture

Only nitpick:

 /**
- * URL filter. Automatically converts text web addresses (URLs, e-mail addresses,
- * ftp links, etc.) into hyperlinks.
+ * URL filter. Automatically converts text web addresses (URLs, e-mail
+ * addresses, ftp links, etc.) into hyperlinks.

PHPDoc and comments should wrap at 80 chars - except the PHPDoc summary.

catch’s picture

Status: Needs work » Needs review
FileSize
8.96 KB

This fixes the comment issue sun noticed, and adds two of wrwrwr's failing tests from http://drupal.org/node/276597#comment-1196760

May or may not get to looking at adding more assertions.

Status: Needs review » Needs work

The last submitted patch failed testing.

webchick’s picture

Note: Henrik contacted me off-list and noted that one real advantage of the HEREDOC syntax is that you can test each of these conditions on "real world" data, rather than in isolation. I think that's a really good point.

Probably what makes the most sense though rather than embedding this data within the test itself is making a separate .html file in the modules/filter/tests/ folder that we can continue to add to as we come up with other crazy ways to break various filters.

sun’s picture

Would be awesome to get this finally done for D7.

The list of tags to exclude should also contain PRE.

catch’s picture

FileSize
7.65 KB

Untested re-roll.

catch’s picture

Status: Needs work » Needs review

Just to see how tests look, not really CNR 'cos this will need some tests adding back in.

Status: Needs review » Needs work

The last submitted patch, filter_url.patch, failed testing.

sun’s picture

Assigned: Damien Tournoud » Unassigned
+++ modules/filter/filter.module	29 Jan 2010 18:25:47 -0000
@@ -1058,24 +1058,100 @@ function _filter_url_settings($form, &$f
+  // List of tags - the content of which must be skipped.
+  $ignore_tags = 'a|script|style|code';

We also need to skip HTML comments.

Powered by Dreditor.

hingo’s picture

HTML comments are properly skipped. Not because of that line, but because of line 70 in the most recent patch.

+    // Split at all tags.
+    // This ensures that nothing that is a tagname or attribute will be processed.
+    $chunks = preg_split('/(<.+?>)/i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

It could be added to the preceding code comment of course, if wanting to be pedantic, but it does also exclude html comments.

kenorb’s picture

Title: URL filter breaks generated href tags » URL filter breaks generated href tags (and filter.test)
Assigned: sun » Unassigned
Priority: Major » Critical
Status: Closed (fixed) » Needs work
redndahead’s picture

Status: Needs work » Needs review
FileSize
8.78 KB

This patch fixes the failing tests, plus this doesn't seem right. I don't see this patch adding a variable and one doesn't exist at the moment. I have changed it. Tests now pass.

+++ modules/filter/filter.module	29 Jan 2010 18:25:47 -0000
@@ -1058,24 +1058,100 @@ function _filter_url_settings($form, &$f
+  _filter_url_trim(NULL, variable_get('filter_url_length_' . $filter->settings['filter_url_length'], 72));

Powered by Dreditor.

sun’s picture

Title: URL filter breaks generated href tags (and filter.test) » URL filter breaks generated href tags
Priority: Critical » Normal
Status: Needs review » Needs work

Note that there are quite a few other bug fix patches for URL filter in the queue. Looking at the changes of this patch, it may be worth to tackle the other issues first. Just an idea.

Also, this issue was bumped to critical in 2008. Filter tests pass in the meantime, so this is just a regular bug.

+++ modules/filter/filter.module	22 Apr 2010 18:31:36 -0000
@@ -1333,24 +1333,100 @@ function _filter_url_settings($form, &$f
+  // List of tags - the content of which must be skipped.
+  $ignore_tags = 'a|script|style|code';

Still seems to be missing PRE.

+++ modules/filter/filter.module	22 Apr 2010 18:31:36 -0000
@@ -1333,24 +1333,100 @@ function _filter_url_settings($form, &$f
+  // is to return the replacement for the match. The array is used and ¶

Trailing white-space.

+++ modules/filter/filter.module	22 Apr 2010 18:31:36 -0000
@@ -1333,24 +1333,100 @@ function _filter_url_settings($form, &$f
+  // Note: The ICANN seems to be on track towards accepting more diverse top level domains,
+  // so this pattern has been "future-proofed" to allow for TLD's of length 2-64.
...
+    // This ensures that nothing that is a tagname or attribute will be processed.
...
+    // Note: PHP ensures the array consists of alternating delimiters and literals
...
+    // closing tag is found. Until the closing tag is found, no replacements are made.
...
+          // There is an $ignore_tag open. See if this is a matching closing tag.

Exceeds 80 chars.

94 critical left. Go review some!

redndahead’s picture

Status: Needs work » Needs review
FileSize
8.82 KB

This integrates sun's comments.

hingo’s picture

@sun

Note that this patch will fix almost half of the other URL filter bugs since most of them are just various symptoms of what this patch fixes. So it certainly makes sense to take this first, then see how many other bugs you can immediately close. #190466, #168227, #191744, #728380, #555280... from the first page of the list should all go away. (This is also the reason this was marked critical, the old URL filter fails in quite many ways and this addresses the root problem.)

For tests passing, it is because tests were removed, not because bugs were fixed. Add back the tests from comment #91 and you'll see a different picture. (How critical the issues are is of course another thing, but as you can see, there are quite many bugs filed against URL filter that are various permutations of this one.)

PRE tag: Personally, I don't see a reason why URL's inside a PRE tag couldn't be made clickable, but don't have a strong opinion either way. I agree that a strict interpretation of "preformatted" means there should be no other formatting inside the tag.

Only thing to still change:

Please use the tests from #91. The current "one liner" tests will not actually catch many of the bugs this patch addresses. It is important to test on data that contains multiple strings to make clickable, and multiple rows, and multiple paragraphs. If HEREDOC syntax is frowned upon, please separate the big test data chunk into a separate file and use it from there.

On a quick reading the last committed patch is good and would pass all tests from #91, so committing it is ok. But since the tests were written, I wouldn't mind adding them to the test harness too. This seems to be an area where some contributors may easily break things.

PS: I'm glad to see this patch is finally getting some serious attention. I'm happy to help if there are any questions. Unfortunately I have been affected by the European flight chaos, and will now embark on a 5 day trip back home, but will be back online mid next week.

klausi’s picture

FileSize
17.3 KB

Same patch as in #111 plus reused tests from #91.

David Latapie’s picture

How would your patch compare to HTML Purifier? It lloks like we have at least three different tools to do the same thing (core URL filter, forked URL filter, HTML Purifier module). And I see no advantage in this.

Thanks in advance for your answer.

hingo’s picture

(This is based on a quick read of HTML Purifier code, not actually installing it.)

HTML Purify mainly does the same job as HTML filter (making HTML safe). But indeed, it contains also the "Linkify" class as some kind of addon service.

In general HTML Purify seems to be a rather complex beast comparing with Drupal's very simple core modules for doing the same thing. Also it is (probably?) based on actually validating the HTML with an HTML parser (correct, but slower), whereas Drupal filters tend to just use some regexps to get their job done (good enough if done right, faster).

As for the URL filter vs Linkify comparison, it is surprising to note that my patch here addresses more URI types and the Linkify class in HTML Purifier turns out to be rather simple. In fact also the original URL filter in Drupal core does more than the one in HTML Purify. This can probably be explained by the fact that this is not the main task of HTML Purify, so the implementation has been left as rather simplistic.

HTML Purify will only convert into links strings beginning with
http://|https://|ftp://

This patch is more complete and will catch all of:
http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://

Also the original Drupal URL filter was more versatile than the one in HTML purify.

In addition both this patch and the old URL filter also convert email addresses (...@...com) and "plain" URL's without the "http://" (www.example.com or ftp.example.com).

In summary, Drupal's URL filter does more than the one in HTML Purify, and is (very probably) simpler and faster.

(And to those just tuning in: there is no difference with this patch vs the forked URL filter, they are the same. The difference between this patch and the old Drupal URL filter is that the old filter is rather buggy, it will miss many URL's that should be converted (especially if other HTML is involved) and it will mangle some places it shouldn't touch.)

hingo’s picture

Priority: Normal » Critical

I didn't notice earlier that this was downgraded from critical in #110. As I explain in comment #112, the statement that "tests pass" is not correct and the argument that there are many other bugs in the queue is questionable too, since this patch will fix about half of those bugs! I apologize for not setting back to critical at that point, I was stuck in an airport somewhere with bad connectivity.

If there is some other reason not to regard this as critical, feel free to revisit, I'm just saying the earlier downgrade seems to have been done based on wrong assumptions.

redndahead’s picture

#113: 161217-url-filter.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 161217-url-filter.patch, failed testing.

sun’s picture

Priority: Critical » Normal

A bug in one of Filter module's filters is very very unlikely to be more important than normal. That is, because you can replace any filter with one of a contributed module, and D7 makes this even easier (e.g., swapping out the processing callback only).

Anyway, let's fix those tests. Will try to do a fresh review asap.

sun’s picture

Priority: Normal » Major

ok, talked to hingo. Given that this patch seems to fix a full range of other open bug reports filed against URL filter currently, promoting to major.

Ideally, we should write (only) tests for those other issues to show that they fail.

+++ modules/filter/filter.module
@@ -1333,24 +1333,104 @@ function _filter_url_settings($form, &$form_state, $filter, $format, $defaults)
+  // Create an array which contains the regexps for each type of link.
...
+  $tasks = NULL;

Why do we initialize a variable with NULL that's supposed to be an array?

+++ modules/filter/filter.module
@@ -1333,24 +1333,104 @@ function _filter_url_settings($form, &$form_state, $filter, $format, $defaults)
   // Match absolute URLs.
...
+  $replacement = "`($url_pattern)([\.\,\?\!]*?)`i";

I don't think we need to escape .,?! here. Also, I wonder why *? instead of just ? ? Lastly, we don't need that matched value, so this should cut it:

$replacement = "`($url_pattern)(?:[.,?!])?`i";

Or am I missing something? If I do, then the code is missing an inline comment.

+++ modules/filter/filter.module
@@ -1359,20 +1439,39 @@ function _filter_url($text, $filter) {
+  return '<a href="' . $match[$i] . '" title="' . $match[$i] . '">' . $caption . '</a>' . $match[$i+1];

IIRC, the (duplicating) title attribute has been removed by another patch in the meantime.

+++ modules/filter/filter.test
@@ -1059,6 +1059,114 @@ class FilterUnitTestCase extends DrupalUnitTestCase {
+$text=<<<END
+Testing wwwstring with period at end www.example1.com. Testing email with period at end person@example2.com. Testing HTTP URL with period at end http://www.example3.com. Also test < code>using www.example4.com the code tag< /code>.
...
+    $this->assertTrue(strpos($filtered_text, 'href="http://www.example1.com"'), t('Parse simple www-string but not the end-of-sentence period.'));

Some phrases in this testing text sufficiently explain what is being tested and expected, but some of them do not. We should make sure that expectations are sufficiently explained.

Alternatively, and I think I'd almost prefer it that way, we could build an array of testing strings and assertions to keep them together, and implode them afterwards, like so:

  $tests = array(
    'Testing wwwstring with period at end www .example1.com.' => array(
      'href="http://www.example1.com"' => TRUE,
      'href="http://www.example1.com."' => FALSE,
      '<a href="http://www.example1.com">http://www.example1.com</a>.' => TRUE,
      '<a href="http://www.example1.com">http://www.example1.com.</a>' => FALSE,
    ),
    // ...
  );
  $string = implode(array_keys($tests), "\n");
  // ...
  foreach ($tests as $tasks) {
    foreach ($tasks as $value => $expected) {
      // Not using assertIdentical, since combination with strpos() is hard to grok.
      if ($expected) {
        $this->assertTrue(strpos($string, $value) !== FALSE, t('@value found in @string.', array(
          '@value' => var_export($value, TRUE),
          '@string' => var_export($string, TRUE),
        )));
      }
      else {
        $this->assertTrue(strpos($string, $value) === FALSE, t('@value not found in @string.', array(
          '@value' => var_export($value, TRUE),
          '@string' => var_export($string, TRUE),
        )));
      }
    }
  }

As you can see, this allows to cleanly define a single test string and test multiple expectations. For the single period (punctuation) test string, we can assert at least 4 things. Larger/multiline testing strings are also no problem.

It might make sense to put the actual testing code into a helper method. So we can build separate $test task lists for individual filter functionality, i.e. $test_absolute, $test_mail, $test_partial, or rather, any separation that would make sense. Technically, I think one could separate by topics characters/encoding, surrounding markup, security, and potentially more - splitting assertions up into such sections probably makes more sense.

In a separate follow-up issue, we should convert most of the existing filter unit tests to use this construct, as we have a lot of awkward assertion messages there.

Powered by Dreditor.

hingo’s picture

Hi sun, and thanks for the additional review. (Would've forgotten about the title attribute otherwise, is *that* checked by a test? :-)

So just for the public record, we did a thorough review and out of 9 open bugs, following are fixed by this patch:

#168227: URL filter fails on URLs surrounded by HTML entities
#877050: URL filter does not convert a link inside a div
#555280: Filter "Convert URLs into links" converts URLS within inline css background-image property

I can work on this during next week.

A few questions to sun:

1)
It would be easy to also fix
#190466: URL filter fails on square brackets
#728380: URL filter fails on parenthesis (RFC2396 Unreserved Characters; i.e., ( and ))
...should I alter the regexps here to cover entire unicode alphabet, or do you want to do that in a separate issue?

2)
When those other bugs have test cased posted, but they are not in D7 CVS yet, should I copy them into this patch?

I'm willing to do both if you answer yes.

sun’s picture

Status: Needs work » Needs review
FileSize
17.49 KB

Re-rolled against HEAD.

1) Let's deal with unicode separately. 2) Let's keep those other issues open, get this one in, and revisit the other issues afterwards.

Status: Needs review » Needs work

The last submitted patch, drupal.filter-url.122.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
16.57 KB

Reworked the tests. Clarified some comments.

The remaining test failures are for real.

Status: Needs review » Needs work

The last submitted patch, drupal.filter-url.123.patch, failed testing.

sun’s picture

Assigned: Unassigned » sun
Status: Needs work » Needs review
FileSize
23.98 KB

Fixed those regular expressions. I think this patch is ready to fly now. Anyone available for review + potential RTBC?

The new tests are entirely replacing the old tests now, as they're much more precise. I've taken extra care and double-checked that all of the existing assertions are indeed covered by the new ones.

Also, I had to mostly inline similar changes as already proposed in #257193: URL filter: Use filter_allowed_protocols variable, since my new tests tried to assert that a link using gopher:// gets converted, but apparently, that hard-coded list of protocols was totally outdated, and thus, check_url() replaced those invalid bad protocols with http:// ;)

sun’s picture

FileSize
24.14 KB

Additional took over a useful @see and assertion from my original patch over in #257193-5: URL filter: Use filter_allowed_protocols variable

hingo’s picture

Hi sun

Good work pushing this forward and getting the patch to apply again. Here's a review. I have errands today, but can work on these myself too if nobody else does it before me.

+  // Prepare protocols pattern for absolute URLs.
+  // check_url() will replace any bad protocols with HTTP, so we need to support
+  // the identical list. While '//' is technically optional for MAILTO only,
+  // we cannot cleanly differ between protocols here without hard-coding MAILTO,
+  // so '//' is optional for all protocols.
+  // @see filter_xss_bad_protocol()
+  $protocols = variable_get('filter_allowed_protocols', array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal', 'rtsp'));
+  $protocols = implode(':(?://)?|', $protocols) . ':(?://)?';
+

As you note in #257193: URL filter: Use filter_allowed_protocols variable, / and : are valid characters anywhere in the URL. So we could simply omit the addition of (?://)? and allow those characters liberally in the next part of the URL. The current regexp doesn't however allow them. So we would need to back to the plan of allowing most unicode characters anywhere in a URL. It can either be done right now, or we let the above stay as it is and then remember to remove it once we change the other regexp so that the above (?://)? isn't needed anymore.

(For clarity, the above needn't necessarily be fixed in this path.)

+  $domain = '(?:[A-Za-z0-9._+-]+\.)?[A-Za-z]{2,64}\b';
+  $ip = '(?:[0-9]{1,3}\.){3}[0-9]{1,3}';
...
+  $url_pattern = "[A-Za-z0-9._-]+@(?:$domain)";

Good call to add $ip (whoever did it).

Same comment here: All of this will be much simpler if we in the future change this regexp to something that just matches a string of unicode characters until next whitespace. (Nothing to change here.)

From #120 above:

Also, I wonder why *? instead of just ? ? ... Or am I missing something? If I do, then the code is missing an inline comment.

The *? is supposed to catch multiple trailing punctuation such as www.example.com... or (blablabla www.example.com.) or example.com!!! Note that it may not always repeat the same character, eg .) must match too.

+  $pattern = "`($url_pattern)`i";
+  $tasks['_filter_url_parse_email_links'] = $pattern;

It seems the email task doesn't contain that regexp at all so me@example.com. wouldn't be matched correctly. (This omission exists already in my patches and original code.)

+    // Split at all tags; ensures that no tags or attributes are processed.
+    $chunks = preg_split('/(<.+?>)/i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

The above code correctly ignores also multi-line comments. A separate test for this could be added though, or rather just add \n to the existing test.

That is all for the functional code. I have a meeting now, will also review test code separately (look forward to what you've done :-). Hopefully tonight.

hingo’s picture

Heh, funny to watch how drupal.org itself behaves when I give examples of strings that should or shouldn't be converted.

In this case, this comment:

It seems the email task doesn't contain that regexp at all so me@example.com. wouldn't be matched correctly. (This omission exists already in my patches and original code.)

...is wrong, it works correctly for other reasons. Please ignore.

sun’s picture

Thanks for reviewing, hingo!

/ and : are valid characters anywhere in the URL. So we could simply omit the addition of (?://)? and allow those characters liberally in the next part of the URL. The current regexp doesn't however allow them.
[...]
All of this will be much simpler if we in the future change this regexp to something that just matches a string of unicode characters until next whitespace.

I may be mistaken, but this is not a valid domain name:

$domain = 'foo/bar:baz.com';
$url = 'http://' . $domain;
# var_dump($url);
(string) 'http://foo/bar:baz.com'

We need a separate pattern for $domain, so as to be able to properly match valid domain names and not match invalid domain names. At minimum, there are domain name length restrictions we have to account for, but also, not all characters can be used in all parts of a URL.

For now, if this is debatable, I'd recommend to move detailed discussion about protocols into #257193: URL filter: Use filter_allowed_protocols variable

Good call to add $ip (whoever did it).

Yeah, I had to get those tests passing. ;)

The *? is supposed to catch multiple trailing punctuation such as www.example.com... or (blablabla www.example.com.) or example.com!!!

That's what I figured, too. However, we are not interested in further trailing text; just want to optionally exclude a certain (single) trailing punctuation character from the URL, if there happens to be one.

I agree that e-mails should get that punctuation treatment, too, and we should move the pattern into a new common $punctuation variable.

The above code correctly ignores also multi-line comments. A separate test for this could be added

Good catch! Let's add a separate string test for multi-line comments then. Also not sure whether we should add another containing HTML markup within a comment.

sun’s picture

FileSize
24.51 KB

Alright, incorporated all points from #130.

Problem: URLs in multi-line comments are not ignored. @hingo: Can you fix that?

Status: Needs review » Needs work

The last submitted patch, drupal.filter-url.131.patch, failed testing.

hingo’s picture

I'm not at my home computer at the moment, so cannot test, but since you are so active on this, a few quickie comments from my head. (...which may be wrong).

Problem: URLs in multi-line comments are not ignored. @hingo: Can you fix that?

I can take a look later, but more importantly I actually tested a multi-line comment before I wrote my previous review and it worked correctly. I only had one newline, is your problem with more than that?

Adding yet another test with html markup in comments is a very good idea. (That may actually break, I realize now! The URL should be after the html...)

The *? is supposed to catch multiple trailing punctuation such as www.example.com... or (blablabla www.example.com.) or example.com!!!

That's what I figured, too. However, we are not interested in further trailing text; just want to optionally exclude a certain (single) trailing punctuation character from the URL, if there happens to be one.

Yes, but now if you have such multicharacter punctuation, it stops on the last punctuation character, not the first? Ie it produces <a href="http://www.example.com..">www.example.com..</a>.

I agree that e-mails should get that punctuation treatment, too, and we should move the pattern into a new common $punctuation variable.

Turns out emails never needed them, since they always end in a letter anyway. (like the m in .com)

Your comment about domain names is correct. But I was thinking there could be other protocols than http which don't have a domain name at the start. Even in mailto, I could see someone having "//@example.com" as their email address (yeah, probably illegal, need to check)

hingo’s picture

Status: Needs work » Needs review

Looked at the failed test now. I think a multiline comment works, it's your p tag that breaks it. The problem is the > character of the p tag matches here:

$chunks = preg_split('/(<.+?>)/i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

My first reaction is that we need to handle html comments as a special case. Probably the cleanest solution is to add a step that escapes all html inside comments, like some filters do in the $op="prepare" phase.

sun’s picture

Adding yet another test with html markup in comments is a very good idea. (That may actually break, I realize now! The URL should be after the html...)
...
My first reaction is that we need to handle html comments as a special case.

If I'm not mistaken, then #222926: HTML Corrector filter escapes HTML comments solved a similar problem for the HTMLCorrector filter in D6 (where it still was regex-based). We may borrow a solution attempt from over there.

Yes, but now if you have such multicharacter punctuation, it stops on the last punctuation character, not the first? Ie it produces www.example.com.

Let's assert this in a test. All of our expectations have to be asserted.

But I was thinking there could be other protocols than http which don't have a domain name at the start.

Let's not overcomplicate things here. ;)

hingo’s picture

Here is a review of the tests, finally. As before, if nobody else is faster, I will also implement these myself asap.

Since I haven't really looked at these tests before, I'm reviewing the filter.test file itself, not just the most recent patch.

Line 996

    // Filter selection/pattern matching.
    $tests = array(
      // HTTP URLs.
      '
http://example.com or www.example.com
' => array(
        '<a href="http://example.com">http://example.com</a>' => TRUE,
        '<a href="http://example.com">example.com</a>' => FALSE,
        '<a href="http://www.example.com">www.example.com</a>' => TRUE,
        '<a href="http://www.example.com">http://www.example.com</a>' => FALSE,
      ),

There is no reason to test both positively and negatively the same test. If the first test passes, then the second test automatically passes. Same for 3 and 4.

Also it is in general not a very useful approach to test some specific ways things could go wrong, since there are always a million ways things go wrong. We should just assert that the result is correct and that's enough. If it is known to be correct, we also know it is not wrong.

In other words this is sufficient:

    // Filter selection/pattern matching.
    $tests = array(
      // HTTP URLs.
      '
http://example.com or www.example.com
' => array(
        '<a href="http://example.com">http://example.com</a>' => TRUE,
        '<a href="http://www.example.com">www.example.com</a>' => TRUE,
      ),

This comment applies to many of the following tests too.

Line 1050:

      // Absolute URL protocols.
      '
https://example.com,
ftp://ftp.example.com,
news://example.net,
telnet://example,
irc://example.host,
ssh://odd.geek,
sftp://secure.host?,
webcal://calendar,
rtsp://127.0.0.1,
not foo://disallowed.com.

Maybe add cross referencing comments here and line 1325 in filter.module so that one remembers to update the test if the list of supported protocls changes.

Good that you also test for a not supported protocol.

Line 1078:

    // Surrounding text/punctuation.
    $tests = array(
      '
Partial URL with trailing period www.partial.com.
E-mail with trailing period person@example.com.
Absolute URL with trailing period http://www.absolute.com.
Query string with trailing period www.query.com/index.php?a=.
'

Need to add back examples with multiple trailing punctuation characters from comment #128.

Line 1194:

      '
<!--
Skip any URLs like www.comment.com in comments.
<p>Also ignore http://commented.out/markup.</p>
-->

Learning from your previous comment, this test is ambiguous since when it fails you don't know if it fails on the HTML tag or because of the newlines. My favorite solution would be to also add newline to the previous test. Alternatively remove newline here and have a separate third test for it.

Ok, so now we have good test coverage.

From the email discussion we had, the approach you showed in building tests suggests it would be possible to easily test all HTML tags by iterating over an array that contains them. (You seem to have further developed that approach here so now it's slightly different.) Now we just test for DL list, P and some other random picks. But for me it is perhaps better to postpone that exercise to a separate issue and close this with the tests we have.

Even if I've already objected to it many times, you have now removed the one test I actually provided. The current tests do cover the various issues well in isolation, however it is also important to test on a larger document that combines many of the cases into one document. With the work you have done, it needn't be as complex as the one I did earlier, but I want to add back one test which would start with the following comment:

// It is important to also test on a more complex document than just short one 
// liners testing one function in isolation.  Note that common regexp errors 
// include: accidental (greedy)* match instead of (minimal)*? match, only 
// match first occurrence rather than all, and newlines not matching .*.
// Additionally early versions of URL filter had bugs in cases where you have
// combinations of the 3 types of URLs to convert.
// This test covers:
//  * Document with multiple newlines and paragraphs (two newlines).
//  * Mix of several HTML tags, invalid non-HTML tags, tags to ignore and
//    HTML comments.
//  * Empty HTML tags (br, img)
//  * Mix of http://... email@... and www... strings in same document.

PS:

But I was thinking there could be other protocols than http which don't have a domain name at the start.

Let's not overcomplicate things here. ;)

That's why I'm interested in the unicode support: the reason that is a tempting path to explore is that the regexps would be much simpler, like ($protocol:\S)$punctuation*?\s

sun’s picture

FileSize
24.43 KB

Fixed those tests. I need you to fix the multiline comments, not the tests. :)

Status: Needs review » Needs work

The last submitted patch, drupal.filter-url.137.patch, failed testing.

hingo’s picture

Status: Needs work » Needs review
FileSize
1.16 KB
28.87 KB

Ok, here's a patch that
- reverts email handling to not use the punctuation stuff (I tried to say so in #129)
- includes fixes for the tests from my previous comment, including the test with large document
- adds back support for multiple trailing punctuation (and tests)
- correctly handles commented out HTML
- since I needed to touch the code anyway, finally fixes _filter_url_trim(). See http://drupal.org/node/260484#comment-966659

The code added to handle HTML comments definitively needs review.

I didn't use anything from #222926: HTML Corrector filter escapes HTML comments since it seems they don't have a working patch themselves. Maybe they can borrow my solution :-)

(Hmm... I cannot add new files with a diff, so I'm attaching a tar file separately that contains a new test/ directory.)

hingo’s picture

Note that automated test will obviously fail now that 2 new files are missing from the patch. Tests naturally pass on my own machine.

Status: Needs review » Needs work

The last submitted patch, drupal.filter-url.137.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
33.99 KB

Yet another reason to put .test files into a /tests subdirectory... grml.

Status: Needs review » Needs work

The last submitted patch, drupal.filter-url.141.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
36.21 KB

Fixed those tests. Not entirely happy with the new test document -- takes hours + lots of debugging tools/code to find the actual cause for failing assertions. But we can go with this for now.

hingo’s picture

Line numbers refer to the latest patch, not the source file:

Line 230:

@@ -1350,7 +1506,7 @@ function _filter_url_trim($text, $length
   }
 
   // Use +3 for '...' string length.
-  if (strlen($text) > $_length + 3) {
+  if ($_length && strlen($text) > $_length + 3) {
     $text = substr($text, 0, $_length) . '...';
   }

Line 552:

+    $filter->settings['filter_url_length'] = 20;
+    $tests = array(
+      'www.trimmed.com/d/ff.ext?a=1&b=2#a1' => array(
+        '<a href="http://www.trimmed.com/d/ff.ext?a=1&amp;b=2#a1">www.trimmed.com/d/ff...</a>' => TRUE,
+      ),
+    );

Do you want to elaborate on why you have switched back to the broken version of _filter_url_trim()? In #141 the test was correct and the code was wrong. Now you have changed the code to match the test and they are both wrong.

In particular, you set filter_url_length to 20 characters, but now it cuts after 23 characters. (For instance, try with a URL with 22 characters and you see what I mean.) This is a minor issue, I'm just curious why you took the extra effort to remove my correct code?

**

FWIW, I'm ok with changes to _filter_url_escape_comments(). Using boolean variable doesn't semantically contain information about everything that the argument does, but I guess it saves so many bytes it is ok.

736:

Index: modules/simpletest/drupal_web_test_case.php
===================================================================
RCS file: /cvs/drupal/drupal/modules/simpletest/drupal_web_test_case.php,v
retrieving revision 1.226
diff -u -p -r1.226 drupal_web_test_case.php
--- modules/simpletest/drupal_web_test_case.php	22 Aug 2010 15:31:18 -0000	1.226
+++ modules/simpletest/drupal_web_test_case.php	22 Aug 2010 20:23:44 -0000

This is probably here by mistake? Also, it just moves a function a little bit, doesn't seem to do anything.

sun’s picture

1) Sorry for reverting that URL length trimming change. I don't think it was mentioned in this issue yet, so I wasn't aware that it really belongs to this patch. I'd prefer to move this fix into a separate issue.

2) The verbose() method is moved from DrupalWebTestCase into the base class, DrupalTestCase, as it cannot be used in unit tests currently. To be able to debug URL filter's processing of the entire document, verbose() is needed, as that's the only way to look at the actual results.

So it looks like we're done here?

hingo’s picture

1) Ok. It is a different issue and can be handled separately. I mentioned it in #139. I'm ok with taking it away, just wanted to raise a flag since you reverted it without commenting.

2) If it was intentional, I have no comment, I'll focus on filter.module only.

+1 for #144 then.

sun’s picture

Status: Needs review » Reviewed & tested by the community

I guess you meant RTBC.

sun’s picture

This patch blocks various other issues.

Dries’s picture

Instead of calling those files filter.url.input.test.txt, can't we call them filter-url-input-test.txt?

sun’s picture

FileSize
160.31 KB

Since that is the only review issue, here's a proper patch. Don't forget to:

cvs rm modules/filter/filter.test
cvs add modules/filter/tests/filter.test
cvs add modules/filter/tests/filter.url-input.txt
cvs add modules/filter/tests/filter.url-output.txt
webchick’s picture

Status: Reviewed & tested by the community » Needs work

We need to back out moving modules/filter/filter.test to modules/filter/tests/filter.test. That's inconsistent with every other module in core, and has nothing to do with fixing this problem.

sun’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
35.4 KB

That's not true. We have approx. the same amount of .test files directly in the module's folder as in a tests/ sub-folder. Additionally, having all testing stuff together in one location is only beneficial. Basically eliminates the t() in test assertion messages issue.

Anyway.

redndahead’s picture

Just to support Sun's claims these modules use "tests" directories.

Aggregator
Block
Field
File
Image
Locale
Node
Openid
RDF
Search
Simpletest
Trigger
Update

sun’s picture

Let's discuss that detail in a separate issue. Latest patch no longer moves filter.test into the tests sub-directory. But of course, the testing text files have to be in there.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks!
Let's talk about the location of tests files in a seperate issue indeed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

redndahead’s picture

Title: URL filter breaks generated href tags (and filter.test) » URL filter breaks generated href tags
Assigned: Unassigned » sun
Priority: Critical » Minor
Status: Needs work » Closed (fixed)

grumble grumble spammers.

ckng’s picture

This has not been backport to D6, open as new issue #1955590: URL filter breaks generated href tags (D6)

ckng’s picture

Issue summary: View changes

Care home Swadlincote

sun’s picture

Issue summary: View changes

Copy of the revision from April 2, 2011 - 13:12.