The freelinking filter seems to choke on embedded URLs that have CamelCase text. For example, if I try to create a link to the CamelCase Wikipedia article by entering the text below:

For more information on WikiWords and CamelCase refer to the Wikipedia article

It tries to filter the word CamelCase in the URL and screws up the link in the output.

How do I prevent this?

Comments

eafarris’s picture

Assigned: Unassigned » eafarris

This is fixed (well, kinda) in HEAD: You can disable CamelCase linking. Still looking for the perfect way to leave URLs alone.

Roland Tanglao@bryght.com’s picture

isn't it just a "simple matter" :-) of disabling CamelCase detection and auto-generation of links within HTML code i.e. link tags?

moggy’s picture

Version: 4.6.x-1.x-dev » master
Category: support » bug
Status: Active » Needs review
StatusFileSize
new1.42 KB

Ok, this patch should help

I opted for fixing the damage caused rather than trying to prevent it, so after all the wikiwords are linked in, it then removes nested instances of [code]

Seems to work, could someone check it out, and perhaps look at the regex? I'm not too hot with it :-s

side effect is that links are stored in the database, then removed, so appear to never have existed.

eafarris’s picture

That patch doesn't appear to work in my testing.

moggy’s picture

StatusFileSize
new1.42 KB

ooops. told you I wasn't that good with regex :D

try this one.

eafarris’s picture

Committed an alternative fix to HEAD.

PHP.net's on-line documentation for the PCRE syntax has an example of matching inside and outside of HTML tags. With a slight modification, this seems to do the trick.

Please give this a good test and update the status here as necessary. Many thanks, moggy, for all your work on this issue.

moggy’s picture

very neat.

It seems to work fine, but you do end up with it recognising wikiwords in urls, not replacing them but still storing them in the database. I fixed that by adding:

      if (!preg_match($pattern, $text)) {
        continue;
      }
      else {
          ...

after $pattern = '/(?![^<]a.*?>)\b' . $wikiword . '\b/';

the patch I've just submitted for #26412 includes this.

moggy’s picture

I'm having problems with the following links, do they work properly for you?

CammelCase link
[[LinkAgain]]
eafarris’s picture

Status: Needs review » Active

Umm. No. Very strange indeed. What's going on here?

moggy’s picture

Could the problem be this line with the arrays the wrong way round?

 $wikiwords = array_merge($ccmatches[0], $flmatches[0]);

BTW: Found another link that's not right:

<a href="/testing/CammelCase">not a freelink</a> relative with cammel case at end
moggy’s picture

think I've got it.

Pattern should be
$pattern = '/\b' . $wikiword . '\b(?![^<]*>)/';

not
$pattern = '/(?![^<]a.*?>)\b' . $wikiword . '\b/';

I think this is why

foo(?!bar) matches any occurrence of "foo" that is not followed by "bar". Note that the apparently similar pattern (?!foo)bar does not find an occurrence of "bar" that is preceded by something other than "foo"; it finds any occurrence of "bar" whatsoever, because the assertion (?!foo) is always TRUE when the next three characters are "bar". A lookbehind assertion is needed to achieve this effect.

eafarris’s picture

Status: Active » Fixed

Yep, that works perfectly in my testing. Committed to HEAD and 4.6. Many thanks for the assistance and testing!

Anonymous’s picture

Status: Fixed » Closed (fixed)