I suspect this is caused by: #1763696: Paths outside of Drupal root incorrectly detected .

It seems that _somehow_ users on the system ended up getting newlines inside of the href link.
For example:

<a href="/sites/all/files/hello_world.pdf
">Hello World</a>

The files are in fact local files, but because the path is invalid (due to newlines), the local file test improperly fails.

A temporary solution is to strip out newlines, but I imagine there are lots of other characters that will break, depending on the OS and file system in use.

Comments

thekevinday’s picture

It seems to happen with htmlspecialchars_decode();.
This means that the following issue also contributes to the regression: #1672932: Character double-encoding/erroneous rawurlencode().

Attached is a workaround patch for newlines only.

Garrett Albright’s picture

I'm confused. How are the newlines getting in there? Are you saying htmlspecialchars_decode() is inserting them, or that they were in the content originally? If the former, that's very strange, but I'll work around it. If the latter, then I don't understand why you think this is Pathologic's fault. If something else, please explain.

thekevinday’s picture

Beats me, thats why my patch is a workaround and not a solution.

I investigated this further and it turns is not htmlspecialchars_decode() but instead parse_ur().
(Which surprised me even more.)

According to the php documentation (http://us.php.net/manual/en/function.parse-url.php)

The URL to parse. Invalid characters are replaced by _.

This means that the newline introduced into the content is converted into an '_'.

The newline could have easily come from a number of reasons, such as:
- user did not use the wysiwyg and manually pressed enter in the html markup.
- the wysiwyg is set to hard-wrap lines longer than 80 characters and part of the link went beyond 80 characters.

In the case of an editor wrapping newlines, it seems that it is possible to write code like the following that works in browsers:

<a href="W
T
F
.ht
ml">WTF
</a>

Which is shown here:
T
F
.ht
ml">WTF

(The drupal website seems to have a filter that converts newlines to breaks, making things even weirder.)
In raw/unfiltered HTML, the browsers I've tested in all ignore the newline characters.

Garrett Albright’s picture

Status: Active » Closed (won't fix)

Okay, well, since this isn't Pathologic's bad, I'm won't fix-ing this.