Sometimes my users upload files with spaces in their filenames. The link works fine on the page because the browser automatically converts them to %20. But the Broken Links report shows them as 404 - Not Found errors. So i have to go through and uncheck the option to Check this link.

Is it possible for Link checker to account for spaces in the filename so that is doesn't highlight these as broken links?

thanks
Steve Hutchison
Brisbane, Australia

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hass’s picture

I strongly suggest you to run transliteration module. This may solve the issue and fix your invalid links automatically. Can you try it, please?

I'm not sure if this is really a bug... Maybe it's just tooo nitpicking about broken uri's.

dingo’s picture

Hi Hass - thanks for responding.

OK, I've just had a look at the Transliteration module. It's designed to handle non-ASCII characters in filenames such as Greek and Russian script. The space character is an ASCII character. So I don't think its an ideal solution.

It's pretty common for users to want to have spaces in their filenames, and since Drupal allows you to upload file attachments which include spaces, i don't think its being too nitpicking to say this shouldn't come up as a broken link.

I'm no Drupal module developer so maybe this is too simplistic, but couldn't there be a line of code added? something like:
$link = str_replace($link, " ", "%20");

I looked at trying it myself but I wasn't sure where it should go.

thanks for your help,
Steve

hass’s picture

Than it's pathauto module... One of it replaces common strings. Aside I'm not sure if your backup software can handle cyrillic :-).

I will look into it, I know where it need to go (link extract process).

dingo’s picture

i did some research and the Tranliteration module does actually convert spaces to underscores when the user uploads a file, in addition to translating non ASCII script. It feels like overkill but there doesn't seem to be anything else does it so I'm going with it.

thanks

hass’s picture

Status: Active » Postponed

Fix need to wait until after #1441574: Port D6 access bypass bugfixes to D7 has been committed.

hwasem’s picture

In case it helps...I'm now having this problem after an upgrade of Link Checker 2.4 -> 2.5 and Path Auto 1.5 -> 1.6 in March. File names with spaces were not an issue prior to that.

aaron1234nz’s picture

FileSize
463 bytes

This patch appears to have solved the issue for me.

hass’s picture

This is the wrong line to fix this issue and you have only fixed nodes.

hass’s picture

Version: 6.x-2.5 » 7.x-1.x-dev
Status: Postponed » Postponed (maintainer needs more info)
FileSize
920 bytes

I've made a quick test with a link http://example.com/foo%20bar/ and this one is extracted correctly as http://example.com/foo%20bar/. It shows that the problem is not really linkchecker itself as the URL in the content is for sure invalid. Spaces are not allowed in urls. See http://stackoverflow.com/a/497972. These invalid urls also do not pass valid_url().

Patch attached, but I feel a bit reluctant to add these type of workarounds to module code. It may gives you a picture that everything is good, but it isn't really.

How are these urls created? Are these ones created by "insert" modules?

hass’s picture

Title: Space in filename incorrectly triggers 404 » Space in filename triggers 404
hwasem’s picture

In my case the files were uploaded with the Upload Module (part of core).

The link is displayed as .../files/Facility%20Closures.pdf, for the file name "Facility Closures.pdf".

hass’s picture

Cannot repro with an image and file upload. Both URLs have spaces encoded properly.

<img typeof="foaf:Image" src="http://example.com/drupal7/sites/default/files/picture-52_0%20-%20copy.png" width="200" height="200" alt="">

<a href="http://example.com/drupal7/sites/default/files/foo%20bar.TXT" type="text/plain; length=4">foo bar.TXT</a>

hwasem’s picture

I just noticed the version was changed since I posted. I'm on Drupal 6. Does that make a difference?

hass’s picture

I'm not sure how I can repro this issue. Is one of you able to share a public link where I can review the source code, please? It looks like http://api.drupal.org/api/drupal/includes!file.inc/function/file_directo... also uses drupal_urlencode(). Codewise I see no reason why there shouldn't be no %20 in an url.

hass’s picture

Version: 7.x-1.x-dev » 6.x-2.x-dev

If you guys like to see this fixed in the soon upcoming release I need to be able to repro.

hwasem’s picture

Sorry for the delay. I'm not clear on what you need as far as source code. Do you just want a page that has a link that reports as broken in link checker but works?

The volunteer application links at the bottom here work, but are showing up in the link checker report as 404 Nof Found. http://www.midcolumbialibraries.org/about-mcl/volunteer

hass’s picture

Thanks for this link. It clearly shows that the links are invalid per RFC standards. Spaces are not allowed in links.

How are these links created?

  • You have uploaded a file with spaces to Drupal? Drupal core upload or any other module?
  • How are the files embedded into the content? WYSIWYG Editor, Insert module, any other?

If last, than the WYSIWYG is broken. We could workaround these bugs by encoding all links, but it sounds a bit wrong to me based on what the RFCs say.

hwasem’s picture

Those links were created by using the Core Upload module.

Users uncheck the List option, copy the URL provided by File Attachments, and add a link into WYSIWYG's body field.

We do use IMCE, but the files are not being uploaded via an IMCE field. I don't think this is the issue.

hass’s picture

Category: bug » support
Status: Postponed (maintainer needs more info) » Fixed

It's reproducible under D6, but not D7. Also tested with D7 Insert module (no issues).

While looking around about the source of the bug it seems like this is a core bug. file_create_url() (http://api.drupal.org/api/drupal/includes%21file.inc/function/file_creat...) does not create valid urls. I encoded the path created in file_create_url() and I got a properly encoded path. After quick searching I found a lot of issues about this.

It looks like #1277140: file_create_url() creates invalid public paths is RTBC. Please try this patch and fix all these broken links with spaces.

hwasem’s picture

Thank you for helping me on this. I was researching, but going in the wrong direction. I'm pretty new to coding. I applied the patch and will report back to both threads in hopes of getting it released for others.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

hwasem’s picture

Status: Closed (fixed) » Closed (duplicate)

Yes, that patch fixed the problem. http://drupal.org/node/1277140#comment-5379396

We just need to go back and edit all of the incorrect links created from the bad URLs. Thanks again. Issue is still closed.

hass’s picture

Thanks for your feedback. I have asked several times why this is still not committed - 1 year RTBC. No feedback. Unbelivable!

Please stress Gabor to commit this to D6 asap.

hass’s picture

Issue summary: View changes

fixed typo

hass’s picture

jasonlttl’s picture

We have a bunch of content migrated into d7 from d6 and hit this issue. There were too many sites to fix manually and we didn't feel comfortable running an update on all the nodes/fields. However, we discovered the d7 version of pathologic, a popular text filter focusing on links, appears to correct spaces in urls by converting them to %20.

https://www.drupal.org/project/pathologic

This may be a reasonable solution for many people. I don't know if the d6 pathologic does the same thing or not, but probably.