Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
Sometimes my users upload files with spaces in their filenames. The link works fine on the page because the browser automatically converts them to %20. But the Broken Links report shows them as 404 - Not Found errors. So i have to go through and uncheck the option to Check this link.
Is it possible for Link checker to account for spaces in the filename so that is doesn't highlight these as broken links?
thanks
Steve Hutchison
Brisbane, Australia
Comment | File | Size | Author |
---|---|---|---|
#9 | linkchecker_1525146+Space+in+filename+incorrectly+triggers+404.patch | 920 bytes | hass |
#7 | linkchecker.module.patch | 463 bytes | aaron1234nz |
Comments
Comment #1
hass CreditAttribution: hass commentedI strongly suggest you to run transliteration module. This may solve the issue and fix your invalid links automatically. Can you try it, please?
I'm not sure if this is really a bug... Maybe it's just tooo nitpicking about broken uri's.
Comment #2
dingo CreditAttribution: dingo commentedHi Hass - thanks for responding.
OK, I've just had a look at the Transliteration module. It's designed to handle non-ASCII characters in filenames such as Greek and Russian script. The space character is an ASCII character. So I don't think its an ideal solution.
It's pretty common for users to want to have spaces in their filenames, and since Drupal allows you to upload file attachments which include spaces, i don't think its being too nitpicking to say this shouldn't come up as a broken link.
I'm no Drupal module developer so maybe this is too simplistic, but couldn't there be a line of code added? something like:
$link = str_replace($link, " ", "%20");
I looked at trying it myself but I wasn't sure where it should go.
thanks for your help,
Steve
Comment #3
hass CreditAttribution: hass commentedThan it's pathauto module... One of it replaces common strings. Aside I'm not sure if your backup software can handle cyrillic :-).
I will look into it, I know where it need to go (link extract process).
Comment #4
dingo CreditAttribution: dingo commentedi did some research and the Tranliteration module does actually convert spaces to underscores when the user uploads a file, in addition to translating non ASCII script. It feels like overkill but there doesn't seem to be anything else does it so I'm going with it.
thanks
Comment #5
hass CreditAttribution: hass commentedFix need to wait until after #1441574: Port D6 access bypass bugfixes to D7 has been committed.
Comment #6
hwasem CreditAttribution: hwasem commentedIn case it helps...I'm now having this problem after an upgrade of Link Checker 2.4 -> 2.5 and Path Auto 1.5 -> 1.6 in March. File names with spaces were not an issue prior to that.
Comment #7
aaron1234nz CreditAttribution: aaron1234nz commentedThis patch appears to have solved the issue for me.
Comment #8
hass CreditAttribution: hass commentedThis is the wrong line to fix this issue and you have only fixed nodes.
Comment #9
hass CreditAttribution: hass commentedI've made a quick test with a link
http://example.com/foo%20bar/
and this one is extracted correctly ashttp://example.com/foo%20bar/
. It shows that the problem is not really linkchecker itself as the URL in the content is for sure invalid. Spaces are not allowed in urls. See http://stackoverflow.com/a/497972. These invalid urls also do not passvalid_url()
.Patch attached, but I feel a bit reluctant to add these type of workarounds to module code. It may gives you a picture that everything is good, but it isn't really.
How are these urls created? Are these ones created by "insert" modules?
Comment #10
hass CreditAttribution: hass commentedComment #11
hwasem CreditAttribution: hwasem commentedIn my case the files were uploaded with the Upload Module (part of core).
The link is displayed as .../files/Facility%20Closures.pdf, for the file name "Facility Closures.pdf".
Comment #12
hass CreditAttribution: hass commentedCannot repro with an image and file upload. Both URLs have spaces encoded properly.
<img typeof="foaf:Image" src="http://example.com/drupal7/sites/default/files/picture-52_0%20-%20copy.png" width="200" height="200" alt="">
<a href="http://example.com/drupal7/sites/default/files/foo%20bar.TXT" type="text/plain; length=4">foo bar.TXT</a>
Comment #13
hwasem CreditAttribution: hwasem commentedI just noticed the version was changed since I posted. I'm on Drupal 6. Does that make a difference?
Comment #14
hass CreditAttribution: hass commentedI'm not sure how I can repro this issue. Is one of you able to share a public link where I can review the source code, please? It looks like http://api.drupal.org/api/drupal/includes!file.inc/function/file_directo... also uses drupal_urlencode(). Codewise I see no reason why there shouldn't be no
%20
in an url.Comment #15
hass CreditAttribution: hass commentedIf you guys like to see this fixed in the soon upcoming release I need to be able to repro.
Comment #16
hwasem CreditAttribution: hwasem commentedSorry for the delay. I'm not clear on what you need as far as source code. Do you just want a page that has a link that reports as broken in link checker but works?
The volunteer application links at the bottom here work, but are showing up in the link checker report as 404 Nof Found. http://www.midcolumbialibraries.org/about-mcl/volunteer
Comment #17
hass CreditAttribution: hass commentedThanks for this link. It clearly shows that the links are invalid per RFC standards. Spaces are not allowed in links.
How are these links created?
If last, than the WYSIWYG is broken. We could workaround these bugs by encoding all links, but it sounds a bit wrong to me based on what the RFCs say.
Comment #18
hwasem CreditAttribution: hwasem commentedThose links were created by using the Core Upload module.
Users uncheck the List option, copy the URL provided by File Attachments, and add a link into WYSIWYG's body field.
We do use IMCE, but the files are not being uploaded via an IMCE field. I don't think this is the issue.
Comment #19
hass CreditAttribution: hass commentedIt's reproducible under D6, but not D7. Also tested with D7 Insert module (no issues).
While looking around about the source of the bug it seems like this is a core bug.
file_create_url()
(http://api.drupal.org/api/drupal/includes%21file.inc/function/file_creat...) does not create valid urls. I encoded the path created infile_create_url()
and I got a properly encoded path. After quick searching I found a lot of issues about this.It looks like #1277140: file_create_url() creates invalid public paths is RTBC. Please try this patch and fix all these broken links with spaces.
Comment #20
hwasem CreditAttribution: hwasem commentedThank you for helping me on this. I was researching, but going in the wrong direction. I'm pretty new to coding. I applied the patch and will report back to both threads in hopes of getting it released for others.
Comment #22
hwasem CreditAttribution: hwasem commentedYes, that patch fixed the problem. http://drupal.org/node/1277140#comment-5379396
We just need to go back and edit all of the incorrect links created from the bad URLs. Thanks again. Issue is still closed.
Comment #23
hass CreditAttribution: hass commentedThanks for your feedback. I have asked several times why this is still not committed - 1 year RTBC. No feedback. Unbelivable!
Please stress Gabor to commit this to D6 asap.
Comment #23.0
hass CreditAttribution: hass commentedfixed typo
Comment #24
hass CreditAttribution: hass commentedComment #25
jasonlttl CreditAttribution: jasonlttl commentedWe have a bunch of content migrated into d7 from d6 and hit this issue. There were too many sites to fix manually and we didn't feel comfortable running an update on all the nodes/fields. However, we discovered the d7 version of pathologic, a popular text filter focusing on links, appears to correct spaces in urls by converting them to %20.
https://www.drupal.org/project/pathologic
This may be a reasonable solution for many people. I don't know if the d6 pathologic does the same thing or not, but probably.