I've got a lot of perfectly OK links on my site, that are being reported in the broken link checker for various reasons. I'd like to be able to tell the link checker not to worry about these, and to exclude them from the report. The report currently requires the admin to read through a whole lot of irrelevant stuff in order to check for possible important information.
Some examples:
1/ I've got a donation link to paymate (like paypal). It's not too surprising that paypal generates a 500 error response. they don't really want robots crawling such things. Works fine for human users though, and I don't want to continue checking or reporting on this one.
I'd like to be able to click a link in the broken link report to disable further checking of and reporting on this link
2/ I've got a bunch of links like http://au.youtube.com/watch?v=prn3lXTex2U which are perfectly valid, and extracted from a youtube RSS feed using FeedAPI/FeedAPI Mapper. The 302 error code here is expected, and appropriate.
The second example is more complex, There's an ongoing stream of new URLs being added. They're all of a single format though, and I'd like to be able to provide a regex to match these. If This just excluded checking these links it would be better than the current situation, but the ideal would be to check the links, report 404s, and accept 302s (or 301s) as being acceptable, and not needing to be in the report.
There's a lot of advantages to running my link checking from within drupal, but if the signal to noise ratio in the reporting is too low, then it's not workable.
Comments
Comment #1
hass commented#1. try latest DEV. There is an option to edit a link. In the edit link page you are able to change the HEAD check mode to GET. I'm sure this solve your 500 error. Duplicate of #427362: Add UI to change request method.
#2. also in latest DEV only, there is an option in the edit link page where you are able to disable link checking. But for your 302 link I would add the 302 code to the ignored status codes list... I have also done this on my machine after some time of testing... the 302 status code may be added to the default ignored status code list in future. Duplicate of #268946: Add ignore filter for links with buggy servers.
Hope this solves your questions.
Comment #2
ngaur commentedThat's somewhat helpful in these particular cases, but there's still a need to be able to provide an exclusion list.
robots.txt support for off-site links is also important.
Comment #3
hass commentedIf you could provide a patch for such a filter I would be very happy to take a look into the code, but it's not easy.
robots.txt? I do not care about a robots.txt configuration... I'd like to verify my links. If someone like to turn off his server we could provide an 404 to us and you are able to remove the link. We do not "crawl" a remote site and display it's content somewhere else...
Comment #4
ngaur commentedComment #5
hass commentedComment #6
hass commentedSeems to be fixed, no feedback from opener.
Comment #7
ngaur commentedComment #8
hass commentedComment #9
ngaur commentedIn what way is this fixed? You've said you aren't interested in fixing it yourself. I'd like to have time to do it, but I've got some other priorities that come first. So this looks likes it's going to remain unfixed for a while.
Apart from looking tidy on your issues queue, is there a reason why you think this should have a 'fixed' status?
Comment #10
hass commentedThis support request has been aswered in #1. The feature still exists as explained. Therefore fixed. There was no Feedback what you need if this don't work
Comment #12
hass commentedDomain exclusion list has been added to DEV.
http://drupal.org/cvs?commit=241028
http://drupal.org/cvs?commit=241038
Comment #13
hass commentedComment #14
hass commented