I've got a lot of perfectly OK links on my site, that are being reported in the broken link checker for various reasons. I'd like to be able to tell the link checker not to worry about these, and to exclude them from the report. The report currently requires the admin to read through a whole lot of irrelevant stuff in order to check for possible important information.

Some examples:

1/ I've got a donation link to paymate (like paypal). It's not too surprising that paypal generates a 500 error response. they don't really want robots crawling such things. Works fine for human users though, and I don't want to continue checking or reporting on this one.

I'd like to be able to click a link in the broken link report to disable further checking of and reporting on this link

2/ I've got a bunch of links like http://au.youtube.com/watch?v=prn3lXTex2U which are perfectly valid, and extracted from a youtube RSS feed using FeedAPI/FeedAPI Mapper. The 302 error code here is expected, and appropriate.

The second example is more complex, There's an ongoing stream of new URLs being added. They're all of a single format though, and I'd like to be able to provide a regex to match these. If This just excluded checking these links it would be better than the current situation, but the ideal would be to check the links, report 404s, and accept 302s (or 301s) as being acceptable, and not needing to be in the report.

There's a lot of advantages to running my link checking from within drupal, but if the signal to noise ratio in the reporting is too low, then it's not workable.

Comments

hass’s picture

Category: feature » support
Status: Active » Fixed

#1. try latest DEV. There is an option to edit a link. In the edit link page you are able to change the HEAD check mode to GET. I'm sure this solve your 500 error. Duplicate of #427362: Add UI to change request method.

#2. also in latest DEV only, there is an option in the edit link page where you are able to disable link checking. But for your 302 link I would add the 302 code to the ignored status codes list... I have also done this on my machine after some time of testing... the 302 status code may be added to the default ignored status code list in future. Duplicate of #268946: Add ignore filter for links with buggy servers.

Hope this solves your questions.

ngaur’s picture

That's somewhat helpful in these particular cases, but there's still a need to be able to provide an exclusion list.

robots.txt support for off-site links is also important.

hass’s picture

If you could provide a patch for such a filter I would be very happy to take a look into the code, but it's not easy.

robots.txt? I do not care about a robots.txt configuration... I'd like to verify my links. If someone like to turn off his server we could provide an 404 to us and you are able to remove the link. We do not "crawl" a remote site and display it's content somewhere else...

ngaur’s picture

Status: Fixed » Needs work
hass’s picture

Status: Needs work » Postponed (maintainer needs more info)
hass’s picture

Status: Postponed (maintainer needs more info) » Fixed

Seems to be fixed, no feedback from opener.

ngaur’s picture

Status: Fixed » Closed (won't fix)
hass’s picture

Status: Closed (won't fix) » Fixed
ngaur’s picture

In what way is this fixed? You've said you aren't interested in fixing it yourself. I'd like to have time to do it, but I've got some other priorities that come first. So this looks likes it's going to remain unfixed for a while.

Apart from looking tidy on your issues queue, is there a reason why you think this should have a 'fixed' status?

hass’s picture

This support request has been aswered in #1. The feature still exists as explained. Therefore fixed. There was no Feedback what you need if this don't work

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

hass’s picture

hass’s picture

Title: Need exclusions list. » Add domain exclusions list
hass’s picture

Category: support » feature