I have written some enhancements to the weblinks checker functionality to give admins more options for checking URLs and would like you to consider adding them in to the module. In summary, the changes are:
- For links reporting errors, give the option to specify 'unpublish after N days' or 'unpublish after N cron runs' instead of the current 'on second run'
- For the log, give the option to show all URLs checked or just those which changed status
- If the link status has changed, show what it was before and what it is now
- Option for the summary message to include the titles of the links checked
- When selecting which error codes to ignore, add 0 to the list, to cater for local testing off-line
The actual files that are changed:
weblinks.install
- added new field last_status_info to weblinks_schema()
- added function weblinks_update_6115() to add this field
weblinks_checker.admin.inc
- weblinks_checker_settings() - new options, moved webklinks_checker_hide closing div from suffix of 'weblinks_checker_ignore' into prefix of 'rescue' so that fieldset and divs are properly nested.
- new function weblinks_checker_settings_validate()
weblinks_checker.install
- weblinks_checker_uninstall() - mass delete of all variables starting with weblinks_checker_% or weblinks_rescue_%
weblinks_checker.module
- added hook_nodeapi()
- weblinks_checker_validate() - cater for allowing http code 0
- weblinks_checker_cron() - lots!
removed 'global $user'
$checked is now an array to hold the titles of the checked links
change sql select where and order by
If the link is OK, don't reset the status to 200, but leave as the value returned (no need to lose this information)
plus all the other code changes for the new checking options
I have attached three screen shots of the admin checker tab.
I created the patch against 2.x-dev, and updated it this morning after the 25th May release. I hope these additions are useful, please let me know if you would like more info or want anything changed.
Jonathan
| Comment | File | Size | Author |
|---|---|---|---|
| #15 | checker_settings.jpg | 91.73 KB | nancydru |
| #14 | weblinks-checker.jpg | 55.65 KB | 2createwdrupal |
| #10 | _weblinks_checker_enhancements.patch3_.txt | 37.61 KB | jonathan1055 |
| #8 | _weblinks_checker_enhancements.patch2_.txt | 37.24 KB | jonathan1055 |
| #4 | weblinks_checker4.jpg | 159.53 KB | jonathan1055 |
Comments
Comment #1
jonathan1055 commentedI forgot to mention these other new options I have included:
6. Give priority to rechecking links in error or select the links in strict time rotation (as now)
7. Also select links which have been unpublished by weblinks_checker to get an up-to-date status
8. Option to republish the bad links if they are now OK
The options for dealing with links in error are now in a separate collapsed fieldset.
Jonathan
Comment #2
nancydruWow, looks pretty good. First question/concern: If they give priority to erroneous links, does that mean that some unchecked links might never get checked (e.g. "Check 3 links" and 4 have errors)? If this is the case, perhaps the description should warn about that.
I would like to see "Do not unpublish" hide the fields that aren't needed, like number of crons/days.
Comment #3
jonathan1055 commentedYes, if priority was given to erroring links, and there were as many links in error as were being checked on a run, then all the links in error would get checked in turn, over however many runs were needed, but only when some of them return a good status (or the editor/owner publishes or unpublishes the link manually, or fixes the URL) would other links get their turn. I will add a warning to the dblog at the end of the run if this situation has occurred.
Good idea with hiding the fields that are not applicable. I should have thought of that, as I like the way you have done it elsewhere in the module.
I'll get on and do these changes.
Jonathan
Comment #4
jonathan1055 commentedHere is an updated patch which also includes the following:
9. If weblinks checker is enabled, validate that 'Links to include' is in the range 1 .. 50
10. If unpublishing bad links, validate that 'unpublish after N' is in the range 1 .. 99
11. If 'Do not unpublish' is selected then the inputs for 'N Days/Crons' and 'Action on Unpublished' are hidden (see jpegs 4 and 5).
12. If the number of links found in error during a run is equal to the $limit (it can never be more than) put a warning in the dblog (see jpeg 6)
13. Removed some unnecessary calls to check_plain() when the variable is prefixed with @, because this is done anyway.
Hiding the redundant unpublish items meant a re-ordering of the form values in the fieldset. I have moved the 'Update on permanent redirect' and 'Checking Sequence' to the top, as these are always shown. If the default is collapsed=TRUE then the javascript to hide the div does not work on first loading the page and the div is always shown initially regardless the value of the radio button. It works ok when clicking the radio buttons, just not when first opening the collapsed fieldset. Maybe if the outside div is hidden on page load then the show/hide on the inner div is ignored? Anyway, setting the default to collapsed=FALSE is fine and then it all works correctly.
Do you think that the validation ranges are OK?
Jonathan
Comment #5
nancydruWow, thanks again, Johnathan. Yes, I think the ranges are okay, but possibly should include some verbiage warning that there are limits. Also, when I do "...after..." I tend to do selection boxes that would have values already provided (e.g. 1 day, 2 days, 1 week, 2 weeks, 1 month, etc.) but this is fine.
One feature I've been looking at lately is switching to batch processing for this kind of function so there is much less possibility of a PHP time out and greater ranges could be allowed. Have you looked at that at all?
Comment #6
nancydruAlso, while we do something like this with the blocks variables because it is easier than a major loop, "db_query("DELETE FROM {variable} WHERE name LIKE '%s_%%' or name LIKE '%s_%%'", 'weblinks_checker', 'weblinks_rescue')" is not considered best practice; individually listing variables is better.
"weblinks_checker_nodeapi" needs to verify that it is a weblinks node. We recently discovered this oversight ourselves.
A very minor thing: Drupal coding standards call for comments to be sentence structure (start with a capital, end with a period), and preferably on a separate line.
Comment #7
nancydruI added a component for the Checker contrib.
BTW, this will go into the module after 6.x-2.0 is released.
Comment #8
jonathan1055 commentedHi Nancy,
Thanks for the feedback, I have responded to them all.
a) I have added a note about the valid ranges in the descriptions for Limit and Unpublish After.
b) Having a drop-down for '1 day, 2 days etc' would be confusing when 'N Cron runs' is selected. I had considered having two separate entry fields and hiding the one that is not applicable to the current selection of days/crons, but that seemed like overkill.
c) No, I have not looked at batch processing.
d) Regarding "DELETE FROM {variable} WHERE name LIKE '%s_%%' .." I did not know this was not good practice. Other modules use this so I just followed! I thought that it would reduce code and be safer to catch all variables. I have changed it back to being explicit deletes, and added my five new rows, plus I found two existing variables that were missing from the list (weblinks_checker_enabled and weblinks_checker_ignore), so I added them too.
e) Good catch in weblinks_checker_nodeapi(), done.
f) Coding standards are not minor, they are very important. I have changed all the comments.
With regard to standards I have also replaced tabs with spaces, removed trailing blanks from the end of rows, changed to the standard @file text for both install files, added format_plural() to weblinks_checker_user, checked that t() was used in every drupal_set_message() and was not used in any watchdog() or format_plural() calls, and shuffled the form items to put them in a standard order for consistency and easy copying/reading. These have made the patch a bit more complex to read, so I hope you are OK with it.
Patch file created against dev dated 3rd June.
Jonathan
Comment #9
nancydruWow, I can't believe that Coder let me slip that many trailing blanks in. But it also let you omit spaces around equal signs in a few places -- also on some logical operators.
Comment #10
jonathan1055 commentedRule for Jonathan to remember: "After removing all trailing spaces then editing the file, re-check for trailing spaces"
Also I forgot about having spaces each side of =, ==, != etc. This is not actually mentioned (at least I could not find it) in the coding standards pages at http://drupal.org/coding-standards and Coder module does not check for these either. I found a few manually, then used scripts/code-style.pl which found lots more (including a couple that were there before my changes ;-)
Comment #11
nancydruThe only place I tend not to use the spaces around "=" is within queries, where, to me at least, it is more readable without them. I have opened an issue for the Documentation team.
Comment #12
2createwdrupal commentedThis is not exactly what you are discussing, but may be helpful at some point -
I've made a list of links to Podcasts. If I have checker on, it returns error "302" and won't allow editing. The url is correct. I have to turn off checker, then all is well, but this makes the checker feature unusable.
Comment #13
nancydruNo, it's not. In the checker settings you can tell it to ignore 302's; that should take care of you.
Comment #14
2createwdrupal commentedOn Web Links checker settings, I have this: screenshot attached.
Apparently, there is more, but the rest disappears as the page is rendered.
I realize that the module probably wasn't intended as a podcast links module.
Comment #15
nancydruIt looks like you are using an old version. Here's what you should see. If you are current and don't see this, open a new issue.
Comment #16
nancydruI'm working on this now. Note that we do not delete the vocabulary in uninstall, because we may be using an existing vocabulary that the admin already had.
Comment #17
nancydruCommitted to -dev.
Comment #18
nancydruIncluded in 6.x-2.1.