It would be great if Link Checker module would show summary on the status report page (admin/reports/status).

It can be done by implementing hook_requirements() for ($phase == 'runtime')

Files: 
CommentFileSizeAuthor
#23 linkchecker_status-1900516-23.patch3.93 KBiva2k
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]
#20 linkchecker_status-1900516-20.patch3.57 KBiva2k
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]
#14 status_screenshot.png19.22 KBiva2k
#12 linkchecker_status-1900516-12.patch3.62 KBiva2k
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]
#4 linkchecker_status-1900516-5.patch3.2 KBiva2k
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]
#3 linkchecker_status-1900516-4.patch3.13 KBiva2k
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]
#2 linkchecker_status-1900516-3.patch3.13 KBiva2k
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]
#1 linkchecker_status-1900516-2.patch3.01 KBiva2k
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

Comments

Status:Active» Needs review
StatusFileSize
new3.01 KB
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

Here's a patch that does just that.
Please review and commit.

StatusFileSize
new3.13 KB
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

The bove patch missed the linkchecker_block_custom table in broken links query.

Use this patch.

Status:Needs review» Needs work
StatusFileSize
new3.13 KB
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

Oops, typo in #2 patch. Here's the one that displays status.
However it is still not matching the results on the broken links report, so needs some more work.

Status:Needs work» Needs review
StatusFileSize
new3.2 KB
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

Now this one fixes all issues that I noticed: ->isNotNull() is used and same broken link in multiple nodes is counted only once.

Thanks for sharing your patch. Have you seen that nearly the same info is shown on the report page if there are unchecked links? A subquery with distinct/union looks more efficient to me. It may be a good idea to move this out of the report page into the status page. What type of problem have you faced when you wrote the patch?

Status:Needs review» Needs work

+++ b/linkchecker.module
@@ -2488,3 +2488,67 @@ function _linkchecker_array_values_recursive(array $array) {
+    $query->leftJoin('linkchecker_node', 'ln', 'ln.lid = ll.lid');
+    $query->leftJoin('linkchecker_comment', 'lc', 'lc.lid = ll.lid');
+    $query->leftJoin('linkchecker_block_custom', 'lb', 'lb.lid = ll.lid');
+    $query->condition('ll.last_checked', 0, '<>');
+    $query->condition('ll.status', 1);
+    $query->condition('ll.code', $ignore_response_codes, 'NOT IN');
+    $query->condition(db_or()
+        ->isNotNull('ln.nid')
+        ->isNotNull('lc.cid')
+        ->isNotNull('lb.bid')
+    );

UNION ALL, distinct in a subselect.

+++ b/linkchecker.module
@@ -2488,3 +2488,67 @@ function _linkchecker_array_values_recursive(array $array) {
+        $description .= ' ' . t('You can <a href="@cron">run cron manually</a>.', array('@cron' => url('admin/reports/status/run-cron')));

The cron url requires the hash param or you cannot execute it.

+++ b/linkchecker.module
@@ -2488,3 +2488,67 @@ function _linkchecker_array_values_recursive(array $array) {
+      'severity' => ($links_broken > 0) ? REQUIREMENT_WARNING : REQUIREMENT_OK,

REQUIREMENT_WARNING may be a problem as it may cause false alarms. Need to be REQUIREMENT_INFO or just leave it always as OK.

Hi @hass
Thanks for very quick review and very good feedback. I agree with all comments and want to discuss the query as it is most obscure part.

When I did the query I looked at the query code in linkchecker_admin_report_page() and tried to reuse it. Unfortunately the code is structured in the way that prevents any code reuse, and I did not want to refactor it without talking to you or to blindly copy-paste the code from it, as there is a comment that it will be reworked ("// @todo Try to make UNION'ed subselect resultset smaller."). So I tried to do the query from scratch.

Do you think LEFT JOIN query gets incorrect results or has worse performance that UNION ALL? I would not be too concerned with performance as it is an admin page that is not looked at by many or too often, but I would no doubt be concerned with incorrect results.

Are you willing to refactor (or let someone else refactor) linkchecker_admin_report_page() so there is no code duplication between counting records and displaying records?

@hass

The cron url requires the hash param or you cannot execute it.

It only requires hash param if run from outside of the site. The following code was copied verbatim from modules/system/system.install

$description .= ' ' . t('You can <a href="@cron">run cron manually</a>.', array('@cron' => url('admin/reports/status/run-cron')));

See [system.install]

<?php
    $description
.= ' ' . $t('You can <a href="@cron">run cron manually</a>.', array('@cron' => url('admin/reports/status/run-cron')));
   
$description .= '<br />' . $t('To run cron from outside the site, go to <a href="!cron">!cron</a>', array('!cron' => url($base_url . '/cron.php', array('external' => TRUE, 'query' => array('cron_key' => variable_get('cron_key', 'drupal'))))));
?>

Why do we need the LEFT joins at all? If you just count the status of checked urls it looks like wasted DB processing time. Maybe you can explain a bit more in detail what your intention of this patch was. I'm still not sure about the value of these counters... :-). I thought about a small table that lists all status codes with counters... however I'm not sure about the value of this, too.

Re #9 - [system.install] - I don't see your point... are you saying that cron won't run without a hash when a logged in admin clicks the URL on the status page? I disagree with that. It runs perfectly fine. The first line in the snippet of code that you provided does exactly that without a hash. Hash is only needed to allow running cron without logging in. I made sure I copied from the code that generates the link on the status page, but not the one that describes how to set up an external cron call.

Re #10
See the top post for the goal - I wanted to have a quick glance in the status page telling me that there are broken links on the site and a link to go investigate and fix them. I feel like that's the central place where I want to see the issues with the website, and have no status from linkchecker module creates a void that I wanted to fill. To achieve that I need a simple count of broken links.

So the history of how I did it goes...
I tried making a counter query that only looks at the {linkchecker_link} table and returns how many broken links there are. I found that status alone does not tell if the link is broken or not, but I needed to exclude `last_checked`=0 and exclude `code` if it is found in the settings of what HTTP codes to ignore.
In my test the query I arrived to reported 6 broken links while the linkchecker_admin_report_page() showed zero broken links. Somewhat puzzled I dove into the database and found that there are 6 records that do not tie to any of nodes, comments or blocks via the other {linkchecker_*} tables. Perhaps they were left over when some of the nodes were edited and links deleted. Thus to address the incorrect counts I extended the query to eliminate those orphan records, and the count then matched the linkchecker_admin_report_page() table. I used left joins, a NOT NULL condition on them and a DISTINCT.

In essence the distinct leftJoin() query and isNotNull() does the same as distinct UNION ALL query in linkchecker_admin_report_page(), so I don't see a design problem.

I don't mean to start a big debate - I feel it is not worth it. I'd rather focus on solving the feature request. Perhaps we can close the issue without debating which query is better if you can authoritatively provide a query that returns total count of broken links which will match the records shown by linkchecker_admin_report_page().

StatusFileSize
new3.62 KB
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

I propose a move forward (see attached patch).

I created new function _linkchecker_query_broken_links() by copying existing linkchecker_admin_report_page() and making necessary edits (removed "->extend('PagerDefault')->extend('TableSort')" and the call to _linkchecker_report_page(), instead returning $query).

I did not touch linkchecker_admin_report_page(), however I feel it can be changed to use _linkchecker_query_broken_links() if desired (thus making code reuse according to DRY principle):

function linkchecker_admin_report_page() {
  $query = _linkchecker_broken_links_query();
  $query->extend('PagerDefault')->extend('TableSort');
  return _linkchecker_report_page($query);
}

Status:Needs work» Needs review

StatusFileSize
new19.22 KB

Note that I used REQUIREMENT_WARNING for broken links, REQUIREMENT_INFO for uncheked links and REQUIREMENT_OK if both are 0. Having REQUIREMENT_WARNING is consistent with other status page entries - if something that needs attention, REQUIREMENT_WARNING is used, if something is seriously broken, REQUIREMENT_ERROR is used.

Here's how the status page looks when there are broken links:

Status:Needs review» Needs work

Something is wrong here and there may be missunderstandings about the link table. This table contains all found links. That a link is in this table does not automatically mean it has been checked. This is a background task that can take weeks to complete.

  • last_checked = -1 means the link is unckecked.
  • There is no need to join the node/comment/block table to get the number of broken links. Keep it simple.

@hass
Can you explain the existing query used to display the table of broken links in linkchecker_admin_report_page()? I used it verbatim in patch #12. Any simpler query gives incorrect count of broken links that does not match the report page. I would be happy to use any other query, but you have to take a bit of time and review the linkchecker_admin_report_page().

Any simpler query gives incorrect count of broken links that does not match the report page.

This sounds strange. What differences do you have? It's only possible if you deleted a lot of nodes and the 24h cleanup has not run yet.

@hass
I'm confused, which question are you answering?

Fixed #17

See http://drupalcode.org/project/linkchecker.git/blob/refs/heads/7.x-1.x:/l... how I count the links on report page.

Aside "3 broken links" as shown in your screenshot doesn't mean that anything is wrong with the modules status. This is status OK from my point of view. At least I understood the status page as informal page if something is wrong or not with the modules/site itself and needs attention. A broken link does not qualify for such an extreme condition from my point of view. Please keep in mind that you may not able to fix all this links and than such a warning on every admin page visit will annoy you for ever.

StatusFileSize
new3.57 KB
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

You convinced me to not use REQUIREMENT_WARNING but only REQUIREMENT_INFO. Attached is a modified patch.

> See http://drupalcode.org/project/linkchecker.git/blob/refs/heads/7.x-1.x:/l... how I count the links on report page.
I've seen _linkchecker_report_page(). It has counter queries for $links_unchecked and $links_all. It does not have a query for broken links, the query comes as an argument from linkchecker_user_report_page() that calls _linkchecker_report_page(). And query in linkchecker_user_report_page() was copied VERBATIM into patch #12. Are you telling me that the query in linkchecker_user_report_page() is incorrect?

#17 response is clear now.
>This sounds strange. What differences do you have? It's only possible if you deleted a lot of nodes and the 24h cleanup has not run yet.
1) I did not delete the nodes. I edited them and replaced broken links with correct ones. External pages moved around and links needed to be updated.
2) 24hr cleanup is great, but why should the status page be ever able to display misleading information? The broken link query (that I copied into the patch #12 from the report page code from this module as it exists in the repository) gives exact count. Why would you want to modify that query and show incorrect information, even if it is only going to be incorrect for 24 hours. I'm as an admin don't want to be spooked either by the broken links counter showing up and being unable to find these links on the report. It will be terrible user experience. What if nodes are actively edited all the time, links come and go on any reasonable active website, there will always be a possibility for orphaned links and simplified query will keep giving incorrect counts. This query is executed rarely for an admin only and why castrate it and worsen the feature? I just don't get it.

Status:Needs work» Needs review

hook_requirements() is normally located in .install

StatusFileSize
new3.93 KB
PASSED: [[SimpleTest]]: [MySQL] 159 pass(es).
[ View ]

> hook_requirements() is normally located in .install
It is only true for phase = 'install', but no problem. Here's a new patch.

#23: linkchecker_status-1900516-23.patch queued for re-testing.