What a neat module!

Just wondered (if it doesn't already) whether it could use the job_queue module to email the author of a node to say something like

"The following links on your page at: --node_url---
do no resolve to accessible pages:
http://xyz - Error 404 - Page not found
etc"

That way the authors of the node can do the work! ;-)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hass’s picture

Title: notify node author » Email notification for node authors

Good idea... nevertheless I don't like this notification / spamming method very much. I've already have had the idea to use job_queue for a more reliable link check processing, but this is one more reason to use it :-)

hass’s picture

How this should be build:

1. Make email notification user configurable. By default ON if globally activated. Add select box to user page.
2. Make sure we do not send one mail per unpublished node. All broken nodes should be collected first per author and then one email with all unpublished nodes listed should be send.
3. May add an early warning email before unpublishing a node.
4. Don't forget to limit this mailings to "3" for dead mail addresses and authors that don't care about their articles and links inside.
5. Limit mailings once per day/week/month.

scedwar’s picture

Check out notifications - well worth hooking into that for most of this functionality.
http://drupal.org/project/notifications

hass’s picture

Can you provide a patch?

scedwar’s picture

patches like this are far beyond my coding skills, sorry!

Anonymous’s picture

Version: 5.x-1.x-dev » 6.x-2.3

Want to bring this request back to life. I'm running a CMS comparison site, and authors are publisher who maintain their own listings. I really need to send them automatic emails on CRON runs in case their URLs are out of date.

So on CRON run, simply:

- check URL (implemented)
- if theshold e.g. >3 fails, then: (implemented)
- send email to author (* new)

With a custom email field with tokens. for author email, name & failed URL + node id.

If I have time, I'll patch it myself. May take a few weeks, very busy right now. (We all are, I know.)

hass’s picture

Yeah, please see #2 for the proposed functionality... I hope to have nothing missed. Would be great to have a patch for the next release. Do not waste much time with point 4... it's an issue, but not so important :-).

chadd’s picture

i would also like to see this implemented.

the functionality in #2 looks fine to me.

r_honey’s picture

FileSize
4.2 KB

Hi everyone, the attached zip provides mailing functionality. It contains minor patches to 2 linkchecker files, linkchecker.module and linkchecker.admin.inc.

Then there is an additional file called linkchecker.mail.inc in the includes folder. This file contains the actual code for mailing on founding broken links.

The first patch to .module file adds a couple of lines to incorporate mailing at appropriate time during link checking. The patch to admin file adds options for mailing.

I am still testing these patches. But tought of sharing them here.

r_honey’s picture

As it currently stands, the patched code sends out an email to each user whose node/comment was found to contain a broken link after a cron run.

Features in #2 are highly desirable. However, for implementing them, we might need to maintain when was last mail sent to a user. This can be accomplished by either adding a brand new table, linkcheker_mails, or by adding a last_mail column to the linkchecker_nodes and linkchecker_comments tables.

I would be willing to adapt code for incorporating these features. As it requires some changes to the module, it might be better if I can make these directly to the CVS. Wonder if the module owners would be willing to provide me CVS access for this module!!

hass’s picture

Status: Active » Needs work

Thank you for sharing your work. Do you know http://drupal.org/coding-standards?

1. Try to use require_once conditionally in hook_init()
2. Patch contains context sensitive bugs in translatable strings. (e.g. $email_token_help)
3. Always use db_placeholders() and never variables in an SQL strings.
4. Always make a comma on the end of arrays and a newline '#collapsible' => FALSE);, see core for examples for better code readability.
5. Code indentation does not meet drupal standards
6. All titles need to be upper-case first, lower case rest, e.g. t('Broken Link Email Subject') -> t('Broken link email subject')
7. Never post ZIP files, please. Post one patch. This module is also monitored by the code review and testing robot... so we know if the patch applies and all tests are still working.
8. if you post a patch, set the case to "needs review".

hass’s picture

This patch looks a bit like a spam engine to me!?!? We need to make sure that mails are not send on every cron run. Many people may run cron every 5 minutes and this would cause status emails every 5 minutes. Individual users MUST be able to configure - send me an report once per day, once per week (default), once per month. I thought in past - using a table named linkchecker_users. Most of the points in #2 are really important and a must have to me. Without this features the functionality is not really usable for the most people. The day should also be customizable as many people don't like to get such type of to-do's on sunday/monday on their desk... :-)

r_honey’s picture

Well, maybe yes. As I said, satisfying some of the requirements in #2 required schema support, and I stopped short of making changes to the schema.

linkchecker_users with columns for whether a user wants the email, last time an email was sent etc. would satisfy most of the points in #2.

Regarding the coding standards, well I use Zend Studio 7 for development, and I have configured the development environment to mimic the coding standards as closely as possible (indentation etc.). But I was not able to completely set the options as pr the Drupal standards.

For db_placeholders(), the user input has been properly escaped before (db_escape_string). However, I do accept that I should have been more vigilant for things like capitalization, translatable string etc.

r_honey’s picture

Status: Needs work » Needs review

Okay, the attached files should satisfy everything in #2, and in #12 (except early warning email before unpublishing a node).

There's a global Admin setting which controls whether linkchecker emails are sent at all. Users can further opt-in/opt-out of linkchecker emails, together with minimum number of days between emails and days on which to receive an email if at all.

The first of the 2 attached file patches linkchecker.module, linkchecker.admin.inc, and linkcehcker.install to add support for sending emails. The second is a completely new file that need to be put in the included folder with the name linkchecker.mail.inc.

As you would be knowing, cvs diff does not diffes newly added files. I develop an Win machine, and therefore a couple of options mentioned in doc pages (fakediff etc). did not apply to me.

The patches are for the latest version of 6.x-2.x branch of the code downloaded this morning by me. Everything is taken care of, except for providing an upgrade path to add the new schema table, linkchecker_users. I thought the module maintainers would be able to decide the upgrade path better. So, these patches should be tested on a new install of linkchecker.

To @hass, I have tried to follow coding standards (although I followed last time also):
1) I could not find a doc page saying require_once in hook_init.
2) $email_token_help was actually copied from the core user module. And this is how it was in the admin file for the core module also.

It might be possible that I left out one or two of your earlier suggestions. However, if if you think the patch adds value to the module, those issues can always be taken care of before committing to the CVS.

r_honey’s picture

FileSize
4.91 KB
10.33 KB

Strange, but the files attached by me did not show up in the previously posted comment.

Status: Needs review » Needs work

The last submitted patch, mail.patch, failed testing.

r_honey’s picture

Status: Needs work » Needs review

#15: mail.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, mail.patch, failed testing.

hass’s picture

+++ linkchecker.module	17 May 2010 07:38:15 -0000
@@ -8,6 +8,8 @@
+require_once (drupal_get_path('module', 'linkchecker') . '/includes/linkchecker.mail.inc');

There is no rule except the one I made :-). Normally it's best practice to include files only if they are required. So - if we send mails, we should include it - otherwise not. The logic will change in D7... but we can see this the hook_init workaround in many modules.

+++ linkchecker.module	17 May 2010 07:38:15 -0000
@@ -148,6 +153,115 @@ function linkchecker_cron() {
+        $form['linkchecker'] = array(
+            '#type' => 'fieldset',
+            '#title' => t('Linkchecker settings'),
+            '#description' => t('Customize settings for e-mails sent when broken links are found in the content created by you on the site.'),
+            '#tree' => TRUE,
+            '#weight' => 4,
+            '#collapsible' => TRUE,
+            '#collapsed' => TRUE);

You are using 4 spaces for one tab, in Drupal we use 2 spaces for one tab.

Also make a comma after TRUE and a line break so that the bracket is in the same line where $form starts. Nitpicking, but clean code :-).

+++ includes/linkchecker.admin.inc	17 May 2010 07:38:17 -0000
@@ -197,6 +199,40 @@ function linkchecker_admin_settings_form
+  $email_token_help = t('Available variables are:') . ' !site, !user_link, !uri, !uri_brief, !username, !mailto, !date.';

No idea who have written this, but if this is in core, it looks like a bug :-). I think it need to be: t('Available variables are: !site, !user_link, !uri, !uri_brief, !username, !mailto, !date.')

The only idea I can come up is - the variables are text and not placeholders... this may be a reason.

+++ includes/linkchecker.admin.inc	17 May 2010 07:38:17 -0000
@@ -197,6 +199,40 @@ function linkchecker_admin_settings_form
+      '#title' => t('Broken Link Email Subject'),

t('Broken link email subject')

+++ includes/linkchecker.admin.inc	17 May 2010 07:38:17 -0000
@@ -197,6 +199,40 @@ function linkchecker_admin_settings_form
+      '#title' => t('Broken Link Email Body'),

t('Broken link email body')

I hope I will find some time soon to give this great feature a test.

I'm also using Windows and I have not heard that a file need to be faked... I'm using Eclipse... but your patches seem to have real issues... this need to fixed first. If you don't get the green light here - something should wrong with your patch.

r_honey’s picture

Regarding the $email_token_help variable, the text that is outside the t() are really placeholders, that the code replaces in the mail subject/body.
Because these are replacement patterns, maybe therefore they are outside theming.

Regarding patches, well I am really surprised here. The patches have been created by cygwin's cvs package by diffing them to the Drupal CVS. I really have no idea why these patches are having issues.

Even more confusing to me is the fact that once a patch has failed testing, why is there an option for retesting it. If it has failed once, it would fail again.

I acknowledge my deviation from Drupal coding standards. But apart from that, I really believe that the code does exactly what it advertises. Would wait to see your feedback on the code (functionality, not appearanace;)).

hass’s picture

Status: Needs work » Needs review
FileSize
17.15 KB

I have fixed the last code style issues.

Fixed:
1. Update hook was missing to add linkchecker_users table to existing installations
2. Tested includes/linkchecker.mail.inc inclusion in attached patch and it works well with Eclipse. I suggest you change to Eclipse - it's a great tool :-)

Untested:
3. db_placeholder in mail.inc is completly untested
4. require_once should be enough to be added once in .module, no need in admin or have I missed something? module_load_include is the normal way to load the .inc. May change to hook_init(). Needs more investigation.
5. Have you tested the schema creation with 'foreign keys' definition? It's a new feature in D7... I'm not sure if it would cause failures in D6. Never tried it yet.
6. UI strings and schema strings should get a review.
7. We may need to remove the & from &$edit and &$user for PHP 5.3 compatibility
8. Variable naming in linkchecker_user may need to be prefixed with linkchecker and more general about linkchecker_mail_* - Need to think more about this.
9. We should add some tests if possible :-)

hass’s picture

FileSize
17.15 KB

The patch is not correctly named. Re-attach for robot

Status: Needs review » Needs work

The last submitted patch, linkchecker_mail-D7.patch, failed testing.

hass’s picture

Status: Needs work » Needs review
FileSize
17.15 KB

New try without suffix

Status: Needs review » Needs work

The last submitted patch, linkchecker_mail.patch, failed testing.

Anonymous’s picture

A patch with the name ending in -D6.patch will not queued for testing; that is why that patch is not tested. See the description for the allowed attachments reported at the bottom of this page.

r_honey’s picture

To @hass:

1) Regarding hook_update, I mentioned it in the post earlier that I left hook_update to the module team (did not had a look at the update numbers ;))
2) Changing to Eclipse - Well, I will give it a try when I get time. But I think Zend Studio is a great IDE (I know it's further based on Eclipe).

3) I have found a minor issue in the code. In linkchecker.mai.inc, this line:
$receive_mail = $user->receive_mail == NULL ? 1 : $user->receive_mail == NULL;
should be changed to:
$receive_mail = $user->receive_mail == NULL ? 1 : $user->receive_mail;

4) I did not find a reference to db_placeholder in mail.inc. I was not quite clear what you meant here.

5) Yes, require_once should have been sufficient once in .module file.

6) Schema creation with foreign keys - Yes, on my D6 install, it did not cause any issues. I haven't gone into D6 core Schema API code, but my understanding is that in all Drupal API methods which expect arrays with specific keys (Form API, Schema API etc.), the core API simply ignores keys in the arrays it does not understand/expect.

So, just left that foreign key part to be able to use it on both D7 & D6 (where it should be ignored).

8) Variable naming in linkcheker_users - I am not sure, but probably you are talking of the table columns here. I don't see any reason for this. The module does not hook into user_load and loads the column data when a user objects load.
linkchecker_users is rather a private table for the module. The only place where data is fetched from this table is in mail.inc, in the _linkchecker_notify_broken_links method. Following is the select statement for this:

SELECT u.*, lu.receive_mail, lu.min_days, lu.days, lu.last_mailed
We can alias columns here if required.

9) I was thinking about scrip timeout. The current behavior of the module is stop once it uses over half of the maximum execution time. Do we need to change this to let's say for 1/3 of the execution time, if the module setting variable linkchecker_mail_broken is set the TRUE, meaning that mails are to be sent.

This is because sending mails can itself take some time, especially is an installation uses the smtp module to connect to an external smtp server (e.g. smtp.google.com) for sending out mails.

hass’s picture

Version: 6.x-2.3 » 6.x-2.x-dev
hass’s picture

Status: Needs work » Needs review

#24: linkchecker_mail.patch queued for re-testing.

hass’s picture

@kiam: D6 code cannot tested at all or what does Setup environment: failed to drop checkout database. mean?

Status: Needs review » Needs work

The last submitted patch, linkchecker_mail.patch, failed testing.

Anonymous’s picture

@hass: I was referring to the description reported in the fieldset File attachments; it could be the text needs to be updated, as the description seems to report that tests for Drupal 6 are not done (which doesn't seems true, as tests for Drupal 6 modules are correctly done). It could also be there is code that filters out the patches to test basing on the file name, and that code has not been removed.

scubasmurf’s picture

subscribing

hass’s picture

Status: Needs work » Needs review
FileSize
16.72 KB

Fixed Unix LF's

Status: Needs review » Needs work

The last submitted patch, linkchecker_mail2.patch, failed testing.

r_honey’s picture

Status: Needs work » Needs review

hass, I pointed this out above also. This line:
$receive_mail = $user->receive_mail == NULL ? 1 : $user->receive_mail == NULL;

in linkchecker.mail.php needs to be changed to:
$receive_mail = $user->receive_mail == NULL ? 1 : $user->receive_mail;

Also, would it make sense to reduce the linkchecker script timeout if email sending is enabled in these lines in linkchecker.module:

    if ((timer_read('page') / 1000) > ($max_execution_time / 2)) {
      break; // Stop once we have used over half of the maximum execution time.

We can reduce the time to $max_execution_time/3 or something suitable to leave sufficient time for email sending.

hass’s picture

FileSize
16.71 KB

*Damn* robot, there was one Windows LF inside... also fixed #36.

I'm not sure about $max_execution_time/3. "$max_execution_time/3" is really a short timeframe - at least if people have only 60s $max_execution_time on shared hosts (nevertheless we are not able to support such *short* cron runs). Have you seen cron failing?

Sending Mails shouldn't be a problem at all... if a mail hasn't been send it should be send at a later cron run, isn't it? We should only prevent cron failures with this logic as this may break a site. I do not feel strong about a delayed email... We may can change the logic a bit that sending emails is also calculated into the time frame of $max_execution_time/2. D7 also have some changes we can use...

r_honey’s picture

I agree that sending emails should not be a priority that much. But one issue might be that if a timeout prevents emails from being sent for nodes that are found to contain broken links, subsequent cron runs would also flag those nodes with broken links, and unless the overall number of nodes with broken links reduce (either by members who received mail or by administrators), such members who did not receive mail on a cron run might also not receive it on subsequent runs.

hass’s picture

How about running the mails upfront and than do the linkchecking? Than you are not sending the mails after a check, but in the very beginning of the next cron run one hour later and at least before you do the next checks. Dirty, but should work. Normally this is really no problem as links are only checked once per 4 weeks (default) and I strongly discourage using any other value shorter than 4 weeks...

r_honey’s picture

Hmmm... that might actually be a good idea to spread out load during a cron run. As I said earlier, I myself use a third-party SMTP server for sending out mails on my Drupal installations. This might take some time depending upon the number of mails that need to be sent out.

Implementing this would require some changes to the code. One thing that immediately comes to my mind is to store a module variable linkchecker_scan_time. This stores the last timestamp when module scanned content for broken links. At the beginning of the cron run (for sending emails), we execute the Sql statement that fetches all users whose content have broken links and whose last email time is less than the last scan time. After an email is sent, the user's last email time is updated immediately (instead of a batch at the end when all emails have been sent).

This would ensure that no user gets duplicate emails in case email sending is terminated in between due to script timeout or something else (This is where I cherish ASP.NET more, the ability to create background threads instead of blocking the UI threads and imposing timeouts on the UI threads).

r_honey’s picture

Another issue I found with the code... In the hook_user, we show the linkchecker settings only for registered users (but not while creating a new user etc.).

Therefore validate for hook_user should only validate if an existing user is being edited. Therefore, inside hook_user in linkchecker.module, 2 lines need to be added:

      if (!$user->uid) {
        return;
      }

This means the validate case should change to:

    case 'validate':
      if (!$user->uid) {
        return;
      }
      $min_days = $edit['linkchecker']['min_days'];
hass’s picture

Status: Needs review » Needs work

Please update the patch

r_honey’s picture

Status: Needs work » Needs review
FileSize
16.76 KB

Resolution to #41

j0nathan’s picture

subscribing

itrivino’s picture

Subscribing. This feature should be part of the module. A great module!!

camaraderie’s picture

Subscribing, unless anyone knows how to do this using Notifications?

rkodrupal’s picture

subscribing

hass’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev

D7 need to be done first

hass’s picture

#43: linkchecker_mail4.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, linkchecker_mail4.patch, failed testing.

jastern’s picture

+1 vote for this patch! :-)

we have a drupal 6 installation with many pages, and while it is not a money-making site, it is a professional university site that faculty, staff, and students all count on, so unpublishing a node -- even temporarily -- is just not an option -- we would catch too much flack from our departments. it would be much better to send email notifications to authors with all the specifications mentioned above (esp #2).

we were just about to write our own rule or even a separate module to scan through the linkchecker table and find such pages, then email the author, but saw this patch.

what's the status of this patch?

thanks!

hass’s picture

See #50

Yuri’s picture

Would be nice to have this as a Rules trigger, this gives the most flexibility and is probably the most simple solution.

backslash856’s picture

Issue summary: View changes

subscribe

backslash856’s picture

I suppose patch #43 is for drupal 6 only, isn't it? For drupal 7 there are some solution or work around?
Thanks