i've installed linkchecker successfully and set up (i think) correctly but it never seems to check any links.
cron has run automatically and i've run it manually but the broken links page still gives the "There are 3821 unchecked links of about 3821 links in the database." message.

the links table has a '0' for every link in the 'last checked' column.

there are no cron errors and cron runs successfully.
i've added code to invoke_all function to report cron calls/returns (found here: http://drupal.org/node/382682 ) and it shows this in the recent log entries table:

module_invoke_al	09/17/2009 - 07:49	calling linkchecker
module_invoke_al	09/17/2009 - 07:49	return from linkchecker after 0.00184 sec
[--snip--]
cron	09/17/2009 - 07:49	Cron run completed.

any ideas?
is there a setting i'm missing somewhere?

thanks for the help...

CommentFileSizeAuthor
#38 drupal-193383_15-D6.patch1.23 KBTheRec

Comments

hass’s picture

Status: Active » Postponed (maintainer needs more info)

1. 'last checked' = 0 is ok, it shows you that the link has never been checked. It will be changed if the link has been checked.

2. "There are 3821 unchecked links of about 3821 links in the database." shows you that the linkchecker_link tables have links to check... also - nothing wrong, too.

So... the only idea I can think of is that in hook linkchecker_cron() the $check_links_max_per_cron_run = ini_get('max_execution_time'); may not work... are you able to verify if ini_get() works well on your host? If this gives you a 0 or FALSE it cannot work... so on the end of the day it should be an issue in PHP or Apache.

chadd’s picture

i realize that 1 & 2 are ok, and just telling me that there are links to check that have never been checked... my issue is that the links don't seem to ever get checked, no matter how many times cron is run.

ini_get() seems to be working fine on our server.

<?
echo 'max_execution_time = ' . ini_get('max_execution_time');
?>

returns:

max_execution_time = 30

is there any other info i can provide or other tests i can run to help figure out why the links aren't getting checked?

chadd’s picture

Status: Postponed (maintainer needs more info) » Active
hass’s picture

Status: Active » Postponed (maintainer needs more info)

First of all 30 seconds is not much time! This should be 240 seconds... I'm not sure why this only 30s for the reason that drupal cron override the standard ini setting with 240s... not sure how you have tested this. Without an watchdog you may not see the correct time.

Please verify if this gives you an result by adding a watchdog line into the while().

$result = db_query_range("SELECT * FROM {linkchecker_links} WHERE last_checked < %d AND status = %d ORDER BY last_checked, lid ASC", time() - $check_links_interval, 1, 0, $check_links_max_per_cron_run);
  while ($link = db_fetch_object($result)) {
    watchdog('linkchecker', 'Fetch URL %link.', array('%link' => $link->url), WATCHDOG_INFO);
...

What is your $check_links_interval?

Are you sure that drupal_http_request() works well on your site?

Also verify if your Drupal system status page shows anything in red, please.

Sylvain Lasnier’s picture

Hi guys,
I have same problem.

I update max_time_execution from 30 to 60s. No change.
I remove poormancron and launch manually cron job. No error displayed, but no link checked.

I had watchdog hack. No trace in watchdog table.

I check argument values of db_query_range request:
time - check_links_interval: 1253185536*
check_links_max_per_cron_run:* no set?

Sylvain Lasnier’s picture

I see SQL request in mysql.log.

Result is empty :

mysql> SELECT * FROM linkchecker_links WHERE last_checked < 1253186993 AND status = 1 ORDER BY last_checked, lid ASC LIMIT 0, 0;
Empty set (0.00 sec)

All status rows are set to 0.
Limit sql clause is not good ; missing second arg value?

my linkchecker_links table contain 109 rows.

Sylvain Lasnier’s picture

I delete content of linkchecker_links table, click on "Clear link data and analyze content link".
No error display
109 rows are created, with status set to 0.

The 1st is :

mysql> SELECT * FROM linkchecker_links limit 1;
+-----+----------------------------------+----------------------+--------+------+-------+------------+--------------+--------+
| lid | token                            | url                  | method | code | error | fail_count | last_checked | status |
+-----+----------------------------------+----------------------+--------+------+-------+------------+--------------+--------+
|   1 | 7997244e7efac3e9e4279f3265e0a17b | http://www.infoq.com | HEAD   |   -1 | NULL  |          0 |            0 |      0 | 
hass’s picture

@Sylvain Lasnier: This is all ok... but check_links_max_per_cron_run:* no set? is BAD as it shows there is a problem with PHP/Apache config. I cannot say why, but this must be the source of the issue. The statement need to return 60 links if max_time_execution = 60s.

EDIT: Ähm... why is your status = 0? It need to be 1! Have you added your domain to the domain blacklist??? Please post the result from SELECT * FROM variable WHERE name LIKE 'linkchecker%'.

Sylvain Lasnier’s picture

Component: User interface » Code

Damned,

< ? php echo ini_get('max_execution_time'); ? > is ok in a dedicated test file.=> No Apache & PHP issue
but return FALSE inside linkchecker_cron function => ?

If you update linkchecker_cron function as is :

$check_links_max_per_cron_run = ini_get('max_execution_time');
$check_links_max_per_cron_run = variable_get('linkchecker_check_links_max', 60);

and

$result = db_query_range("SELECT * FROM {linkchecker_links} WHERE last_checked < %d AND status = %d ORDER BY last_checked, lid ASC", time() - $check_links_interval, 0, 0, $check_links_max_per_cron_run);

it's work.

Does timer_read('page') function in while loop cumulate time?

hass’s picture

I removed variable_get('linkchecker_check_links_max', 60); for very good reasons (usability). ini_get() need to work. If not you will run in several issues with core. Check out of PHP savemode is disabled for sure.

Does timer_read('page') function in while loop cumulate time?

http://api.drupal.org/api/function/timer_read/6

And linkchecker_link.status must be 1. Why is this 0 for all your links? You have two issues. One is that PHP is not configured properly and the second could be a cluttered linkchecker url blacklist configuration.

hass’s picture

Category: bug » support
Sylvain Lasnier’s picture

Category: support » bug

Strange

safe_mode = Off
and I have no other issues on my multisite server.

About 0 value, it come from INSERT SQL requests. Extract of myslq.log :

1287 Query SELECT url FROM linkchecker_links ll INNER JOIN linkchecker_nodes ln ON ll.lid = ln.lid WHERE ln.nid = 3 AND token IN ('7997244e7efac3e9e4279f3265e0a17b')
1287 Query SELECT lid FROM linkchecker_links WHERE token = '7997244e7efac3e9e4279f3265e0a17b'
1287 Query INSERT INTO linkchecker_links (token, url, method, code, fail_count, last_checked, status) VALUES ('7997244e7efac3e9e4279f3265e0a17b', 'http://www.infoq.com', 'HEAD', -1, 0, 0, 0)

hass’s picture

This status is only 0 if infoq.com is on your your domain blacklist! Keep the example.com/net/org domains in the list - remove everything else you have added and clear the linkchecker data with re-scan.

hass’s picture

Can you verify if you have max_execution_time exists in your php.ini? And/or if other modules may use any ini_set/get and may clutter the php config?

Sylvain Lasnier’s picture

max_execution_time is set and < ? php echo ini_get('max_execution_time'); ? > is ok in separate php file.

Other modules use ini_get :
# grep -Ri ini_get * -l
includes/mail.inc
includes/database.inc
includes/bootstrap.inc
includes/common.inc
includes/unicode.inc
includes/database.pgsql.inc
includes/file.inc
includes/image.imagemagick.inc
install.php
modules/user/user.module
modules/system/system.admin.inc
modules/system/system.install
modules/color/color.module
sites/all/modules/click_heatmap/clickheat/config.php
sites/all/modules/token/token_actions.module
sites/all/modules/image/image.imagemagick.inc
sites/all/modules/linkchecker/linkchecker.module

About ini_set :
# grep -Ri ini_set * | grep max_execution_time
#

Sylvain Lasnier’s picture

I had a block with ini_get('max_execution_time') at the bottom of http://www.webstrat.fr/

Value is set

hass’s picture

Category: bug » support

It's not a bug of linkchecker :-)

Are you able to investigate why ini_get() gives NULL/0, please? I'd like to learn how this is possible...

chadd’s picture

i think i'm having the same problem with max_execution_time being set but not carrying into the linkchecker module.
i added a line in htaccess to set 'max_execution_time' to 240.

outside of drupal php ini_get('max_execution_time') returns 240.
i created a page in drupal and ran the same php and it also returns 240, so drupal itself isn't changing the value.

i added a watchdog line to line 105 of the linkchecker.module:

  watchdog('linkchecker-max', 'check_links_max_per_cron_run =  %c1 :: max_execution_time = %c2', array('%c1' => $check_links_max_per_cron_run,'%c2' => ini_get('max_execution_time')), WATCHDOG_INFO); 

when cron is run it returns:

check_links_max_per_cron_run = :: max_execution_time = 

why would the ini_get('max_execution_time') return NULL when called from within the linkchecker.module file but return a valid result both inside and outside of drupal on the same server?
is there a php setting i need to check? is this a bug in linkchecker?

i've run these tests both with and without the htaccess setting the max_execution_time and the results are the same, the only difference being that the time returned is 30 instead of 240

hass’s picture

I really have no idea... it works for me and many others... so we need to find out why it's not working. Try figure out that other core ini_get() functions return... this may shed some light if more things are broken, but you haven't noticed them yet.

chadd’s picture

i created a page in drupal and ran the ini_get('max_execution_time') and it also returns 240, so drupal itself isn't changing the value.

where else can i test?

hass’s picture

Can you try

error_reporting(E_ALL);
$check_links_max_per_cron_run = ini_get('max_execution_time');

and if you get something like "ini_set() has been disabled for security reasons"? I cannot help here - you need to find the reason on your box. It's not linkchecker... Aside - how do you execute cron - via wget or poormanscron?

hass’s picture

Also check your apache logs... or enable debug logging in PHP...

chadd’s picture

i added the error_reporting(E_ALL); and i get nothing in my apache error or access logs.

i'm calling cron by hand (clicking the link on the status page). normally the cron is run on the hour via a apache cron job. the log shows:

"GET /cron.php HTTP/1.0" 200 - "-" "Lynx/2.8.6rel.5 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.8h"
chadd’s picture

i also noticed that linkchecker_links.status is '0' for all links.
i reset the module config to defaults and rescanned for links and they always get set back to 0.

could that be related?

Sylvain Lasnier’s picture

Category: support » bug

Horrible,

If I add this :

function linkchecker_cron() {
  //SLA
  print_r (ini_get_all());
  ini_restore("max_execution_time");
  print_r (ini_get_all());

and run curl http://www/cron.php

I get this :

    [max_execution_time] => Array
        (
            [global_value] => 60
            [local_value] => 
            [access] => 63
        )
....
    [max_execution_time] => Array
        (
            [global_value] => 60
            [local_value] => 60
            [access] => 7
        )

I don't know why max_execution_time local value is set to null and why it is set to 60 other function, like linkchecker_nodeapi for example.
I didn't find any ini_set('max_execution_time',...)

Perhaps ini_restore("max_execution_time"); is an valid hack waiting for beautiful solution?

hass’s picture

Category: bug » support

You need to find out what is inside the "linkchecker_disable_link_check_for_urls" variable and why.

This all works and I have no clue what a crippled machine you are using there.

chadd’s picture

where do i find that "linkchecker_disable_link_check_for_urls" variable.

and how can i find out why it is being set as it is being set?

chadd’s picture

i again cleared my link data and re-scanned for links and this time it set all the status to 1, so that seems to be working.

but i still can't figure out what is going on with the max execution time.
i have that ini_get() running successfully both in and out of drupal and the only place i can see that it returns nothing is in the linkchecker module...

hass’s picture

SELECT * FROM variable WHERE name LIKE 'linkchecker%'

hass’s picture

Category: support » bug

I'm able to repro this on my linux box, but not on my windows.

hass’s picture

Title: Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before » links not being checked
Priority: Critical » Normal

This is really an annoying bug... I do not really understand it yet, but in drupal_cron_run() we have this code

  if (function_exists('set_time_limit')) {
    @set_time_limit($time_limit);
  }

and *this* causes the value to become 0. Afterwards in all cron hooks you are no more able to get the 'max_execution_time' with ini_get(). I WONDER why this works on Windows, but not on Linux. This could be a PHP bug... maybe you are able to post your PHP version here, please. I'm able to repro this on PHP 5.2.9 (Linux), but not PHP 5.2.6 (Windows).

We need to figure out why this is broken - or how we are able to get the current set_time_limit().

hass’s picture

Title: links not being checked » Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before
Priority: Normal » Critical
hass’s picture

Title: links not being checked » Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before
Priority: Normal » Critical

$time_limit is undefined in drupal_cron_run() and is therefore undefined... looks like a core bug.

hass’s picture

Title: Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before » Links not checked, 'max_execution_time' returns 0 if set_time_limit($time_limit) is used before
damien tournoud’s picture

Title: Links not checked, 'max_execution_time' returns 0 if set_time_limit($time_limit) is used before » Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before
Status: Postponed (maintainer needs more info) » Active

You don't have to know the max_execution_time, because it makes little sense by itself (that doesn't give you the number of second your script can run, but the number of seconds it was allowed to run when it started). set_time_limit() messes with that a big time: after a call to set_time_limit() this value has even less sense that it had before: set_time_limit(n) gives the script the guarantee that it can execute for n *additional* seconds.

Drupal 6.14 now guarantees each hook_cron() that it will be able to execute for at least 30s. If you believe that this is not enough for your use case, call drupal_set_time_limit() yourself at the beginning of your function.

hass’s picture

Currently it run unlimited time.

Job queue und linkchecker are now broken!

damien tournoud’s picture

By the way, you should take a look at the Job queue module, that would allow you to do that right.

TheRec’s picture

Title: Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before » Links not checked, 'max_execution_time' returns 0 if set_time_limit($time_limit) is used before
Status: Active » Needs review
StatusFileSize
new1.23 KB

Yup... seems like the review process for D6 was kinda skipped ;) Back to a litteral value... There was the same copy/paste error in node.module.

TheRec’s picture

Title: Links not checked, 'max_execution_time' returns 0 if set_time_limit($time_limit) is used before » Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before
Status: Needs review » Active

I did not post it in the right issue I guess.. sorry, disregard the patch in here.. I'll post it in #193383: set_time_limit: Centralize calls and prevent warnings and errors.

hass’s picture

Title: Links not checked, 'max_execution_time' returns 0 on Linux if set_time_limit() is used before » Drupal 6.14: Links not checked, 'max_execution_time' is overriden to "undefined" by drupal_cron_run()

Corrected title.

hass’s picture

Added a "Known issues" section on linkchecker project home and suggested to apply patch http://drupal.org/node/193383#comment-2060262. I leave this open for now in the hope that Damien is able to explain http://drupal.org/node/193383#comment-2061014 here or over there and in the hope that others see the case if they have the problem.

This case will become a duplicate of #193383: set_time_limit: Centralize calls and prevent warnings and errors afterwards.

hass’s picture

TODO: After some reading of other cases I think the module need to care for max_execution_time = 0. I'm not sure if someone would be soooo crazy to use this value, but in such a case the current linkchecker logic will break.

hass’s picture

hass’s picture

Status: Active » Fixed

Patches to prevent link check failure if 'max_execution_time' = 0 (unlimited) has been committed to D5 and D6.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.