i've installed linkchecker successfully and set up (i think) correctly but it never seems to check any links.
cron has run automatically and i've run it manually but the broken links page still gives the "There are 3821 unchecked links of about 3821 links in the database." message.
the links table has a '0' for every link in the 'last checked' column.
there are no cron errors and cron runs successfully.
i've added code to invoke_all function to report cron calls/returns (found here: http://drupal.org/node/382682 ) and it shows this in the recent log entries table:
module_invoke_al 09/17/2009 - 07:49 calling linkchecker
module_invoke_al 09/17/2009 - 07:49 return from linkchecker after 0.00184 sec
[--snip--]
cron 09/17/2009 - 07:49 Cron run completed.
any ideas?
is there a setting i'm missing somewhere?
thanks for the help...
Comments
Comment #1
hass commented1. 'last checked' = 0 is ok, it shows you that the link has never been checked. It will be changed if the link has been checked.
2. "There are 3821 unchecked links of about 3821 links in the database." shows you that the linkchecker_link tables have links to check... also - nothing wrong, too.
So... the only idea I can think of is that in hook linkchecker_cron() the
$check_links_max_per_cron_run = ini_get('max_execution_time');may not work... are you able to verify if ini_get() works well on your host? If this gives you a 0 or FALSE it cannot work... so on the end of the day it should be an issue in PHP or Apache.Comment #2
chadd commentedi realize that 1 & 2 are ok, and just telling me that there are links to check that have never been checked... my issue is that the links don't seem to ever get checked, no matter how many times cron is run.
ini_get() seems to be working fine on our server.
returns:
is there any other info i can provide or other tests i can run to help figure out why the links aren't getting checked?
Comment #3
chadd commentedComment #4
hass commentedFirst of all 30 seconds is not much time! This should be 240 seconds... I'm not sure why this only 30s for the reason that drupal cron override the standard ini setting with 240s... not sure how you have tested this. Without an watchdog you may not see the correct time.
Please verify if this gives you an result by adding a watchdog line into the while().
What is your $check_links_interval?
Are you sure that drupal_http_request() works well on your site?
Also verify if your Drupal system status page shows anything in red, please.
Comment #5
Sylvain Lasnier commentedHi guys,
I have same problem.
I update max_time_execution from 30 to 60s. No change.
I remove poormancron and launch manually cron job. No error displayed, but no link checked.
I had watchdog hack. No trace in watchdog table.
I check argument values of db_query_range request:
time - check_links_interval: 1253185536*
check_links_max_per_cron_run:* no set?
Comment #6
Sylvain Lasnier commentedI see SQL request in mysql.log.
Result is empty :
mysql> SELECT * FROM linkchecker_links WHERE last_checked < 1253186993 AND status = 1 ORDER BY last_checked, lid ASC LIMIT 0, 0;
Empty set (0.00 sec)
All status rows are set to 0.
Limit sql clause is not good ; missing second arg value?
my linkchecker_links table contain 109 rows.
Comment #7
Sylvain Lasnier commentedI delete content of linkchecker_links table, click on "Clear link data and analyze content link".
No error display
109 rows are created, with status set to 0.
The 1st is :
Comment #8
hass commented@Sylvain Lasnier: This is all ok... but
check_links_max_per_cron_run:* no set?is BAD as it shows there is a problem with PHP/Apache config. I cannot say why, but this must be the source of the issue. The statement need to return 60 links if max_time_execution = 60s.EDIT: Ähm... why is your status = 0? It need to be 1! Have you added your domain to the domain blacklist??? Please post the result from
SELECT * FROM variable WHERE name LIKE 'linkchecker%'.Comment #9
Sylvain Lasnier commentedDamned,
< ? php echo ini_get('max_execution_time'); ? > is ok in a dedicated test file.=> No Apache & PHP issue
but return FALSE inside linkchecker_cron function => ?
If you update linkchecker_cron function as is :
$check_links_max_per_cron_run = ini_get('max_execution_time');
$check_links_max_per_cron_run = variable_get('linkchecker_check_links_max', 60);
and
$result = db_query_range("SELECT * FROM {linkchecker_links} WHERE last_checked < %d AND status = %d ORDER BY last_checked, lid ASC", time() - $check_links_interval, 0, 0, $check_links_max_per_cron_run);
it's work.
Does timer_read('page') function in while loop cumulate time?
Comment #10
hass commentedI removed variable_get('linkchecker_check_links_max', 60); for very good reasons (usability). ini_get() need to work. If not you will run in several issues with core. Check out of PHP savemode is disabled for sure.
http://api.drupal.org/api/function/timer_read/6
And linkchecker_link.status must be 1. Why is this 0 for all your links? You have two issues. One is that PHP is not configured properly and the second could be a cluttered linkchecker url blacklist configuration.
Comment #11
hass commentedComment #12
Sylvain Lasnier commentedStrange
safe_mode = Off
and I have no other issues on my multisite server.
About 0 value, it come from INSERT SQL requests. Extract of myslq.log :
1287 Query SELECT url FROM linkchecker_links ll INNER JOIN linkchecker_nodes ln ON ll.lid = ln.lid WHERE ln.nid = 3 AND token IN ('7997244e7efac3e9e4279f3265e0a17b')
1287 Query SELECT lid FROM linkchecker_links WHERE token = '7997244e7efac3e9e4279f3265e0a17b'
1287 Query INSERT INTO linkchecker_links (token, url, method, code, fail_count, last_checked, status) VALUES ('7997244e7efac3e9e4279f3265e0a17b', 'http://www.infoq.com', 'HEAD', -1, 0, 0, 0)
Comment #13
hass commentedThis status is only 0 if infoq.com is on your your domain blacklist! Keep the example.com/net/org domains in the list - remove everything else you have added and clear the linkchecker data with re-scan.
Comment #14
hass commentedCan you verify if you have max_execution_time exists in your php.ini? And/or if other modules may use any ini_set/get and may clutter the php config?
Comment #15
Sylvain Lasnier commentedmax_execution_time is set and < ? php echo ini_get('max_execution_time'); ? > is ok in separate php file.
Other modules use ini_get :
# grep -Ri ini_get * -l
includes/mail.inc
includes/database.inc
includes/bootstrap.inc
includes/common.inc
includes/unicode.inc
includes/database.pgsql.inc
includes/file.inc
includes/image.imagemagick.inc
install.php
modules/user/user.module
modules/system/system.admin.inc
modules/system/system.install
modules/color/color.module
sites/all/modules/click_heatmap/clickheat/config.php
sites/all/modules/token/token_actions.module
sites/all/modules/image/image.imagemagick.inc
sites/all/modules/linkchecker/linkchecker.module
About ini_set :
# grep -Ri ini_set * | grep max_execution_time
#
Comment #16
Sylvain Lasnier commentedI had a block with ini_get('max_execution_time') at the bottom of http://www.webstrat.fr/
Value is set
Comment #17
hass commentedIt's not a bug of linkchecker :-)
Are you able to investigate why ini_get() gives NULL/0, please? I'd like to learn how this is possible...
Comment #18
chadd commentedi think i'm having the same problem with max_execution_time being set but not carrying into the linkchecker module.
i added a line in htaccess to set 'max_execution_time' to 240.
outside of drupal php ini_get('max_execution_time') returns 240.
i created a page in drupal and ran the same php and it also returns 240, so drupal itself isn't changing the value.
i added a watchdog line to line 105 of the linkchecker.module:
when cron is run it returns:
why would the ini_get('max_execution_time') return NULL when called from within the linkchecker.module file but return a valid result both inside and outside of drupal on the same server?
is there a php setting i need to check? is this a bug in linkchecker?
i've run these tests both with and without the htaccess setting the max_execution_time and the results are the same, the only difference being that the time returned is 30 instead of 240
Comment #19
hass commentedI really have no idea... it works for me and many others... so we need to find out why it's not working. Try figure out that other core ini_get() functions return... this may shed some light if more things are broken, but you haven't noticed them yet.
Comment #20
chadd commentedi created a page in drupal and ran the ini_get('max_execution_time') and it also returns 240, so drupal itself isn't changing the value.
where else can i test?
Comment #21
hass commentedCan you try
and if you get something like "ini_set() has been disabled for security reasons"? I cannot help here - you need to find the reason on your box. It's not linkchecker... Aside - how do you execute cron - via wget or poormanscron?
Comment #22
hass commentedAlso check your apache logs... or enable debug logging in PHP...
Comment #23
chadd commentedi added the error_reporting(E_ALL); and i get nothing in my apache error or access logs.
i'm calling cron by hand (clicking the link on the status page). normally the cron is run on the hour via a apache cron job. the log shows:
Comment #24
chadd commentedi also noticed that linkchecker_links.status is '0' for all links.
i reset the module config to defaults and rescanned for links and they always get set back to 0.
could that be related?
Comment #25
Sylvain Lasnier commentedHorrible,
If I add this :
and run curl http://www/cron.php
I get this :
I don't know why max_execution_time local value is set to null and why it is set to 60 other function, like linkchecker_nodeapi for example.
I didn't find any ini_set('max_execution_time',...)
Perhaps ini_restore("max_execution_time"); is an valid hack waiting for beautiful solution?
Comment #26
hass commentedYou need to find out what is inside the "linkchecker_disable_link_check_for_urls" variable and why.
This all works and I have no clue what a crippled machine you are using there.
Comment #27
chadd commentedwhere do i find that "linkchecker_disable_link_check_for_urls" variable.
and how can i find out why it is being set as it is being set?
Comment #28
chadd commentedi again cleared my link data and re-scanned for links and this time it set all the status to 1, so that seems to be working.
but i still can't figure out what is going on with the max execution time.
i have that ini_get() running successfully both in and out of drupal and the only place i can see that it returns nothing is in the linkchecker module...
Comment #29
hass commentedSELECT * FROM variable WHERE name LIKE 'linkchecker%'
Comment #30
hass commentedI'm able to repro this on my linux box, but not on my windows.
Comment #31
hass commentedThis is really an annoying bug... I do not really understand it yet, but in
drupal_cron_run()we have this codeand *this* causes the value to become 0. Afterwards in all cron hooks you are no more able to get the 'max_execution_time' with ini_get(). I WONDER why this works on Windows, but not on Linux. This could be a PHP bug... maybe you are able to post your PHP version here, please. I'm able to repro this on PHP 5.2.9 (Linux), but not PHP 5.2.6 (Windows).
We need to figure out why this is broken - or how we are able to get the current set_time_limit().
Comment #32
hass commentedComment #33
hass commented$time_limit is undefined in drupal_cron_run() and is therefore undefined... looks like a core bug.
Comment #34
hass commentedThis core bug has been introduced by #193383: set_time_limit: Centralize calls and prevent warnings and errors
Comment #35
damien tournoud commentedYou don't have to know the max_execution_time, because it makes little sense by itself (that doesn't give you the number of second your script can run, but the number of seconds it was allowed to run when it started). set_time_limit() messes with that a big time: after a call to set_time_limit() this value has even less sense that it had before: set_time_limit(n) gives the script the guarantee that it can execute for n *additional* seconds.
Drupal 6.14 now guarantees each hook_cron() that it will be able to execute for at least 30s. If you believe that this is not enough for your use case, call drupal_set_time_limit() yourself at the beginning of your function.
Comment #36
hass commentedCurrently it run unlimited time.
Job queue und linkchecker are now broken!
Comment #37
damien tournoud commentedBy the way, you should take a look at the Job queue module, that would allow you to do that right.
Comment #38
TheRec commentedYup... seems like the review process for D6 was kinda skipped ;) Back to a litteral value... There was the same copy/paste error in node.module.
Comment #39
TheRec commentedI did not post it in the right issue I guess.. sorry, disregard the patch in here.. I'll post it in #193383: set_time_limit: Centralize calls and prevent warnings and errors.
Comment #40
hass commentedCorrected title.
Comment #41
hass commentedAdded a "Known issues" section on linkchecker project home and suggested to apply patch http://drupal.org/node/193383#comment-2060262. I leave this open for now in the hope that Damien is able to explain http://drupal.org/node/193383#comment-2061014 here or over there and in the hope that others see the case if they have the problem.
This case will become a duplicate of #193383: set_time_limit: Centralize calls and prevent warnings and errors afterwards.
Comment #42
hass commentedTODO: After some reading of other cases I think the module need to care for max_execution_time = 0. I'm not sure if someone would be soooo crazy to use this value, but in such a case the current linkchecker logic will break.
Comment #43
hass commentedMarked #583810: make "max links checked" a configurable option in admin as duplicate.
Comment #44
hass commentedPatches to prevent link check failure if 'max_execution_time' = 0 (unlimited) has been committed to D5 and D6.