Closed (won't fix)
Project:
XML sitemap
Version:
6.x-2.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
5 May 2009 at 22:47 UTC
Updated:
28 Apr 2010 at 22:18 UTC
Jump to comment: Most recent
Comments
Comment #1
dave reidmax_execution_time is not a reliable variable. For now these should be no problem with the 'batch size' setting because what if we want the xmlsitemap modules to run equally on cron. Using this logic, the first module will use all the time, and not leave any time for the rest. I do think I want to research into this a little more, so I'll mark as postponed since it's not too high-priority.
Comment #2
dave reidPlus, this is a feature request. :)
Comment #3
hass commentedOnly for my interest - could you explain me why
max_execution_timeis not a reliable variable?This code make sure that a module B stops processing after half max exec time is reached... if module A runs longer than the half exec time - above B cron cannot execute as half of the max exec time has already been reached... B needs to run it's cron tasks the next time cron have more time...
Comment #4
dave reidWell, the only major thing I thought of is that max_execution_time can be 0 (default for PHP CLI). What should happen in the case that PHP has no limit? Should the module process all nodes on a site with 10,000,000 nodes and takes 30 minutes?
Plus, by using a limit, we are helping save database bandwidth by only getting x number of items, instead of running a full query. Again, on a site with 10,000,000 nodes, that's not a good thing to be doing.
Comment #5
hass commentedI do not expect that a site having 10,000,000 nodes have a sysadmin who is such stupid not having an PHP max exec limit... and the default PHP value is 60 seconds that is often overridden by core/modules to 240 seconds. Cron tasks override to 240s if
ini_set()is not blocked by php config. There are a few places in core where max_execution_time is set to 240s... you could also set an more appropriate setting if 240s shouldn't be enough for xmlsitemap.Aside - I do not expect that core works well with sites having 10.000.000 nodes and high traffic... we see how slow it often works in d.o with only ~500.000 nodes... But this is OT here and it couldn't be wrong to develop for hypothetically 10.000.000 node sites :-)
Comment #6
avpadernoWhat Dave said about the 10,000,000 nodes is clearly an exaggeration, but the point is still valid.
The current setting does make more sense than a setting where the user is asked to set the time execution.
Comment #7
hass commentedWith the above code we do not need to ask the user again to make an assumption that may often fails and cause other issues. The current setting is hard-wired over all cron tasks of a day and sometimes cron have to do more work and sometimes less. If there is more other work to do - it might fail.
Take a look to #386612: Prevent possible cron failures caused by too many link checks how difficult it could be to calculate an appropriate value and how difficult it is to use a good configuration that wouldn't delay a complete link check to finish in weeks (linkchecker module) and what users do if you give them such a setting (cron overload). A "save" value that should make cron never time out is 10 (for linkchecker), but with the new approach the module can check ~120 links (or more) without any cron failure. If I transfer this to xmlsitemap the module may be able to add/rebuild only ~10 links safely per cron run, but with the timer check you might be able to add 1000 (guessing).
Users don't need to setup a max time execution with the above approach - it's already defined in the php.ini
PS: It was planed to add job_queue module to D7 core... if this will happen this logic goes into core.
Comment #8
dave reidThe Job Queue API has been added to core (see http://drupal.org/cvs?commit=207566), but there is nothing regarding cron and using timers to run. For right now, I think it is wise to follow the core standard of search module that has a defined maximum number of nodes to process each cron run. However hass, I don't want to give you the idea that I'm completely disregarding your idea. It has good potential, but it needs more discussion.
Personally I don't think we need to worry too much since the linkchecker cron code has to run cURL requests which take longer than just saving a sitemap link to the database.
Comment #9
avpadernoI perfectly agree with Dave.
Comment #10
Anonymous (not verified) commentedPercentage of cron execution is used by FeedAPI and maybe others. Number of rows to process is used by pathauto and others. I myself prefer the percentage of cron time but there is no master control for that number such that one could end up with 200% of cron time which then becomes meaningless. Therefore this has to become "won't fix" until there is some API methods in core to control percentage of cron time which won't be in the 6.x series.
Comment #11
hass commentedDave, how can you say that it's possible to add - let's say 1000 links? If linkchecker and other modules consume all the available cron processing time the job fails... - if no other module have cron tasks it will work... but only in such a situation. If you think about high performance module such a solution need to be implemented.
Thank you very much for the D7 link - I wasn't aware about this commit.
Comment #12
hass commentedOnly to note - the patch in http://drupal.org/node/410656 seems to be the followup and have this code:
Comment #13
Anonymous (not verified) commentedI don't see the Job Scheduler patch being backported to 6.x. I still say this is a won't fix.
Comment #14
hass commentedTherefore you need job_queue or implement your own solution that works like above. Don't close cases you do not understand, please. Dave said "postponed".
Comment #15
avpadernoWe should not start a battle of status changes.
So far, the 6.x-2 branch is business of Dave, who is able to decide by himself what he thinks to do in that branch. If he needs a opinion from us, he will have all the support he needs; in any cases, I think it's just him who can decide if this report can be set to won't fix.
For what I can see, Dave set this report to postponed, and until he doesn't change his mind, this is the status the feature request should have.
Comment #16
dave reidI'd like to be able to consider a lot of different solutions to problems/improvements in the 6.x-2.x code. For now, I'd like to mark these as postponed since the major focus is getting the modules all ported/rewritten. Once I can get to an alpha release, I'm going to start going through this issues more in detail. I still appreciate any input or patches that happen in the mean time.
Comment #17
dave reidBecause of the unreliability of checking max execution time, especially now that we have Drush integration, I'm marking this as won't fix. I'm potentially looking into using the native Queue API in D7, but that really doesn't have much to do with this issue.