Would it be possible to have a setting that allows for some cron jobs to be run using a different crontab task than the main call to cron.php or supercron?
The search task, even when multi-tasked, bogs down on my site, and sometimes causes cron to fail. It'd be nice if I can separate out that task from the rest and have it run every few hours by itself.
Comments
Comment #1
MisterSpeed commentedI'm having some significant issues with Search as well; it's an area of Drupal that could benefit tremendously from some new engineering, I like the idea a lot though; a few options come to mind:
1. We could define sets of calls and let supercron.php start any set individually; or
2. We could just make it so that supercron.php can take a module= parameter (it does now but a changing safety token is present to avoid an outside user, say, call up search indexing repeatedly until the site dies; we could just publish a magic URL with a task-specific constant safety token though, and the whole URL could be listed in crontab as is even when the cron handler is disabled in SuperCron's main job list)
I like the first idea to some extent on a conceptual level, but #2 might be more realistic as there are few "clusters of cron tasks" that one could think of defining at this stage in Drupal 6 history (why do I have a feeling I will regret having said that ?) and implementing that would be a lot simpler. What do you think ?
Comment #2
geerlingguy commentedI like #2... and if someone is dedicated enough to know that their search indexing problems have to do with cron, I don't think you'd be stretching their abilities too far in having them set up a unique URL.
For my site, search and the ed_classified cron tasks are usually culprits in any problem, and to be able to move those off the main job would be a godsend!
Comment #3
MisterSpeed commentedOk, we'll fast-track that; I could use that too.
Just to whet your appetite: the next project we'll likely publish is internally called SuperIndexer; we are looking at why indexing takes so long and needs to be redone so gratuitously often. It'll take a while though to get something out.
Looks like we are facing the same problems !
Comment #4
geerlingguy commented@ 63reasons: I guess so. I was at my whit's end on it, and supercron is helping, but every time new content is added, the next cron run goes belly up, and it's always during the indexing.
I switched the search tables to innodb, thinking that might help (with all the locking that goes on), but it really didn't affect things that much. I guess having over 18,000 articles (all with 500-1200 words) can tax the indexing engine!
Comment #5
mikeytown2 commented+1 subscribe
Looking for this feature so I can recommend it with boost.
Comment #6
geerlingguy commentedBoost + this + some good server hardware can make Drupal fly.
Comment #7
MisterSpeed commentedIt's in the CVS; pls. try it and let me know how it goes ! I've also added a configuration page to help users figure out the proper format for the crontab command lines. I think this could look a lot better though.
Comment #8
geerlingguy commentedI'll grab it next week (gone part of this weekend), and I'll give it a try. Once I've used it a little, I'll give any suggestions I might have for an improved UI.
Comment #9
piloro commentedThat would be great.
Elysia cron (http://drupal.org/project/elysia_cron) does that but lacks some supercron features (output capture).
Comment #10
geerlingguy commentedCould you make a -dev release so I can grab it from there? I'm not too comfortable with CVS, and if the -dev branch is stable enough, it might encourage others to test as well.
Thanks for your work!
Comment #11
MisterSpeed commentedYup; I'll wrap in some ideas to get rid of that ugly new tab in there and post something, hopefully next week
Comment #12
shunting commentedActually, I like #1, and I had the indexing scenario in mind; for really complex and recursive forms of indexing, bundles or sets of cronjobs could be very useful. For example, I might want to schedule a series of lucene tasks that have dependencies.
So, while I think better indexing, via SuperIndexer, is always good, I also think that the type of index (hence indexer) chosen will depend on the content and the use cases to be approached.
Comment #13
geerlingguy commentedI haven't yet been able to test this module - and, unfortunately, I can't connect to CVS from work (the port is blocked). Could you possibly update the latest -dev release to HEAD, so I can test this functionality?
Comment #14
MisterSpeed commentedYup; we are finishing work on SuperSearch* then merge all existing changes (including a cool self-updating view of timings and stats so you don`t have to reload the page) right after that is done.
* Check it out, it will resolve a lot of search indexing issues. For one it indexes much faster -- we get metrics of 11x to 17x indexing speed-ups. It also solves a fundamental flaw in search indexing that we discovered on customer sites. Search asks you to take a wild guess on something you should never have to think about: how many items should be indexed between cron runs ? That number is almost always wrong. Aim too low and indexing slows to a crawl. Aim too high and a huge problem occurs: as one instance of a cron run is finishing its indexing and overshooting the cron call period, another starts and messes up the first one's work, crashing in the best cases, or, in the very worst case, silently messing up your index. So SuperSearch does this: it asks you how long it should try to index content for (e.g., what's your cron call periodicity) and tries to fit as much work as possible within that period. On the off chance that it tries to index one node too many, the next cron call will get this message: hold your horses and come back in a minute, we`re not done here yet. So at the very worst you lose one cron call's worth of time where not as much work ends up being done, instead of wasting your full index.
(Plus it caches results, allows theming, has reordering and search-enrichment hooks, an SEO magic wand, and all sorts of goodies; we need testers and new ideas so if you care to pls join the effort)
Comment #15
geerlingguy commentedGasp! That sounds awesome; sign me up!
So... just wondering - is there a project page yet?
Comment #16
MisterSpeed commentedYup, here it is:
http://drupal.org/project/supersearch
We are still battling some strange drupal.org CVS gremlins, but the first package should be up shortly.
Comment #17
MisterSpeed commentedThis feature is now implemented; pls. try 1.4-beta1 !