Hello.

I have a site that from time to time give me this error:
Boost: Crawler is already running. Attempt to start crawler failed.

I have check drupal.org about it, and I don't find nothing to check the time of the crawler. I run the cron every hour, but maybe because we have a lot of nodes (around of 6000) can give the time out?

Thanks

Oskar

Comments

sillygwailo’s picture

Category: bug » support
kollo-dherbois’s picture

I get same report: "Boost: Crawler is already running. Attempt to start crawler failed."

mikeytown2’s picture

If you have a lot of nodes and your server can't generate that many pages within the timespan of your cron run then the crawler is already running next time cron runs. Thus you will get the error "Boost: Crawler is already running. Attempt to start crawler failed."

kollo-dherbois’s picture

mikeytown2, I have no more than 20-30 nodes, very simple site on Drupal oriented fast hosting...

kollo-dherbois’s picture

Would it be if I didn`t configured cron сrawler for Boost properly?

I just enabled it here /admin/settings/performance/boost - "Boost crawler: Enable the cron crawler". Settings are:

Crawler Throttle: 0
Crawler Batch Size: 3
Number Of Threads: 2

Boost - HTML - Default maximum cache lifetime: One day

(and - automatically run cron via Poormanscron is set to: every 12 hours)

fehin’s picture

Hi kollo-dherbois, did your settings solve the problem for you? I'm also getting the same message.

achton’s picture

Same here. Subscribing.

marios88’s picture

Make sure the crawler is not getting any 404s or permission denied (crawler runs as anonymous), also set your config to

Crawler Throttle: 2000000 (2 second delay increase as needed)
Crawler Batch Size: 15
Number Of Threads: 1

and make sure you have "Do not flush expired content on cron run, instead recrawl and overwrite it." unckecked

yan’s picture

I'm using the settings mentioned in #8 for a site with a couple of thousand nodes. I get the error message "Crawler already running" very often and a couple of problems with boost. I have cron (for boost) running every 15 minutes. But I don't really understand what settings are right for a large site - is the whole site crawled on every cron run (that would mean that it was better to have it run less often) or just the nodes that are expired (which would mean shorter intervals for cron)?

Anonymous’s picture

That's a tricky question as it depends on the site and for example whether the pages change frequently and whether expiry is needed for user comments being added. The message that you are receiving is not really an issue unless the crawler is doing nothing at all, it's just a warning that "maybe" your cron settings are too short but on a frequently changing site that would not be the case. (especially as cron also indexes the site for searches too).

When a page expires it is added the queue table (and the family) which is then done on the next cron run for anonymous users, it may be that you need to examine that table to check exactly how many items are queued at any time. In version 7 (currently my version 6 vm is down) then you have a maximum time for the crawler to run e.g. 30 seconds, so unless your entire site can be crawled in that period, it is not going to happen in one cron run. You need to strike a balance between expiry, amount of time allocated to crawling (and remember that anon users are going to generate the file instead of the crawler), length of time in the cache. The really slow sites are the one with no anon users and not many of them, because the db tables are not in memory and so there are a lot of disk reads to create a cached page that may expire before a second visit.

Sorry that there is no perfect solution, I'd examine the kind of traffic, but you are not exactly timing out, you are just assigning other crawled pages to the next cron run after X amount of seconds. Unfortunately in Drupal 6 the cron is very complicated, I'd put the length of time your pages are in the cache up to start with so that the crawler is not trying to refresh every one of those 6000 nodes.

plato1123’s picture

I see this error is shown to anonymous users too. I thought we agreed that's a no no:/

Anonymous’s picture

I see this error is shown to anonymous users too. I thought we agreed that's a no no:/

Would this error be appearing on a boost cached page ? or in a message block