As a user, how can I know what settings to use for the cron crawler such as "crawler batch size," "number of threads," "crawler throttle" and "number of URLs to grab"? I checked the handbook page at http://drupal.org/node/545908 and the readme.txt but didn't see any information on this.
I'm finding that the site is working a lot faster for anonymous users but the server has crashed a couple times and it can be really really slow for logged in users. So, I want to optimize it as much as possible. Actually, the pages for logged in users sometimes load normally and sometimes load really really slow.
Comments
Comment #1
mikeytown2 commentedIt's probably loading slowly when the crawler is running. Do you get normal load times when the crawler is not running?
Number of URLs to grab - Lower this if MySQL is getting hammered by boost.
Number Of Threads - Set this to 1 since your logged in users are noticing a slow down.
Crawler Throttle - With 1 thread if its still slow, set this to 0.5 seconds; go up to 5 seconds.
Crawler Batch Size - Only need to change this if crawler is having issues.
Comment #2
OneTwoTait commentedWith crawler disabled, it still seems pretty slow while logged in. However, sometimes pages wouldn't even load in a minute with crawler enabled and I'm not seeing that now. That would only happen 10% of the time when crawler was enabled. It's rather hard to test because the speed goes up and down.
Perhaps one of these facts is relevant.
- My server shows a large increase in "CPU Usage." I remember seeing CPU usage was 2% before and it is usually over 100% now. There's plenty of memory though.
- I also have Memcache installed. It was fine with Memcache and no Boost.
- I see that it isn't as slow using a user with restricted permissions, but it is quite slow when using the administrator account.
- This is on a live site with about 10,000 page views per day. So, I guess you could say it is being thoroughly tested here.
Comment #3
mikeytown2 commentedI have it running on a site with over 20k a day; glad to hear of more high traffic sites using boost!
Like any drupal module there is a slight overhead for each module installed, but this doesn't sound like the issue your describing. The crawler will use close to 100% CPU because it is hitting every single url as fast as possible, one after another, in order to have a hot copy of the site in the cache ASAP. This allows for your first hit to be from the boost cache then. If you have a short cache lifetime, the crawler could be running non stop in short (because it never finished its previous run). Be aware of this, you might want to adjust the expiration time or not use the crawler.
More details would be helpful in terms of figuring out the source of the issue.
Comment #4
OneTwoTait commentedCache lifetime is set at 18 hours.
On the "Stop crawler" button, I always see that there are some "URLs left" - usually 1 or 2 thousand, but sometimes 4 thousand something. I've limited the number of URLs to grab to 5000.
There are currently about 26 thousand pages cached according to the "clear all Boost cache data" button.
Yesterday I re-enabled the crawler but reduced the number of threads to grab from 2 to 1. It seems better... but then again, it's hard to tell. Let's just say there hasn't been a majorly slow period for myself (as a user logged in with the administrator account) today.
I've also now set the crawler throttle to 500,000 (0.5 seconds).
CPU usage is definitely down now.
Comment #5
mikeytown2 commentedWith your settings & site parameters it will take about 4+ hours to crawl your site. Let me know if you encounter anything odd.
Comment #6
mikeytown2 commentedreopen if your still having issues
Comment #8
Ela commentedthanks for these tips! My site was also crashing while running crawler.. I applied some of these changes and it seems to work a lot better :)