By Mombee on
Hi,
I’m looking at building a site that runs the following predicted metrics each month:
• 320k visits
• 200k unique visits
• 5m page views
• 14k uploads
• 16Gb total uploads
I'm keen to use Drupal and have done a fair bit of reading about tuning for higher volumes (using Memcache and the like), but are these figures pushing the boundaries with Drupal?? We'd be using Drupal 6.0.
Cheers.
Comments
There are sites with similar volumes
A fair amount of server architecting and tuning is required. Certainly achievable. The memcache module needs porting to D6 (something I've urgently wanted to do for a while)
- Robert Douglass
-----
my Drupal book
Hi Robert, Assuming that
Hi Robert,
Assuming that we’ve dealt with the ‘standard’ server issues (i.e. bandwidth, CPU, memory, IO) effectively with dedicated hardware, are there any specific Drupal tuning that we should focus on and get up-to-speed with?
I know that Cron jobs can be an issue, but I suspect that’s unlikely to affect us. Simultaneous users could be a Drupal problem, but my understanding is that Memcache will help there.
Cheers.
high performance is for you
Hello Mombee,
I think you might like to join the High Performance group:
"This group is dedicated to solutions and approaches for high traffic, high performing Drupal sites. As such, it will deal with a lot of information around the rest of a typical Drupal "stack" -- the operating system, web server, database, and PHP tweaks that combine to support the Drupal application."
http://groups.drupal.org/high-performance
There's a lot of good help
There's a lot of good help in the high performance group. I just finished moving a high traffic site from WP to Drupal this weekend (crooksandliars.com). We average about a 1/4 million visitors a day and 400,000 page views. Yesterday while monitoring we had periods where we had 6,000+ guests and 50+ logged in users on the site and the servers were sitting almost idle. We also had 1 hour where we hit over 30,000 page views and the servers never broke a sweat.
While on WP these kind of hits brought us down instantly, even with WP caching plugins. We average about 2,000 comments a day on the site, so that creates problems. One of the biggest problems was flock waits on the cache files for WP, since they do all their caching via the filesystem. On high traffic sites with constantly updating content, this is something that shouldn't be used.
We are running off two quad core machines. The first one has 8gb and handles Apache + memcache. The second also has 8gb and handles MySQL plus our static content. There are some hacks you can do to the core in order to get all the static content served from a static subdomain (and I think plans to include this option in future versions of Drupal are in the works). Basically how it works is we have an rsync script firing every minute that updates the static directory tree on the static server with the entire Drupal directory structure. We use excludes so all the files PHP, INC, MODULE, etc files aren't copied over. I also have a mod_rewrite in the static directory that checks if the file exists - if not it rewrites it back to the Drupal server. This helps when we have a lag running on the rsync, especially with things like javascript and css aggregation files.
For memcache, we are using the cache router module. This thing has been a lifesaver. Right now we are using the most basic configuration for it (a single cache bucket and one server). As traffic grows we will be expanding that.
Another hack I ended up doing to the core was the path lookup. I have it ignoring a bunch of paths like the comment edit, admin and other ones that won't be aliased. I also have the other paths caching in memcache, which greatly reduced the number of queries we need.
I keep our own Drupal with hacks under version control so I can easily create patch files and apply them when new versions of Drupal are released. Same goes for any contributed modules we are using, which is only a couple
Depending on your module needs, it can be beneficial to weigh the needs of the module versus writing your own. Drupal has a lot of great modules out there, but a lof of them also end up using a lot of needless resources. This isn't a fault of the module designer, but rather an evil necessity of having modules that can be customized to fit any site. If it's something you can write easily enough, do so. I ended up with about 12 custom modules for this site. Most could have come from contributed modules, but it was much more beneficial on the server end to write my own tailored to our exact needs.
Also do check out the high performance group. The information there is invaluable.
---------------------
HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.
Depends on your hardware. I
Depends on your hardware. I have a drupal site with about 300k visit, 6 million page views and it is handling it fine. However that is including the forum, which take up 1/3 of the traffic.
If you have enough memory (I have done this with 1 gig, but 2 gig is much better) and a some what modern processor (anything better then an Atom). Running php accerlator of some type, you should be fine.
It also depends on how much modules you use. Views, and heavy dependency on search can means life or death by themselves. Bandwidth would probably be the least of your worry.
The caches helps -a lot-. Especially php accelerator. The block caches and page caches of Drupal only help maybe 5 - 10% from my experience on normal setting.