I've been evangalising PHP/MySQL in general and Drupal in particular for the basis of a website that's currently built on a mashup of technologies (PHP/MySQL/Java/JSP/Oracle, etc). We're currently doing between 5 and 6 million pageviews/month (around 200K / day), ans will be adding a bunch of social networking functions over the coming 6 months. We're achieving 99.9% (ish) availability (incl scheduled outages), with an agreed target of only 97%. My current (beta) tech strategy says that we'll move to PHP/MySQL as our sole environment, and is being met with various levels of acceptance up and down the food chain - but seems likely to be accepted and implemented at the moment.

We're trying to get all development into a single framework/CMS so that we can focus our code developers into a single channel, rather than the number of things we're currently doing. We'll be looking to contribute back to the community once we're up and running, either through patches, modules, doco, or any/all of the above.

Has anyone had experience in actually delivering infrastructure that'll cope with these types of loads? I know Wikipedia and the like are doing it with heaps of hardware and all, but I'm looking for experience in the Drupal space in particular. I'm guessing that theOnion and AirAmerica are probably pulling way bigger numbers, but I can't find much information on how they're built under the hood.

Any/all hints welcome (and, yeah, I've read all the scalability/perf stuff in the handbooks and all, and that all seems eminently sensible).

Things I'm looking at doing:

  • Load-balanced web-farm of dual xeon-based Dell websevers, 2GMemory (minimum of 2, preferably more), Centos 4.4, Apache 2.0 with mod-deflate. Cisco hardware load balancer(s) with hardware SSL support.
  • Gigabit (or maybe FC) connected shared disk for static content/file cache(s).
  • 4-way 3GHz+xeon-based MySql engine, 2-4GB memory, Gigabit connected to the web farm. Database replication to standby server. Possibly MySQL clustering, but haven't had experience with that yet.
  • Dual redundant 100Mb internet uplinks (shared with a much larger organisation, we've got plenty of bandwidth, though).

Any suggestions on modifications/upgrades/whatever to the above would be gratefully accepted.

Comments

Robardi56’s picture

Hi,
I won't directly answer your post about hardware, but here is a hint: there is a file based cache system that can dramatically help big site like yours. I don't know if it is supported by 4.7 or only upcoming 5.0 drupal, but this is certainly something that you should investigate.

Also, I read that drupal 5.0 will have lots of performance improvements.

Brakkar

drawk’s picture

I believe Brakkar is referring to fastpath_fscache http://drupal.org/project/fastpath_fscache.

A couple caveats though ... the primary one being that it isn't production ready. There is new aggressive caching in 5.0 which introduces a significant performance boost.

---
www.whatwoulddrupaldo.org

bradsw57’s picture

Cool - thanks guys. We were gonna look at a bunch of caching strategies, including a couple of squids out front, page and block caching, php acceleration, mysql query caching etc etc. If there's FS-based caching in 5.x, that'll help even more :) There may be some stuff that we can got out of the Zend platform(s) that'll help too, but I'm not convinced that it'll be well enough integrated with the Drupal core to be reliable.

I'll have a look at the fastpath_fscache to get a feel for how and what it does :)

jamesJonas’s picture

If you wish to kick it up, Squid is an alternative that may need to be considered. I will also say that it will demand some work as configuration is not simple. First, don't bother with version 2.5, move directly to 2.6. 3.0 is not yet production, but many of the 3.0 features were backported to 2.6. Next, as far as I have read (and I have not read everything in the drupal site) there is a very large misconception about caching strategies. The focus seems to be on caching strategy according to user (anon or registered). What is important is the state of the content (not who is the user). The event of significance is Update and Delete. New content is automatically picked up by the cache. During the update or delete event of a nodes you need to perform a PURGE via squidclient. This deletes that object from your cache and is a command line call.

I just hacked a highly site specific squid.module and ran my first test using 'ap' and was hitting 500+ responses per second. Before you get to excited, this is a very idealized test. I was only going against a single URL and was using localhost. This means connection time is 0 and the object is moved into memory, thus no disk access latency. But I am happy with the potential. Since I'm deleting the object when my node is updated or deleted my squid cache does not care who is the user. Again, the state of the content is what is key. I hope this is helpful.

- james

testhosts’s picture

SpreadFirefox also runs Drupal, and it must handle a really big load.
But I don't know how many modifications they made..
I'm also anxious to test Drupal 5.0 ...

alex_b’s picture

I just put online resultcache, a cache that processes heavy functions at cron time. Helps us a lot with expensive operations on things like tag clouds etc.

alex

-
former username: lx_barth

Robert Castelo’s picture

There's a section in the handbook about performance and scalability:

Tuning your server for optimal Drupal performance

Note also that there are some fairly busy sites also running Drupal: MTV UK, The Onion, Aint-it-cool-news.

Hope that helps.

Cortext Communications
Drupal Themes & Modules

------------------------------------------
Drupal Specialists: Consulting, Development & Training

Robert Castelo, CTO
Code Positive
London, United Kingdom
----

kbahey’s picture

This is perfectly doable.

I have a client with a site that peaks to 179,000 page views per day (97,000 average) WITHOUT any caching.

So far we do not need to go to tiered server farms, just one big box.

See more details here.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

bradsw57’s picture

Thanks all - that's pretty much what I was hoping/expecting. A bit of careful configuring and system optimisation and we should be able to handle the load without batting an eye.

Cheers

Brad

markus_petrux’s picture

Hi,

I believe you should also consider how much content the sites is about to host. With little content you may just survive serving from query cache and/or buffer pool. However, if the content or the update/insert activity grows then, dynamic processing time or disk access may become a bottleneck.

I'm currently managing a site that has 45 million pageviews a month, with a propietary CMS (meristation.com). First, I've been trying to optimize the forums, that are phpBB based. Now, I just started to take over the inners of the CMS itself. Here, almost everything is generated off-line. However, that makes the logic of the processes quite hard to maintain and, because the way the CMS works, problems are difficult to isolate when they arise. So, since this is a propietary CMS, rather than rewriting it, I'm currently trying to optimize the weakest points and planning to migrate to Drupal sometime in the near future. However, I believe I'll wait to see how the caching techniques in Drupal 5 evolve.

It would be nice to see a bit more experience from high traffic sites that also have a lot of content. :-)

Bests

Doubt is the beginning, not the end of wisdom.

Dries’s picture

Drupal.org does 8,000,000 page views a month (260,000 page views a day). However, the load on your servers will not depend on the number of page views but on the effeciency of the caching which is determined by the number of concurrent authenticated users, the number of nodes and the frequency of updates (cache flushes). This is explained in the handbook, but also on my personal website (http://buytaert.net/drupal-webserver-configurations-compared and http://buytaert.net/drupal-vs-joomla-performance). Page views is not a good enough metric.

bradsw57’s picture

Thanks Dries. The site I'm referring to (just realised I'd not posted the url anywhere! *duh*) is http://www.ourbrisbane.com. We've got about 5,000 "static" content pages, about 3000-ish event calendar-type entries and between 25 and 30K directory entries, as well as webmail and stuff (that's handled by a different server farm using a somewhat customised version of the horde stuff).

The bulk of the site is not currently personalised/personalisable in any meaningful way - which is one of the reasons for moving to a CMS/framework like Drupal. Our current mash of platforms and technologies makes it challenging to do anything in a reasonable timeframe and have it both supportable and reliable. We don't support much in the way of interactivity (like forums and comments and polls and things), and we're not real big on building an active user community, for a range of historical reasons. That, however, is planned to change and we'll be putting up a whole raft of more engaging offerings and hopefully get a bit of a real user base happening over the next while.

As a consequence the number of concurrent authenticated users is very low at the moment, but it's one of the things we're hoping to increase. That, of course, will mean that we'll (possibly exponentially) increase the load on our infrastructure as (as I currently understand it) the caching'll become less efficient as we increase the number of concurrent authenticated users we're serving. Just how far and how fast the grows, however, is something I don't have a real feel for at the moment.

In the worst case, I'm guessing that we could split the site physically across a number of federated Drupal instances (each with their own infrastructure) and use something like the Drupal module to handle the a single-signon setup for us, so that users don't understand that they're using a whole bunch of different instances. I'm hopeing that we don't have to go that far tho :) (Our current user management system is custom code in java, backed by an oracle database with triggers that manage a shadow MySQL database for the horde webmail system, so we're not afraid of hacking things together. Hopefully, though, if we hack things in the future we'll be able to do things in a framework that'll make them useful to more than just us - one of the reasons I've been looking for a well docc'ed and active framework and system like Drupal appears to be).

Cheers

Brad

deplifer’s picture

Read your articles.
But when facing a lot of concurrent users how would you split the load on the servers ?
How does drupal.org split the load on the servers .
Do you split mysql php apache on different servers ?

wmostrey’s picture

Hey Brad,

We at RealROOT have a set-up much like yours running currently. We have one Drupal installation with a couple of sites running on a webfarm. We connect to a central fileserver using NFS for static content/cache. We have about 180.000 unique visitors each day on this webfarm and Drupal 4.7 is coping more than fine. But as Dries said, more important here is the number of concurrent authenticated users, the number of nodes and the amount of updates.

bradsw57’s picture

Thanks :) All I'm hearing is sounding more and more positive. We're currently presenting the proposal to our exec management team (marketing, content operations, etc) for their endorsement. Given that the driver for moving to something like Drupal has come from the strategies that they've prepared for the next couple of years, I'm hoping that it'll be a relatively easy sell. Then it'll be a few months of verrrrrry steep learning as we basically re-architect what we've already got into the brave new world.

I'm thinking about cache flushing, and whether it'd be possible to have nodes cache themselves (kinda like blocks, I guess) and delete their cached copies on when they're updated, then use a "file not found" exception type of thing to re-cache when someone requests their content. That way, the frequency of cache misses would be completely dependent on the frequency of update, not (necessarily) on the number of concurrent authenticated users. Of course, it's reeeeaaaallll early days for me and Drupal at the moment so all of the preceeding may up utter garbage :))

Cheers

Brad