A friend of mine doubts that Drupal can scale and so I wanted to hear from any on here on their real experience on very large websites. Can Drupal scale >1million users without issues?

Thanks so much!

Comments

kbahey’s picture

The question as asked is meaningless. Not to be dismissive, but let me explain:

Having a million users will make for a large database, but other than that, it all depends on what these users are doing.

Do they register once and then never visit again?
Do they register and post content (pages, comments, ...etc)? If so what percentage of those do that and how often?
Do they register and use a certain application (e.g. chat or something custom)? How often?

What is really needed is prototyping the application, and then doing a benchmark using ab or siege. There is a page in the handbook on how to benchmark Drupal.

Check the previous threads in the performance forum, there are past discussions on this.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

newms’s picture

I would think that drupal.org has, if not the most, then is near the top of the list, of registered users of any drupal site. drupal.org has > 100,000 users. Scalability shouldn't be a problem with drupal.

newms

cinquetooty’s picture

If I was serious about building a commercial site that would have that many users, I'd be VERY worried about the legal ramifications of Drupal NOT being able to handle it. Anyone commissioning a site of that scale will be wanting to generate a significant revenue from it and should be looking for a team with dedicated 24/7 support in place and should be prepared to pay for it too.

If it's a non-commercial site then the consequences are not so great, but you still need load balanced, co-located servers. At 100,000 users this site is pretty slow already.

I don't know if Drupal could handle it or not as I'm pretty much a low-scale developer catering for modest ambitions - but still I can't get Professional Indemnity Insurance at a decent premium (anyone in the UK got any suggestions?).

Just my opinion.

Ian

halfer’s picture

To be honest, everything's scalable. Drupal's just a web application, so you could put your users through a load-balancing proxy that routes users to a server cluster, which of course can be as big as you need. The MySQL database meanwhile could be clustered (though the discussion referenced below casts doubt on this) or put on a very juicy server.

A quick search on the net reveals this extensive conversation - I haven't read all of it, but it looks like a thorough treatment of this topic:

kbahey’s picture

There are some bottlenecks in Drupal, even if you spread the load onto more than one front end.

Say you do load balancing via DNS round robin on 4-5 Apache/PHP servers that mount the /files directory over NFS.

You still have ONE database. Every node view has to do certain things like write the node counters, the session table, maybe users table, the accesslog, ...etc. So, these parts cause some contention, and also some LOCKing.

However, many of these things are being worked upon:

The locking part has a patch for it http://drupal.org/node/55516. The users table contention is being split (so the last access time is in its own table).

There is also aggressive caching and fast file cache as well http://drupal.org/project/fastpath_fscache, and boost http://drupal.org/project/boost. There is also the the new pressflow preempt cache http://drupal.org/project/pressflow_preempt and there is a memcache effort too.

For some uses, Drupal is already fast. A site that I help technically run got 394,000 human page views in one day last week (7.85 million page views last month), and it does not even have the page cache on. This is a single dedicated server, not a separate DB backend. Nothing fancy, and it keeps on ticking.

Sites with special scalability needs will hire people to customize core for them to make it scale more, as well as other people to manage the infrastructure for them.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

Art Morgan’s picture

I've been wondering about Drupal scalability lately too, and logged on today to see if anyone has suggestions on how I can speed up inserts for new or edited nodes. I run a Drupal 4.7 site that gets pretty good traffic. We have over 20K users now, 20-30K unique visits per day, and up to 100k page views per day. We have two servers -- one for apache/drupal and one for the DB.

The site runs pretty fast for serving up content, but we have some problems now when people add or edit a blog post, or add a comment. Pulling up a page only takes a second or so, but adding or editing a post can take 30 seconds or more.

We have almost 20k nodes now, most of them blog posts, and about 90K comments. But I'm not sure if size of the node tables is the problem; adding a "page" only takes a few seconds, and while there are many fewer pages, both pages and blog posts use the same tables, as far as I can tell.

Anyone have any ideas on how to speed things up, or what I should check?

BTW kbayey, I just installed your nodevote module today -- looks great -- thanks for contributing it!

VM’s picture

in case they haven't been seen.

drupal server tuning http://drupal.org/node/2601
at the bottom of that link would be found
drupal mysql optimization = http://drupal.org/node/51263
as well as other optimization techniques.

kbahey’s picture

I think we had a similar problem a while back before we moved servers. I guess it is a cache clearing problem tying up the database, and causing a lot of locks making every wait. We run with the cache off for this reason, as well as a server specific reason. We did not need the cache so far anyways.

Try turning off the cache and see what happens. Yes, it is counterintuitive, but it is true: the cache can be a scalability issue for Drupal. In your case it may hurt something else, YMMV.

Another thing is to install the lock elimination patch, and switch to InnoDB on some tables (cache, session, ...etc.)

By the way, our stats are like this: 4,700 nodes, 29,000 comments, and 8,500 users.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

Art Morgan’s picture

Both of those changes helped a HUGE amount. Thanks!

The funny thing is I didn't even realize how much I was using the cache. I already had it turned off in the admin settings, and I thought that was that. Then I looked at my cache table and it was 75MB, w/ 60,000 rows. I realized that this was because of the default settings in the settings.php file, so I adjusted those and it made a big difference.

----
Art Morgan
ProgressiveU.org

kbahey’s picture

Did you implement them one at a time to see which one had the most impact? Or just did them all at once?

In these situations, it is better to change one thing at a time, do a reassessment, then change the next thing. This way you know exactly if one is responsible for 100% of the gain and the other is useless, or they are 50/50, ...etc.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

Art Morgan’s picture

Very good point. And alas, it appears I spoke to soon -- things soon got worse again, so it turns out neither change really helped in the long run. Let me explain what I did so far:

I had actually turned off the Drupal page cache (the one available under admin -- I think that is called "page cache") a few days before I posted my comment. But I thought that maybe it had cluttered up the cache table, so I decided to truncate the cache table and see if that helped. It seemed like page loads were faster at that point, but it did not help speed up adds of comments and posts.

So, half a day later I applied the no-lock patch. Initially things seemed faster for adding posts. That is when I joyfully added my comment here on Drupal saying that the changes had helped. But a little while later I tried adding a comment on ProgressiveU.org, and found that things were still bad for adding comments. I should point out that I have not yet tried to switch from isam to InnoDB tables. I read that takes a lot more disk space, so I wasnt ready to do that yet.

Meanwhile, the cache table kept growing and growing. Within a day it has been growing to 75MB and 60,00 or so rows. I was confused, because I thought I had turned off cache! I found other posts on drupal.org that explained that you can't turrn of cache completely -- it's just the page cache that you turn off, but I'm still worried that I'm caching too much.

Looking at what's in the table, it is almost all filter entries. Can anyone point me to an explanation of what those do, and how you control that aspect of the cache? In my case this is mostly comment snippets, and sometimes posts.

I also installed mytop (an alternative to mtop). For the most part I don't see anything unusual, but from time to time I do see "Query LOCK TABLES cache WRITE," indicating (I think) that the no-lock patch isn't really stopping the locks that I need to stop.

Another thing that one of my users pointed out: when you add a comment or post, if you wait a few seconds and then click STOP on the browser (virtually all posts and comments hang the browser for 30 seconds or more now) and then refresh, your comment or post has gone through. So it's something that happens after the update to the comment or node tables that is hanging things.

My watchdog tables don't show any unusual errors, so I'm not sure what to do next. Is there something I should be monitoring on the apache side that would tell me where things are hanging?

Sorry for all the newbie questions. We are well over 20,000 users now, and I am confident we can get to 100,000+ users and several million page views per month by the end of the year -- but only if I can get this optimized for multi-user/heavy posting scenarios.

Art Morgan’s picture

I just read about this workshop on scalability featuring Dries:
http://2007.oscms-summit.org/node/43

I'm planning to go, but I can't wait until March 23 to figure all this out. We're expecting a flood of new traffic starting February 15. Could be chaos if I don't figure this out soon...

-------------------------------------
ProgressiveU.org
"Like MySpace...but for smart people"