We all know that Drupal can scale to handle large sites such as Drupal.org, SpreadFirefox and MTV Europe. The good folks at Lullabot have put up some great slides discussing scalability and performance.
Now without getting all technical, I thought it would be useful to get a discussion going to come up with some sort of "idiots guide to Drupal scalability" so that newcomers to Drupal would have a better idea of what their hardware requirements will be. I don't want this to turn into a "tuning Drupal thread" - rather, just to get a rough picture around out of the box performance. I know there isn't an easy answer - it depends on modules installed and various other factors but I'm sure we can come up with something generic.
So here are some questions for you:
1. How many users before a site is considered a large drupal installation (1000, 5000, 10 000....?)
2. How many nodes before the DB is conisdered big?
3. How many online users at a time before a site is considered a busy Drupal site?
4. At what points for the above questions would you moved from shared hosting to VPS to dedicated server to multi-server environments?
These questions are based on the growth of one of the sites where I am involved in - I have over 10k nodes, 8k users and generally 100 users online at a time during the day. When my online users grew from 60 to 100 I found that mySQL CPU usage was going through the roof and that the site became extremely slow on my VPS with 256mb RAM. I upgraded to 512mb but this hardly helped. I then upgraded from Drupal 4.7 to Drupal 5.1 and this helped a little more but I am running with a few views turned off.
Comments
4. At what points for the above questions would you moved...
Great idea for a topic.
I can only think about this question at the moment...
Because I have not used Drupal on many high traffic sites - yet.
I would start out with dedicated servers, with 12ghz of processing power and at least 2gb RAM
If the bandwidth is good then that setup should be fine for quite a while, but I dont have any quantitive figures to answer your other questions at this time. I would love to find out though - better be prepared.
It would be great if SpreadFirefox or MTV Europe could share that data, as - not many websites are successful enough to actually gather it.
really good idea.
This is a really good idea. This will help others in scaling their site.
--
Sharique uddin Ahmed Farooqui
IT head, Managefolio.com
Sharique Ahmed Farooqui
--
Subscribing to this thread.
Caroline
A coder's guide to file download in Drupal
Who am I | Where are we
11 heavens
subscribing
Just keeping an eye on the thread.
-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world
also subscribing...
also subscribing...
Subscribing
Subscribing
-zach
--
harkey design
: z
subscribing
subscribing
Subscribing
Subscribing
Subscribing
Subscribing
subscribe
subscribing
subscribing
subscribing
subscribing
subscribing
Subscribing
Subscribing..
subscribe
subscribe
How did you subscribe to this thread ? I can't find how
I'd like to subscribe to this thread but can't see how to do it.
Do you have to submit a comment to subscribe, or can you click a button somewhere so you can just follow the thread?
Many thanks :-)
I don't know for sure
I don't know for sure, but I think they mean that if they post to the thread, it will then
be listed in their account's 'track' and 'my recent posts' views, and so be easier to
find again. It's not really like a proper forum 'tracker' feature though.
Hit the nail on the head
You hit the nail on the head. By commenting the tracker will now notify me of additions to the thread. Drupal does have a module that does just plain subscribing but it doesn't seem to be on Drupal.org :-(
-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front (Drupal)
http://www.SunflowerChildren.org/ Helping children around the world ( soon to be Drupal)
Thanks both - wish they had
Thanks both - wish they had the tracker module enabled for this forum - lots of times you just want to follow cuz your issue has already been mentioned in that thread - cheers :-)
They don't seem to want to
They don't seem to want to run anything but the 'out-of-the-box' install here.
I think the main reason is keeping resource usage to a minimum (and therefore hosting cost).
Perhaps minimising development, upgrade and maintenance resources too.
Good topic...
I guess a lot depends also on how the site is used, how its content is presented and navigated by users. For example, a Drupal site that depends heavily on internal search to display its content to users would have a very different performance profile from a site that displays its content in a paginated fashion, and that is how users engage its content. In the first example, heavy traffic could mean thousands of unique queries to the search index per minute. In the second example, the same amount of traffic might just be reloading the same small number of pages over and over. A large node number x high traffic volume does not necessarily indicate a higher resource burden than a smaller number of nodes. Search is only one example. Modules could also dramatically impact site performance. Of course there is also the question of whether your site is heavily user contributed (like Drupal.org), or if its content is added mostly by the admin.
eyepassport.com
User Contributed
The case I'm most interested in is where the users particpate heavily aka a community site.
-
Qatar - A Community Site
good start of the topic !!!
good start of the topic !!!
-- Sree --
IRC Nick: sreeveturi
Size of the Search Index
At over 100MB, the search index begins to get really unwieldy. At 200MB, you start to wonder if you really need it. At 300MB, mine "blew up". I don't think I'll use the Drupal internal search again, but I would love some tips on managing a very large search index.
Right now all I want is to surgically remove the remnants of the broken index:
and save myself the trouble of replacing a backup (which has the whole index in it, of course). I have DBA 4.7, 1.2 -- Could it be as simple as deleting the offending tables? Which tables? What is the difference between "Empty" and "Drop"?
kerblogger.com
Fixed, but...
Well, a simple "repair table search_index" via phpmyadmin did the trick, but the search module is now turned OFF and is going to stay that way.
kerblogger.com
=-=
The difference between empty and drop.
empty = truncates or empties out the table
drop = deletes the table
optimizing the tables is good to do at regular intervals.
Thanks...
Thanks for that, but I still have to answer for myself whether Drupal core search is going to be a part of my site as it grows. This is a scalability thread, so I had hoped to get some tips regarding search performance in cases of a very large index. If the basic advice is "optimize often", I can try that and see what effect there is. But, I gotta tell ya, right now the combination of gsitemap and "on your own site" search from Google is hard to beat.
kerblogger.com
=-=
my comment about optimizing tables was not just in relation to search table.
Drupal limiting mid-sized sites due to performance issues
Hi mohamedn,
I stumbled over this thread when searching for recommendations to handle even small to mid-sized sites with Drupal without having a server farm. I'd suggest the following terminology regarding a site's size: small (up to 10k nodes), medium (10k-100k nodes), large (> 100k nodes). Regarding the registered users of a site I don't have much experience, however it's important to distinguish the number of Anonymous users from logged-in users and to have an eye on dynamic elements delivered to these different user profiles; e.g. some blocks require a significant amount of server's ressources, while others don't use much sql queries and/or can be cached. Thus the mportant question is not, how many users Drupal can manage, but how many concurrent logged in users Drupal an handle without a zoo of servers.
Some notes from my experience so far:
* I'm running three sites (1 quite large, 2 mid-sized) with lots of scripting and some database backing on a dedicated Windows Server with 1 GB of RAM; the large site consists of >700k nodes, mostly stored static on the hdd; this server can handle some thousand of concurrent users without squid or proxy caching and without notifiable server load; it also delivers a few hundred GB of traffic/month without a glitch.
* I'm hosting one mid-sized (>20k nodes) and four small Drupal sites (each 10k nodes) on a dedicated server (Dual-Core AMD Opteron Processor 1212 HE with 2 GB of RAM) on Debian GNU/Linux "Etch" with a standard LAMP configuration (MySQL 5.0.32, PHP 5.2.0-8+etch7, Apache 2.2.3) and some APC caching; this server can handle a few dozend (!) of concurrent Drupal users, then both CPU cores are reaching 100% CPU load and the sites begin to crawl like a snake. This is mosty caused by background activities like Cron jobs indexing the site's content and some stuff in Drupal 5.x that I consider bugs. The limiting factor is the LAMP configuration, which also has been confirmed by Dries in a post some time ago (the combination of Apache2 with PHP5 is the slowest backend for Drupal). Also, on the server two other applications are running for each site: Gallery2, and MediaWiki. G2 is very CPU intensive, while MW offers file caching that relieves the server of much calculation; however, dynamically rendered pages are the limiting factor in MW, also, especially if you can't block our search engine's spiders from hitting the sites with hundreds of requests per minute. It's a bit different than Drupal but basically the same problem: It doesn't scale with logged-in users and growing content until you take additional (non-trivial) measures.
Two of the Drupal sites were hosted until a few months ago on a smaller dedicated server with 512 MB of RAM which was a nice playground but rendered unusable when going live. It might be possible to run _one_ small site on a server with 512 MB of RAM if disabling lots of core modules and not using 3rd party extensions, but I doubt that it'd be much fun; if there are experiences with low-end hadrware I'd like to hear them.
Large sites like the Wikimedia projects document their server infrastructure pretty transparently: they do a lot of multi-level caching to handle the traffic; while some of these caching concepts for MediaWiki are documented, I wasn't able to locate similar documents for Drupal. While caching support (for APC, Squid, et.c) is embedded into MediaWiki's core, in Drupal you have to utilize instable and unsupported 3rd party modules, as far as I understand.
Bottomline: With Drupal, you'll reach the hardware limits of your server frightening fast; no average (non-commercial, private, hobby) site can afford to scale the hardware as Drupal would require; IMHO the key to scaling Drupal is not hardware, but caching concepts. I'd like to hear experiences from other users in this area, e.g. if somone has figured out working Squid configurations to handle Drupal traffic and stuff like that. Maybe we can put together a guide when enough experiences become available.
For that, we need to consider a lot of factors, e.g.
* server type, hardware (CPU, RAM, HDD [IDE/SATA/SCSI/Array etc.])
* operating system, applications running, other jobs the OS has to handle (mail server, cron jobs...)
* configuration of drupal (theme, blocks, modules, etc.)
* specifics of the Drupal installation (e.g. shared code base, number of tables in database, age of installation, etc.)
Ragrds, -asb
boost module = nice caching
Bottomline: With Drupal, you'll reach the hardware limits of your server frightening fast; no average (non-commercial, private, hobby) site can afford to scale the hardware as Drupal would require; IMHO the key to scaling Drupal is not hardware, but caching concepts. I'd like to hear experiences from other users in this area, e.g. if somone has figured out working Squid configurations to handle Drupal traffic and stuff like that. Maybe we can put together a guide when enough experiences become available.
The boost module serves static files to anonymous users. It looks pretty cool. I'm not using it in a production environment. I would love to hear from others if they have production experience w/ boost.
Search Engine Scalability
I'd like to learn how people here deal with Search Engine Scalability issues in general. The built-in search is known to have serious problems. What are people doing to circumvent this problem as long as no fix is available? Not using a search engine?
Google Search is not always the solution, because of poor website integration and delayed indexing of constantly changing content.
Does using Google search
Does using Google search (this site) also help with Google SEO?
subscribe
My site (very low traffic) runs out of memory after moving to vps (256mb). very sad.
------------------------------
social networking | enhanced speech | noisy speech
WallStreetOasis.com
hey guys, I'm on Drupal 5.2 and I have had similar issues as my userbase (11,400 registered users, ~50k pageviews a day) and database have grown -- a lot of activity in the forums and changing dynamic content / blocks.
I was on a dedicated box, 2GB RAM, dual core AMD that was working fine with normal caching up until a few weeks ago. Over the period of a few days the overall site and especially the forums started to crawl. The load on the CPUs was way too high so I upgraded to an 8-cpu "Barcelona" server with 4GB of RAM that should theoretically be able to handle millions of pageviews a day. Unfortunately , throwing hardware at the problem did nothing. After days of research it seemed the MySQL was STILL overloading the CPUs.
After enabling the Boost module and Block Cache, changing the forum module (w help from a Drupal expert) so that it wouldn't query through the node access table and disabling the xtracker module, forum access and taxonomy access (access table was too complex causing huge MySQL queries and loads) the speed of the site improved and the load went down and is now under control, but it still seems slower than it was a few weeks ago which is very troubling (again, especially loading and posting to the forums).
I had xtracker installed, the normal forums and neither Boost nor Block Cache installed on a weaker server and was getting faster page load times and post times -- just a few short weeks ago. It has been very discouraging and I have spent a lot of time and money trying to get the site back to the same speed it was before.
It is currently usable, but my server costs are now over $460 a month and I will be spending more $ trying to have a Drupal expert analyze the bottlenecks and solve them. One thing I have noticed is the presence of more automated "bots" crawling the site...unfortunately, it is tough for me to decide, with little technical background which bot is harmful and which bot is just from google and is indexing. I can tell by looking up the IP address, but when is a google bots load too much? I would think that on my current system I could handle this (only around 14k nodes).
Plus, I use a lot of modules so I won't be able to upgrade to D6 for a while. (until all of the contributed modules are up and supported)
Just thought I'd throw my story out there...any tips / suggestions would be appreciated.
I still LOVE drupal. I think once I simplify my theme (make it compatible across all browsers), that might solve some of my speed issues. I hate Internet Explorer 6.
Thanks guys,
Patrick
...
Drupal 5.2? There have been several security and bug fix releases since 5.2, please consider updating soonest. Drupal.org had an issue with Google bot for a while along with other bots. With Google you can use their webmaster tools to request a slow down of crawls to see if this helps mitigate issues. To help deal with the other bot, take a look at http-bl. As to other performance issues, you'll have to check with your expert.
-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain
-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide
Thanks
Steven, thanks.
I would love to be able to move up to a higher version of Drupal, but it also costs $ every time I have to do incremental upgrades. I prefer to make sure the release and 3rd party modules I use are fully supported before upgrading, and do it all at once every 6-9 months depending on the releases.
I wish I had more expertise, but I have 0 coding / drupal / technical experience. Everything I've learned has been on the Drupal platform with help from hired web developers, so I really depend on others to help me get the site to the next level. (in terms of modules, functionality and performance tuning)
I am impressed that someone with my background can manage so much through the admin menus but when it comes to the complexity of mySQL, PHP, Apache, I am a bit lost. I know what MySQL queries are my bottlenecks, I am just not sure what modules are causing them and how to solve them. Although my site is "medium sized", the ad revenue is not sufficient to cover the web dev costs and performance tuning. But that is ok, because I really enjoy running the site and if I have to dip into my own pocket to make it go fast again I will.
I know my delays are coming from MySQL. I have a knowledgeable host that has been great (Liquidweb), but something with the indexing of the database / optimization is screwy. I am not sure if its the node_access table, but i did notice there is a nodeaccess and node_access table in phpmyadmin (not sure why there are 2). I asked my host how I could look at the MySQL queries and what were the ones slowing the site down and his response is below:
"look at "Show MySQL runtime information" -- you will see that your biggest problem seems to be 'Handler_read_rnd_next'
where you are getting 3,299.76 M hits, indicating that this needs better
indexing for your data so that the next row of data can be called without
being polled from the db with another call/sort of the table, instead using
the index to go directly to the row in question.
All these RED entries are areas that can be used to optimize your databases
and your queries."
any thoughts on how to proceed from there?
By the way, I am using Boost and Block Cache and they seem to help a lot (again with anonymous users), but the Forum page loads (~5+ sec) and post times to the database (~2+ sec) for logged in users is my concern, especially since I will be growing a lot over the next 2-3 months and I am already on an expensive / powerful server.
Thanks,
Patrick
subs
subs
Link not working
This link given here is not working
http://drupal.org/node/httpbl
---------------
Webmaster
free press release website
Correct url
The correct url is http://drupal.org/project/httpbl
Your robots.txt should help
Your robots.txt should help cut down on bot traffic especially if the bot traffic is coming mainly from search engines. For other bots, it'll only help if they pay attention to it.
Is your server hitting the swp space a lot? i.e. are you running out of RAM and having pages swapped out to disk? Add more ram or try and get some of the more memory hungry apps to use a little less. MySQL will want a lot of RAM since that's the fastest way for a db to run.
Reconfiguring Apache to meet your site's needs can help too, or it may expose the need for additional capacity of some sort. Run Apache Benchmark sometime when you tend not to have site traffic for a few minutes, if there is such a time.
Can you cut down on your cron jobs? At least during peak access times or are they running something that is time sensitive.
Use Devel module to help you figure out some db related speed issues.
Think about separating out your servers so that each can be configured for what it needs the most.
Turn off statistics
Our site was serving close to 100k page views a day with an average of 100 logged in users during the daytime. We run it on dedicated server with 4GB RAM.
MySQL would continually die on us causing much pain.... I was running blockcache but still faced problems. I started disabling modules at an alarming rate just to try to keep the site from falling over.
Finally a Drupal consultant suggested we stop running Statistics module - voila! Everything seemed to work okay after that...
-
Qatar - A Community Site
good tip!
good tip!
Drupal hogs resource
I am using drupal for our free press release distribution website. It has some 15,000 nodes, 3,000 users and a medium traffic of some 500 unique visitors per day.
Initially it was hosted on a shared hosting service provider (dailyrazor). Their service is good but with some temporary down times. Customer support was excellent. Then I did one big mistake. That is installing category module. I also installed other modules like freetagging and tagedelic without realizing the performance implications. Category module is feature rich but poorly coded and consumed too much memory. I exhausted my PHP memory limit of 32 MB. That is the maximum limit allowed in a shared hosting plan. I tried some easy suggestions for optimizing category module for better performance and less memory consumption. But no big improvement. So I moved to linode VPS where I set the PHP memory limit to 128 MB which solved all memory issues.
Our marketing campaigns attracted more visitors to the free press release distribution website. Site started crawling. Some times VPS would freeze because of too much swap usage. I followed many drupal optimization techniques including using Squid for serving static content like CSS, JavaScript,Images freeing apache to just serve only PHP pages. I also reduced the apache process count to avoid too much swap and too many database connections. In the mean time I also installed PHP eAccelerator. I tried many other suggestions which I could not recollect now. Still site crawls during busy hours. It is important for us to maintain good server response time because we are using Ubercart for eCommece. So I am planning to do following to improve site performance.
1. Immediately move the VPS from UML to Xen based VPS.
2. Move the server to a Xeon Core 2 Duo based dedicated server later ( I am interested to know if anyone here wants to share the dedicated server. Can you suggest some good value for money dedicated plans. How about ThePlanet or RackSpace).
3. Install memcached.
4. Upgrade to Drupal 6.x when all the dependent modules used by our site is ready.
5. Install boost and blockcache.
6. Serve static pages for Search Bots.
...Anything you suggest.
--------------------------
Webmaster
PressReleasePoint.
I'd be interested in what
I'd be interested in what you have found out. I moved from a co-hosted server to a dedicated server but as the site www.revisionworld.co.uk now has over 7000 pages and uses the category module it falls over regularly. This is not great and only seems to do so when you either edit or add new content.
I'm not sure a dedicated server is your option unless you plan to keep on adding memory (and cost) indefinitely.
Anyone else got any suggestions?
I'm seeing similar issues with
slow loading times as well on a Win2003 server, Apache 2.0x, MySql 5, PHP 5 with 2g of ram. We are using private files and can't use any of drupal's built in caching mechanisms. It's a school district site and we have a lot of anonymous visitors but only about 50-100 users and when more than 10 are trying to post at a time page loading screeches to a halt, I think it took about 30 secs. for them to access the form to post a simple content type. Frustrating to no end.
I've been tweeking the config files and have seen a slight increase in speed of page loads and it did help to disable statistics and devel module too. What's really helped is using MySql administrator and the command line tool and running SHOW GLOBAL STATUS; . I've used this to modify the my.ini file and used Apache's huge.config file as the basis of my Apache set up. I've increased PHP's memory limit as well.
But I'm really stumped when it comes to figuring out how to cache images in the private files directory. They don't cache and I'm not sure how to go about getting that to work. I've read a number of other forum topics posted ( http://drupal.org/node/227294 http://drupal.org/node/272082) dealing with this same issue but unfortunately no solution.
I am also using category module and have quite a bit of views created and know this will affect performance as well.
Is it possible that...
... it makes sense for you to use private download for some files, but not for others, e.g. most image files...?
A terrible (IMO) limitation in Drupal core is to have all upload using either private or public download, it's all or nothing.
Maybe you could use a contributed module like http://drupal.org/project/private_upload. With it you can have both sorts of upload, but I am not sure it would be easy for your 'community' to use that.
Caroline
11 heavens.com
unfortunately....
I went the recommended route of using private files before i really understood the implications. The site is rather large and I've uploaded loads of files (docs, images, etc.) to the private directory and I'm afraid of making changes to the file setting since that could break the whole site.
After my experience and research, I do agree it is a bigtime limitation with no real solution afaik so at this point I'm feeling very much stuck.
wwaaaahh..
Subscribing...
Do i get subscribed like this?
APache 2
Just my twopence worth! Try out http://www.lighttpd.net/ instead of apache 2 You might just get a nice surprise :)
For those doing the
For those doing the subscribe thing - there is no subscribe feature. People simply reply to threads with "Subscribe" in the body so that the reply will show up under My Account in the sidebar.
subscribe post (sigh) This
subscribe post (sigh)
This subscribing issue will be fixed in the new drupal.org site? I hope so, this thread is a mess.
Interested to hear more opinions though. We're trying to guage hardware requirements for an Ubercart site.
subscribe
subscribe
subscribe
subscribe