Hello,
I was wondering if someone could lend me some advice about a few performance issues as we convert a medium/largish site to Drupal (the old platform is failing and a V2 attempt at a custom platform also went down in flames). I'm trying Drupal now becuase I've had so much success with it on other projects.
We have about 150,000 thousand nodes (profiles as nodes) and this could easily grow to a million of more over the next year or two (if we get this to work). Drupal 6x has some pretty good options for managing a lot of the issues that we've run into...
Can't open taxonomies with millions of terms? Taxonomy Manager works pretty good.
Drupal Search too slow... Solr
Slow loading pages in the front end... Boost, Authcache, Cacherouter, etc...
Fine tune SQL, build indexes, ok
Throw some hardware at it - a cluster server config (yeah... in the offing)
Drupal 6x works great for small to medium sites, but on a medium to large sites (not to mention just plan large, 1,000,000 nodes plus), performance starts to become, er, rather challenging. I'm beginning to feel like Sisyphus. Even on our test site with only a handful of test users... the site is pretty much unmanageable, a WSOD-fest.
Here are a few questions... (and any other advice on issues to come would be most appreciated as well).
I can't seem to find a solution to the content (node) management... Content Management > Content. Is there a module available to manage content like the Taxonomy Manager does for Taxonomies and bypass the native Drupal funcionality?
Would it be better to use something other than MySQL?
How is Drupal 7 for those that have used it?
Do you have any Drupal tips or tricks about how you manage your large site (outside of throwing hardware at it and fine tuning databases, which are pretty universal to all platforms)? Let's define large as 200,000 plus nodes, 20,000 users online at a time and assume 35% of users are logged in at all times, searching, modifying their content, etc...
Basically, I need a performance advice dump so we don't waste time traveling down all the dead ends... thanks!
Comments
_
The best sources of performance info I have found are: http://2bits.com/contents/articles and http://groups.drupal.org/high-performance
_
Don't be a Help Vampire - read and abide the forum guidelines.
If you find my assistance useful, please pay it forward to your fellow drupalers.
Drupal 7 just hit final code
Drupal 7 just hit final code freeze. It certainly isn't at a point that anyone could provide any valid information on how it scales presently. Focus on D6 at this point.
---
"Nice to meet you Rose...run for your life." - The Doctor
My first public Drupal site - EyeOnThe503
Yes... I'm also trying to be
Yes...
I'm also trying to be forward looking... from what I understand performance is one of the major goals set forth for Drupal 7. If it turns out to be much more scalable than its predecessor, that leaves room for growth as sooner or later we're going to hit the wall with Drupal 6 and will have to progressively dilute/sacrifice a lot of its native Drupalness (which I don't want to do if avoidable) in order to keep it going. Content Management with large numbers of nodes is just one example.
Pressflow
http://pressflow.org/ is a good start
If you have multiple users changing stuff in the database using InnoDB would be helpful
APC is a must, same with Cache Router
Page Caching that doesn't use PHP is a must. Look into Boost or Varnish.
Keep an eye out for a stable release of Mercury as it's aiming to be something close to a drop in like Pressflow currently is; it's a build of Drupal that comes with APC, Cache Router, Pressflow & Varnish.
Personally I find Authcache too much of a burden to setup correctly. If your serving mainly anonymous users, Boost with Nginx is a very fast combo (somewhere around to 5k requests a second should be achievable). Configuring Apache correctly helps as well if you don't want to use Nginx, should be able to get over 2k a second with Apache.
Yeah, Authcache does look a
Yeah, Authcache does look a bit complex - I haven't ventured to try it yet, but's its on my list of things to look into. Between that and keeping Solr up, it could be tough. I'm familiar with Boost and Cache Router. I'll look into the others... The main problems is exactly what you mentioned - lots of logged in users modifying profiles and such. I'm trying to keep things fairly modest (not trying to replicate Facebook by any means). Lots of internal email, automails, who viewed who, favorite lists, etc... the site itself if fairly simple, just one major content type and some views but the performance is a big hurdle. The firsts two versions of the site were performance disasters (not Drupal) which is why its not got millions of members today...
Drupal.org works great so I know it's all possible...