When I unveiled the Drupal-powered version of my webcomic to the world, I was proud of what I'd done -- legitimately so, I think. A ten-year-old webcomic, with ten years worth of comic archives, completely converted over to a database-driven site is nothing to sneeze at.

Unfortunately, this pride was short lived: after three days up, my site host provider suspended my account because I was getting too much traffic (thanks, in part, to a link from Drupal's front page) and it was killing the server it was on. They moved me to a test server for about a week, where they monitored how much site resources my site was using... after that week they reported that ubersoft.net had been consuming a full 4% of the site resources, and consquently I'd need to move up, at the very least, to a virtual host provider package.

I had neither the time nor the finances to do this, so I sadly moved back to my static pages. The Drupal version of the site is currently sitting in my test area, while I try to figure out how to redesign it so it can minimize the consumption of site resources.

Since reverting back to the static pages I've been trying to figure out what specifically caused the site to go down. At this point, I think I can point to the following areas:

1. Inaccurate reporting of site hosting capabilities on the part of the provider, and a lack of tools to accurately allow customers to measure site resource usage.

2. Database activity from Drupal on top of standard site resource consumption.

A more complete analysis follows.

Site Hosting: What is Advertised vs. What is Given

First, I don't want this to come off sounding like an attack on my host provider. A2hosting.com was very helpful during this time, their tech support was very responsive, and they allowed me to essentially sit on a dedicated server at shared hosting prices for an entire week while we monitored the site traffic. The problem, in my opinion, is that the industry standard used for bandwidth metering predates the common adoption of database-driven sites, and they are no longer reliable benchmarks.

I purchased a shared hosting plan from them as a jumping off-point. It was their highest level shared plan, with a bandwidth cap of 200gb/month. The bandwidth cap is, unfortunately, the only unit of measurement most people have when trying to determine what kind of account to lease from a hosting provider, and 200gb/month was well above my monthly site traffic. My static site ranges anywhere from 30gb-55gb/month, so I figured an upper limit of 200 would give me some room to grow before moving to a semi-dedicated or dedicated hosting plan.

Unfortunately, the bandwidth limit does not accurately measure every aspect of site resource consumption -- specifically, it doesn't measure how much database activity is going on behind the scenes, and that is far more important to a database-driven site.

I had access to my webalyzer stats from both my Drupal site and my older static site. I was able to find a period of spiked traffic on my static site (a few days when I had been linked by Reddit) and compare it with the time on my Drupal-powered site when Drupal had linked to it from the front page. This comparison revealed the following information:

- while bandwidth consumption on the Drupal-powered site was higher on average (about 15-20K per page) it was never in danger of reaching the bandwidth cap -- not even close

- hits per page on the Drupal site was actually lower than hits per page on the static site

From this I determined that my site traffic and resource usage was well within the advertised limits of my account. However, the limits were based on a pre-database model for measuring site traffic and resource consumption, and a shared hosting account can have hundreds of separate hosting accounts on a single server. There was another piece of the picture -- database activity -- that wasn't factored in at all.

Database Activity: the 40-ton feather that broke the camel's back

Most host providers give you "unlimited mysql accounts" but don't actually give you any tools that you can use to measure database activity. My host provider was no different in this case, so I was very curious how a site that seemed well within the limits of all advertised resource caps -- disk and bandwidth usage -- could be using 4% of the system resources of a shared hosting account. Obviously if you're on a shared hosting plan, with hundreds of other accounts on the same server, 4% is a ridiculously large slice of the server to be keeping all to yourself. But the only unit of measurement I had on hand was webalizer statistics and monthly bandwidth usage.

My webcomic averages about 8.5K unique visits on Monday through Friday (when I update) and roughly 20K-30K page views (for the readers who stroll through my comic archives). When I was linked by Reddit (on the static site), this spiked up to about 16K unique visits on a single day with 42K page views, and when I was linked by Drupal (on the drupal site), the unique visits and page views spiked in a similar manner.

The bandwidth consumption, as mentioned earlier, was a bit higher, but not high enough to exceed my 200gig/month cap, and the hits per page was actually lower so in theory Apache was doing less work to serve each page...

... but there was a database to take into account, and this is what killed my site.

My host provider had no tools I could use to monitor my database activity, but some Drupal users suggested I install the Devel module, which allows admins to view statistics about what the database is doing and how long it takes for those things to be done. Specifically, Devel allows you to see how many database queries are performed on each page, what those queries are, and how long each query takes.

On my test bed (which is an exact replica of the live site, using the same version of php, apache and mysql) I installed the module and started browsing pages. The Devel statistics were interesting:

- the front page required anywhere from 90-150 database queries
- individual comic pages in my comic archives averaged 70-80 database queries
- my archive table of contents for each comic (lists of my comics that could be browsed by year) used anywhere from 100-300 queries, depending on the filters used

I have no idea what the average is for database queries on a Drupal site. I do know that my site uses views quite heavily -- publishing a comic on Drupal is a bit more complicated than your standard blogroll, and publishing three comics plus general site news posts required a fair amount of tinkering on my part. I can surmise, however, that there is a significant difference in site resource consumption between a single person viewing a static page consisting of twelve hits in total, and a single person viewing a static page consisting of five hits in total plus 70 to 300 individual database queries. Multiply that by 16,000 unique visitors (or, more acurately, 47,000 individual page views over the course of one day) and you a LOT more activity going on in the background with a Drupal site than you do with a static site. And this was with both mysql caching and Drupal site caching enabled (though it was not set to aggressive caching).

That, I believe, is what killed my site.

Moving on: Making Drupal Work

So the question now becomes "how can I make Drupal work?" I was very pleased with the functionality I had put into the Drupal-powered website -- it had navigation and content searching features that static sites simply can't provide, and the taxonomy features of Drupal are head and shoulders above any other CMS I've played with... and perfectly suited for making images easier to search and navigate in a text medium. The problem, apparently, is that this functionality requires more power on the back end, and I need to figure out:

- whether I need to move to a more robust (i.e., more dedicated) hosting solution
- whether I need to optimize the drupal site to minimize database usage
- whether I need to do some combination of the two above options.

I suspect the third option is correct, but at this point I'm not sure which is more important. My hosting provider said that based on the 4% resource consumption I'd probably do well on a semi-dedicated hosting plan... but I'm not sure if that 4% is a natural result of the site traffic or a result of bloat I introduced in my site when I made all those views. The thought of 47,000 page views generating anywhere from 4,700,000 to 14,100,000 database queries in a single is alarming, but is it alarming because it's ridiculously high or is it alarming because I'm not familiar with database-driven sites? Is it normal for a single page to make 100 database queries? I don't think that's normal. On the other hand, I don't have enough experience to know if my instinct is on target.

Would I be well served to work on paring back all the features in my site in order to reduce the database load? Would doing that reduce the functionality of the site to the extent that it's just not particularly useful to my readers? The last question is particularly important to me -- I'm sure I could create an alternative Drupal design that reduced the number of database queries per page, but I'm not sure that the design would make my content more accessible and easier to navigate for my readers -- which was one of the main draws for moving over to a database-driven site.

The End

That's about all I have to say in terms of analysis. I hope it was useful to the rest of you... I spent a lot of time trying to figure out what to say. An earlier version of this post was considerably longer, a lot more confusing, and generally useless. I'm half-afraid that this version has gone in the other direction and doesn't give enough background information.

At any rate, I'm very intrested in what those of you who run high-traffic sites think of this post and my conclusions in it. All comments welcome and encouraged...

Comments

jjkd’s picture

Nice posting! Well written, and I think you've captured the essence of what is going on. Unfortunately, I don't have the Drupal-specific skills to help you, but I think this would make an excellent case study and/or white paper on how to optimize a Drual site while trading off resource requirements against functionality.

Hopefully someone with the necessary background will take this on, I would be very interested in seeing the results, and I think that it would be a benefit to others as well. I believe that I've seen others post some good work in regard to specific parts of the solution process, which may fit in well as references.

--
Joe Kyle
--jjkd--

JohnForsythe’s picture

It's interesting, Ubuntu just had a similar issue. They recently upgraded to Drupal, but had to temporarily revert back to static html to handle the high demand for their latest release. Maybe they will post some analysis?

--
John Forsythe
Need reliable Drupal hosting?

rszrama’s picture

(Subscribing... I'd love to hear about query caching if anyone has used it with large Drupal sites. We started caching queries on our MySQL server for a large osCommerce site and saw incredible results.)

MacRonin’s picture

Subscribing ... This is a problem I hope to have as a project I'm working on grows toward the end of the year.

BTW have you considered investigating the new Block level caching??
-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world

ubersoft’s picture

I use it quite extensively... almost every view block I create uses the cached version of the block instead of the block itself.

Unfortunately, that does NOT allow you to cache pages that are not views... for example, individual nodes for each comic, or views that are entire pages (i.e. the table contents archives I've created, which are the most database-intensive parts of the site).

In a new site I'm building I have a few other things I've added to hopefully minimize some of the overhead (like xcache for the php) but I would really like the ability to cache entire pages regardless of whether the reader is anonymous or has an account.

MacRonin’s picture

Yes, I was thinking of the block-cache module. I hadn't realized that you had already implemented it for most of the blocks. But working on the core page content when that content is data driven(such as your views) would be a great enhancement.

-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world

sepeck’s picture

Well, I was the one who promoted your announcement to the front page the first time and this time as well. Oops. :D

The reason I did so the first time was because of what you stated here. You took a static site and migrated it and 10 years of content leveraging Drupal and it's abilities very well. It was a good example of why designing and building sites can be complicated and you had a very nice write up about your objectives and the benefits you received in using Drupal for your site as well as some of the learning curve as well.

This write up is also good. We have a variety of levels of users. From the folks who work on a high end designing and building enterprise level sites (mtv, sony, warner) to those converting over from static HTML who know how to look up the word web server but don't necessarily understand the relationship with the database much less the performance impact of the database on RAID1 or RAID5 or even a virtual server.

Performance documentation is something we don't have a lot of in the community. It's something that is not hard to do, in theory, but does take time, research and practice practice practice. With the recent Lullabot performance seminar a few weeks ago this type of conversation is starting to occur in a wider audience. Over time I hope that white papers can get added to our collection of documentation on the issue. Drupal is only one part of the equation, as you learned, the database is another important part as well.

As a note, if you look in the development archives, performance is always an on going concern for the active developers. For instance for Drupal 6, one of the large issue with menu module's has been addressed. This doesn't really help you now but be assured folks are always trying to keep performance considerations in mind while going forward with each version of Drupal.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide

ubersoft’s picture

Well I didn't mind the link :), and honestly in the end I don't mind that the site went under -- it's a little embarrassing but if I ever want my site to handle high volumes of traffic it's better to learn that right off the bat, and I'm hoping it makes the next attempt more successful.

What I need to figure out right now is if there are any guidelines/best practices concerning how many database queries should be generated by a single page view. Am I right in assuming that 90-150 queries is excessive? Or is that not unusual? I'm afraid I don't have much to compare it with.

misty3’s picture

While most host advertise HUGE amount of space and bandwidth , the cpu resource usage allocation is something which is tucked away elsewhere under t.o.c , a.u.p etc. Practically the amount of bandwidth they promise cannot be achieved at that minimal cpu resource % allocated.
The 'top' record which is the actual documentary evidence of which processes have actually consumed resources stay with the webhost admins and are never shown to the client directly live and online. When asked about tranparency , all most all hosts say that we have to "trust" them.

To run sites based on drupal or similar php-mysql platform and where there is moderate amount of crowd most shared hosting provider will "fail" unless its a fairly new host whose disc has not already been "overpacked". Some hosts actually provide v.p.s or semi-dedicated servers at very fair price - so it may be worth going with them provided the budget permits, in my humble opinion.

Best regards

ubersoft’s picture

I actually have a 512 mb slice set up at slicehost.com... but I wanted to see if it would be worth my time trying to reduce all those db calls as well. Not that I'd necessarily know where to start...

catch’s picture

I noticed you said you were using front_page module + views for your front page.

If you used a panel for your front page, you could use views blocks inside that, which then allows you to use cached versions of your blocks - it's a very handy way to cache views in the content region of your page and allow for block placement at the same time. We do that for all the main indexes at http://libcom.org (still unstable at the moment). There's also a contrib caching module which allows you to cache entire panels, I've not tried that yet though.

Assuming a lot of the visits to your site land on the front page it could make a big difference.

Fwiw, we get similar traffic to you (c.5-7,000 visits, 30-40,000 pages/day on a site with 3,600 users and 14,000 nodes) and we're on a 512mb VPS with blackcatnetworks. We're thinking of moving to a 1gb 1/4 VPS with rimuhosting though to speed things up a bit.

ubersoft’s picture

Everything on my front page is a block. most of them use blockcache -- there is one exception, because blockcache is apparently incompatible with views bookmarks.

aren cambre’s picture

Just want to echo the need for more performance analysis.

One of my biggest gripes is how slow Drupal runs on shared hosting accounts. I hope a lot of work goes into query optimization with Drupal 6.

canadrian’s picture

Subscribing to this thread because my problems are similar. I have a very low-traffic site, but I keep topping out my CPU usage allotment on my shared hosting account. I would be very interested on how to optimize a Drupal site when storage and bandwidth are not a problem, but CPU usage is.
----------
"Being tired is like playing mind games with some of the pieces missing."
- Canadrian

I am a proud member of the ElectricTeaParty.net online community.

joshk’s picture

Drupal really is an application and as such it takes CPU/RAM to run, much more so than disk space or bandwidth.

Unfortunately, most shared hosts still live on the assumption that their users are (primarily) serving up static page documents, in which case the only application in use is Apache.

------
Personal: Outlandish Josh
Professional: Chapter Three

------
Personal: Outlandish Josh
Professional: Pantheon

dmuth’s picture

Personally, I think your webhost was out of line. If they are selling "unlimited MySQL accounts" then going back on that, they are ripping people off. It is no different than Comcast selling "unlimited" Internet connections but shutting off high-bandwidth users. Granted, I'd understand if they shut off a customer due to abuse of the MySQL database, but I would hardly consider Drupal to be "abusive".

I would suggest finding another webhost shows more clue in these matters. For example, my webhost allows unlimited MySQL usage -- but, if you are in the top 10% of MySQL users, they will bill you a surcharge of $0.01 per day. Not only are they up front about it, but they are reasonable about what is charged and their conditions for charging it.

My 2 cents.

-- Doug

dsp1’s picture

The hole, I mean, whole ;) shared web hosting business sells unlimited accounts and bandwidth restrictions without really pointing out CPU usage. We as customers, should demand more transparency on CPU usage from web hosts, but until the industry changes, we are stuck if we use shared hosting. If you get a notice from your web host that they will shut you down, that is pretty good for current standards, most users I have read about just got switched off. no notice. i think it is wrong, but we need to ask for better from web hosts.

And ISP are shutting off users for using too much bandwidth and not telling them. Cox for example, advertises high speed internet of 7mbps or 15mbps and they don't advertise that there is a limit to how much you can actually download or upload in a month, it is included in their los (limitations of service), if you can find it. I only found out after signing up and later reading about it on another web site.

I have noticed on my test server that the queries are very high on the main page 347 and other pages 150. I do have devel and many other modules installed for testing.

misty3’s picture

Hi Dmuth,

Thanks for the link to your webhost. Its certainly a very fresh approach to webhosting.

A few questions, if you do not mind :

1) Do you run drupal there ? It seems that if there is Druapl+Gallery that won't run because of php safe mode etc ( vide their faq )
2) What does "a" MySQL process means actually ? Do they have phpmyadmin ?
3) 24x7 support, good speed and uptime ?

Best regards

dmuth’s picture

1) Do you run drupal there ? It seems that if there is Druapl+Gallery that won't run because of php safe mode etc ( vide their faq )
2) What does "a" MySQL process means actually ? Do they have phpmyadmin ?
3) 24x7 support, good speed and uptime ?

1) Yes, I run several Drupal websites there. (Link #1, Link #2, and Link #3) (Edit: I forgot to mention Gallery. I run the acidfree module instead. It integrates with the rest of Drupal nicely.)

2) I'm not super familiar with the latest MySQL terminology, but a process is basically an instance of MySQL. The above mentioned sites that I run all exist in the same "process", but each of them is under a different MySQL user so as to keep the data separate. Yes, they have phpmyadmin.

3) Support has regular business hours. However, they do monitoring 24/7. So if something breaks in the middle of the night, they usually know about it before you do. :-) I've been with them for about 4 years, and I can only recall 2 multi-hour outages over the years. Their infrastructure is robust enough to handle the approximately one Slashdotting a week that their sites tend to get. :-)

They also have member forums where you can ask questions of other members once you sign up.

-- Doug

Southpaw’s picture

Just a quick question, but did you turn on caching?

ubersoft’s picture

I had both css and site caching turned on, though I did not have aggressive caching turned on -- it reported that it was incompatible with a few modules I was using at the time.

But I need to point out that I ultimately hoped to have a large number of subscribers to my site -- some of the nice site features (bookmarking pages, editing user comments, some advanced search functions) were only available to subscribers -- and if I ultimately had a large number of subscribers, Drupal site caching would be useless because it doesn't affect subscribers at all.

I was also using the blockcache module -- unfortunately, blockcache doesn't affect raw nodes or full-page views, so for example someone who was browsing through my site archives wouldn't be able to benefit from the caching.

The provider had caching turned on in mysql, and I assume that helped some. I did not have any php-caching set up on my site, which I assume would have helped as well...

merlinofchaos’s picture

But I need to point out that I ultimately hoped to have a large number of subscribers to my site -- some of the nice site features (bookmarking pages, editing user comments, some advanced search functions) were only available to subscribers -- and if I ultimately had a large number of subscribers, Drupal site caching would be useless because it doesn't affect subscribers at all.

This is only true if you expect 100% of your hits to be from subscribed users.

If 75% of your hits are from subscribed users, that's still 25% that are getting cached pages for very little performance hit. And, what really kills a site, the 'slashdot effect' will have a disproportianately high number of anonymous hits.

Finally, when aggressive caching reports that it is incompatible with some modules, it is being very conservative in reporting that there may be problems. The reason it is 'incompatible' with these modules is that they implement _init or _exit hooks (i.e, they operate on every page) and that those calls won't be made for anonymous users. But that's often ok in those situations. It depends upon what that module does with the hook. Most of the modules you're using that are 'incompatible' are just fine with aggressive caching.

-- Merlin

[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

-- Merlin

[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

ubersoft’s picture

Eaton pointed that out (and I think I remember you doing that as well, back when the Drupalized site was live) -- but there's no way to know that just by reading the warning.

firebus’s picture

the real problem here is drupal's continuing lack of static page caching.

it's not that hard to do, there's a promising contrib module for this for 4.7 (http://drupal.org/project/boost), and in my opinion this is one of the most important roadblocks to larger scale adoption of drupal.

imo this should be a high priority for drupal 6, and devs should look to the smarty template system as a guide. smarty has an awesome static page caching system that provides a separate html cache file for each template - implementing something like this for drupal would allow for block level caching. you could have a page with dynamic content, but cached menus and blocks - just like block cache, but with static pages.

there's no reason for a site the receives primarily anonymous traffic to be doing a ton of php compilation and database calls on a page that doesn't change. database caching is the worst idea drupal ever had.

sime’s picture

It's also worth noting
http://drupal.org/project/fastpath_fscache

I notice that boost seems to have a little more queue activity. It would be great to get some feedback from users of both boost and fastpath_fscache.

mr700’s picture

Static page caching would be a great thing. On our site all stories older than 1 month are locked (no more comments), so they change virtually never (except for the menu). I was thinking if I can use mod_rewrite and serve static html, and creating a static snapshot for the worst scenarios (better static than none). It would be a 'wget -m' thing if all drupal links ended with a slash (node/nid -> node/nid/)... For now, we bought better server and moved the database on it (the database traffic was higher than the web traffic).

budda’s picture

Surely older pages will not suffer from intensive access, so caching them as a static page is pretty worthless?

--
Ixis (UK) providing Drupal consultancy and Drupal theme design.

ubersoft’s picture

My website (the drupalized example is currently down due to Stupid MySQL Admin Activities) is a webcomic, and the "older pages" are archived comics -- which are absolutely CRUCIAL to the site, especially for new and potential readers. I *want* people reading the older pages, and in any situation where I get more traffic than usual (if slashdotted, dugg, linked from reddit or, ahem from the front page of Drupal.org) it is actually more likely that people will browse through the older pages than in any normal day. In other words, during periods of high traffic those pages will be accessed more.

Of course, Drupal already takes care of that for the most part, since those new visitors will be mostly anonymous and therefore Drupal's cache will already be working, but my point is that depending on what content you're serving, older pages can be accessed fairly intensively.

bslade’s picture

Also see A file based cache utility for Drupal. It uses Apache URL rewriting to direct a web page to a static file, if it exists, or to a drupal URL. A cronjob periodically creates an index.html file that's a copy of the drupal front page, and other html files which represent html pages of the N most recent postings (eg. the story node with the various blocks around it that make up a whole page for an anonymous user)

In fact, Drupal.org needs something like this. It seems to run sort of slow.

PublicMailbox@benslade.com

abbynormal’s picture

I agree that a single page shouldn't normally take 100 database queries to display. Everyone who has commented about the *serious* need for improved caching (including for logged-in users) is right and echoes some of my serious concerns with Drupal performance in an environment where users may be logged in.

I am not a database "guru", but I have a fair amount of database performance experience and I've seen a handful of modules and slightly-slow database query times render a site almost unusable for logged-in users.

-an

NT’s picture

Hi

Are you able to use something like apache's mod_cache to cache the generated web pages to disk or memory.
This should reduce the amount of database accesses - (Drupal holds its cache held in the database).
I've not needed to try it, (I've used other caching solutions with jsps, which will not work with php).
The latest (online) documentation is very good.
This module may not be available on a shared server, but for those using a dedicated server, I think it would be worth trying.

Nick

ubersoft’s picture

... suggests that in order to get any real benefit, you need to modify the bootstrap.inc file in the includes folder to keep Drupal from forcing reloads... and that people who suggest making that change get shouted at by Drupal developers. :)

the_other_mac’s picture

100 queries per page is quite normal for some web-apps (OsCommerce springs to mind) but quite high for Drupal in my experience. Of course it depends on how you were using it; but looking at your current (static) site, it seems to me that there is little in the way of the sort of interactive content that makes database queries unavoidable. As long as users aren't logged in (and I guess that most of your pageviews would be from non-logged-in visitors), a page could be loaded almost entirely from cache.

If you had something like a "recently-viewed articles" block whose content constantly changes, that would be an example of something you might consider removing to optimise performance.

You said you didn't try aggressive caching, and that's a little strange. Caching is exactly what you need, and the nature of your site suggests it would be very effective. You might look at this thread on additional caching options: http://drupal.org/node/97347 .

As for your hosting provider - a typical hosting situation is to allow 1,000 accounts on a single server, each of which is entitled to (say) 10Gb diskspace and 100Gb monthly transfer. But of course that doesn't mean the server has 10,000 Gb of diskspace, nor do they pay their ISP for 100,000 Gb of monthly bandwidth. They are overselling, safe in the knowledge that statistically most users won't ever use a fraction of their limits. (This applies to all hosting providers - no one could compete by being "honest" about their allowances.) If one of the 1,000 accounts happens to use say, 10Gb of diskspace, this could be 2% of the total available, but it wouldn't be a problem because everyone else is using less. So 4% of CPU usage might be over your limit, but it's not overwhelmingly unreasonable.

I wonder about your statement that you couldn't afford a virtual private server, given that you run advertising. They start at under $50 per month, and you do run advertising on your site. Or as an in-between, many hosts offer so-called "e-commerce" hosting, which is basically a shared package that allows for fairly high CPU usage.

ubersoft’s picture

Lots of good meat in your post to respond to. Thanks. :)

- The current static site is really stripped down and shouldn't be used as a baseline for what the drupalized site was. The drupalized site hosted three webcomics, and it was possible just by looking at the front page which of the comics had been updated (two were displayed as thumbnails on a sidebar via imagecache). There was a also a poll that allowed readers to vote on various silliness, a link to an rss feed to another site I run, and site news printed below the main comic. In order to do this I had to use the front page module and populate the front page entirely with views -- the default setup doesn't really allow you to separate out information on a page.

Also, I used taxonomies extensively -- each comic had a taxonomies for the comic name, storyline, and characters appearing in the comic. By clicking on the storyline taxonomy, you were able to view all the comics in a given storyline and only those comics. Clicking on a character name allowed you to view all the comics that the character appeared in. This made it easier for my readers to search for specific comics.

Finally I created a rather complex archive system (again, using views) that allowed the reader to view a table listing all the comics published for a specific webcomic -- it displayed year, date, comic title, storyline -- which could also be filtered to only display comics published for a specific year.

Those features seemed to me exactly the sort of thing that webcomics ought to have, especially ones with extensive archives (I don't have the largest archives out there, but at 1400+ comics for Help Desk alone mine are legitimately large), and those are what caused a lot of the database work -- that, plus trying to figure out how to display various pieces of information on the front page and keep them there when they all updated at different times. Unfortunately the default system that Drupal and every other CMS uses is put all content into a single stream, and I actually had five or six separate streams -- three comics, site news, a poll, an rss feed to another site...

- I didn't use aggressive caching because on the performance page the aggressive caching option clearly stated (in red, bolded text!) that it was incompatible with some of the modules I was running. That is the only reason I stuck with the default caching system. Right now the modules content, devel and token are listed as incompatible with aggressive caching. Both token and content are absolutely necessary for the functioning of my site (my comic nodes are created with CCK in order to make data entry easier, and token is used with the custom pagers module which makes navigation between individual comics adhere to the webcomic navigation standard).

That said, I did use site caching. I turned on normal site caching and turned on css caching. I had no php caching because my host didn't support it at the time. Mysql caching was turned on.

One of the things that I think tripped me up is that I don't consider my site very high-volume -- not compared to really successful webcomics out there. The unofficial benchmark for the low end of successful for a webcomic is around 10K unique visits a day, and I'm not there yet.

- I'm somewhat encouraged by the responses here that my site may not have been as resource hungry as I initially feared. It may be that I simply underestimated what my hosting solution could handle, and that a semi-deciated, deciated, or vpn server will be able to take it all in stride. I do think I need to figure out if it's possible to fine tune the site a little bit first, though.

- My ad revenue is collected quarterly and isn't enough to pay for most of the standard semi-dedicated packages I ran across. Most virtual private servers (like you mention) require that you install and maintain everything yourself, which made it an extremely unattractive option at the time. Now I'm exploring the LAMP route and going through the unpleasant process of learning how it works -- at the time I was more interested in publishing my webcomic, but the experience has convinced me that in order to do that properly I need to learn more on the back end. That is going to be a ridiculously painful process, though, and if I'd been trying to do that AND learn to configure Drupal at the same time I would have given up completely. A human being can only take so much pain at one time... ;)

eaton’s picture

I didn't use aggressive caching because on the performance page the aggressive caching option clearly stated (in red, bolded text!) that it was incompatible with some of the modules I was running.

We really need to look into making that warning more accurate. Technically, what it means is that modules that assume they will ALWAYS run every time a page is viewed will not actually run when an 'aggressively cached' page is viewed. Aggressive Caching scans for any modules that implement the "init" hook, and warns the user about them.

Both content.module and token.module ARE actually compatible with aggressive caching, as they implement the init hook but work fine even if the super-cached page is used, and they are ignored. Certain module features, like devel.module's query logging and statistics.module's logging of page-views, DON'T work with it -- in general, it's not that the cached page will break but that modules that want to log stuff, etc don't get a chance to.

Not sure if that helped at all, but it at least clarifies the somewhat vague-but-scary warning message on that performance screen. I ran into it a couple of times as well and did some investigation.

--
Lullabot! | Eaton's blog | VotingAPI discussion

--
Eaton — Partner at Autogram

ubersoft’s picture

to take the documentation at face value. :)

Thanks for letting me know that. I'll be using aggressive caching in the future -- I won't be running devel when the site goes live again, and token and content were the only other modules it flagged.

sime’s picture

Thanks Eaton, very useful tip.

dnewkerk’s picture

I'm not quite a drupal ninja yet, so I can't really advise on that aspect... but what I do have is extensive knowledge of online advertising, and quite a bit about dedicated servers. Anyhow - you mentioned your site gets some reasonably high traffic (I take it in the ballpark if not quite to 10k per day). With traffic anywhere in that range, or even half that, it should be possible to make more than enough to pay for some higher end hosting, provided you choose the right ads, and set them up in the most ideal way on your site. The niche (and thusly the type of advertisers for it) dictates whether your traffic is worth hundreds versus thousands of dollars a month, but at the least hundreds (in that traffic range). At your level of traffic, I would definitely move to the next step above virtual hosting as soon as you can - keeping it will only limit your potential success and add to your stress wondering if your site will stay online.

Anyhow, I'll be happy to go into more detail if you're interested - though just to start you off... consider changing to (or adding) Google AdSense or Yahoo Publisher Network ads. Contextual ads (and text ads in particular) tend to receive substantially higher click-through-ratio (and of course, keep your CPM ads too if you can do so tastefully). How and where the ads are placed throughout your site, and what colors you use (always set the links to the same color as your site's links, and usually opt for no border or background color) makes a very significant difference (e.g. just changing placement, etc, can help you out by 2-6x). Also consider a few other "less obvious" directions that would fit nicely in with your site's audience... for instance join the Amazon affiliate program and look up your favorite comic-related books, or perhaps some books you could recommend to people interested in drawing their own comics (you get the idea). Add both a general block of these to your site's sidebar, as well as a "Recommended books" type of section to your site, and write up some personal reviews of the books you've chosen. You're an expert in your field, so many of your audience will give greater weight to your recommendations and consider buying what you suggest. You can also suggest that fans of your comic make all their Amazon purchases through your link to support the site. Doing the above should help you make several hundred more per month, at least.

So far as hosting, probably a good match for your needs right now would be grid hosting from MediaTemple or Mosso:
http://www.mediatemple.net/webhosting/gs/
http://www.mosso.com

With these you'd get substantially higher resource allocations than with plain virtual hosting (and you actually get it - not empty promises of unlimited this and that), and would not have to concern yourself whatsoever with the intricacies of managing your own dedicated server for the time being (which is not worth it yet for you, I'd advise). You could also consider a dedicated-virtual server from MediaTemple: http://www.mediatemple.net/webhosting/dv/
Whatever you choose, go do a quick search on www.webhostingtalk.com for the company first to get the most recent feedback about them. If you "do" end up going with your own dedicated server eventually - be sure to have it secured properly (e.g. rack911.com is usually highly recommended, etc).

-- Dave

fischermx’s picture

100 queries per page is a bit too much, no matter what site it is.

znerol’s picture

i too had problems with a moderate-traffic drupal site recently, because of the fact that drupal logs errors into the database. the site lived on a server at a shared hosting provider which was not very well managed (too many users, bad security, ...). several times the mysql-connection "went away" and so did the cache table of my drupal installation one or two times. as a result thousands of error-messages were triggered which strained the db additionally. after some minutes the server went down completely.

in some circumstances it would be usefull to be able to log errors into a logfile instead of the database. logging make things worse if something goes wrong with the db.

eaton’s picture

Complete "Couldn't contact the database" errors -- like you'd see when the DB server goes down -- are not logged to the database, but if DB contact is *sporadic* throughout the page's lifecycle, you could see the errors you described.

Drupal 6 now has a more flexible error logging system that does just what you're talking about -- I'm not sure if there's a chance it will be backported to Drupal 5, but it's something to look into.

--
Lullabot! | Eaton's blog | VotingAPI discussion

--
Eaton — Partner at Autogram

znerol’s picture

the clarify this: the real problem was not that the database went away* and drupal logged into nirvana, the problem was when the db came back but was corrupted (cache table was not accessible anymore). each and every page hit after the db came up again generated a lot of log entries** which (i beleive) brought the server to its knees. however i can't tell if this was really the problem because the poeple at the provider were not able/willing to provide detailed information.

* logmessage: "MySQL server has gone away"
** logmessage: "Can't open file: 'cache.MYI' (errno: 145) query: ..."

eaton’s picture

* logmessage: "MySQL server has gone away"
** logmessage: "Can't open file: 'cache.MYI' (errno: 145) query: ..."

Yeah, that's an indication that the server's database is seriously hosed. Drupal probably should be able to deal with the scenerio more gracefully, but no matter how you cut it that setup will render a piece of software unusable until the DB is properly restored.

If that happened, drupal logging its error messages was the least of the server's problems...

--
Lullabot! | Eaton's blog | VotingAPI discussion

--
Eaton — Partner at Autogram

vkr11’s picture

Just tracking

Hyper-1’s picture

I don't see why money is an issue. You can get a fully managed VPS from about $50. Try servint.

vivek.puri’s picture

Hi,

This is indeed very important topic. I am in similar situaion except having a somewhat technical background I never moved my site to Drupal except for a very short amount of time.

For background : I host a "small" forum based on phpbb. I was initially hosted on shared hosting but I have moved to dedicated server sometime ago. I realized that when I move to drupal my shared hosting account wont be able to handle the load. I am trying since days of Drupal 4 but have not made the switch yet.

Reasons why I wanted to move to drupal: Many here might be aware that to add features to phpbb , you have to change the code of many files. What it means is when a new patch is released you have to carefully redo everything. My last update costed me almost days to move to latest version. As the patches are security related you cant avoid it. All this can be very painful.

Now comes Drupal which has very well designed , elegant api. Allows you to extend functionality "very" easily. There comes the catch as it leave everything at hands of module developers , not all modules will be equally developed causing uneven performance.

All those who feel caching will solve the problem, they are only partially right. Part of blame is the design of Drupal itself. I wil try to explain but main core issue is loding by object vs loading by page. Drupal is designed to load by object.let see how it effects the performance.

I will take a simple case , lets say there are 10 nodes. To display the main page:
Drupal will load 10 nodes ( 1 Query )
For each node it will read author info ( 10 queries )
For each node it will read count of comments ( 10 Queries )

So that makes it 21 queries just to load 10 nodes, with only comments modules enabled. Depending on number of modules you add , you could be looking at 10 queries/module to be added.

Alternatively if drupal could load grouped items it would look like
Drupal will load 10 nodes ( 1 Query )
read user info for all 10 nodes ( 1 query )
read count of comments for all 10 nodes ( 1 Query )

now thats 3 queries vs 21 queries.

From development core code in node.module in function node_page_default:
original code :

while ($node = db_fetch_object($result)) {
      $output .= node_view(node_load($node->nid), 1);
    }

modified code

$nodes = db_fetch_objects($result); // $nodes is an array containing all nodes
all_node_load($nodes) ; // loads node objects in groups
foreach ($node in $nodes) {
      $output .= node_view($node);
    }    

This is just an idea not something I have experimented yet. I am not an expert in Drupal API level but have spent some amount of time trying to make it work on my test system. It may be a stupid idea but someone more experienced can comment on it.

But I have seen that Drupal loads same objects again and again in same page.

Now I know many people will say that it has been working for xyz and even for drupal itself but they should understand that not everyone will be able to host on multiple servers. Most of us can only afford shared hosting and unless number of queries are reduced actively this problem will not go away. Caching will help but nowhere near as actually reducing number of queries.

giorgosk’s picture

I am wondering if its actually possible.
------
GiorgosK
Web development/design blog part of the world experts network

------
GiorgosK
Web Development

merlinofchaos’s picture

There has been some discussion of doing this with the node hooks, but to my knowledge no one has stepped up to write the patch to do this. This is one place where someone interested in really doing valuable work for the Drupal community could step in and provide some help.

-- Merlin

[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

-- Merlin

[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

johnnybegood’s picture

I've been following this bit of the thread to see if you receive any replies from more experienced Drupal developers and wonder if you plan to test what you've found and share your that with the community.

So far nobody said it's a stupid idea ;)

Cheers

vivek.puri’s picture

I definitely want to work in this area. I can find my way around code but I still don't know my way around drupal community ;)
This change will work best when done in modules too. So my initial thought is to try it against a set of modules ( maybe around 10 ) and compare results. If I can get some input regarding test scenarios I would try to test them out. This may possibly give some statistical information which can be further discussed.

johnnybegood’s picture

developing for the community would be http://drupal.org/contribute/development

al4711’s picture

Hi,

nice post thanks to share ;-)

I have looked into the current site and have some questions:

1.) the js was from jquery or in every page or in a/some .js files
2.) I hope the pics are not in the db, isn't it ;-)?
3.) apache can deliver the css/js/png without asking drupal
4.) Have your provider give you some infos about the mysql like:

  • query_cache_*
  • max_user_connections
  • version
  • table_cache
  • key_buffer_size
  • ...

5.) is a slow-query log available?
6.) what php-settings was used e.g.: mysql*, accelerator, ...
7.) Is it possible to see the drupalied site?

I must also say that I'am new with drupal but I have done a lot with php/perl/.. and some webservers like: nginx/apache/lighty.

Maybe I haven't fully understand your site nor drupal but the new drupal book is on the way to me ;-)

BR

Aleks

ubersoft’s picture

You're going to need to explain your questions a little before I can answer them...

1.) the js was from jquery or in every page or in a/some .js files

What do you mean by js? Javascript? And how do I find the answer to your question?

2.) I hope the pics are not in the db, isn't it ;-)?

The pictures are in the files directory, which is Drupal's default location. If you mean something else you're going to have to explain your question a little more

3.) apache can deliver the css/js/png without asking drupal

I publish a webcomic. The primary content on my site is graphics -- if I have to find a way to add the graphics around Drupal instead of using it directly, there's really no point in using it at all.

4.) Have your provider give you some infos about the mysql like:

* query_cache_*
* max_user_connections
* version
* table_cache
* key_buffer_size
* ...

It's all moot since I've moved back to my old host -- I don't have access to the settings for the site when I was running Drupal. What would I do with that information once I got it?

5.) is a slow-query log available?

what is a slow-query log?

6.) what php-settings was used e.g.: mysql*, accelerator, ...

er... it was running php5. there were no php caching tools in use because the host provider didn't have them.

7.) Is it possible to see the drupalied site?

Yeah, but you'd have to stop by the house. :)

al4711’s picture

What do you mean by js? Javascript? And how do I find the answer to your question?

Yes I mean Javascript.
What I mean is: Do you have .js files which apache can deliver directly to the customer or are they embedded?

The pictures are in the files directory, which is Drupal's default location. If you mean something else you're going to have to explain your question a little more

I publish a webcomic. ...

The most webserver are very good and fast to deliver files from the harddisk to the network ;-).
If it is possible to deliver a $CONTENT (pic, css, js, html, ...) directly to the network without to ask the backend then this option should be used.

  1. nginx Nginx, Fastcgi, PHP, rewrite config for Drupal
  2. apache + mod_rewrite with -f/-d as in .htaccess which comes with drupal
  3. lighty + mod_rewrite

lets say your pic is under ..../images/.../$PIC.png and the customer make a request to a drupal site which way goes the request to the pic:

  1. apache (accept request) => ask drupal please tell me where is the location of the pic => drupal give the pic or the loction back
  2. apache (accept request) => see in his config that $PIC-extension should be on the harddisk => deliver it
  3. A way I don't know

or simpler asked have you used the .htaccess ;-)?

What would I do with that information once I got it?

what is a slow-query log?

Well MySQL have some optimization possibilities: Tuning Server Parameters maybe someone from the hosting provider can help us/drupal to find the top 5 queries which comes to the mysql and or which takes a very long time to be answered, maybe this queries could be optimized?
A slow-query log is written from mysql, if it is configured, more here The Slow Query Log ;-)

it was running php5

Well is mysqli used or mysql?!
If you ask me how do you find this out, sorry but I'am new with drupal, please can anybody else answer this ;-)

What I have read from your posts you use a lot of modules, isn't it?
Please can you tell us which module do you use?

inforeto’s picture

Migrating large static sites means that many visitors will suddenly have access to all the increased functionality.
Such a large but static site fits nicely in a shared hosting account, but its dynamic equivalent won't.
Conversely, drupal can power many kind of smaller sites with a small expenditure of resources.
And there's middle grounds and plenty of suggested solutions, optimizations, etc. but making the jump still requires careful planning.

To find bottlenecks there's three main points to consider: queries, cache and resources.

Using devel you can find what queries are being run, not only the amount.
It won't matter if there's hundreds if most of them are calls to simple items like cached blocks and menu links.
But there's always a handful slow queries, beside actual content work like views and category.
Things like statistics and cache write to the database intensively and can take a toll on speed.

Now, these queries happen per page read, so tuning up the navigation is important.
An archive with a calendar can prompt more browsing than needed.
Galleries, forums, categories or searchable content add functionality and visitors but also put the database to work.
Many of these things are cached, but there could be so many items that cache is outperformed.

The benefit of cache depends on the amount of content, as cache expires everyday.
The amount of nodes that could be statically cached is often not significant against the many dynamic pages that are needed for navigation.
The usage of dynamic listings, like archive pages, can make page reads build up quickly and probably won't benefit from any kind of caching either.
The nodes and pages with static content are fine and won't really benefit from any caching if there's pages that slow the site.
Ideally, the sites must be optimized at this point, rather than rely on the measure of queries and cache alone.

This is also behind the reason of why pages aren't cached for authenticated users.
As a middle ground, cached pages can still be shown to users if pulled from a multisite mirror.
But the real issues arise with the content that can't really be cached or isn't cached when viewed.
For example, i have a site with 500 nodes, but spread through 300 categories with sortable views.
Every category page has a pager with links to the first ten pages of results, 10 nodes listed per page, and filters by price, location and brand.
This means there's multiple pages to be cached for the same set of nodes.

That makes the site collect entries in the cache table, and while those build up it behaves as if there were no cache at all:
It always crash in the hours following a cache clear. In the next few hours it collects about 2000 saved entries.
Hopefully, before the peak hour it'd get 6000-8000 and no longer crash. By 9000-12000 it speeds like a charm.
By the end of the day the cache table has 15000 but the cycle is bound to repeat because the daily addition of new nodes change all pages,
most lists are sorted by date and it would happen even if there were no expiration on cache entries.

While the cache is being built there's a higher RAM usage than on the hours afterwards.
I hit the limit on a VPS, so the choice of shared or non-shared doesn't make the difference.
A shared account with large resources would be nothing more than the cost of management and resources put together.
Both VPS and shared have pooled resources, like burstable RAM, but this is never available in the practice.

Where does the resources go? I can't tell, because i haven't compiled my apache server to read the memory usage per query.
But estimates can be done. I see 10 httpd processes keeping hold of 24 to 32 MB RAM each, but there's more on peak hours.
Using "systat", the log shows that the active processes, most of them httpd, go up from the average of 10 to 20-30 on peak hours, before hitting the RAM limit of 512 MB and crashing.
As my visitors double on peak hours, i can see why resource usage doubles as well, but fine tuning everything still takes some work.

CPU supposedly makes these processes finish faster, releasing the RAM that is currently held longer than necessary.
That's only available on a dedicated server. At that level there's more control on all resources, but it still takes some tuning to plan a site scalability.
To avoid conflicts, it'd be good to have the database run with its own resources, on a smaller VPS or similar separate account.
Perhaps serving images through lighthttpd or also in a separate account, which can't be done in a shared enviroment.
In any case, running a drupal site under heavy traffic requires careful planning, but the features are worth it.

harriska2’s picture

I've been with a2hosting for almost 2 years. Over the past month they have gotten to where the server is down several times a day, including email. About a week ago, the mysql server literally lost my database (it was there but the tables were gone) for about 1/2 an hour. Scared me good.

Late last year they said I was causing a problem with their servers using up too much resources. They asked me to state what my websites were used for. I ended up upgrading immediately to patch some security issues that had recently come out. Since then I've been tracking their CPU usage which is always in the red and constantly hovering now around 6 or 7 (2 is green).

One of my sites uses a ton of memory. It has OG, views, and CCK. This site requires over 30MB of memory but there really isn't a ton of content on it.

I agree that the mysql database connections and bandwidth is not trackable on a2hosting. But for $12 a month and hosting 20 databases and 10 virtual domains, what can I do?

magico’s picture

(subscribing)

AmyStephen’s picture

Database Activity: the 40-ton feather that broke the camel's back

We found the same to be true. When HarryB built Open Source Community for us, we were so thrilled with what Drupal was able to do! It was our first Drupal site and we were so impressed. (Of course, HarryB is nothing short of brilliant and has *years* of experience with PHP-Nuke, PostNuke, Joomla!, and WP, so that helped!)

But, before go live, with only a few of us online, we noticed serious performance issues. Long story short, we were forced to find dedicated hosting or move away from Drupal. The number of database calls and latency from querying a remote database (i.e., using Drupal in a setup like most inexpensive hosts provide) is a back breaker.

Other than that, Drupal rocks and we love it! Thanks to all who have contributed to its success!

AmyStephen@gmail.com
http://OpenSourceCommunity.org

bs’s picture

Hi,
This is very important problem we ( I mean Drupaler's) are facing. Wouldn't it be useful if we rewrite some functions like variable_get, watchdog,access log etc.. so they can use other storage methods instead of database, for example file, xml schema etc, to reduce database load?

ubersoft’s picture

for those of you who are looking for more specific information...

I've set up the drupalized version of the site here: http://208.75.86.76

I have also set up the Devel module so that anonymous viewers can see the statistics in the footer. Have at it -- any and all observations welcome.

eaton’s picture

It's very interesting that the number of queries being generated *isn't* hellaciously large by the standards of many content-intensive sites. a couple tweaks to optimize the path alias loading could probably get it under 100 queries for the page, and aggressive caching would drop that to just one or two queries for anonymous visitors.

I'm wondering if there were/are subtler issues at play, too, when higher traffic/load hits the site...

In any case, first I'm sorry that your initial experience has been so troublesome, but second I want to thank you for the tremendous service you're giving to the community in troubleshooting and brainstorming on a mid-traffic site. This is just the sort of environment that is trickiest to optimize due to the constraints of shared hosting setups and we can all use all the knowledge we can get!

--
Lullabot! | Eaton's blog | VotingAPI discussion

--
Eaton — Partner at Autogram

ubersoft’s picture

This has given me something to chew on.

For the purpose of the devel module I turned off Drupal's caching altogether, since I figured that would make devel's statistics useless to any of you who came by.

How does one go about optimizing path alias loading? I'm afraid I wasn't even aware that it was an issue.

That said, it's sort of a relief to hear that compared to other content-intensive sites it's not too bad. Unfortunately that suggests that the best solution at the moment is more hardware. The place I've linked to in the above post is on slicehost.com, which is a virtual hosting service, so it'll probably handle the load better than a2hosting did, but if it turns out to be not enough I can't afford anything better at the moment...

If aggressive caching can really cut down on all those queries in that manner (from 100 to 2) then I'll definitely be turning it on when it goes live again. But I have to ask again why there is so much resistance in the drupal development community to have cached pages for users who are logged in as well? My site isn't set up to allow my registered users to customize the site layout -- at the moment it just lets them edit their own posts and use the views bookmarks feature, though in the future it'll also give them the ability to create metatags of their own (honestly, Drupal's taxonomy system alone makes it worth the trouble I've been having so far), and it seems to me that if an anonymous user is going to generate two database calls and a registered user is going to generate 80-140 database calls to the front page, then I'm going to want to discourage registered users instead of encouraging them. This seems counter-productive...

If my experiences can help with Drupal's development and improvement then I'm happy to have them, at least on one level... I'm not a programmer so I can't actually code anything useful for you guys but I'm pretty good at describing how things blow up when I touch them...

sepeck’s picture

But I have to ask again why there is so much resistance in the drupal development community to have cached pages for users who are logged in as well?

What resistance? You or someone who thinks it's easy is welcome to implement a contrib module that does this. Just because random forum poster claims something is possible or merely asks 'why isn't this done?' a few posts up doesn't make it easy or make the associated cost actually less.

People who say those things don't really understand the issues involved. Your comment alone indicates you are only thinking from your sites setup. Setting up caching for logged in users involves building an engine that is scalable for all the variations of user roles. Why cache an individuals personal tracker for all 100,000+ users of drupal.org? How to deal with queries in pages that check for roles?

Drupal 5 has a default cache setup that pretty good. It has a pluggable caching architecture that you can plug in different cache methodologies. People are perfectly welcome to built their own custom cache methods that leverage these api's. Of course, so many that say these things only confuse the issue and demonstrate that they do not understand the complexities involved. For something to go into core, it has to be a flexible base api. The entire community would welcome a contributed module or working examples to actually judge performance against.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide

ubersoft’s picture

You comment alone indicates you are only thinking from your sites setup.

Guilty as charged. My interest in Drupal focuses on the things I want to do with it. I'm not familiar with a more effective way to evaluate a tool other than to judge it by what you want to accomplish with it. And I'm not particularly shy about advocating that it move more in my direction, either.

Take a step back from your familiarity with the Drupal community, do a search on caching threads in the forums -- focus on the newbie forum -- and then see if you can't see this "mythical" resistance I'm talking about.

Anyway, since we all know I'm NOT a developer, why don't you just follow up with a list of questions I have no business asking until I become some kind of fucking php guru and I'll make sure I don't ask them in the future.

sepeck’s picture

What I said in no way called for that level of hostility in your reply. You are free to ask any question you like, but your comment about a level of resistance to caching for logged in users wasn't accurate and I was trying to help explain the complexities of the issue so you would have a better understanding of them in future.

When implementing a feature in core Drupal, it requires thinking in broad terms and use cases. Not narrow specific ones. As I and Eaton have both commented, just because someone thinks there is a magic wand to wave and make it so, doesn't mean it really exists. It's complicated. It's hard. It's not easy. NO ONE IS HIDING it. It's that it's not as simple as some random people would think, want or believe. If it were simple it would have been implemented.

Evidently my attempt to help you has irritated you enough to swear at me. I may not know php but I do know the process in the community and try and help people learn them so they can more effectively enact change.

As you have now demonstrated your opinion and contempt of my attempts to help you in no uncertain hostile way, I shall withdraw from commenting in your threads or trying to help you in future.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide

ubersoft’s picture

I certainly interpreted your post as an attack... reading over it now I'm not as convinced it was. If no offense was meant, then I apologize. If you feel it necessary to blacklist me, I suppose that's fair.

patrickharris’s picture

he caused your problem in the first place by promoting your site to Drupal front page. :)

It's sad but true that forums are a limited means of communication; interpretive misunderstandings are just so easy - even between two literate people like yourself and Sepeck.

merlinofchaos’s picture

Knowing sepeck as well as I do, I assure you there was no intended hostility in his comment. He was attempting to correct what he saw as a misperception, and nothing more. You are in no danger whatsoever of being banned, and he wasn't intending to deride you in any way.

-- Merlin

[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

-- Merlin

[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

zoon_unit’s picture

As someone who has come under Steven's wrath, I can say that Steven has a way of really raising the hackles of people. It's just the way he words things sometimes that comes across as harsh and condescending. I'm also equally convinced that it is not his intention to do so, and from his viewpoint, he's just trying to help.

I too, suffer from this malady. I would only ask Steven to think a bit more before putting mouth into gear. It has helped me to "temper" my public persona somewhat. I'm still working at it....

Sincerely, best wishes.

ubersoft’s picture

It seems the more time I spend online the more I tend to assume someone is being hostile when they aren't. Enough people have pointed out (publically and privately) that you were trying to be helpful, not hostile, that it's clear I was assuming something that wasn't there. So I owe you an apology -- whether you read it or not, it's still owed, so here it is:

Steven, I apologize for being rude earlier. Nobody likes it when a stranger comes into your backyard and pisses on your porch, and that's essentially what I was doing. You were trying to help and wound up getting bit for your trouble. In the future, I hope, I will exercise more self-control.

sepeck’s picture

Accepted. These things happen and when they do, I try and figure better ways to phrase things in the future. I regard forum threads as a protracted conversation and forget that sometimes others don't parse the threads as I do. This can result in phrasing on my part that is perhaps not as clear as I'd like it to be.

--

What I try and do in the community is facilitate effective contribution.

The reason I promoted the thread originally was to get the performance discussion going. Not for the high end folks but the small and mid tier sites.

You had an excellent write up. Many people know about these issues but that doesn't translate into a lot of documentation about these issues or a broader community knowledge with the part time or smaller site community of implementors.

With Drupal you now have to be aware of at least MySQL and Apache configuration. Also good would be OS configuration and hard drive partitioning, php accelerators, where the database is in relation to the web server. It's not hard to learn, but just flat out takes time and knowing how to research MySQL docs, Apache docs, your OS docs ..... It's up there with the hardest type of documentation to write.

I'd like to see more good case studies develop for people to learn from.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide

eaton’s picture

...to sepeck's comments, I'll also chime in and say that Dries, too, has said that we need to focus on speeding things up (doubtless caching is a part of that) for logged in users as well as anonymous users. It's not that there is resistance to the concept of caching logged in users data, just that there hasn't yet been a clear case of how to make a simple and generalized logged-in-user page-caching mechanism that work effectively. Especially not in a way that is applicable in a "flip the switch" sort of way.

Things like block-cache, perhaps ways to cache the main content of a page, and so on are some potential solution. I think we're all pondering them right now :D

--
Lullabot! | Eaton's blog | VotingAPI discussion

--
Eaton — Partner at Autogram

firebus’s picture

that's just it! page caching is not the answer at all. you need a method that caches bits. cache the blocks, cache the page content, figure out the tabs (can the user edit?) dynamically.

it's not that there's a resistance per se on the part of the devs to caching for logged in users, it's that they've dug themselves into a hole with page caching. theres NO WAY to get caching for logged in users with page caching. you need a cache system that's smart enough to cache the bits.

drupal ALSO NEEDS DESPARATELY a way to cache to static files instead of DB. DB provides an incremental improvement. static files would provide an exponential improvement, and would benefit small fry and large installs alike.

zoon_unit’s picture

hasn't yet been a clear case of how to make a simple and generalized logged-in-user page-caching mechanism that works effectively

By requiring a solution to be simple and generalized, the Drupal community is severely limiting the possibilities for speedup. There is another approach, and that would be allowing the developer to choose which pages and queries to cache aggressively. That way, the developer could decide when aggressive caching would work in his particular situation, rather than the software having to determine that.

Add a flag to each node, block, and view that allows the developer to tag it for caching. Let the "wetware" figure out when it's feasible. Then the developer can set a caching refresh rate that is either based on time, or some other criteria such as revisions, added comments, etc.

Since caching is more important for highly trafficked sites, there is a certain correlation with the developer's skill level. In other words, sites with high traffic are usually run by developers with a higher degree of knowledge, so empower them to find ways to increase performance based on their skill sets.

dman’s picture

It's true that when it comes to 'optimizing' a site that's reached these levels, there will be no one-fix-for-all solution.
The databases recognizes that (with the my.huge, my.small) config samples. Drupal acknowledges it with the throttle mechanism,
but if the option was there to say 'cache this and not that (much)' then this is certainly more feasible to get a few wins for the big players.
Possibly some out-of-the-box recipes for folk who are struggling on a shared host also.
So maybe the solution isn't a top-down reconstruction of the cache system, but a package of specific hacks :)

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

styro’s picture

My site isn't set up to allow my registered users to customize the site layout -- at the moment it just lets them edit their own posts and use the views bookmarks feature, though in the future it'll also give them the ability to create metatags of their own

That there would be enough user specific content (eg each users links would be different) to make caching pages for logged in users not very useful even before considering differing access control levels. I imagine most times a user reloads a path they've already visited would be to see new content or after they've made a change of some sort (eg editing, tagging, bookmarking etc) and would want to see the updated version anyway.

Caching one page that is identical for all the anonymous visitors makes sense, whereas caching per user variations on a page for each logged in user that might not get loaded again before the page is out of date isn't as effective.

Creating a smarter more fine grained caching system (eg specific object caching) is a much harder problem as the level of uniqueness varies between different objects in unpredictable ways, but there are people who have been working on it. Hopefully there will be solutions in the future.

--
Anton
New to Drupal? | Troubleshooting FAQ
Example knowledge base built with Drupal

al4711’s picture

What says your

http://208.75.86.76/admin/logs/status/sql

and

http://208.75.86.76/admin/logs/status/php

Have you under Administer > Site configuration > File system the Download method public or private?!

ubersoft’s picture

First of all, I'm using public download. When I was trying to figure which one to use, general consensus seemed to be that the public download method was a lot more resource-intensive.

Here's the mysql information. Copying from a table rarely works well... I've put it in code tags in the hope it makes it a little easier to read.

SQL
Command counters
Variable	Value	Description
Com_select	2	The number of SELECT-statements.
Com_insert	1	The number of INSERT-statements.
Com_update	0	The number of UPDATE-statements.
Com_delete	2	The number of DELETE-statements.
Com_lock_tables	1	The number of table locks.
Com_unlock_tables	1	The number of table unlocks.
Query performance
Variable	Value	Description
Select_full_join	0	The number of joins without an index; should be zero.
Select_range_check	0	The number of joins without an index; should be zero.
Sort_scan	0	The number of sorts done without using an index; should be zero.
Table_locks_immediate	10906	The number of times a lock could be acquired immediately.
Table_locks_waited	31	The number of times the server had to wait for a lock.
Query cache information

The MySQL query cache can improve performance of your site by storing the result of queries. Then, if an identical query is received later, the MySQL server retrieves the result from the query cache rather than parsing and executing the statement again.
Variable	Value	Description
Qcache_queries_in_cache	3015	The number of queries in the query cache.
Qcache_hits	19872	The number of times that MySQL found previous results in the cache.
Qcache_inserts	6546	The number of times that MySQL added a query to the cache (misses).
Qcache_lowmem_prunes	0	The number of times that MySQL had to remove queries from the cache because it ran out of memory. Ideally should be zero.

Here's the php information. Same deal...

System 	Linux ubersoft 2.6.16.29-xen #3 SMP Sun Oct 15 13:15:34 BST 2006 x86_64
Build Date 	Feb 20 2007 19:57:02
Server API 	Apache 2.0 Handler
Virtual Directory Support 	disabled
Configuration File (php.ini) Path 	/etc/php5/apache2/php.ini
Scan this dir for additional .ini files 	/etc/php5/apache2/conf.d
additional .ini files parsed 	/etc/php5/apache2/conf.d/gd.ini, /etc/php5/apache2/conf.d/mysql.ini, /etc/php5/apache2/conf.d/mysqli.ini, /etc/php5/apache2/conf.d/pdo.ini, /etc/php5/apache2/conf.d/pdo_mysql.ini
PHP API 	20041225
PHP Extension 	20060613
Zend Extension 	220060519
Debug Build 	no
Thread Safety 	disabled
Zend Memory Manager 	enabled
IPv6 Support 	enabled
Registered PHP Streams 	zip, php, file, data, http, ftp, compress.bzip2, compress.zlib, https, ftps
Registered Stream Socket Transports 	tcp, udp, unix, udg, ssl, sslv3, sslv2, tls
Registered Stream Filters 	string.rot13, string.toupper, string.tolower, string.strip_tags, convert.*, consumed, convert.iconv.*, bzip2.*, zlib.*

Directive	Local Value	Master Value
allow_call_time_pass_reference	On	On
allow_url_fopen	On	On
allow_url_include	Off	Off
always_populate_raw_post_data	Off	Off
arg_separator.input	&	&
arg_separator.output	&	&
asp_tags	Off	Off
auto_append_file	no value	no value
auto_globals_jit	On	On
auto_prepend_file	no value	no value
browscap	no value	no value
default_charset	no value	no value
default_mimetype	text/html	text/html
define_syslog_variables	Off	Off
disable_classes	no value	no value
disable_functions	no value	no value
display_errors	On	On
display_startup_errors	Off	Off
doc_root	no value	no value
docref_ext	no value	no value
docref_root	no value	no value
enable_dl	On	On
error_append_string	no value	no value
error_log	no value	no value
error_prepend_string	no value	no value
error_reporting	6135	6135
expose_php	On	On
extension_dir	/usr/lib/php5/20060613	/usr/lib/php5/20060613
file_uploads	On	On
highlight.bg	#FFFFFF	#FFFFFF
highlight.comment	#FF8000	#FF8000
highlight.default	#0000BB	#0000BB
highlight.html	#000000	#000000
highlight.keyword	#007700	#007700
highlight.string	#DD0000	#DD0000
html_errors	On	On
ignore_repeated_errors	Off	Off
ignore_repeated_source	Off	Off
ignore_user_abort	Off	Off
implicit_flush	Off	Off
include_path	.:/usr/share/php:/usr/share/pear	.:/usr/share/php:/usr/share/pear
log_errors	Off	Off
log_errors_max_len	1024	1024
magic_quotes_gpc	Off	On
magic_quotes_runtime	Off	Off
magic_quotes_sybase	Off	Off
mail.force_extra_parameters	no value	no value
max_execution_time	30	30
max_input_time	60	60
memory_limit	32M	8M
open_basedir	no value	no value
output_buffering	no value	no value
output_handler	no value	no value
post_max_size	8M	8M
precision	12	12
realpath_cache_size	16K	16K
realpath_cache_ttl	120	120
register_argc_argv	On	On
register_globals	Off	Off
register_long_arrays	On	On
report_memleaks	On	On
report_zend_debug	On	On
safe_mode	Off	Off
safe_mode_exec_dir	no value	no value
safe_mode_gid	Off	Off
safe_mode_include_dir	no value	no value
sendmail_from	no value	no value
sendmail_path	/usr/sbin/sendmail -t -i 	/usr/sbin/sendmail -t -i 
serialize_precision	100	100
short_open_tag	On	On
SMTP	localhost	localhost
smtp_port	25	25
sql.safe_mode	Off	Off
track_errors	Off	Off
unserialize_callback_func	no value	no value
upload_max_filesize	8M	8M
upload_tmp_dir	no value	no value
user_dir	no value	no value
variables_order	EGPCS	EGPCS
xmlrpc_error_number	0	0
xmlrpc_errors	Off	Off
y2k_compliance	On	On
zend.ze1_compatibility_mode	Off
al4711’s picture

Hi,

on your site the functions views_build_view AND _imagefield_file_load is unkown im my drupal 5.1 tree, please can you tell me which module have this functions?

Which MySQL version do you use and use php mysql or mysqli, this can you see in sites/default/settings.php afaik, hope it is right?

The Table_locks_immediate and Table_locks_waited is very high for my opinion, is this normal for drupal, what does the developer means?

Please can you do the following, if you have the rights:

1.) add LogFormat "%h %l %u %t \"%r\" %>s %b time_taken %D child_pid %P conn_stat %X" timing
2.) add CustomLog PATH_TO_A_LOGFILE timing

This add the ability to see in the apache log which request need the most time with a sort -n -k 12 PATH_TO_A_LOGFILE, maybe the number(12) must be adopt, I have no apache here to look into it, and will the connection be aborted or not.

Which mpm is used in your apache?
Is MaxRequestsPerChild set to 0?

I have make a small quick benchmark with httperf and have seen that on a very small site (1 blog, 1 Story, 1 book) on a Intel(R) Pentium(R) M processor 1.73GHz behind a nginx i can get:

Request rate: 54.3 req/s (18.4 ms/req)
Session rate [sess/s]: min 0.00 avg 1.23 max 8.40 stddev 2.97 (50/50)

I don't know if this is fast or not, but I know the Site is very untypical ;-)

The test was running with this command:

httperf --hog --server 192.168.1.20 --wsesslog=50,5,httperf_sess.txt --rate 10 --timeout 5

httperf_sess.txt:

/
  /misc/favicon.ico
  /modules/book/book.css
  /modules/forum/forum.css
  /modules/node/node.css
  /modules/system/defaults.css
  /modules/system/system.css
  /modules/user/user.css
  /themes/garland/style.css
  /themes/garland/print.css
  /themes/garland/logo.png
  /misc/feed.png
  /themes/garland/images/bg-navigation.png
  /themes/garland/images/body.png
  /themes/garland/images/menu-leaf.gif
  /themes/garland/images/bg-content.png
  /themes/garland/images/bg-content-right.png
  /themes/garland/images/bg-content-left.png
/node/6
/node/1
/blog/1
/node
/forum
/forum/2
  /misc/arrow-asc.png
  /misc/forum-default.png
/node/4
  /misc/favicon.ico
  /modules/book/book.css
  /modules/forum/forum.css
  /modules/node/node.css
  /modules/system/defaults.css
  /modules/system/system.css
  /modules/user/user.css
  /themes/garland/style.css
  /themes/garland/print.css
  /themes/garland/logo.png
  /misc/feed.png
  /themes/garland/images/bg-navigation.png
  /themes/garland/images/body.png
  /themes/garland/images/menu-leaf.gif
  /themes/garland/images/bg-content.png
  /themes/garland/images/bg-content-right.png
  /themes/garland/images/bg-content-left.png

Well from my point of view drupal should look into the db_query* to try to use some prepare statement and avoid to use SELECT * FROM if it is not necessary:

modules/system/system.module:system_region_list() but only filename AND description is used

What does the db-maintainer mean to this?

What you can help is:

1.) find out where the views_build_view and_imagefield_file_load is or which modules this is
2.) If you can connect to the db type in explain SELECT node.nid, node.created AS node_created_created FROM node node LEFT JOIN ....(the whole line is in the bottom of your site ;-)) and please give us the output.

dman’s picture

  • Table_locks_immediate is supposed to be high. The higher (vs table_locks_waited) the better.
  • For database independance, no 'PREPARE' is used in Drupal. (I believe)
  • db_query() is just the database API that does everything. If you want to analyse the SQL, you need to do so several functions above, ie - throughout the entire Drupal code.
  • Almost every module will include some of its own data tabes. Surprisingly, views_build_view is a function within views.module and _imagefield_file_load is from imagefield (part of CCK). You can tell by the first section of the function name. Always.
  • (regarding your earlier post) Drupal does not serve files via database lookups unless downloads are set to 'private'. You will see that request for the embedded images go to /files/images/etc.jpg or wherever. These requests are serviced directly by Apache and do not result in Drupal load.

I understand that you have some suggestions that may be useful, and a fresh look at the DB from someone outside the Drupal mind-set will probably be helpful, but you'll have to get a little more familiar with how things are working (what the queries actually do internally in Drupal) before we can find any progress in this direction.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

al4711’s picture

table_locks*

Hm from my point of view a table lock should be avoided as much as possible.
On http://dev.mysql.com/doc/refman/4.1/en/lock-tables.html could be a nice description how a table lock can be avoided and when he should be uses.
Btw.: From my point of view there are at least 3 main point in drupal which could bring performance

  1. DB-Pooling, to avoid (connect, queries, quit)
  2. If possible prepare statment-pool
  3. file-cache, with smarty or Cache_Lite or $CACHE-ENGINE

But I'am sure that a lot more people have thought about this issue ;-), jm2c.

prepare

Well in the special db-include the prepare could be used because this is the module for the special thing per driver, as far as I have seen ;-)

view & imagefield

Sorry, you are right I think I should take a little bit more time with drupal so that this basic questions will be avoidedd.

files delivering

Yep in the meantime I have seen this, thanks.

Thanks for your adjustment, I hope that the issue from the OP will be solved, good luck ;-).
Neverless I will read further this thread and are very curious to the solution.

dman’s picture

I believe table locks are only being used - intentionally - where there is a potential race condition and when updates/writes are being made to the database. This is their intended use, and should not be affecting read performance.

The statistics

Table_locks_immediate 10906 The number of times a lock could be acquired immediately.
Table_locks_waited 31 The number of times the server had to wait for a lock.

describe that only in 31 cases out of almost 11,000 did the use of table locks adversely affect performance a little bit. The other 10,906 are successes and the high numbers there are good.

On the other hand,

Qcache_hits 19872 The number of times that MySQL found previous results in the cache.
Qcache_inserts 6546 The number of times that MySQL added a query to the cache (misses).

looks like a very low hit/miss ratio of 3:1
However, that may just be a factor of how long it was since the last reset.

Best-case scenario is that MYSQL cache isn't even being hit because Drupal cache is detecting and responding without even hitting it!

Anyway, no matter what some of my clients seem to believe, I'm not claiming to be a SQL-optimization expert, and some work, possibly in PREPARE ing lookups may indeed be very productive. That's not my bag, but I appreciate these methods must exist for something.

I too have been asked some serious questions about Drupal and scalability, and I'd like to feel that it was less dangerous than it seems to be.
As mentioned in this thread, progress is being made towards partial content (ie block-level) caching, and that's gotta be good.

Initiatives like memcache/PECL are probably the ones to watch. I love the idea there, but am aware that tuning at this level may become a little unstable :-)

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

apage’s picture

I'm not sure exactly what your demands are, or how loyal you are to your hosting company. I would like to recommend my hosting company. I just switched to them this year and they are outstanding. A lot of useful features, but here is the best part.

The basic hosting plan includes
1.614 TB of monthly transfer
161.4 GB of storage space
2 GB max file size
Hosting Of Unlimited Domains
And of course the usual, PHP, MySql support, and many other features.

This looks to be quite a bit more than what you are used to. The plan, I think is about $130 dollars per year if you pay for a year at a time. I wouldn't know though, because there are a lot of coupon codes in the wiki and I only paid about $28 dollars for my first year. I will gladly pay the full price next year.

Be sure to check them out, it might be away to get your drupal on with out crashing any servers or paying bandwidth overage fee's.

Arthur Malcolm Page
www.synapsemultimedia.org

aren cambre’s picture

There are many complaints of Dreamhost being slow and overloaded.

dwees’s picture

My friend moved his site off of a Dreamhost server and to a virtual host because he wanted to claw his eyes out while waiting for his virtually trafficless website load pages.

Now he has 1/5 of the time to load pages on the new server.

Dave

aren cambre’s picture

Did he use Dreamhost VPS or some other service?

dwees’s picture

He switched to a different company entirely. Dreamhost wasn't especially helpful when my friend was trying to debug his page loads.

I have 3 test sites running on different versions of Drupal on a shared hosting account and no difficulties at all with speed. No complaints either from my host. However I have almost no traffic to most of those sites.

My friend had almost no traffic either to his site, but it was because the pages took frikken forever to load. Then Dreamhost 'upgraded' to Php 5.2 and his site stopped working completely, until we forced it to run on Php 4. Apparently his single site, running on Php 5.2, went over the 100Mb limit, and without any warnings, he was given the blank white screens of death.

Dave

zoon_unit’s picture

I signed up with Dreamhost and will be leaving soon. While they are good at setting up a generic environment, their mysql performance is abysmal. My sites sometimes take 30 seconds to render the front page!

Meanwhile, one of my customers has moved to Site5 and a webpage that rendered in 5-30 seconds on Dreamhost, now renders in 2-4 seconds on Site5!

al4711’s picture

the posting is the 'Cool, thanks ;-)'.
sorry for mistake.

hrpr.com’s picture

Dreamhost is not the place for a Drupal site. Had one there...response time was terrible, mainly due to the load Drupal put on the database. Dreamhost runs Mysql on separate servers so every transaction goes over the LAN, making matters that much worse.

But, I have Wordpress and Joomla! sites there that work just fine....

I've pretty much convinced myself that Drupal needs a dedicated host...

HarryB
http://wwww.opensourcecommunity.org (Drupal @ dedicated server)
http://www.hrpr.com (Joomla! @ Dreamhost)
http://www.halfvast.com (Wordpress @ Dreramhost)

liquidcms’s picture

I run a small drupal design biz and i am putting together a pretty large site right now which i demo to my client on my DreamHost shared hosting account.

The DH PROS:

- all the stuff we know.. tons of space, unlimited domains, ridiculously low price
- and the special things i need: ffmpeg since my site does video conversion a la "youtube", svn server for every site i design, shell access, etc, etc

and the CONS:

- ridiculously bad performance and reliability: drupal pages sometimes wont load or take up to a minute, my mail accounts fail regularly - most of my sites are pretty much unusable.

I have tried other shared hosting but none have the features i need (svn, ffmpeg)... but i lose sleep and hair everyday i am with DH.

Hopefully from this thread i will pick up a few clues as to the direction to go for hosting replacement - i can easily afford $50-100/month if i can get the fetaures i get with DH.. but also with pages that actually load.

=============

on another note (i.e. the one this thread is regarding)... as much as i swear by the wonders of drupal's api.. as i am in php debugger 10 hours a day designing custom modules i must agree it does seem that the excellent extensible architecture does come at the cost of being very inefficient. So which is better: development cost to write efficient functionality from scratch versus simply getting more horsepower to handle drupal's inefficiencies. I'd have to still pick the latter.

Peter Lindstrom
LiquidCMS - Content Management Solution Experts

vm’s picture

consider site5 ,ffmpeg and shell access only have to be asked for to be asigned to your account.

michelle’s picture

I'm on site5 and my site is down a lot and I frequently run into memory issues (AKA WSOD). I don't know that I can fault them as they probably aren't any better or any worse than any other shared hosting. Their tech support is pretty good, so there's that. But moving to site5 won't cure the previous commenter's reliability problems.

Michelle

--------------------------------------
My site: http://shellmultimedia.com

zoon_unit’s picture

I've successfully migrated a Drupal 5 site to Site 5 with no problems whatsoever, but as you probably know, Drupal 5 is much less memory intensive than 4.7, hence less WSOD.

vm’s picture

I've encountered zero problems on site5 as well, and find the maro server at least, very stable. I'm concerned a little that S5 is handing over ownership of their server to rackspace, but I can't at alll complain about my server but down.

Coupon Code Swap’s picture

I have a site5 account. I recommend checking the uptime for the server they put you on while you are still within the 60 day trial period. I've been stuck on a lousy server for months now and they refuse to move the account to another server. Hopefully they have finally resolved the problems. Overall, I'm happy with the service but I've experienced a lot of downtime and aggravation. It is a plus that they even publish server uptime, most shared hosting providers do not.

http://www.site5.com/support/uptime.php

------------------------------------
ThemeBot - Find and Share Web Design Templates

zoon_unit’s picture

Rackspace has an incredible reputation for having the best tech support in the business. I believe they have the expertise to build some servers optimized for Drupal. And you have the benefit of an upgrade path to a dedicated server if needed.

Coupon Code Swap’s picture

They are going to The Planet not Rackspace:

http://www.theplanet.com/

------------------------------------
ThemeBot - Find and Share Web Design Templates

vm’s picture

yea my bad, had rackspace on the brain. I meant the planet.

Only worried to the degree, that its a big jump & and I've been through this type of thing before with Powweb, before it sold its servers to Endurance. I'm leary over the "nothing will change" message. I've been through that being stated before, and in the end most things wound up changing with no avenue of recourse.

Trying not to let one bad experience with this type of situation bias me : )

michelle’s picture

I've got one site still on 4.7 but the rest are on 5. How memory intensive it is depends on how many modules you have installed and I've got quite a few. I hit the WSOD a lot. That's not really site5's fault as it's a limitation of shared hosting in general.

The frequent downtime, though, is aggravating. Especially since it seems to go down during my son's naps when I actually have time to work on the site. Sigh.

I've heard they have better servers, now, but they won't move unless you upgrade your account.

Michelle

--------------------------------------
My site: http://shellmultimedia.com

Coupon Code Swap’s picture

This is an excellent thread and directly addresses one of the major concerns a web developer is likely to have when building a site using Drupal. One of the sites I'm working on will eventually outgrow shared hosting but I can't afford a dedicated server at this time.

I started a discussion for hosting busy Drupal sites without a dedicated server:

http://drupal.org/node/138425

I'm hoping some developers who have accomplished this will read the thread and contribute a few benchmarks. It would be useful for Drupal users to have this info.

------------------------------------
ThemeBot - Find and Share Web Design Templates

quasimodal’s picture

From the "make-drupal-scale" post ...

I do a lot of profiling and debugging of C programs, and my first instinct is always to try and reproduce problems locally before reproducing them in production, a strategy that works for programs in any language. Why?

1. They are easier to measure locally

2. They are easier to fix locally.

3. You aren't affecting users

4. You aren't being affected by users.

5. You can detect environmental problems, as environmental problems usually won't appear locally.

This is an excellent idea. I use either XAMPP for Windows or a combination of Puppy Linux and XAMPP as a server ( Puppy and XAMPP will run on any old nothing machine that hasn't seen service since 1998 ). Using a local Drupal desktop has given me alot of insight into Drupal performance, particularly the performance of individual modules I'm using.

All installed modules increase of the length of the code path to various degrees and invariably cause a performance hit - some more, some less. Figure out the minimal set of modules you need to do the task and uninstall everything else. Naming no names, it helped me to distinguish between the essential light-weight modules from the nice-to-have heavy hitters that I could do without.

Also, I've done a significant amount of SQL perfromance work and I haven't seen anyone mention the factor of the size of the result set as opposed to the number of queries. In my experience, the size of the result set size is at least as important to query perfromance as the number of queries. All the big "blob" images embedded in DB tables may be a problem. It may be better to just insert the images with tags and let Apache do the dirty work if it can.

Also, get rid of low-level logging if you can, As a rule of thumb, DB inserts are about 4 to 5 times more resouce intensive than reads. There were some comments about the number of "locks" being generated, and that can represent a signifcant performance hit, especially from logging that is taking place behind the scenes. The log info may useful sometimes, but is it essential ?

To reiterate other people's comments, this is an incredibly useful thread. A must read for Drupal implementors, whether newbies or old Drupal dinos. Thanks everyone.

dman’s picture

Naming no names, it helped me to distinguish between the essential light-weight modules from the nice-to-have heavy hitters that I could do without.

I think it would be useful to annotate a few of the modules with their potential performance impact. To assist in evaluation and architecture decisions.

I personally get the gut feeling that CCK in general is an order of magnitude heavier than any custom (module-defined) node type. Thus, for industrial strength sites I avoid it unless totally neccessary. Although still use it for prototyping or for shifting requirements.

I may be wrong

... and I'd like to hear anyones thoughts on this (That's a great thread about flexibility and maintainability, but doesn't touch on efficiency).
I KNOW as much work as possible has gone into optimizing CCK, but because of it's flexibility, it's neccessarily multi faceted. The node object you end up with is a lego sculpture. It is exactly what you wanted, but the SQL DB has to count the pieces. If you can replace that amalgam with a die-cast toy of the same shape (a custom node type) the DB can just count '1' and be done.

I think this makes a difference when the DB is getting hammered.

[edit: If it doesn't already, I'm thinking that a node-object-level cache_cck option would be a big win]

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

inforeto’s picture

I believe the value cck has from flexibility is greater than the cost in performance.
For content purposes, cck provides a way to skip building your own sql tables.
But on a web enviroment i'd weight the cost of upgrading RAM against the cost of programming.
You have to actually plan and test your data sets to know if the site would benefit from optimizations.

The choice of using views would take more tought, but its flexibility also have great value.
Instead of replicating functions, work could be spent on custom code to handle large sets of data.
Perhaps with temporary tables or cached results, etc. depending on the project.

dman’s picture

When flexibility is more important than performance, you are right. There are plenty of good justifications for that approach in the thread I linked. I love the entry-level features of CCK, and the ability for a client to work on it (as if they ever will). But for really really high load sites, with static requirements, it's sub-optimal. Just as Drupal itself would be sub-optimal if you never wanted to change anything :)

it helped me to distinguish between the essential light-weight modules from the nice-to-have heavy hitters that I could do without.

Here it's even beyond that. quasimodal is dealing with a trade-off between functionality and performance. Not just how well module X works, or how flexible it is, but whether it can work at all in his environment/load.

If someone found that CCK was causing them this sort of dilemma, I believe that investigating a custom tuned module may make more sense.

Suggestion 2 - a wizard to auto-create a custom node type (and associated DB table) from a given CCK definition and migrate the values. Even if just useful for a performance metric, I think such a demo would load 10x faster from the database!

I really do agree with your excellent point about throwing RAM at the problem. It occurs to me that for some of the days I've spent diagnosing and optimising certain performance issues, it would have been cheaper for the client to have just bought another server and split the load or something :)
That's what comes of always looking for a software solution to a hardware problem ...

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

Coupon Code Swap’s picture

>>That's what comes of always looking for a software solution to a hardware problem ...

In a lot of ways, the hosting environment hasn't caught up with the types of sites that are being built these days. Now that people have some really good site development tools available through open source content management systems, there are more people building dynamic sites. But how many of these new developers know how to do low-level optimizations and assess performance issues? I for one am learning a lot through working on my own projects and I am feeling a bit limited by the performance issues and cost of hosting required for busy sites. There is a lot of foresight required if you want to run a successful website. The Drupal community, among others, is such an excellent resource for information. Thanks to all!

I think a lot of people just want to add all of the cool functionality they can pack into a site and aren't aware of the strain this will put on the server. There need to be some major technological break-throughs on the hardware / hosting end of things to raise performance and bring the cost of hosting down so that people with less sys admin type of experience can build feature-rich sites using the tools that are now available.

------------------------------------
ThemeBot - Find and Share Web Design Templates

quasimodal’s picture

The subject of application performance is too complicated for a simple answer.

First of all, I have a keen appreciation of how much work Drupal module developers put into the things, so I'm very wary of making potentially hurtful ( and probably wrong ) statements about their modules. Often, a specific design is based on assumptions about the operating environment that may not always be true and then the discussion devolves into assertion about how a module is being used in the real world rather than about performance per se.

The following sections contain some very general and mostly unverifiable conclusioins about Drupal module performance, for what it worth ...

The Value of Performance

I think the central question about performance should always be, what's it worth to you ? What performance price are you willing to pay for a function ? If a multi-billion dollar currency trading operation wants sub-second response time for some complex business function, they have the justification and the money (!) to make it happen. It's valuable to them so they'll take the hit.

So I think that understanding the basic requirements for a site is critical to determining the 'efficency' of heavy hitter functions. If a certain piece of code does what I need to do and I don't have to take 2 weeks out my busy schedule to write code to do it, then the solution is 'efficient', even if the adopted code runs alot slower than custom code. In my definition of the word, 'efficiency' is roughly the sum of machine response time plus my own response time ( more likely to be measured in months than in milliseconds ). I think it's really important to look at the "value equation" when considering performance.

Performance Analysis on a Shoestring

The best and most assured way to understand code-level performance is a methodical, scientific quantitative approach, rigorously defining the system under test, generating a synthetic workload, running monitoring software, capturing voluminous data, parameterizing a performance model of some sort, etc. etc. It can be a big task.

My approach is more in the quick and dirty "Performance for Dummies" school. I set up a Puppy Linux / XAMPP server on "NellyBelle", an ancient and venerable 200 MHz Pentium II machine. My GKrellM resouce meter shows CPU consumption nailing at 100% and staying there for long periods of time. I measure gross response time with a stop watch. Performance differences are painfully obvious. This very simple configuration gives me a good sense of stand-alone performance ( assuming a fairly high speed connection ). Of course, behavior under load is still an unknown, that requires a full scale performance experiment.

In fact, the actual capacity of my little slice of a Hostmonster server is probably something like a virtual 400 MHz Pentium machine, maybe with twice the CPU/Memory/IO juice of my Puppy server.

Generalizations about Module Performance

So ... I think that my ad-hoc and entirely unscientific experiments have lead me to a general sense of how to predict performace of a particular module. I assume that code-level perfromance is more or less efficient - this seems to have been pretty well confirmed by what I've seen of maybe a couple dozen Drupal modules. They are often cooperative efforts and that seems to sharply reduce the number of idiosyncratic designs and oddball algy-rythmns.

Assuming code efficiency, I think that two factors stand out as important in predicting resource consumption, the size of the code module and the 'nestiness' of data structures it uses. Basically, it's code length times the number of times the code is used. Complex, highly nested data structures cause code blocks to be executed many many times, so the total length of the code path is usually not reflected entirely in the gross amount ( the number of Kbtyes ) of code in the module.

There are a lot a variations on the theme, but overall most Drupal modules I've encounted seem to conform to this rough rule of thumb - size of the module times a 'nestiness' factor. Generally, smaller modules are faster than bigger modules. Flat structures are faster than complex tree structures.

Your sense that CCK is about an order of magnitude more resource intensive than hand-crafted code is probably correct. Generally, a generic framework will require maybe 8 times more resources than code that is optimized for a specific task. Instead running in-line code, the framework navigates through complex data structrues describing the task to be performed, generating the equivalent of in-line code. In fact, the logic of the situation is roughly equivalent to that of compiled versus interpreted code.

Providing Finer Levels of Functional Granularity

It may be that overall performance of Drupal sites can be improved with better packaging of functions into light-weight and heavy-weight version of modules, rather than focusing too much on code-level efficency. Of course, code level efficency is very important ( no such thing as too fast ), but some functions are just heavy hitters and that's the way it is. If a Drupal user doesn't need a certain function, they should be able to turn it off.

In my view, your idea about a "node-object-level cache_cck option" is exactly the sort of approach that can produce signifcant performance enhancements on the "user-level" in a fairly short cycle without having to rearchitect the whole she-bang. It may be more a question of functional granularity.

Inherently Difficult Performance Challenges

One of my big interests these days is the Semantic Web. There are various definition of the SemWeb, but mine necessitates gigantic tree structures with millions of knowledge 'nodes' distributed over the face of the planet, giving rise to a vast brave new world of knowledge resources shared by billions ... blah blah etc etc. Consequently, I've been watching the NINA/RDF/OWL projects with great interest. Performance is always a HUGE issue in knowledge intensive applications, especially when combined with all the gritty issues of statefulness verus statelessness. Actually if you think about it, statefulness is one of the central issues of caching, in a sense hanging on to semi-static, partial answers so you don't have to answer them all again the next time round.

A good example of the performance issues surrounding the SemWeb is the Tabulator Plugin for Opera. It runs pretty slowly, not because of inefficent code ( as far as I can tell ) but because it's doing alot. It begs the question, does it really need to be doing so much ? Maybe the same question could be useful for designing Drupal modules.

Publishing Static Pages

Another perfromance-related interest has been in using Drupal as a platform to publish static HTML pages. I have tons of text, most of it is rehashed boiler-plate documents from consulting assignments in the years of yore - it was static 10 years ago, it might as well be static now. In fact, the built-in "pretty print" function does about half the job, but there are several tricky areas, for instance creating styles consistent with Drupal and automating valid links back into the Drupal environment. Templates may provide most of the solution. It seems to me that there's been a steady trickle of interest in the forums about ways to integrate static pages into Drupal sites.

Drupal Design Patterns ?

One other idea that might be useful is something like Drupal Design Patterns. Given a certain usage scenario, what would be the miminal functions, modules and configuation parameters required to make it go. It would ease the process of implementing a Drupal site for non-technical people. I've been looking at Website Patterns for a while and maybe some of them can be adapted to Drupal.

I seem to have written a disseration on the subject, proving again that brevity is the soul of wit. Hope some of the above is helpful.

- Bill Breitmayer

My personal contribution to prolifering web site pollution.

http://www.billbreitmayer.com/home/

dman’s picture

Um, fantastic and comprehensive thoughts there!

Your 'efficiency' calculation is valid. Especially in a practical business context.

But thinking too much that way probably leads to bloat and bad code piling up upon itself in the future :).
If you can do it the wrong way in 20 minutes, vs the right way in two days, and consistantly take the 'most efficient' path to productivity .... well your performance problems will become exponentially harder to clear up.
Of course, with the life cycle of a website being close to 2 years between redesigns, and Moores Law covering up your litttle shortcuts ... we're probably safe ;)

I too have been building SemWeb stuff and trying to incorporate RDF/OWL into a Drupal framework, and even in my sandbox I started seeing DB problems just due to the recursive design patterns involved. Great in theory. Hard on the processor in practice.

I'm also adding a 'static' page cache/archive as output from my import_html module. More for safety and posterity of content than for performance issue. I've migrated through too many systems to be happy the database owning all my pages without a 'dump to portable format' button.
My vision of static caching is pure, semantically tagged XHTML files without any theme/chrome/blocks, so it's not the right solution for the live site performance issues - but may be for folk who are supporting backlogs of archival stuff.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

quasimodal’s picture

Your 'efficiency' calculation is valid. Especially in a practical business context ... But thinking too much that way probably leads to bloat and bad code piling up upon itself in the future :).

Some might call that a future business opportunity for software consultants. :-))

How many times over the last three decades ( o lord ) have I been confronted by highly-annoyed project managers ordering me to stop whining in meetings about maintenance nightmares in the making. The clever managers confounded me with the counter-argument of who did I think could do a better job than myself given the time and resource constraints. The correct answer is "no one" of course, so Q.E.D. the company is lucky to have me around or the result would be even worse than it is. Appealing to my vanity was always a sure-fire way to get me to shut up about the mess we were making. The more blunt managers would tell me that if I didn't quiet down and go with the plan I was going to get tossed out on my tookus ( direct threats also work, a question of style I suppose ).

But that's the way it is in business development. Time is money, not only the money expended to build the system but also the opportunity cost of not getting the benefit of the system while it's being developed. That opportunity cost can easily be $1 million a month.

One of my industry areas has been in health insurance, claims processing and policy configuration applications. To this day, many of their environments are a 1970's style COBOL jungle. 80-column records anyone ? You may say "80 column card images on disk in the year 2007, that's impossible" !!! OK, so don't believe me, I barely believe it myself, even when I'm looking at it. It's amazing that anything works in many big Fortune 1000 legacy applications, largely as a result of the accumulation of quick fixes over the course of years.

On the other hand, I don't think the Drupal project has that problem to the same degree as 'big' applications. It seems to me that the ratio of code to functionality and the ratio of developers to code are quite favorable. The underlying PHP mods provide a huge amount of very efficient functionality and that keeps the Drupal code base much smaller than it would be otherwise.

I don't know exactly the number of lines of code in Drupal, but if the actual running code base is about 1.5 megabytes [ ed. note: original estimate revised ] and assuming 40 bytes per line, that's about 40,000 lines of code. In a well-funded project, that would imply maybe 5-8 developers for the code base ( in the real world, it could very well be 2 developers, with someone to do QA for you if you are lucky ). I think there are far more Drupal developers than that, maybe 100 people familiar enough with the code to jump in if necessary. Even if most of them are part-timers, that's a lot of attention, debate and thought going into the design and implementation of Drupal.

So, maybe it's the other way around, that Drupal may need a more customer-centric "IBM marketing" outlook. In the drift toward commercialization ( I hope ), Drupalers seem to be flirting with the idea of rushing toward a dominant "market share" in Open Source CMS and maybe that's a good thing. Of course, if one kisses too many frogs in looking for Prince Charming, one winds with warts. It's a balancing act.

BTW, IBM's interest in Drupal is big deal, they're the best, have been for many years ( I'm sure you've seen their Drupal tutorial, http://www-128.ibm.com/developerworks/ibm/osource/implement.html ). That was a big factor in my getting on the boat with Drupal last fall. Sometimes I think IBM is promoting Drupal and Open Source just to get at Bill Gates. They've never forgiven Gates for running away from them with the PC market, their corporate memory is long and grinds exceedingly fine, I'm happy to say. :-)

I too have been building SemWeb stuff and trying to incorporate RDF/OWL into a Drupal framework, and even in my sandbox I started seeing DB problems just due to the recursive design patterns involved.

I must have been vaguely aware of that during my previous response. I visit your site fairly often.

In fact, I have your Relationship Manager installed on my local "Drupal4Windows" installation. Not on NellyBelle ! :-) BTW, excellent API documentation, plaudits ( http://coders.co.nz/drupal_development/?q=api/relationship ). I'm still learning the module, it's inherently complex stuff. RDF Store all by itself is a fairly heavy-weight piece of functionality. OWL tools remind me of the steep learning curve required to use the Protege ontology editor. When one gets down to it, even editing the simple "pizza ontology" requires some deep thinking about how to structure knowledge in an ontology.

I've also followed the many of the ARC versus RAP versus who-knows-what debates over the last year or so. One thought I've had is using a more easily optimizable relation-to-object mapping tool such a CakePHP. I have a Drake/CakePHP prototype of something like "Visible Drupal" to explore the database structure. But I'm not sure what it's buying at this point. Perhaps more controllable and sophisticated caching ?

One very generic structure I've used through the years is a pair of Entity-Relation tables for meta-data combined with Object-Attribute-Value table for instance creation. It's something like RDF, but includes both data and meta-data ( and even meta-meta-data if you can stand it ). In fact, it seems to encompass RDF/OWL structures as a subset of its features.

Unfortunately, the EROAV meta-model is pretty much a worst-case scenario performance-wise, no better than and probably worse than RDF/OWL. I think that the only way to make ER meta-models work effectively is to build objects and then keep them persistent, sharing object instances over many different sessions. In other words, serious statefulness.

There may be an easy way to push off caching to the client via plugins of some sort. From the object server standpoint, the problem becomes one of object identity rather than user identity, which is more amenable to CORBA and SOA type of technologies. In other words, pushing object instances off to the client cache via AJAX/JSON might result in better performance and allow the server to deal with a better class of problem than having to manage per-user caching.

Another idea is to restrict who actually uses the Semantic Web "thing" on my site, that is the Personal KnowledgeWeb Desktop I have in mind. In my view, the Desktop would be connected to perhaps hundreds of other servers run by people who subscribe to a particular KnowedgeWeb. But, from my perspective, the only legitimate user of my Desktop is ME. If people want to access my server, they subscribe to the KnowledgeWeb and access my knowledge nodes via the KnowledgeWeb, not by directly accessing my server via page requests. The HTML functionality of my Desktop server would be private to me alone, always in single-user mode ( with maybe some exceptions ). The other folks would have their browsers set to their own Desktops. FOAF/RSS seems to provide some of the skeleton functionality for this.

This architecture might have performance advantages by spreading the load around a bit. Still, it would generate a significant load of inter-server traffic coming either to coming my knowledge nodes or through-traffic going to other servers, sort of like a KnowledgeWeb version of IP routers. There would also be a good deal of replication, necessitating a fairly sophisticated RSS-like object versioning mechanism.

Anyway, the Semantic Web is a fascinating subject. For all the misery and problems in the world today, it's good to be alive just to see it happening. Many thanks for your good work.

- Bill

zoon_unit’s picture

And got an interesting answer. Apparently, the highly normalized nature of CCK's field set is overcome by caching the resulting node. Therefore, a CCK node takes no more time to render than a "classic" node structure.

This may be a gross simplification, but one thing is true: CCK has MANY developers crunching the code and optimizing, so it has the potential to be one of the better performers in the Drupal module universe.

In contrast, so many modules that try to build custom node structures suffer from poor performance due to the limitations of their programmers.

dman’s picture

It's good to think my mis-preconceptions may be wrong.
I've investigated the cache table and seen the data it's storing there, and does (I guess) do the right thing. I wasn't expecting to be the only one who'd thought of it (or was concerned about it)
I'll get friendlier with that side of things now...

It's a good point about the likely differences between mainstream and customized modules, so that's another trade-off to take into account.
It actually bugs me how most of the php snippets are the WORST examples of how to do something :). They are always prime examples of write-once code that works but probably isn't the right way to do things on a high-traffic basis :).

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

harriska2’s picture

Organic Groups. My guess is this module can be very intensive as far as # of queries and thus lots of extra memory.

pamphile’s picture

Don't restrict your website to one host. Your database is obviously taking a beating.

I recommended ModWest.com in your first post and I am doing it here again. They have a highly organized database backend. I hosted a popular statistics tracking service on Modwest for a year, before selling it.

http://www.ComputingNews.com
http://www.BusinessLetters.com
http://www.AcmeTutorials.com

Coupon Code Swap’s picture

Another site that was converted to Drupal recently is the1secondfilm.com. I saw them pop up on the Digg front page and decided to check it out. The site kept going off-line and having extremely slow page loads for maybe a day or two after the Digg effect.

The thing is, they were on a dedicated server already and thought they were prepared for some major traffic. Apparently they needed a second dedicated server to handle the load.

http://www.the1secondfilm.com/forums/t1sf_producer_forum/suggestions_sit...

------------------------------------
ThemeBot - Find and Share Web Design Templates

merlinofchaos’s picture

One interesting way to help performance-enhance Views a little bit is to do this:

Create a module. Let's call it 'ubersoft.module'. Using Drupal 5 it means you'll need the .module file and .info file. Put them both in 'sites/all/modules/ubersoft' (or modules/ubersoft if you like, but sites/all/modules is a better place to keep stuff). Even better, go with sites/all/modules/custom/ubersoft so you can tell it apart from other contrib modules, but that's only needed if you're likely to have a few of these).

The .info file would look like this:

name = Ubersoft
description = Custom code for ubersoft.com

In the .module file, implement the following function:

function ubersoft_views_default_views() {

  return $views;
}

Yes, it's a bare function -- for the moment.

Go to your customized Views. 'export' the view. Cut and paste that code into that fiction, before the 'return' line. Do this for every customized view you have. Now, if you've customized views you've already got from some other module, you'll need to do a little work. You should skip those for the first pass.

Once you've done this, activate your new module. Go to the views admin page. You should now see all of the views you did this with listed under 'default views' and they should say they've been overridden. Edit one of the original views and delete it. GO make sure that view still works. If everything is good, then go and delete the rest of them. When you're done, all of your custom views will be in code, and a bunch of those queries to load views will be eliminated.

Note that this isn't much of a win: Those are very fast queries. But if the sheer number of queries matters, this will help.

(To do views from other modules that you've overridden, you'll have to do two things: 1) rename the view (i.e, make taxonomy_term into taxonomy_term_custom) prior to the export, and then 2) 'disable' the original default view so that it doesn't get added to the menu structure.

-- Merlin

[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

-- Merlin

[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

ubersoft’s picture

You know, I actually did something like that with the imagecache thumbnails so I could add links to them. I hadn't considered it would also reduce the number of database calls. I'm going to have to try that out. At this point I don't know if the sheer number of queries matters or not, but that looks like fun.

zach harkey’s picture

that looks like fun.

Step away from the keyboard, Ube.

-zach
--
harkey design

: z

pcs305’s picture

this is a handy hack!

ubersoft’s picture

... what the heck happened to the example of my drupalized site that I linked to in a prior post, I'm having a small problem with mysql -- namely, I sort of broke it earlier this morning while trying to fiddle with something completely unrelated to the site itself.

Robardi56’s picture

I became a drupal fan since version 4.6...so I saw 4.7 and 5.x coming to life.
I was surprised by the lack of development in REAL solutions for the problem of big resource usage. Come on, drupal is a COMMUNITY tool, so it is supposed to have a lot of users logged in, yet, the cache is best effective only for unregistered users.

I think it must be an absolute priority for next release, to focus on effective caching solutions: full static caching for logged in users or block static caching at the very least. It is time to resolve this problem once for all.

Drupal must become a CMS reputed for its moderate resource usage, not for being an ogre.

Brakkar

moramata’s picture

When a site gets that kind of traffic, shared hosting is simply not an option. You need a dedicated server. You are sitting on a gold mine and you don't realize it. You need to find ways to monetize your website. You can get a very decent dedicated server for $100 -$150. Long time ago, I have learned that ISPs don't talk about resources (mostly CPU utilization). They will give you unlimited bandwidth (in your case 200 GB) but to pump that kind of bandwidth, the machine needs lots of CPU cycles. My adivce: move to a dedicated server and if you are not making at least 5K a month from your traffic, use a part of your creativity towards monetizing your site.

zach harkey’s picture

Moramata is absolutely right. Shared hosting has a serious ceiling for cpu intensive websites once they start getting traffic. Believe it or not, it's a good problem to have, and easily mitigated with a minimal increase in hardware. But, don't expect your hosting company to provide options in the way of an upgrade path. They'll just shut you down like the unfortunate clog you are.

-zach
--
harkey design

: z

nathanpalmer’s picture

We drupalized http://www.votefortheworst.com at the end of last year. At that time we were on a shared hosting plan and we have had to scale up to a 2-server cluster since then. Of course we get an average of 250k pageviews per day and we have had very intensive spikes usually right after Sanjaya's performances.

We're currently using the standard caching that comes with 4.7. We used Boost for a while but it killed apache performance and had to disable it. Other than that we try to not use very many modules and have had to add a couple of indexes here and there. But we still get some downtime during the big spikes (even on the 2-server setup.)

I feel like there is more than needs to be done to get our site optimized. But it seems at this point we need to dig into the code and really analyze what everything is doing to make some optimizations. It seems that TeamSugar had to make quite a few optimizations and a custom caching mechanism before their drupal site worked well.

Nathan

TinTin_Pinguin’s picture

Been testing php-eAccelerator and php-apc on Mandriva.
Well, pretty fantastic results, page time reduced from 140ms to 46 ms, memory from 6MB to 0.4 MB (with eAccelerator). APC gave me some higher values for memory and time, so eAccelerator is the keeper (beside the fact that APC’s admin page is nicer and works better than eAccelerator’s).

Wander if Drupal could be squid reverse-proxy friendly…
I know wikipedia.org is using this architecture with great results.
As far as I know, Drupal is not writing some headers (I presume no-cache headers for logged-in users, or maybe some smart url-rewriting, with the username in it) and this is why squid reverse proxy is not suitable for.

Anybody having some experiences with squid reverse proxy and Drupal?

Thanks,
Tin Tin
Pinguin

tesliana’s picture

Me thinks that heavy trafficked database driven sites (like Drupal, etc) need to be optimized in 2 ways.

While there is code tweaking there is also site (as well as internal modules) design and development that must take database activity into consideration. Too many such sites are being designed with a main page that does not just look like a swiss knife but more like a warehouse of swiss knives.

Would a site be less effective and popular if its main page was tastefully done but was not much more than a top level menu with half dozen items on it, instead of current tendencies of summarizing just about the whole site on that one front page ?

There seems to be too much tendency to add the various dynamic and data driven bells and whistles that create beatiful sculptures one sand grain at a time and then compare them with the old static pages that served whole slabs of etchings in the stone. The differences are night and day and not only can they not be compared but the overall site design and development must take this fact into consideration.

---
We are all prisoners of our own experiences.

ubersoft’s picture

Would a site be less effective and popular if its main page was tastefully done but was not much more than a top level menu with half dozen items on it, instead of current tendencies of summarizing just about the whole site on that one front page ?

Honestly? In many cases, yes.

You only have a small window of time before a new visitor decides to go somewhere else, and the more you obscure your content the smaller that window gets. I can think of only one webcomic that gets away with not posting the latest comic on the front page (Penny Arcade) and they're so huge they can get away with things others simply can't.

jashoet’s picture

Hi. long and interesting thread this one.
A practical solution I use to reduce at least my front page queries is to use crontab to 'copy' the front page every 6 hours or so to an index.html file. That is about the change cycle for my site. I have also redirected my breadcrumbs and links to 'HOME' to head back to index.html. Saves a bunch of queries! You obviously need access to the crontab etc, but here is an idea...

crontab command looks something like this (others may have a better suggestion)

cd ~/mydrupaldirectory && php4 index.php | sed -n '/<!DOCTYPE/,$p' > index.html

malexandria’s picture

When I moved my site EclipseMagazine.com over to the Drupal Platform five months ago, my host provider
(Not a good one) started turning off my site - seemingly every couple of minutes. I don't know if it was a scheme to try and get me to go with their outrageously expensive VPS option or what. So I promptly switched to a new provider, and the new provider did the same thing. For Eight years, of all my technical and growth pain issues, I had never been hassled over a server limit issue.

Drupal eats up an amazing amount of resources. When investigating my SQL DB I noticed The Watchdog Table
had over 1 million queries! And the Search tables routinely get up into the million query numbers as well
and I have to constantly monitor them and empty them to make sure my site doesn't get turned off. It's really annoying.

sadwanmage’s picture

How did it take the community over a week to suggest this?
I'm sure a static cache of the front page is not a magic cure all, but for ubersoft.net it seems pretty solid.
After all, the vast majority of a webcomics visitors will just log in to the front page to see the latest comic, and then leave again.

It doesn't seem you had anything very dynamic (e.g. anonymous comments) on your frontpage anyway, aside from a userpoll, so I don't think you'd be suffering much from making it static. Manual refreshes when you post news (that's a quick module in itself), and refreshes when a new comic appears and you'd hardly know the difference.

alpha0’s picture

I might sound crazy but it is just an idea.

Have a local server running drupal on which all edits happen.
Now, you wget this local website. this will convert the entire website into static pages. You can put these static pages on the server.

Have this done automatically, setup a wget job followed by ftp synchronization such that the changes get reflected on server in every 12 hours.

We are also in process of converting static pages to drupal. In case, it sucks I am definitely going to try it out.

--Alpha Zero
www.transbittech.com

ubersoft’s picture

You've got static pages, yes, but you lose the ability to search on keywords, have reader comments, etc. I might as well just use the static pages I have now.

alpha0’s picture

This scheme is good if the user is just 'VIEWER'.
In such cases, you can use google for searching.

11’s picture

subscribing...

and i'm sick of drupal query performance and non official solutions.

somebody tell us;
how can stand drupal.org? is there any difference with these?

abqaria’s picture

ray007’s picture

subscribing

klcthebookworm’s picture

This is fascinating stuff.

headkit’s picture

subscribing...

martin gersbach’s picture

- the front page required anywhere from 90-150 database queries
- individual comic pages in my comic archives averaged 70-80 database queries

ha-ha

My new site have :
- in Home Page (hole page ;) : 569 queries
- In an internal page (CCK w/ 130 field) : 537 queries

;(

Martin GERSBACH,
www.gersbach.net
Paris, FRANCE

moazam’s picture

A lot of folks have complained about the database load that Drupal seems to create, but I haven't seen too much talk about actually tuning MySQL (or Apache).

I recently upgraded my hardware and OS for my Drupal 4.7 site and assumed that it would easily handle an enormous amount of hits right out of the box. Unfortunately, when I ran some load-testing tools, the MySQL DB quickly ran out of steam and I started getting the Drupal/mySQL DB error page. My main page generation time for an authenticated user was anywhere from 300ms to 700ms with 129 queries (caching turned on). After some extensive MySQL (my.cnf) tuning, and loading up a lot more Apache instances, my page generation time is down to 65ms-80ms, but most importantly, I'm NOT getting the dreaded MySQL DB error page (out of connections/server not responding, etc.)

The next step for me is that when and if my site starts to get a crazy amount of traffic, I will start caching as much of the database as I can via memcached. With a caching layer like memcached in place, I believe I don't really need to care if Drupal does not cache blocks or authenticated user pages since the caching of all DB queries will happen with memcached.

Anyways, my point is,

Cache as much as you can, at the Drupal layer, and the MySQL layer.
Tune Apache httpd to make sure there are enough free servers available to handle incoming requests
Start looking into memcached, even if it means running 1 memcached instance of 256MB-512MB on the same machine as Drupal/MySQL.

-Moazam
www.unixville.com
the house of unix

vivek.puri’s picture

Moazam, are you talking about your site unixville ? I checked that site and it has less than 250 nodes. Am I correct ? If thats right than also we have to consider that even with such low amount of nodes there is need to tune mysql and apache, that doesn't sounds right.

I am all in favor of tuning apache and mysql but if you have to tune the whole system for just 200 nodes there are other problems too.

moazam’s picture

crystalcube, yes, I am talking about Unixville. The node count is around 250, but I'm not sure why this means that the setup should not be tuned? I'd rather start tuning the whole setup upfront than starting to see MySQL errors once you've started to add more and more nodes and users.

The default installation was not necessarily slow, but in the interest of learning how Drupal works (and speeding up the site of course), tuning the setup has been extremely helpful.

-Moazam
www.unixville.com
the house of unix

vivek.puri’s picture

but I'm not sure why this means that the setup should not be tuned?

I am fully in favor of tuning the setup. What I am trying to say is for 250 node site performance of out of box mysql/apache should be sufficient.

moazam’s picture

Crystalcube, 'sufficient' for what? For getting slashdotted/digged? No, no way.

Even if you take Drupal out of the mix and just have a basic Apache httpd/MySQL site with 5-10 pages, the sheer amount of constant hits will throw up error pages.

-Moazam
www.unixville.com
the house of unix

vivek.puri’s picture

somehow I had a feeling you would say this ;) , but "sufficient" for moderate traffic site.
Of course Drupal works within the bounds of Apache( or whatever your http server is) and MySQL ( or whatever your database is ).

moazam’s picture

crystalcube, if you go back to my original post, you'll see that I said,

"and assumed that it would easily handle an enormous amount of hits right out of the box."

The whole point for me moving to a new server and tuning the configuration was to enable digg/slashdot type hits. Node count is irrelevant.

-Moazam
www.unixville.com
the house of unix

vivek.puri’s picture

Node count is irrelevant.

actually if you look at performance, the amount of content you have is equally relevant . with 250 nodes you can probably fit easily in small memcache but if you have more content you will have to think of alternate strategy. Specially if you are preparing for slashdot/digg.

Deviation’s picture

Subscribed

socialnicheguru’s picture

subscribed

http://SocialNicheGuru.com
Delivering inSITE(TM), we empower you to deliver the right product and the right message to the right NICHE at the right time across all product, marketing, and sales channels.

daphisto’s picture

subscribe