The birth, death, (and hopefully rebirth) of Ubersoft.net on Drupal 5.1 -- a site performance autopsy

ubersoft - April 20, 2007 - 15:24

When I unveiled the Drupal-powered version of my webcomic to the world, I was proud of what I'd done -- legitimately so, I think. A ten-year-old webcomic, with ten years worth of comic archives, completely converted over to a database-driven site is nothing to sneeze at.

Unfortunately, this pride was short lived: after three days up, my site host provider suspended my account because I was getting too much traffic (thanks, in part, to a link from Drupal's front page) and it was killing the server it was on. They moved me to a test server for about a week, where they monitored how much site resources my site was using... after that week they reported that ubersoft.net had been consuming a full 4% of the site resources, and consquently I'd need to move up, at the very least, to a virtual host provider package.

I had neither the time nor the finances to do this, so I sadly moved back to my static pages. The Drupal version of the site is currently sitting in my test area, while I try to figure out how to redesign it so it can minimize the consumption of site resources.

Since reverting back to the static pages I've been trying to figure out what specifically caused the site to go down. At this point, I think I can point to the following areas:

1. Inaccurate reporting of site hosting capabilities on the part of the provider, and a lack of tools to accurately allow customers to measure site resource usage.

2. Database activity from Drupal on top of standard site resource consumption.

A more complete analysis follows.

Site Hosting: What is Advertised vs. What is Given

First, I don't want this to come off sounding like an attack on my host provider. A2hosting.com was very helpful during this time, their tech support was very responsive, and they allowed me to essentially sit on a dedicated server at shared hosting prices for an entire week while we monitored the site traffic. The problem, in my opinion, is that the industry standard used for bandwidth metering predates the common adoption of database-driven sites, and they are no longer reliable benchmarks.

I purchased a shared hosting plan from them as a jumping off-point. It was their highest level shared plan, with a bandwidth cap of 200gb/month. The bandwidth cap is, unfortunately, the only unit of measurement most people have when trying to determine what kind of account to lease from a hosting provider, and 200gb/month was well above my monthly site traffic. My static site ranges anywhere from 30gb-55gb/month, so I figured an upper limit of 200 would give me some room to grow before moving to a semi-dedicated or dedicated hosting plan.

Unfortunately, the bandwidth limit does not accurately measure every aspect of site resource consumption -- specifically, it doesn't measure how much database activity is going on behind the scenes, and that is far more important to a database-driven site.

I had access to my webalyzer stats from both my Drupal site and my older static site. I was able to find a period of spiked traffic on my static site (a few days when I had been linked by Reddit) and compare it with the time on my Drupal-powered site when Drupal had linked to it from the front page. This comparison revealed the following information:

- while bandwidth consumption on the Drupal-powered site was higher on average (about 15-20K per page) it was never in danger of reaching the bandwidth cap -- not even close

- hits per page on the Drupal site was actually lower than hits per page on the static site

From this I determined that my site traffic and resource usage was well within the advertised limits of my account. However, the limits were based on a pre-database model for measuring site traffic and resource consumption, and a shared hosting account can have hundreds of separate hosting accounts on a single server. There was another piece of the picture -- database activity -- that wasn't factored in at all.

Database Activity: the 40-ton feather that broke the camel's back

Most host providers give you "unlimited mysql accounts" but don't actually give you any tools that you can use to measure database activity. My host provider was no different in this case, so I was very curious how a site that seemed well within the limits of all advertised resource caps -- disk and bandwidth usage -- could be using 4% of the system resources of a shared hosting account. Obviously if you're on a shared hosting plan, with hundreds of other accounts on the same server, 4% is a ridiculously large slice of the server to be keeping all to yourself. But the only unit of measurement I had on hand was webalizer statistics and monthly bandwidth usage.

My webcomic averages about 8.5K unique visits on Monday through Friday (when I update) and roughly 20K-30K page views (for the readers who stroll through my comic archives). When I was linked by Reddit (on the static site), this spiked up to about 16K unique visits on a single day with 42K page views, and when I was linked by Drupal (on the drupal site), the unique visits and page views spiked in a similar manner.

The bandwidth consumption, as mentioned earlier, was a bit higher, but not high enough to exceed my 200gig/month cap, and the hits per page was actually lower so in theory Apache was doing less work to serve each page...

... but there was a database to take into account, and this is what killed my site.

My host provider had no tools I could use to monitor my database activity, but some Drupal users suggested I install the Devel module, which allows admins to view statistics about what the database is doing and how long it takes for those things to be done. Specifically, Devel allows you to see how many database queries are performed on each page, what those queries are, and how long each query takes.

On my test bed (which is an exact replica of the live site, using the same version of php, apache and mysql) I installed the module and started browsing pages. The Devel statistics were interesting:

- the front page required anywhere from 90-150 database queries
- individual comic pages in my comic archives averaged 70-80 database queries
- my archive table of contents for each comic (lists of my comics that could be browsed by year) used anywhere from 100-300 queries, depending on the filters used

I have no idea what the average is for database queries on a Drupal site. I do know that my site uses views quite heavily -- publishing a comic on Drupal is a bit more complicated than your standard blogroll, and publishing three comics plus general site news posts required a fair amount of tinkering on my part. I can surmise, however, that there is a significant difference in site resource consumption between a single person viewing a static page consisting of twelve hits in total, and a single person viewing a static page consisting of five hits in total plus 70 to 300 individual database queries. Multiply that by 16,000 unique visitors (or, more acurately, 47,000 individual page views over the course of one day) and you a LOT more activity going on in the background with a Drupal site than you do with a static site. And this was with both mysql caching and Drupal site caching enabled (though it was not set to aggressive caching).

That, I believe, is what killed my site.

Moving on: Making Drupal Work

So the question now becomes "how can I make Drupal work?" I was very pleased with the functionality I had put into the Drupal-powered website -- it had navigation and content searching features that static sites simply can't provide, and the taxonomy features of Drupal are head and shoulders above any other CMS I've played with... and perfectly suited for making images easier to search and navigate in a text medium. The problem, apparently, is that this functionality requires more power on the back end, and I need to figure out:

- whether I need to move to a more robust (i.e., more dedicated) hosting solution
- whether I need to optimize the drupal site to minimize database usage
- whether I need to do some combination of the two above options.

I suspect the third option is correct, but at this point I'm not sure which is more important. My hosting provider said that based on the 4% resource consumption I'd probably do well on a semi-dedicated hosting plan... but I'm not sure if that 4% is a natural result of the site traffic or a result of bloat I introduced in my site when I made all those views. The thought of 47,000 page views generating anywhere from 4,700,000 to 14,100,000 database queries in a single is alarming, but is it alarming because it's ridiculously high or is it alarming because I'm not familiar with database-driven sites? Is it normal for a single page to make 100 database queries? I don't think that's normal. On the other hand, I don't have enough experience to know if my instinct is on target.

Would I be well served to work on paring back all the features in my site in order to reduce the database load? Would doing that reduce the functionality of the site to the extent that it's just not particularly useful to my readers? The last question is particularly important to me -- I'm sure I could create an alternative Drupal design that reduced the number of database queries per page, but I'm not sure that the design would make my content more accessible and easier to navigate for my readers -- which was one of the main draws for moving over to a database-driven site.

The End

That's about all I have to say in terms of analysis. I hope it was useful to the rest of you... I spent a lot of time trying to figure out what to say. An earlier version of this post was considerably longer, a lot more confusing, and generally useless. I'm half-afraid that this version has gone in the other direction and doesn't give enough background information.

At any rate, I'm very intrested in what those of you who run high-traffic sites think of this post and my conclusions in it. All comments welcome and encouraged...

Nice posting! Well written...

jjkd - April 20, 2007 - 15:51

Nice posting! Well written, and I think you've captured the essence of what is going on. Unfortunately, I don't have the Drupal-specific skills to help you, but I think this would make an excellent case study and/or white paper on how to optimize a Drual site while trading off resource requirements against functionality.

Hopefully someone with the necessary background will take this on, I would be very interested in seeing the results, and I think that it would be a benefit to others as well. I believe that I've seen others post some good work in regard to specific parts of the solution process, which may fit in well as references.

--
Joe Kyle
--jjkd--

It's interesting, Ubuntu

JohnForsythe - April 21, 2007 - 02:45

It's interesting, Ubuntu just had a similar issue. They recently upgraded to Drupal, but had to temporarily revert back to static html to handle the high demand for their latest release. Maybe they will post some analysis?

--
John Forsythe
Need reliable Drupal hosting?

(Subscribing... I'd love to

rszrama - April 20, 2007 - 15:53

(Subscribing... I'd love to hear about query caching if anyone has used it with large Drupal sites. We started caching queries on our MySQL server for a large osCommerce site and saw incredible results.)

Subscribing + Block level caching??

MacRonin - April 20, 2007 - 16:14

Subscribing ... This is a problem I hope to have as a project I'm working on grows toward the end of the year.

BTW have you considered investigating the new Block level caching??
-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world

Do you mean the block-cache module?

ubersoft - April 20, 2007 - 16:20

I use it quite extensively... almost every view block I create uses the cached version of the block instead of the block itself.

Unfortunately, that does NOT allow you to cache pages that are not views... for example, individual nodes for each comic, or views that are entire pages (i.e. the table contents archives I've created, which are the most database-intensive parts of the site).

In a new site I'm building I have a few other things I've added to hopefully minimize some of the overhead (like xcache for the php) but I would really like the ability to cache entire pages regardless of whether the reader is anonymous or has an account.

Yes, I was thinking of the block-cache module.

MacRonin - April 21, 2007 - 03:53

Yes, I was thinking of the block-cache module. I hadn't realized that you had already implemented it for most of the blocks. But working on the core page content when that content is data driven(such as your views) would be a great enhancement.

-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world

oops

sepeck - April 20, 2007 - 17:24

Well, I was the one who promoted your announcement to the front page the first time and this time as well. Oops. :D

The reason I did so the first time was because of what you stated here. You took a static site and migrated it and 10 years of content leveraging Drupal and it's abilities very well. It was a good example of why designing and building sites can be complicated and you had a very nice write up about your objectives and the benefits you received in using Drupal for your site as well as some of the learning curve as well.

This write up is also good. We have a variety of levels of users. From the folks who work on a high end designing and building enterprise level sites (mtv, sony, warner) to those converting over from static HTML who know how to look up the word web server but don't necessarily understand the relationship with the database much less the performance impact of the database on RAID1 or RAID5 or even a virtual server.

Performance documentation is something we don't have a lot of in the community. It's something that is not hard to do, in theory, but does take time, research and practice practice practice. With the recent Lullabot performance seminar a few weeks ago this type of conversation is starting to occur in a wider audience. Over time I hope that white papers can get added to our collection of documentation on the issue. Drupal is only one part of the equation, as you learned, the database is another important part as well.

As a note, if you look in the development archives, performance is always an on going concern for the active developers. For instance for Drupal 6, one of the large issue with menu module's has been addressed. This doesn't really help you now but be assured folks are always trying to keep performance considerations in mind while going forward with each version of Drupal.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

Well I didn't mind the link

ubersoft - April 20, 2007 - 17:41

Well I didn't mind the link :), and honestly in the end I don't mind that the site went under -- it's a little embarrassing but if I ever want my site to handle high volumes of traffic it's better to learn that right off the bat, and I'm hoping it makes the next attempt more successful.

What I need to figure out right now is if there are any guidelines/best practices concerning how many database queries should be generated by a single page view. Am I right in assuming that 90-150 queries is excessive? Or is that not unusual? I'm afraid I don't have much to compare it with.

It is usual, I guess

misty3 - April 20, 2007 - 18:10

While most host advertise HUGE amount of space and bandwidth , the cpu resource usage allocation is something which is tucked away elsewhere under t.o.c , a.u.p etc. Practically the amount of bandwidth they promise cannot be achieved at that minimal cpu resource % allocated.
The 'top' record which is the actual documentary evidence of which processes have actually consumed resources stay with the webhost admins and are never shown to the client directly live and online. When asked about tranparency , all most all hosts say that we have to "trust" them.

To run sites based on drupal or similar php-mysql platform and where there is moderate amount of crowd most shared hosting provider will "fail" unless its a fairly new host whose disc has not already been "overpacked". Some hosts actually provide v.p.s or semi-dedicated servers at very fair price - so it may be worth going with them provided the budget permits, in my humble opinion.

Best regards

Yes, that's my next step.

ubersoft - April 20, 2007 - 18:22

I actually have a 512 mb slice set up at slicehost.com... but I wanted to see if it would be worth my time trying to reduce all those db calls as well. Not that I'd necessarily know where to start...

I noticed you said you were

catch - April 21, 2007 - 12:07

I noticed you said you were using front_page module + views for your front page.

If you used a panel for your front page, you could use views blocks inside that, which then allows you to use cached versions of your blocks - it's a very handy way to cache views in the content region of your page and allow for block placement at the same time. We do that for all the main indexes at http://libcom.org (still unstable at the moment). There's also a contrib caching module which allows you to cache entire panels, I've not tried that yet though.

Assuming a lot of the visits to your site land on the front page it could make a big difference.

Fwiw, we get similar traffic to you (c.5-7,000 visits, 30-40,000 pages/day on a site with 3,600 users and 14,000 nodes) and we're on a 512mb VPS with blackcatnetworks. We're thinking of moving to a 1gb 1/4 VPS with rimuhosting though to speed things up a bit.

they already are blocks

ubersoft - April 21, 2007 - 12:18

Everything on my front page is a block. most of them use blockcache -- there is one exception, because blockcache is apparently incompatible with views bookmarks.

Just want to echo the need

Aren Cambre - April 20, 2007 - 18:27

Just want to echo the need for more performance analysis.

One of my biggest gripes is how slow Drupal runs on shared hosting accounts. I hope a lot of work goes into query optimization with Drupal 6.

I need the same answers

canadrian@elect... - April 20, 2007 - 18:12

Subscribing to this thread because my problems are similar. I have a very low-traffic site, but I keep topping out my CPU usage allotment on my shared hosting account. I would be very interested on how to optimize a Drupal site when storage and bandwidth are not a problem, but CPU usage is.
----------
"Being tired is like playing mind games with some of the pieces missing."
- Canadrian

I am a proud member of the ElectricTeaParty.net online community.

Gonna be hard

joshk - April 24, 2007 - 17:01

Drupal really is an application and as such it takes CPU/RAM to run, much more so than disk space or bandwidth.

Unfortunately, most shared hosts still live on the assumption that their users are (primarily) serving up static page documents, in which case the only application in use is Apache.

------
Personal: Outlandish Josh
Professional: Chapter Three

Personally, I think your

dmuth - April 20, 2007 - 18:32

Personally, I think your webhost was out of line. If they are selling "unlimited MySQL accounts" then going back on that, they are ripping people off. It is no different than Comcast selling "unlimited" Internet connections but shutting off high-bandwidth users. Granted, I'd understand if they shut off a customer due to abuse of the MySQL database, but I would hardly consider Drupal to be "abusive".

I would suggest finding another webhost shows more clue in these matters. For example, my webhost allows unlimited MySQL usage -- but, if you are in the top 10% of MySQL users, they will bill you a surcharge of $0.01 per day. Not only are they up front about it, but they are reasonable about what is charged and their conditions for charging it.

My 2 cents.

-- Doug

shared hosting and too many database queries, not a good combo

dsp1 - April 20, 2007 - 20:11

The hole, I mean, whole ;) shared web hosting business sells unlimited accounts and bandwidth restrictions without really pointing out CPU usage. We as customers, should demand more transparency on CPU usage from web hosts, but until the industry changes, we are stuck if we use shared hosting. If you get a notice from your web host that they will shut you down, that is pretty good for current standards, most users I have read about just got switched off. no notice. i think it is wrong, but we need to ask for better from web hosts.

And ISP are shutting off users for using too much bandwidth and not telling them. Cox for example, advertises high speed internet of 7mbps or 15mbps and they don't advertise that there is a limit to how much you can actually download or upload in a month, it is included in their los (limitations of service), if you can find it. I only found out after signing up and later reading about it on another web site.

I have noticed on my test server that the queries are very high on the main page 347 and other pages 150. I do have devel and many other modules installed for testing.

Hi Dmuth, Thanks for the

misty3 - April 20, 2007 - 23:26

Hi Dmuth,

Thanks for the link to your webhost. Its certainly a very fresh approach to webhosting.

A few questions, if you do not mind :

1) Do you run drupal there ? It seems that if there is Druapl+Gallery that won't run because of php safe mode etc ( vide their faq )
2) What does "a" MySQL process means actually ? Do they have phpmyadmin ?
3) 24x7 support, good speed and uptime ?

Best regards

1) Do you run drupal there ?

dmuth - April 22, 2007 - 22:51

1) Do you run drupal there ? It seems that if there is Druapl+Gallery that won't run because of php safe mode etc ( vide their faq )
2) What does "a" MySQL process means actually ? Do they have phpmyadmin ?
3) 24x7 support, good speed and uptime ?

1) Yes, I run several Drupal websites there. (Link #1, Link #2, and Link #3) (Edit: I forgot to mention Gallery. I run the acidfree module instead. It integrates with the rest of Drupal nicely.)

2) I'm not super familiar with the latest MySQL terminology, but a process is basically an instance of MySQL. The above mentioned sites that I run all exist in the same "process", but each of them is under a different MySQL user so as to keep the data separate. Yes, they have phpmyadmin.

3) Support has regular business hours. However, they do monitoring 24/7. So if something breaks in the middle of the night, they usually know about it before you do. :-) I've been with them for about 4 years, and I can only recall 2 multi-hour outages over the years. Their infrastructure is robust enough to handle the approximately one Slashdotting a week that their sites tend to get. :-)

They also have member forums where you can ask questions of other members once you sign up.

-- Doug

Just a quick question, but

Southpaw - April 20, 2007 - 18:32

Just a quick question, but did you turn on caching?

Yes.

ubersoft - April 20, 2007 - 18:46

I had both css and site caching turned on, though I did not have aggressive caching turned on -- it reported that it was incompatible with a few modules I was using at the time.

But I need to point out that I ultimately hoped to have a large number of subscribers to my site -- some of the nice site features (bookmarking pages, editing user comments, some advanced search functions) were only available to subscribers -- and if I ultimately had a large number of subscribers, Drupal site caching would be useless because it doesn't affect subscribers at all.

I was also using the blockcache module -- unfortunately, blockcache doesn't affect raw nodes or full-page views, so for example someone who was browsing through my site archives wouldn't be able to benefit from the caching.

The provider had caching turned on in mysql, and I assume that helped some. I did not have any php-caching set up on my site, which I assume would have helped as well...

But I need to point out that

merlinofchaos - April 23, 2007 - 18:26

But I need to point out that I ultimately hoped to have a large number of subscribers to my site -- some of the nice site features (bookmarking pages, editing user comments, some advanced search functions) were only available to subscribers -- and if I ultimately had a large number of subscribers, Drupal site caching would be useless because it doesn't affect subscribers at all.

This is only true if you expect 100% of your hits to be from subscribed users.

If 75% of your hits are from subscribed users, that's still 25% that are getting cached pages for very little performance hit. And, what really kills a site, the 'slashdot effect' will have a disproportianately high number of anonymous hits.

Finally, when aggressive caching reports that it is incompatible with some modules, it is being very conservative in reporting that there may be problems. The reason it is 'incompatible' with these modules is that they implement _init or _exit hooks (i.e, they operate on every page) and that those calls won't be made for anonymous users. But that's often ok in those situations. It depends upon what that module does with the hook. Most of the modules you're using that are 'incompatible' are just fine with aggressive caching.

-- Merlin

[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

Yeah...

ubersoft - April 23, 2007 - 18:47

Eaton pointed that out (and I think I remember you doing that as well, back when the Drupalized site was live) -- but there's no way to know that just by reading the warning.

static page caching

firebus - April 20, 2007 - 18:33

the real problem here is drupal's continuing lack of static page caching.

it's not that hard to do, there's a promising contrib module for this for 4.7 (http://drupal.org/project/boost), and in my opinion this is one of the most important roadblocks to larger scale adoption of drupal.

imo this should be a high priority for drupal 6, and devs should look to the smarty template system as a guide. smarty has an awesome static page caching system that provides a separate html cache file for each template - implementing something like this for drupal would allow for block level caching. you could have a page with dynamic content, but cached menus and blocks - just like block cache, but with static pages.

there's no reason for a site the receives primarily anonymous traffic to be doing a ton of php compilation and database calls on a page that doesn't change. database caching is the worst idea drupal ever had.

It's also worth

sime - April 21, 2007 - 13:51

It's also worth noting
http://drupal.org/project/fastpath_fscache

I notice that boost seems to have a little more queue activity. It would be great to get some feedback from users of both boost and fastpath_fscache.

re: static page caching

mr700 - April 23, 2007 - 08:35

Static page caching would be a great thing. On our site all stories older than 1 month are locked (no more comments), so they change virtually never (except for the menu). I was thinking if I can use mod_rewrite and serve static html, and creating a static snapshot for the worst scenarios (better static than none). It would be a 'wget -m' thing if all drupal links ended with a slash (node/nid -> node/nid/)... For now, we bought better server and moved the database on it (the database traffic was higher than the web traffic).

Surely older pages will not

budda - April 24, 2007 - 13:44

Surely older pages will not suffer from intensive access, so caching them as a static page is pretty worthless?

--
Ixis (UK) providing Drupal consultancy and Drupal theme design.

Not necessarily true.

ubersoft - April 24, 2007 - 13:54

My website (the drupalized example is currently down due to Stupid MySQL Admin Activities) is a webcomic, and the "older pages" are archived comics -- which are absolutely CRUCIAL to the site, especially for new and potential readers. I *want* people reading the older pages, and in any situation where I get more traffic than usual (if slashdotted, dugg, linked from reddit or, ahem from the front page of Drupal.org) it is actually more likely that people will browse through the older pages than in any normal day. In other words, during periods of high traffic those pages will be accessed more.

Of course, Drupal already takes care of that for the most part, since those new visitors will be mostly anonymous and therefore Drupal's cache will already be working, but my point is that depending on what content you're serving, older pages can be accessed fairly intensively.

Re: static page caching

bslade - July 4, 2007 - 04:45

Also see A file based cache utility for Drupal. It uses Apache URL rewriting to direct a web page to a static file, if it exists, or to a drupal URL. A cronjob periodically creates an index.html file that's a copy of the drupal front page, and other html files which represent html pages of the N most recent postings (eg. the story node with the various blocks around it that make up a whole page for an anonymous user)

In fact, Drupal.org needs something like this. It seems to run sort of slow.

PublicMailbox@benslade.com

Database drubbing

abbynormal - April 20, 2007 - 19:28

I agree that a single page shouldn't normally take 100 database queries to display. Everyone who has commented about the *serious* need for improved caching (including for logged-in users) is right and echoes some of my serious concerns with Drupal performance in an environment where users may be logged in.

I am not a database "guru", but I have a fair amount of database performance experience and I've seen a handful of modules and slightly-slow database query times render a site almost unusable for logged-in users.

-an

mod_cache

NT - April 20, 2007 - 19:36

Hi

Are you able to use something like apache's mod_cache to cache the generated web pages to disk or memory.
This should reduce the amount of database accesses - (Drupal holds its cache held in the database).
I've not needed to try it, (I've used other caching solutions with jsps, which will not work with php).
The latest (online) documentation is very good.
This module may not be available on a shared server, but for those using a dedicated server, I think it would be worth trying.

Nick

A quick search of "mod_cache" in the Drupal search bar...

ubersoft - April 20, 2007 - 21:06

... suggests that in order to get any real benefit, you need to modify the bootstrap.inc file in the includes folder to keep Drupal from forcing reloads... and that people who suggest making that change get shouted at by Drupal developers. :)

100 queries per page is

the_other_mac - April 20, 2007 - 19:46

100 queries per page is quite normal for some web-apps (OsCommerce springs to mind) but quite high for Drupal in my experience. Of course it depends on how you were using it; but looking at your current (static) site, it seems to me that there is little in the way of the sort of interactive content that makes database queries unavoidable. As long as users aren't logged in (and I guess that most of your pageviews would be from non-logged-in visitors), a page could be loaded almost entirely from cache.

If you had something like a "recently-viewed articles" block whose content constantly changes, that would be an example of something you might consider removing to optimise performance.

You said you didn't try aggressive caching, and that's a little strange. Caching is exactly what you need, and the nature of your site suggests it would be very effective. You might look at this thread on additional caching options: http://drupal.org/node/97347 .

As for your hosting provider - a typical hosting situation is to allow 1,000 accounts on a single server, each of which is entitled to (say) 10Gb diskspace and 100Gb monthly transfer. But of course that doesn't mean the server has 10,000 Gb of diskspace, nor do they pay their ISP for 100,000 Gb of monthly bandwidth. They are overselling, safe in the knowledge that statistically most users won't ever use a fraction of their limits. (This applies to all hosting providers - no one could compete by being "honest" about their allowances.) If one of the 1,000 accounts happens to use say, 10Gb of diskspace, this could be 2% of the total available, but it wouldn't be a problem because everyone else is using less. So 4% of CPU usage might be over your limit, but it's not overwhelmingly unreasonable.

I wonder about your statement that you couldn't afford a virtual private server, given that you run advertising. They start at under $50 per month, and you do run advertising on your site. Or as an in-between, many hosts offer so-called "e-commerce" hosting, which is basically a shared package that allows for fairly high CPU usage.

A number of things...

ubersoft - April 20, 2007 - 20:24

Lots of good meat in your post to respond to. Thanks. :)

- The current static site is really stripped down and shouldn't be used as a baseline for what the drupalized site was. The drupalized site hosted three webcomics, and it was possible just by looking at the front page which of the comics had been updated (two were displayed as thumbnails on a sidebar via imagecache). There was a also a poll that allowed readers to vote on various silliness, a link to an rss feed to another site I run, and site news printed below the main comic. In order to do this I had to use the front page module and populate the front page entirely with views -- the default setup doesn't really allow you to separate out information on a page.

Also, I used taxonomies extensively -- each comic had a taxonomies for the comic name, storyline, and characters appearing in the comic. By clicking on the storyline taxonomy, you were able to view all the comics in a given storyline and only those comics. Clicking on a character name allowed you to view all the comics that the character appeared in. This made it easier for my readers to search for specific comics.

Finally I created a rather complex archive system (again, using views) that allowed the reader to view a table listing all the comics published for a specific webcomic -- it displayed year, date, comic title, storyline -- which could also be filtered to only display comics published for a specific year.

Those features seemed to me exactly the sort of thing that webcomics ought to have, especially ones with extensive archives (I don't have the largest archives out there, but at 1400+ comics for Help Desk alone mine are legitimately large), and those are what caused a lot of the database work -- that, plus trying to figure out how to display various pieces of information on the front page and keep them there when they all updated at different times. Unfortunately the default system that Drupal and every other CMS uses is put all content into a single stream, and I actually had five or six separate streams -- three comics, site news, a poll, an rss feed to another site...

- I didn't use aggressive caching because on the performance page the aggressive caching option clearly stated (in red, bolded text!) that it was incompatible with some of the modules I was running. That is the only reason I stuck with the default caching system. Right now the modules content, devel and token are listed as incompatible with aggressive caching. Both token and content are absolutely necessary for the functioning of my site (my comic nodes are created with CCK in order to make data entry easier, and token is used with the custom pagers module which makes navigation between individual comics adhere to the webcomic navigation standard).

That said, I did use site caching. I turned on normal site caching and turned on css caching. I had no php caching because my host didn't support it at the time. Mysql caching was turned on.

One of the things that I think tripped me up is that I don't consider my site very high-volume -- not compared to really successful webcomics out there. The unofficial benchmark for the low end of successful for a webcomic is around 10K unique visits a day, and I'm not there yet.

- I'm somewhat encouraged by the responses here that my site may not have been as resource hungry as I initially feared. It may be that I simply underestimated what my hosting solution could handle, and that a semi-deciated, deciated, or vpn server will be able to take it all in stride. I do think I need to figure out if it's possible to fine tune the site a little bit first, though.

- My ad revenue is collected quarterly and isn't enough to pay for most of the standard semi-dedicated packages I ran across. Most virtual private servers (like you mention) require that you install and maintain everything yourself, which made it an extremely unattractive option at the time. Now I'm exploring the LAMP route and going through the unpleasant process of learning how it works -- at the time I was more interested in publishing my webcomic, but the experience has convinced me that in order to do that properly I need to learn more on the back end. That is going to be a ridiculously painful process, though, and if I'd been trying to do that AND learn to configure Drupal at the same time I would have given up completely. A human being can only take so much pain at one time... ;)

Darn that big, bold warning...

Eaton - April 20, 2007 - 20:47

I didn't use aggressive caching because on the performance page the aggressive caching option clearly stated (in red, bolded text!) that it was incompatible with some of the modules I was running.

We really need to look into making that warning more accurate. Technically, what it means is that modules that assume they will ALWAYS run every time a page is viewed will not actually run when an 'aggressively cached' page is viewed. Aggressive Caching scans for any modules that implement the "init" hook, and warns the user about them.

Both content.module and token.module ARE actually compatible with aggressive caching, as they implement the init hook but work fine even if the super-cached page is used, and they are ignored. Certain module features, like devel.module's query logging and statistics.module's logging of page-views, DON'T work with it -- in general, it's not that the cached page will break but that modules that want to log stuff, etc don't get a chance to.

Not sure if that helped at all, but it at least clarifies the somewhat vague-but-scary warning message on that performance screen. I ran into it a couple of times as well and did some investigation.

--
Lullabot! | Eaton's blog | VotingAPI discussion

Leave it to the technical writer

ubersoft - April 20, 2007 - 20:50

to take the documentation at face value. :)

Thanks for letting me know that. I'll be using aggressive caching in the future -- I won't be running devel when the site goes live again, and token and content were the only other modules it flagged.

tx

sime - April 21, 2007 - 14:04

Thanks Eaton, very useful tip.

Tips on money and hosting

Keyz - April 20, 2007 - 22:03

I'm not quite a drupal ninja yet, so I can't really advise on that aspect... but what I do have is extensive knowledge of online advertising, and quite a bit about dedicated servers. Anyhow - you mentioned your site gets some reasonably high traffic (I take it in the ballpark if not quite to 10k per day). With traffic anywhere in that range, or even half that, it should be possible to make more than enough to pay for some higher end hosting, provided you choose the right ads, and set them up in the most ideal way on your site. The niche (and thusly the type of advertisers for it) dictates whether your traffic is worth hundreds versus thousands of dollars a month, but at the least hundreds (in that traffic range). At your level of traffic, I would definitely move to the next step above virtual hosting as soon as you can - keeping it will only limit your potential success and add to your stress wondering if your site will stay online.

Anyhow, I'll be happy to go into more detail if you're interested - though just to start you off... consider changing to (or adding) Google AdSense or Yahoo Publisher Network ads. Contextual ads (and text ads in particular) tend to receive substantially higher click-through-ratio (and of course, keep your CPM ads too if you can do so tastefully). How and where the ads are placed throughout your site, and what colors you use (always set the links to the same color as your site's links, and usually opt for no border or background color) makes a very significant difference (e.g. just changing placement, etc, can help you out by 2-6x). Also consider a few other "less obvious" directions that would fit nicely in with your site's audience... for instance join the Amazon affiliate program and look up your favorite comic-related books, or perhaps some books you could recommend to people interested in drawing their own comics (you get the idea). Add both a general block of these to your site's sidebar, as well as a "Recommended books" type of section to your site, and write up some personal reviews of the books you've chosen. You're an expert in your field, so many of your audience will give greater weight to your recommendations and consider buying what you suggest. You can also suggest that fans of your comic make all their Amazon purchases through your link to support the site. Doing the above should help you make several hundred more per month, at least.

So far as hosting, probably a good match for your needs right now would be grid hosting from MediaTemple or Mosso:
http://www.mediatemple.net/webhosting/gs/
http://www.mosso.com

With these you'd get substantially higher resource allocations than with plain virtual hosting (and you actually get it - not empty promises of unlimited this and that), and would not have to concern yourself whatsoever with the intricacies of managing your own dedicated server for the time being (which is not worth it yet for you, I'd advise). You could also consider a dedicated-virtual server from MediaTemple: http://www.mediatemple.net/webhosting/dv/
Whatever you choose, go do a quick search on www.webhostingtalk.com for the company first to get the most recent feedback about them. If you "do" end up going with your own dedicated server eventually - be sure to have it secured properly (e.g. rack911.com is usually highly recommended, etc).

-- Dave

Isn't all about caching?

fischermx - April 21, 2007 - 00:14

100 queries per page is a bit too much, no matter what site it is.

logging into the database

znerol - April 21, 2007 - 00:24

i too had problems with a moderate-traffic drupal site recently, because of the fact that drupal logs errors into the database. the site lived on a server at a shared hosting provider which was not very well managed (too many users, bad security, ...). several times the mysql-connection "went away" and so did the cache table of my drupal installation one or two times. as a result thousands of error-messages were triggered which strained the db additionally. after some minutes the server went down completely.

in some circumstances it would be usefull to be able to log errors into a logfile instead of the database. logging make things worse if something goes wrong with the db.

Not quite

Eaton - April 21, 2007 - 01:51

Complete "Couldn't contact the database" errors -- like you'd see when the DB server goes down -- are not logged to the database, but if DB contact is *sporadic* throughout the page's lifecycle, you could see the errors you described.

Drupal 6 now has a more flexible error logging system that does just what you're talking about -- I'm not sure if there's a chance it will be backported to Drupal 5, but it's something to look into.

--
Lullabot! | Eaton's blog | VotingAPI discussion

the clarify this: the real

znerol - April 21, 2007 - 12:13

the clarify this: the real problem was not that the database went away* and drupal logged into nirvana, the problem was when the db came back but was corrupted (cache table was not accessible anymore). each and every page hit after the db came up again generated a lot of log entries** which (i beleive) brought the server to its knees. however i can't tell if this was really the problem because the poeple at the provider were not able/willing to provide detailed information.

* logmessage: "MySQL server has gone away"
** logmessage: "Can't open file: 'cache.MYI' (errno: 145) query: ..."

Oh, man.

Eaton - April 21, 2007 - 18:43

* logmessage: "MySQL server has gone away"
** logmessage: "Can't open file: 'cache.MYI' (errno: 145) query: ..."

Yeah, that's an indication that the server's database is seriously hosed. Drupal probably should be able to deal with the scenerio more gracefully, but no matter how you cut it that setup will render a piece of software unusable until the DB is properly restored.

If that happened, drupal logging its error messages was the least of the server's problems...

--
Lullabot! | Eaton's blog | VotingAPI discussion

Adding to track

vkr11 - April 21, 2007 - 02:19

Just tracking

Get a VPS

Hyper - April 21, 2007 - 02:30

I don't see why money is an issue. You can get a fully managed VPS from about $50. Try servint.

Load by page vs load by object

crystalcube - April 21, 2007 - 07:02

Hi,

This is indeed very important topic. I am in similar situaion except having a somewhat technical background I never moved my site to Drupal except for a very short amount of time.

For background : I host a "small" forum based on phpbb. I was initially hosted on shared hosting but I have moved to dedicated server sometime ago. I realized that when I move to drupal my shared hosting account wont be able to handle the load. I am trying since days of Drupal 4 but have not made the switch yet.

Reasons why I wanted to move to drupal: Many here might be aware that to add features to phpbb , you have to change the code of many files. What it means is when a new patch is released you have to carefully redo everything. My last update costed me almost days to move to latest version. As the patches are security related you cant avoid it. All this can be very painful.

Now comes Drupal which has very well designed , elegant api. Allows you to extend functionality "very" easily. There comes the catch as it leave everything at hands of module developers , not all modules will be equally developed causing uneven performance.

All those who feel caching will solve the problem, they are only partially right. Part of blame is the design of Drupal itself. I wil try to explain but main core issue is loding by object vs loading by page. Drupal is designed to load by object.let see how it effects the performance.

I will take a simple case , lets say there are 10 nodes. To display the main page:
Drupal will load 10 nodes ( 1 Query )
For each node it will read author info ( 10 queries )
For each node it will read count of comments ( 10 Queries )

So that makes it 21 queries just to load 10 nodes, with only comments modules enabled. Depending on number of modules you add , you could be looking at 10 queries/module to be added.

Alternatively if drupal could load grouped items it would look like
Drupal will load 10 nodes ( 1 Query )
read user info for all 10 nodes ( 1 query )
read count of comments for all 10 nodes ( 1 Query )

now thats 3 queries vs 21 queries.

From development core code in node.module in function node_page_default:
original code :

while ($node = db_fetch_object($result)) {
      $output .= node_view(node_load($node->nid), 1);
    }

modified code
$nodes = db_fetch_objects($result); // $nodes is an array containing all nodes
all_node_load($nodes) ; // loads node objects in groups
foreach ($node in $nodes) {
      $output .= node_view($node);
    }   

This is just an idea not something I have experimented yet. I am not an expert in Drupal API level but have spent some amount of time trying to make it work on my test system. It may be a stupid idea but someone more experienced can comment on it.

But I have seen that Drupal loads same objects again and again in same page.

Now I know many people will say that it has been working for xyz and even for drupal itself but they should understand that not everyone will be able to host on multiple servers. Most of us can only afford shared hosting and unless number of queries are reduced actively this problem will not go away. Caching will help but nowhere near as actually reducing number of queries.

Now that is a great idea

GiorgosK - April 21, 2007 - 14:51

I am wondering if its actually possible.
------
GiorgosK
Web development/design blog part of the world experts network

There has been some

merlinofchaos - April 23, 2007 - 18:34

There has been some discussion of doing this with the node hooks, but to my knowledge no one has stepped up to write the patch to do this. This is one place where someone interested in really doing valuable work for the Drupal community could step in and provide some help.

-- Merlin

[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

Is this something you could propose at a higher level?

johnnybegood - April 23, 2007 - 19:54

I've been following this bit of the thread to see if you receive any replies from more experienced Drupal developers and wonder if you plan to test what you've found and share your that with the community.

So far nobody said it's a stupid idea ;)

Cheers

I would Like to Try.

crystalcube - April 24, 2007 - 01:48

I definitely want to work in this area. I can find my way around code but I still don't know my way around drupal community ;)
This change will work best when done in modules too. So my initial thought is to try it against a set of modules ( maybe around 10 ) and compare results. If I can get some input regarding test scenarios I would try to test them out. This may possibly give some statistical information which can be further discussed.

The best place to start...

johnnybegood - April 24, 2007 - 09:32

developing for the community would be http://drupal.org/contribute/development

Some technical questions

al4711 - April 21, 2007 - 09:06

Hi,

nice post thanks to share ;-)

I have looked into the current site and have some questions:

1.) the js was from jquery or in every page or in a/some .js files
2.) I hope the pics are not in the db, isn't it ;-)?
3.) apache can deliver the css/js/png without asking drupal
4.) Have your provider give you some infos about the mysql like:

  • query_cache_*
  • max_user_connections
  • version
  • table_cache
  • key_buffer_size
  • ...

5.) is a slow-query log available?
6.) what php-settings was used e.g.: mysql*, accelerator, ...
7.) Is it possible to see the drupalied site?

I must also say that I'am new with drupal but I have done a lot with php/perl/.. and some webservers like: nginx/apache/lighty.

Maybe I haven't fully understand your site nor drupal but the new drupal book is on the way to me ;-)

BR

Aleks

Er...

ubersoft - April 21, 2007 - 12:11

You're going to need to explain your questions a little before I can answer them...

1.) the js was from jquery or in every page or in a/some .js files

What do you mean by js? Javascript? And how do I find the answer to your question?

2.) I hope the pics are not in the db, isn't it ;-)?

The pictures are in the files directory, which is Drupal's default location. If you mean something else you're going to have to explain your question a little more

3.) apache can deliver the css/js/png without asking drupal

I publish a webcomic. The primary content on my site is graphics -- if I have to find a way to add the graphics around Drupal instead of using it directly, there's really no point in using it at all.

4.) Have your provider give you some infos about the mysql like:

* query_cache_*
* max_user_connections
* version
* table_cache
* key_buffer_size
* ...

It's all moot since I've moved back to my old host -- I don't have access to the settings for the site when I was running Drupal. What would I do with that information once I got it?

5.) is a slow-query log available?

what is a slow-query log?

6.) what php-settings was used e.g.: mysql*, accelerator, ...

er... it was running php5. there were no php caching tools in use because the host provider didn't have them.

7.) Is it possible to see the drupalied site?

Yeah, but you'd have to stop by the house. :)

Detail explanation (rather long)

al4711 - April 21, 2007 - 13:57

What do you mean by js? Javascript? And how do I find the answer to your question?

Yes I mean Javascript.
What I mean is: Do you have .js files which apache can deliver directly to the customer or are they embedded?

The pictures are in the files directory, which is Drupal's default location. If you mean something else you're going to have to explain your question a little more

I publish a webcomic. ...

The most webserver are very good and fast to deliver files from the harddisk to the network ;-).
If it is possible to deliver a $CONTENT (pic, css, js, html, ...) directly to the network without to ask the backend then this option should be used.

  1. nginx Nginx, Fastcgi, PHP, rewrite config for Drupal
  2. apache + mod_rewrite with -f/-d as in .htaccess which comes with drupal
  3. lighty + mod_rewrite

lets say your pic is under ..../images/.../$PIC.png and the customer make a request to a drupal site which way goes the request to the pic:

  1. apache (accept request) => ask drupal please tell me where is the location of the pic => drupal give the pic or the loction back
  2. apache (accept request) => see in his config that $PIC-extension should be on the harddisk => deliver it
  3. A way I don't know

or simpler asked have you used the .htaccess ;-)?

What would I do with that information once I got it?

what is a slow-query log?

Well MySQL have some optimization possibilities: Tuning Server Parameters maybe someone from the hosting provider can help us/drupal to find the top 5 queries which comes to the mysql and or which takes a very long time to be answered, maybe this queries could be optimized?
A slow-query log is written from mysql, if it is configured, more here The Slow Query Log ;-)

it was running php5

Well is mysqli used or mysql?!
If you ask me how do you find this out, sorry but I'am new with drupal, please can anybody else answer this ;-)

What I have read from your posts you use a lot of modules, isn't it?
Please can you tell us which module do you use?

Keep on, for experience makes the master.

inforeto - April 21, 2007 - 11:21

Migrating large static sites means that many visitors will suddenly have access to all the increased functionality.
Such a large but static site fits nicely in a shared hosting account, but its dynamic equivalent won't.
Conversely, drupal can power many kind of smaller sites with a small expenditure of resources.
And there's middle grounds and plenty of suggested solutions, optimizations, etc. but making the jump still requires careful planning.

To find bottlenecks there's three main points to consider: queries, cache and resources.

Using devel you can find what queries are being run, not only the amount.
It won't matter if there's hundreds if most of them are calls to simple items like cached blocks and menu links.
But there's always a handful slow queries, beside actual content work like views and category.
Things like statistics and cache write to the database intensively and can take a toll on speed.

Now, these queries happen per page read, so tuning up the navigation is important.
An archive with a calendar can prompt more browsing than needed.
Galleries, forums, categories or searchable content add functionality and visitors but also put the database to work.
Many of these things are cached, but there could be so many items that cache is outperformed.

The benefit of cache depends on the amount of content, as cache expires everyday.
The amount of nodes that could be statically cached is often not significant against the many dynamic pages that are needed for navigation.
The usage of dynamic listings, like archive pages, can make page reads build up quickly and probably won't benefit from any kind of caching either.
The nodes and pages with static content are fine and won't really benefit from any caching if there's pages that slow the site.
Ideally, the sites must be optimized at this point, rather than rely on the measure of queries and cache alone.

This is also behind the reason of why pages aren't cached for authenticated users.
As a middle ground, cached pages can still be shown to users if pulled from a multisite mirror.
But the real issues arise with the content that can't really be cached or isn't cached when viewed.
For example, i have a site with 500 nodes, but spread through 300 categories with sortable views.
Every category page has a pager with links to the first ten pages of results, 10 nodes listed per page, and filters by price, location and brand.
This means there's multiple pages to be cached for the same set of nodes.

That makes the site collect entries in the cache table, and while those build up it behaves as if there were no cache at all:
It always crash in the hours following a cache clear. In the next few hours it collects about 2000 saved entries.
Hopefully, before the peak hour it'd get 6000-8000 and no longer crash. By 9000-12000 it speeds like a charm.
By the end of the day the cache table has 15000 but the cycle is bound to repeat because the daily addition of new nodes change all pages,
most lists are sorted by date and it would happen even if there were no expiration on cache entries.

While the cache is being built there's a higher RAM usage than on the hours afterwards.
I hit the limit on a VPS, so the choice of shared or non-shared doesn't make the difference.
A shared account with large resources would be nothing more than the cost of management and resources put together.
Both VPS and shared have pooled resources, like burstable RAM, but this is never available in the practice.

Where does the resources go? I can't tell, because i haven't compiled my apache server to read the memory usage per query.
But estimates can be done. I see 10 httpd processes keeping hold of 24 to 32 MB RAM each, but there's more on peak hours.
Using "systat", the log shows that the active processes, most of them httpd, go up from the average of 10 to 20-30 on peak hours, before hitting the RAM limit of 512 MB and crashing.
As my visitors double on peak hours, i can see why resource usage doubles as well, but fine tuning everything still takes some work.

CPU supposedly makes these processes finish faster, releasing the RAM that is currently held longer than necessary.
That's only available on a dedicated server. At that level there's more control on all resources, but it still takes some tuning to plan a site scalability.
To avoid conflicts, it'd be good to have the database run with its own resources, on a smaller VPS or similar separate account.
Perhaps serving images through lighthttpd or also in a separate account, which can't be done in a shared enviroment.
In any case, running a drupal site under heavy traffic requires careful planning, but the features are worth it.

Are you on my server?

harriska2 - April 21, 2007 - 15:08

I've been with a2hosting for almost 2 years. Over the past month they have gotten to where the server is down several times a day, including email. About a week ago, the mysql server literally lost my database (it was there but the tables were gone) for about 1/2 an hour. Scared me good.

Late last year they said I was causing a problem with their servers using up too much resources. They asked me to state what my websites were used for. I ended up upgrading immediately to patch some security issues that had recently come out. Since then I've been tracking their CPU usage which is always in the red and constantly hovering now around 6 or 7 (2 is green).

One of my sites uses a ton of memory. It has OG, views, and CCK. This site requires over 30MB of memory but there really isn't a ton of content on it.

I agree that the mysql database connections and bandwidth is not trackable on a2hosting. But for $12 a month and hosting 20 databases and 10 virtual domains, what can I do?

(subscribing)

magico - April 21, 2007 - 17:03

(subscribing)

Excellent reflection!

AmyStephen - April 22, 2007 - 12:57

Database Activity: the 40-ton feather that broke the camel's back

We found the same to be true. When HarryB built Open Source Community for us, we were so thrilled with what Drupal was able to do! It was our first Drupal site and we were so impressed. (Of course, HarryB is nothing short of brilliant and has *years* of experience with PHP-Nuke, PostNuke, Joomla!, and WP, so that helped!)

But, before go live, with only a few of us online, we noticed serious performance issues. Long story short, we were forced to find dedicated hosting or move away from Drupal. The number of database calls and latency from querying a remote database (i.e., using Drupal in a setup like most inexpensive hosts provide) is a back breaker.

Other than that, Drupal rocks and we love it! Thanks to all who have contributed to its success!

AmyStephen@gmail.com
http://OpenSourceCommunity.org

Wonderful explanation

bs - April 22, 2007 - 05:40

Hi,
This is very important problem we ( I mean Drupaler's) are facing. Wouldn't it be useful if we rewrite some functions like variable_get, watchdog,access log etc.. so they can use other storage methods instead of database, for example file, xml schema etc, to reduce database load?

OK...

ubersoft - April 22, 2007 - 12:23

for those of you who are looking for more specific information...

I've set up the drupalized version of the site here: http://208.75.86.76

I have also set up the Devel module so that anonymous viewers can see the statistics in the footer. Have at it -- any and all observations welcome.

Some thoughts...

Eaton - April 22, 2007 - 13:46

It's very interesting that the number of queries being generated *isn't* hellaciously large by the standards of many content-intensive sites. a couple tweaks to optimize the path alias loading could probably get it under 100 queries for the page, and aggressive caching would drop that to just one or two queries for anonymous visitors.

I'm wondering if there were/are subtler issues at play, too, when higher traffic/load hits the site...

In any case, first I'm sorry that your initial experience has been so troublesome, but second I want to thank you for the tremendous service you're giving to the community in troubleshooting and brainstorming on a mid-traffic site. This is just the sort of environment that is trickiest to optimize due to the constraints of shared hosting setups and we can all use all the knowledge we can get!

--
Lullabot! | Eaton's blog | VotingAPI discussion

Interesting...

ubersoft - April 22, 2007 - 15:34

This has given me something to chew on.

For the purpose of the devel module I turned off Drupal's caching altogether, since I figured that would make devel's statistics useless to any of you who came by.

How does one go about optimizing path alias loading? I'm afraid I wasn't even aware that it was an issue.

That said, it's sort of a relief to hear that compared to other content-intensive sites it's not too bad. Unfortunately that suggests that the best solution at the moment is more hardware. The place I've linked to in the above post is on slicehost.com, which is a virtual hosting service, so it'll probably handle the load better than a2hosting did, but if it turns out to be not enough I can't afford anything better at the moment...

If aggressive caching can really cut down on all those queries in that manner (from 100 to 2) then I'll definitely be turning it on when it goes live again. But I have to ask again why there is so much resistance in the drupal development community to have cached pages for users who are logged in as well? My site isn't set up to allow my registered users to customize the site layout -- at the moment it just lets them edit their own posts and use the views bookmarks feature, though in the future it'll also give them the ability to create metatags of their own (honestly, Drupal's taxonomy system alone makes it worth the trouble I've been having so far), and it seems to me that if an anonymous user is going to generate two database calls and a registered user is going to generate 80-140 database calls to the front page, then I'm going to want to discourage registered users instead of encouraging them. This seems counter-productive...

If my experiences can help with Drupal's development and improvement then I'm happy to have them, at least on one level... I'm not a programmer so I can't actually code anything useful for you guys but I'm pretty good at describing how things blow up when I touch them...

...

sepeck - April 22, 2007 - 22:56

But I have to ask again why there is so much resistance in the drupal development community to have cached pages for users who are logged in as well?

What resistance? You or someone who thinks it's easy is welcome to implement a contrib module that does this. Just because random forum poster claims something is possible or merely asks 'why isn't this done?' a few posts up doesn't make it easy or make the associated cost actually less.

People who say those things don't really understand the issues involved. Your comment alone indicates you are only thinking from your sites setup. Setting up caching for logged in users involves building an engine that is scalable for all the variations of user roles. Why cache an individuals personal tracker for all 100,000+ users of drupal.org? How to deal with queries in pages that check for roles?

Drupal 5 has a default cache setup that pretty good. It has a pluggable caching architecture that you can plug in different cache methodologies. People are perfectly welcome to built their own custom cache methods that leverage these api's. Of course, so many that say these things only confuse the issue and demonstrate that they do not understand the complexities involved. For something to go into core, it has to be a flexible base api. The entire community would welcome a contributed module or working examples to actually judge performance against.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

Well you're right, Steven. I am.

ubersoft - April 23, 2007 - 00:16

You comment alone indicates you are only thinking from your sites setup.

Guilty as charged. My interest in Drupal focuses on the things I want to do with it. I'm not familiar with a more effective way to evaluate a tool other than to judge it by what you want to accomplish with it. And I'm not particularly shy about advocating that it move more in my direction, either.

Take a step back from your familiarity with the Drupal community, do a search on caching threads in the forums -- focus on the newbie forum -- and then see if you can't see this "mythical" resistance I'm talking about.

Anyway, since we all know I'm NOT a developer, why don't you just follow up with a list of questions I have no business asking until I become some kind of fucking php guru and I'll make sure I don't ask them in the future.

hey now

sepeck - April 23, 2007 - 00:21

What I said in no way called for that level of hostility in your reply. You are free to ask any question you like, but your comment about a level of resistance to caching for logged in users wasn't accurate and I was trying to help explain the complexities of the issue so you would have a better understanding of them in future.

When implementing a feature in core Drupal, it requires thinking in broad terms and use cases. Not narrow specific ones. As I and Eaton have both commented, just because someone thinks there is a magic wand to wave and make it so, doesn't mean it really exists. It's complicated. It's hard. It's not easy. NO ONE IS HIDING it. It's that it's not as simple as some random people would think, want or believe. If it were simple it would have been implemented.

Evidently my attempt to help you has irritated you enough to swear at me. I may not know php but I do know the process in the community and try and help people learn them so they can more effectively enact change.

As you have now demonstrated your opinion and contempt of my attempts to help you in no uncertain hostile way, I shall withdraw from commenting in your threads or trying to help you in future.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

You may be right.

ubersoft - April 23, 2007 - 02:14

I certainly interpreted your post as an attack... reading over it now I'm not as convinced it was. If no offense was meant, then I apologize. If you feel it necessary to blacklist me, I suppose that's fair.

Don't forget,

patrickharris - April 23, 2007 - 08:48

he caused your problem in the first place by promoting your site to Drupal front page. :)

It's sad but true that forums are a limited means of communication; interpretive misunderstandings are just so easy - even between two literate people like yourself and Sepeck.

Knowing sepeck as well as I

merlinofchaos - April 23, 2007 - 19:00

Knowing sepeck as well as I do, I assure you there was no intended hostility in his comment. He was attempting to correct what he saw as a misperception, and nothing more. You are in no danger whatsoever of being banned, and he wasn't intending to deride you in any way.

-- Merlin

[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]

Steven has a way...

zoon_unit - May 1, 2007 - 04:22

As someone who has come under Steven's wrath, I can say that Steven has a way of really raising the hackles of people. It's just the way he words things sometimes that comes across as harsh and condescending. I'm also equally convinced that it is not his intention to do so, and from his viewpoint, he's just trying to help.

I too, suffer from this malady. I would only ask Steven to think a bit more before putting mouth into gear. It has helped me to "temper" my public persona somewhat. I'm still working at it....

Sincerely, best wishes.

Steven

ubersoft - April 23, 2007 - 19:22

It seems the more time I spend online the more I tend to assume someone is being hostile when they aren't. Enough people have pointed out (publically and privately) that you were trying to be helpful, not hostile, that it's clear I was assuming something that wasn't there. So I owe you an apology -- whether you read it or not, it's still owed, so here it is:

Steven, I apologize for being rude earlier. Nobody likes it when a stranger comes into your backyard and pisses on your porch, and that's essentially what I was doing. You were trying to help and wound up getting bit for your trouble. In the future, I hope, I will exercise more self-control.

Christopher

sepeck - April 23, 2007 - 20:19

Accepted. These things happen and when they do, I try and figure better ways to phrase things in the future. I regard forum threads as a protracted conversation and forget that sometimes others don't parse the threads as I do. This can result in phrasing on my part that is perhaps not as clear as I'd like it to be.

--

What I try and do in the community is facilitate effective contribution.

The reason I promoted the thread originally was to get the performance discussion going. Not for the high end folks but the small and mid tier sites.

You had an excellent write up. Many people know about these issues but that doesn't translate into a lot of documentation about these issues or a broader community knowledge with the part time or smaller site community of implementors.

With Drupal you now have to be aware of at least MySQL and Apache configuration. Also good would be OS configuration and hard drive partitioning, php accelerators, where the database is in relation to the web server. It's not hard to learn, but just flat out takes time and knowing how to research MySQL docs, Apache docs, your OS docs ..... It's up there with the hardest type of documentation to write.

I'd like to see more good case studies develop for people to learn from.

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

In addition..

Eaton - April 23, 2007 - 00:06

...to sepeck's comments, I'll also chime in and say that Dries, too, has said that we need to focus on speeding things up (doubtless caching is a part of that) for logged in users as well as anonymous users. It's not that there is resistance to the concept of caching logged in users data, just that there hasn't yet been a clear case of how to make a simple and generalized logged-in-user page-caching mechanism that work effectively. Especially not in a way that is applicable in a "flip the switch" sort of way.

Things like block-cache, perhaps ways to cache the main content of a page, and so on are some potential solution. I think we're all pondering them right now :D

--
Lullabot! | Eaton's blog | VotingAPI discussion

page caching is not the answer

firebus - April 25, 2007 - 06:01

that's just it! page caching is not the answer at all. you need a method that caches bits. cache the blocks, cache the page content, figure out the tabs (can the user edit?) dynamically.

it's not that there's a resistance per se on the part of the devs to caching for logged in users, it's that they've dug themselves into a hole with page caching. theres NO WAY to get caching for logged in users with page caching. you need a cache system that's smart enough to cache the bits.

drupal ALSO NEEDS DESPARATELY a way to cache to static files instead of DB. DB provides an incremental improvement. static files would provide an exponential improvement, and would benefit small fry and large installs alike.

This is the problem

zoon_unit - May 1, 2007 - 04:38

hasn't yet been a clear case of ho