If I go by Dreamhosts stats for me, yesterday my site, Philly Future (http://www.phillyfuture.org) tallied
7246 database connects, doing 607,236 queries for 5,017 unique users who visited 4,412 pages.
I have caching turned on. I am throttling many of the modules on the site.
600,000 queries for 5000 visitors. That sounds very wrong to me.
In addition to our web traffic, we pull 300 RSS feeds from around our regional community into our aggregator. Each feed is individually configured to run every hour. And cron is scheduled for every 15 minutes. I suspect the aggregator has much to do with the database stats.
My traffic should not require me to go to a dedicated host, but our CPU and database utilization says we should.
Can anyone here help me?
Thanks in advance,
Karl
Comments
Missing one statistic...
You had more unique users than you had pageviews?! Or are those 'pages' possibly requested more than once?
You need to compare total requests with the database queries.
I think you should be expecting between 5 & 15 queries per request, depending on the number of modules and the number of authenticated users. Others who have run big sites (I haven't) may have their own numbers.
Editing actions also take quite a few more queries than just reading, and I hear acidfree does a whole heap of lookups.
Yes, it sure looks like a lot of queries, it is, but there are probably some things you can tune.
Have you looked at the devel.module for query logging?
.dan.
http://www.coders.co.nz/
.dan. is the New Zealand Drupal Developer working on Government Web Standards
That day we had 13,350 total
That day we had 13,350 total requests, which equals around 45 queries per request.
Looking at the devel module, that seems about right for an average node page. Here is one that produced 51 queries:
4.89 1 SELECT * FROM url_alias
4.99 1 SELECT name, filename, throttle, bootstrap FROM system WHERE type = 'module' AND status = 1
2.16 1 SELECT locale, name, formula, enabled FROM locales_meta ORDER BY isdefault DESC, enabled DESC, name ASC
58.32 1 SELECT data, created, headers FROM cache WHERE cid = 'menu:1:en-US'
4.31 1 SELECT COUNT(*) FROM node_access WHERE nid = 0 AND CONCAT(realm, gid) IN ('all0') AND grant_view = 1
3.83 1 SELECT n.nid FROM node n WHERE n.nid = 2354 AND n.type != 'book'
32.53 1 SELECT data, created, headers FROM cache WHERE cid = 'locale:en-US'
2.87 1 SELECT n.*, u.uid, u.name, u.picture, u.data FROM node n INNER JOIN users u ON u.uid = n.uid WHERE n.nid = '2354'
4.37 1 SELECT last_comment_timestamp, last_comment_name, comment_count FROM node_comment_statistics WHERE nid = 2354
4.71 1 SELECT uid, vote FROM queue WHERE nid = 2354
5.03 1 SELECT * FROM trackback_received WHERE nid = 2354 AND status = 1 ORDER BY created DESC
3.42 1 SELECT * FROM trackback_node WHERE nid=2354
11.59 1 SELECT * FROM files WHERE nid = 2354
4.05 1 SELECT COUNT(*) FROM privatemsg WHERE recipient = '1' AND newmsg = 1 AND recipient_del = 0
3.13 1 SELECT data, created, headers FROM cache WHERE cid = 'filter:1:ae8025162dc3dad21c2d2758c205e1c2'
4.14 1 SELECT * FROM book WHERE nid = 2354
4.14 1 SELECT * FROM system WHERE type = 'theme' ORDER BY name
4.15 1 SELECT * FROM variable WHERE name like 'event_nodeapi_%'
4.24 1 SELECT t.* FROM term_data t, term_node r WHERE r.tid = t.tid AND r.nid = 2354 ORDER BY weight, name
4.22 1 SELECT * FROM flexinode_type
4.7 1 SELECT minimum FROM moderation_filters WHERE fid = 0
5.4 1 SELECT COUNT(*) FROM comments WHERE status = 0 AND nid = 2354
2.07 1 SELECT c.cid as cid, c.pid, c.nid, c.subject, c.comment, c.format, c.timestamp, c.name , c.mail, c.homepage, u.uid, u.name AS registered_name, u.picture, u.data, c.score, c.users, c.thread FROM comments c INNER JOIN users u ON c.uid = u.uid WHERE c.nid = 2354 AND c.status = 0 GROUP BY c.cid, c.pid, c.nid, c.subject, c.comment, c.format, c.timestamp, c.name, c.mail, u.picture, c.homepage, u.uid, u.name, u.picture, u.data, c.score, c.users, c.thread ORDER BY SUBSTRING(c.thread, 1, (LENGTH(c.thread) - 1)) LIMIT 0, 50
3.52 1 SELECT fid, filter FROM moderation_filters
39.61 1 SELECT s.lid, t.translation FROM locales_source s INNER JOIN locales_target t ON s.lid = t.lid WHERE s.source = 'Select your preferred way to display the comments and click \"Save settings\" to activate your changes.' AND t.locale = 'en-US'
1.3 1 SELECT comment FROM node WHERE nid = 2354
0.76 1 SELECT v.mid, v.vote, MAX(v.weight) AS weight, MAX(r.value) AS value FROM moderation_votes v INNER JOIN moderation_roles r ON r.mid = v.mid WHERE r.rid IN (3) GROUP BY v.mid, v.vote ORDER BY weight
1.64 1 SELECT timestamp FROM history WHERE uid = '1' AND nid = 2354
2.61 1 SELECT data, created, headers FROM cache WHERE cid = 'filter:1:13d8d48d51ef686b9029aa44e832ff96'
2.44 1 SELECT s.name FROM tinymce_settings s INNER JOIN tinymce_role r ON r.name = s.name WHERE r.rid IN (3)
0.92 1 SELECT * FROM role ORDER BY name
0.99 1 SELECT * FROM tinymce_settings
1.29 1 SELECT * FROM filter_formats
1.26 1 SELECT * FROM filter_formats WHERE format = 1
1.16 1 SELECT * FROM filters WHERE format = 1 ORDER BY weight ASC
2.39 1 SELECT * FROM filter_formats WHERE format = 2
1.16 1 SELECT * FROM filters WHERE format = 2 ORDER BY weight ASC
3.87 1 SELECT * FROM filter_formats WHERE format = 3
4.27 1 SELECT * FROM filters WHERE format = 3 ORDER BY weight ASC
5.39 1 SELECT * FROM filter_formats WHERE format = 4
4.93 1 SELECT * FROM filters WHERE format = 4 ORDER BY weight ASC
5.85 1 UPDATE history SET timestamp = 1135983561 WHERE uid = 1 AND nid = 2354
2.38 1 SELECT * FROM blocks WHERE status = 1 AND region IN ('0') ORDER BY weight, module
2.96 1 SELECT * FROM boxes WHERE bid = 29
3.85 1 SELECT data, created, headers FROM cache WHERE cid = 'filter:2:61596bda6d535065ed9fee6e1502bd07'
3.52 1 SELECT cache FROM filter_formats WHERE format = 2
7.62 1 SELECT t.* FROM term_data t, term_node r WHERE t.tid = r.tid AND t.vid = 1 AND r.nid = 2354 ORDER BY weight
7.84 1 SELECT n.nid, n.title, b.parent FROM node n INNER JOIN book b ON n.nid = b.nid WHERE n.nid = 2354
5.34 1 SELECT * FROM blocks WHERE status = 1 AND region IN ('1') ORDER BY weight, module
5.24 1 SELECT * FROM boxes WHERE bid = 7
1.42 1 SELECT data, created, headers FROM cache WHERE cid = 'filter:3:08c61f41a70f9337e8c716c54cf84dec'
Of course, I'm logged in. But if these figures hold, then non-authenticated users are getting pages un-cached.... I think.
But how can that be?
Apparently my cache is working
I modified bootstrap.inc to log saves and retrievals from cache:
Looking at the admin log shows me that indeed, pages are being cached and later retrieved when requested.
Help! Please :)
Aha... pages are getting cached with PHPSESSID in their URL
Rendering the cache pretty much worthless I think. I will report back if this is behind my troubles (I bet it is).
good spotting
That would be a bit of a killer.
I haven't seen mention of this before, but it sounds likely.
http://www.coders.co.nz/
.dan. is the New Zealand Drupal Developer working on Government Web Standards
Is my earlier post
A security risk? The query post that is. None of that data like dangerous to share - is it? If so, how do I get rights to edit that post?
As a reference.. I've applied this patch (I am running 4.6): http://drupal.org/node/4109#comment-38607 and PHPSESSIDs on Dreamhost are gone.
no actual passwords
No worries. There are no actual passwords or anything useful there.
MySQL security and Drupal Security are different beasts. The worst case scenario would you be posting the queries from a page resulting in you changing the password for the admin user (where the encrypted form of the admin password would be visible) but even that would be diffused by just changing your admin password again.
Don't worry about that listing. It's (sorta) useful.
I can't offer much more wisdom on it however.
.dan.
http://www.coders.co.nz/
.dan. is the New Zealand Drupal Developer working on Government Web Standards
Looks like that patch didn't entirely do it
I need to take more extreme measures at Dreamhost it seems - compiling my own PHP executable. There are a few threads here on the subject, this post being the best reference. I've emailed the author to ask for advice.
Thanks for the help.
Some thoughts
Hi
I am the author of that link on compiling PHP on Dreamhost. Just saw this thread now, and even though we discussed it by email, I will summarize here for everyone else reading this.
I did the above last summer for a client who was hosting on Dreamhost. They were using shared hosting at the time.
It started by him trying to get rid of PHPSESSIDs (like you are now). Even though it is a matter of applying the patch you mentioned on 4.6, and then overriding several php.ini parameters, this did not work in his case. The reason is that some parameters were not overridable in Dreamhost's setup. They advised us to compile PHP, and override what we want.
This is what I did, which got rid of the PHPSESSIDs alright.
However, the site was a bit slower due to CGI overhead. Moreover, Dreamhost starting flagging the hosting account as being CPU excessive. My client tried to argue that this was because the Apache PHP cannot be configured to eliminate the PHPSESSIDs.
Since my client's site was high traffic anyways, it was time to upgrade to dedicated hosting. This was done and all the troubles were over, and the site is faster too.
So, perhaps it is a similar situation for you as well.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com
--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba
I believe you're right, in addition however...
I did find a way to improve the cache-hit efficiency of my site in another major way....
I aggregate a large number of blogs (over 300) and cron runs every 15 minutes to keep aggregator updates small and managable.
I found some interesting cache information in this thread.
The aggregator calls cache_clear_all() after feed refrehes. The problem with that is that *all* pages are cleared from cache since they are assigned the CACHE_TEMPORARY constant.
For my site, inner pages are safe to cache for very long period's of time, until a user edits that node for the most part. The home page, on the other hand, is far more dynamic and is fitting the CACHE_TEMPORARY designation.
What I've done, is modify bootstrap.inc - all inner pages are cached for for six hours and the home page keeps the CACHE_TEMPORARY designation.
In addition, I modified the aggregator to call clear cache on pages with "aggregator" in their cid since those pages, in addition to the home page, should be cleared upon an update.
updates
since that thread was posted, we have gotten rid of session ids in urls for just about everyone. this is still a bit environment specific AFAIK. also, we implemented "loose caching" in 4.7 which is the b) option in steven's post. both of these are in 4.7
That's terrific news :)
That's terrific news :) Looking foward to getting it up and running at Philly Future.
Small update
Even though I am looking forward to installing 4.7, just want to share the progress made with the changes....
From 7246 database connects, doing 607,236 queries on 13,350 total requests, which equals around 45 queries per request.
To 7377 database connects, doing 541191 queries on 14,219 total requests, which equals around 38 queries per request.
Looking forward to getting rid, entirely, of those PHPSESSIDs.
Loose caching
Maybe you can increase the refresh intervals of the RSS feeds you poll to reduce the number of queries / cache flushes?
Not sure if your changes can be made more generic such that other Drupal sites could benefit from them. If not, Drupal 4.7's loose caching should help when configured properly. I vaguely remember we made a number of changes to the aggregator module to make it scale better. It might be worthwhile to help test and debug Drupal 4.7.