Closed (won't fix)
Project:
Project
Version:
5.x-1.x-dev
Component:
Releases
Priority:
Minor
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
26 Dec 2006 at 00:08 UTC
Updated:
23 Apr 2012 at 00:01 UTC
Jump to comment: Most recent file
Comments
Comment #1
kbahey commentedStart with logging the queries (devel module?) and then run explain against them to see what is going on.
Posting the info here will get many people looking at the slow queries and figuring where the bottleneck is.
Another approach is to fire up mtop when the script is running. That will tell you what is slow on the spot, and how long it takes.
It may be as simple as one or two indexes in the right place.
Comment #2
dwwthere is only 1 real query, which is different depending on if we're packaging official releases or development snapshots:
official release:
dev snapshots:
so, it's obviously not ideal it's doing a filesort here. but, a little-old 931 row table shouldn't be causing *that* much grief.
other than that, all it's doing are a few tiny updates once each release is packaged:
those can't possibly be a problem, since both {node} and {project_release_nodes} are use nid as the primary key.
however, thinking about this more closely, perhaps the real problem is in the organization of the code itself. the basic pseudo code for the packaging script is:
so, we've got that big nasty query still "open" while we're doing all these other expensive operations, including these 2 updates to change {node} and {project_release_nodes}. i was always assuming that the db abstraction layer would be pre-fetching the results from the DB, and db_fetch_object() just grabs the next object out of same in-RAM data-structure, but perhaps that's all false. maybe the real problem is that the db connection/query is still in progress, so a bunch of tables are locked, etc, while we're in the outer loop, deciding what to do. compounding the pain is that the work inside the loop is queuing up a bunch more locking UPDATEs. :(
am i full of crap, or might that be a big problem? ;) is the solution (aside from avoiding the filesort above) just grabbing all the releases out of the huge query into RAM, and then looping over those once the query completes?
Comment #3
killes@www.drop.org commentedI have not seen any sql related spike at about midnight.
How about simply doing a little sleep() between calls as a quickfix (if you think you need one).
I can't comment on the last paragraph, but it would be strange if that would be a problem.
Comment #4
kbahey commentedIf the problem is some locking, then adding sleep() is only going to compound the problem.
I think caching the result of the big query, and iterating through the array and doing your stuff should eliminate that, provided that there are no concurrency issues (what happens if something is updated after you cache, may not a problem, but ...).
Before going that route, just run the script from the command line, and in another window run mtop, and yet another run top. See the running queries in the former and how long they take, and the us% and wa% in the latter.
Comment #5
merlinofchaos commentedSubscribing. I would like to maybe help on this this weekend.
Comment #6
merlinofchaos commentedHere's a patch that should see if having the query open is the problem.
Comment #7
dwwi just applied that patch to the copy running on d.o. we'll have to see if the packaging run in 3.25 hours is any faster as a result. ;)
Comment #8
Steven commentedIf the scripts run on drupal2, shouldn't moving them to drupal3 help out with making the site more responsive too? Having mail and cvs be a bit tardy is not as bad as our main web site.
Comment #9
dwwyeah, i thought about that a while ago. here are the problems:
1) the packaging scripts are doing a CLI drupal_bootstrap(). therefore, they need access to the official drupal installation on d.o. we could use another copy of the source code, with the same DB settings, etc, but i'd be worried about things getting out of sync, etc. i suppose we could setup another automatic rsync mirroring like we do for drupal1...
2) the packaging scripts need to write to the files/projects directory, and drupal3 doesn't see that filesystem. well, they used to, at least. now that we're actually serving everything from ftp.osuosl, i guess this isn't technically true anymore. but, we'd have to change how osuosl is doing the rsync to pull the tarballs. there's also the problem that the project_release.module code is doing filesize() on the tarballs for display purposes (so, even though the download links point to ftp.osuosl, the file sizes we display are coming directly from files/projects/*). if we move things completely away from files/projects, we'd also have to compute the filesize in the packaging script, and store that in the DB. we currently store the file date and md5 hash in the DB, so storing the size would certainly make sense, and would actually solve some other problems. too.
also, i'm still primarily concerned that it's the db server getting slammed, not necessarily drupal2, which is causing the troubles. i could be wrong, and i suppose moving everything to drupal3 would be a way to test this theory, but the above 2 points involve a lot of work, and i'd like to have a little more confidence they'd actually make things better before going down this road...
Comment #10
dwwtoday, Bdragon and I decided to look at this again. he asked "wtf are you doing all those watchdog() calls for", which is a very good question. ;) we were doing 2 watchdogs for *every* release. in the early days, this was useful to figure out what's going on and ensure proper behavior, but now, it's just excessive... and expensive.
so, i commented out those 2 calls, and this evening's run (~ 2 hours ago as of this writing) was *much* better:
so, this is clearly a big win, and fairly damning evidence about the cost of doing lots of watchdog() in a row on a busy site like d.o... since this was obviously a good move, i committed this to HEAD and installed it for real on d.o: http://drupal.org/cvs?commit=64865
so, we could probably still optimizing the initial query to be less of a killer, which is why i'm not just marking this issued "fixed", but this watchdog() change is a *huge* improvement with very little effort. ;)
Comment #11
kbahey commentedThis is good news. It confirms that there is performance gains for the new hook_watchdog and syslog module.
Perhaps on drupal 6, d.o can be switched to using syslog entirely, and save the watchdog table write altogether.
Comment #12
senpai commentedThis hasn't been a problem in six years, so I'm closing this issue.