I extended the current timer_start() function in bootstrap.inc and added two new function, timer_read() and timer_stop(). I've used these to profile drupal.org. (A small change will be needed to devel.module.)
I've also extended the accesslog-table to keep track of page generation times. The information is shown on the "top pages" page (admin/logs/pages). It shows each page's average page generation time, and the total page generation time. This lets one identify (i) slow pages and (ii) resource heavy pages. This is the most essential and most useful information when tuning your Drupal site.
Please review, or I'll commit it as is.
Comments
Comment #1
dries commentedScreenshot.
Comment #2
dries commentedSetting to patch. Take a look at the patch + screenshot please.
Comment #3
dries commentedComment #4
moshe weitzman commentedlooks ok ... note that there are some unrelated (i think) variable_set() changes in the patch.
Comment #5
Amazon commentedIt's great to have this level of logging granularity.
I am concerned that all the logging would add over head when having multiple crawlers crawling the site during testing. Would it be possible to support batching the log entries and submitting them to the database as batch query at set increments instead of for every page.
For example I routinely run several wget crawls for different roles when testing an upgrade. If I ran 5 crawlers and each of 10, 000+ pages were being crawled then I am concerned the performance would be impacted.
I understand this isn't the intent of this patch but non intrusive logging is important in generating accurate performance measurements.
Thank-you for your consideration.
Kieran
Comment #6
moshe weitzman commentedin the pdf, i think you are swapping 'total' and 'average' column headers
Comment #7
Bèr Kessels commentedI like the idea. a lot. But from the pdf it is not clear that timer deals with the generation tilme o a page.
* Are we not boing too geeky here (again)? devel.module does farily granual timing already. Is Joe User really waiting for timers on how long it took to make a page. I think not.
* If we really want this in core, it should at least be made very clear that this is the time it took to make a page. I presented the screenshot to a client, as a small test. And just like i thought, his reaction was " Wow, this is handy, I can now see how long a user spent at a certain page. This will tell me what posts are actually read, and which are only opened in a browser only to be closed right away". I almost made the same mistake, but the geek in me told me to look at the code and see that it is page generation times. Joe user will not ever look at the code.
* What is the penalty. accesslog already is a heavy load table. Do we really want aditional data in that table.
* We have access hooks (called after page is loaded) to do aditional statistics magic. IT is not userd by anything else then the stats in core. Why cant this go into an advanced stats module?
Comment #8
dries commentedMoshe: the column names are not swapped. What makes you think they are? What can be done to make it more clear?
Moshe: the variable_get() was removed intentionally. The 'page' timer is always started. It's cheap.
Ber: I'm aware of the possible confusion and I agree that we need to clarify this. Unfortunately, I couldn't think of better column names. The problem is that these names should be short (or the table will look ugly). Suggestions?
Ber: the devel.module does not provide this functionality. It doesn't track page generation times for other users, it doesn't compute averages, nor does it compute the total time spent generating a particular page. The devel.module doesn't have an overview. This is important information for anyone who has to tune his or her Drupal site. Without this patch, you simply don't know which pages are eating your resources.
Ber, Kieran: the performance impact is small. The timer values are stored as integers and fit in 2 bytes. So if there are 200.000 entries in your accesslog-table like on drupal.org, the database overhead is 400kb (a fraction of the table's total size). No additional SQL queries are introduced. As such, performance remains comparable with the current implementation. I can benchmark this if you want but I'm confident it is OK.
I could easily compile a list of the users/IPs that eat most resources. This can help us track evil crawlers. I'm not sure yet how or where to integrate this UI-wise so I haven't (yet). One step at a time.
Making it easy to inspect the performance characteristics of your Drupal site is a good thing. It creates awareness which in turn will encourage people to tune and tweak Drupal's performance. It also lets you identify problems -- like a module performing badly. Compare it with your operating system's "activity/process monitor" (or "top"). You don't really need it, but you can't without it either.
Comment #9
jeremy commentedI love the idea very much, and think this will prove extremely useful. What about making an option that enables/disables these columns? The only reason I suggest this is because they are inherently confusing without using a lot of verbage. "Average time" needs to be something like "Average time for Drupal engine to generate this page", and "Total time" needs to be something like "Total time to generate". Or, it could be fully explained at the configurable that turns the display on and off...?
Yes, do it! :) That would be very useful.
Where would the data be stored until it was committed to the table?
It would be possible to do something similar to what the banner module does, writing to multiple cache files. But this isn't friendly for distributed web servers.
Do writes to the accesslog slow down measurably when the acceslog table grows? If so, then there could be an accesslog_cache table to which data is first written, and which is flushed to the main accesslog every n minutes, or every time it reaches a certain size, etc. This would only be useful if writing to a very large table is measurably slower than writing to a very small table.
Comment #10
kbahey commentedUseful info for sure.
I am with Jeremy that this needs to be parameterized.
Also, are these figures real? Does the average time to generate the forum page is 30 seconds? I find this hard to swallow.
Comment #11
moshe weitzman commentedNow I realize that the Total and Average columns are using different units. Thats not ideal, but understandable in this case. I can't think of a better solution.
Comment #12
dries commentedCommitted to HEAD.
Comment #13
(not verified) commented