Now that we're capturing usage data it'd be nice to make it visible to maintainers and users.

I'm not certain that this is the best way to do it but it's a simple way to start. I think we should probably add some permissions to allow control over who can see what.

CommentFileSizeAuthor
#173 165380_project_usage_display.173.patch24.06 KBdww
#172 165380_project_usage_display.172.patch24.07 KBdww
#167 165380_project_usage_display.167.patch23.99 KBdww
#164 165380_project_usage_display.164.patch23.95 KBdww
#143 project_usage_display_143.patch23.3 KBdww
#141 project_usage_display_141.patch23.64 KBaclight
#140 project_usage_display_140.patch23.68 KBaclight
#138 project_usage_display.patch23.18 KBhunmonk
#135 project_usage_display_135.patch22.28 KBaclight
#133 project_usage_display_133.patch21.51 KBaclight
#132 project_usage_display.patch19.93 KBhunmonk
#117 project_usage_display.patch19 KBhunmonk
#97 project_usage_165380-97.patch18.83 KBpwolanin
#96 project_usage_165380-96.patch18.78 KBpwolanin
#95 project_usage_165380-95.patch18.75 KBpwolanin
#93 project_usage_165380-93.patch18.72 KBpwolanin
#92 project_usage_165380-92.patch17.57 KBpwolanin
#91 project_165380.patch18.14 KBdrewish
#75 project_165380_20.patch18.62 KBdww
#74 project_165380_19.patch18.33 KBdrewish
#73 project_165380_18.patch17.27 KBdrewish
#72 project_usage_css_woes.png130.17 KBdww
#71 project_usage_165380_2.patch17.72 KBdww
#70 project_usage_165380_1.patch16.78 KBdrewish
#68 project_usage_165380_0.patch18.04 KBdrewish
#59 project_usage_165380.patch16.29 KBdrewish
#53 project_165380_17.patch18.96 KBdrewish
#49 project_165380_16.patch18.96 KBdrewish
#43 project_165380_15.patch21.86 KBdrewish
#42 project_165380_14.patch21.54 KBdrewish
#39 project_165380_13.patch18.74 KBdrewish
#38 project_165380_12.patch16.47 KBdrewish
#37 project_165380_11.patch16.44 KBdrewish
#36 project_165380_10.patch14.61 KBdrewish
#35 project_165380_9.patch14.4 KBdrewish
#34 project_165380_8.patch14.53 KBdrewish
#32 project_165380_7.patch14.91 KBdrewish
#31 project_165380_6.patch13.75 KBdrewish
#30 project_165380_5.patch13.97 KBdrewish
#24 project_165380_4.patch7.93 KBdrewish
#23 project_165380_3.patch8.25 KBdrewish
#22 project_165380_2.patch6.29 KBdrewish
#20 Project_usage.pdf41.86 KBdrewish
#19 project_165380_1.patch5.58 KBdrewish
#12 project_usage_project.PNG21.27 KBdrewish
#11 project_usage_overview.PNG16.71 KBdrewish
#10 project_165380_0.patch4.43 KBdrewish
#8 overall_0.jpg48.38 KBwebchick
#6 overall.jpg65.13 KBwebchick
#2 project_165380.patch2.8 KBdrewish
project_9.patch2.37 KBdrewish
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drewish’s picture

Status: Needs work » Needs review

Added a 'view project usage' permission.

Fixed some bugs in the hook_menu() where I'd used = instead of ==.

drewish’s picture

FileSize
2.8 KB

making sure it's attached this time...

webchick’s picture

Subscribing.

Maybe two permissions? View all project usage data, and view own project usage data?

It also might be nice to make the $node argument optional, so that those with the proper permission (or even 'view all project usage data', but that's getting a little nuts :P) have an aggregate view of it across the entire site.

drewish’s picture

on IRC webchick pointed to http://drupal.org/project/issues/statistics as a rough idea of how to represent this data.

hass’s picture

subscribe

webchick’s picture

FileSize
65.13 KB

Drewish understandably wasn't quite sure what I meant, so here is a more detailed mock of the 'grand overview' page.

dww’s picture

@webchick -- looks cool. however, we don't really have per-day data like that. some sites will configure update_status to only check 1/week, in which case we only get 1 record/week for them. all of the existing code in project_usage.module is doing weekly summaries, not daily (we still store the raw per-day data, but we don't process it into per-day totals like this). we all agreed that weekly grainularity for these metrics would be plenty of detail. so, i think the only change is that i'd tweak the granularity of your report so that each column is a week, and next/prev jumps months or something (with titles like "Project usage statistics: August 2007"), instead of each column as a day, and next/prev jumps weeks... make sense?

webchick’s picture

FileSize
48.38 KB

Cool. Like this?

dww’s picture

yep, more or less. ;) the only other possible complication is that i'm not exactly sure how the weekly summaries are stored, but i think it's sunday to saturday weeks, not necessarily july 1st - 7th. e.g. So, it might be more like this:

Project usage statistics: August 2007

Project  | 07-29 to 08-04 | 08-05 to 08-11 | 08-12 to 08-18 | ...
------------------------------------------------------------------
views    | 235235         | 236332         | 237012
...

Also, while in the wish-list brainstorm phase, it'd be nice to see a single yearly summary, for example, with the totals for the last week in each month.

Project usage statistics: 2007 summary

Project  | January | February | ...
-------------------------------------
views    | 205235  | 213332   | 
...
drewish’s picture

FileSize
4.43 KB

here's a rough overview page. it does X weeks showing total usage for all API versions.

drewish’s picture

FileSize
16.71 KB

here's a screen shot

drewish’s picture

FileSize
21.27 KB

here's the per-project page.

hass’s picture

not sure if this becomes an additional patch or should be committed together with this patch. The images above looks nice, but i'm missing a tab for "http://drupal.org/project/Modules/usage" and the ability to sort in different variants like "show only modules installed on more then X server" and some other filter options for the end users... the current patch looks only for module developers, isn't it?

hass’s picture

I'd like to see per month statistics, too. Not only per starting week... per week shouldn't be the default view... this displays toooo much details we may never need. I'm sure the stats are very constant in real and not flapping like the example screen shots. :-)

hass’s picture

I'd like to ask if not using "date_first_day" is a bug of this patch or the usage module already committed. I cannot see this core variable is used anywhere and i'd ask me how you are able to know if the week starts on Monday or Sunday. Maybe this feature it's already implemented and i've overseen this!?

drewish’s picture

hass, in regards to #13 i think the browse by tab is a different issue. this is just aimed at making it visible to developers.

in regards to #15 it was a bit of a conscious oversight PHP uses Sunday as 0 and it was the easiest way to do it.

hass’s picture

Hm... i think we should be able to change the first day of week for different countries per user and site!?

drewish’s picture

hass, feel free to open a new issue and roll a patch.

drewish’s picture

FileSize
5.58 KB

i was running into an issue when there were periods with missing data. the cells would be moving to the left into the wrong column. i realized i need to build a full, blank table and then fill in the data.

at webchick's suggestion, this also make the cells default to 0. i thought the blanks made it easier to see that there was no data but the 0 is probably the correct value.

drewish’s picture

FileSize
41.86 KB

killes got me an updated dump of hte project_usage tables on d.o which i imported and used to generate the attached overview. it's missing any projects created after may when the rest of d.o dump was created. it'd be really nice if you could sort by week but i haven't figured out an efficient way to build that table.

drewish’s picture

webchick encouraged me to make the overview table sortable. i think i've done so in a way that won't freak out the DB too badly. in fact i think it's a bit cleaner way of doing it.

i also moved the usage menu items to all be under /project_usage. i think this makes more sense than putting them under /node/*.

i want to modify the project's usage page to provide a list of the releases with the current week's usage info and link to each individual release node.

drewish’s picture

FileSize
6.29 KB

patch didn't quite make it...

drewish’s picture

FileSize
8.25 KB

it's pretty close now. selecting a project from the overview takes you to the project usage page which shows a table of weekly data by branch and then the current week's data by release. each of the release links takes you to the release usage page which shows weekly data for the release. i'll attach some screen shots.

drewish’s picture

FileSize
7.93 KB

i posted some screen shots over at: http://groups.drupal.org/node/5608

i realized that the previous patch only showed releases that had usage information for the previous week. i reworked the query to list all releases so there were links to each release's usage page.

i also added two constants PROJECT_USAGE_DATE_SHORT and PROJECT_USAGE_DATE_LONG to control the formatting of dates. i wouldn't mind making it a variable, but a constant seemed like a good compromise for now.

webchick’s picture

The screenshots look awesome. Can't wait to see this info on drupal.org. :D

re:
PROJECT_USAGE_DATE_SHORT and PROJECT_USAGE_DATE_LONG ... Drupal already actually stores a preference for short/medium/long dates at admin/settings/date-time. So I'd just swap these with variable_get('date_format_short') and variable_get('date_format_long'), respectively.

dww’s picture

Status: Needs review » Needs work

Agreed, the screenshots are looking great!

However, the LEFT JOINs are quite worrisome. Luckily, you can still cache these results, even with the pager query. You just have to make sure your cache id includes the page #. Given that you're using tablesort_sql(), too, you'll need to include the sort and order URL args, too. Getting it all exactly right is left as an excercise for the interested reader. ;) That said, caching these pages is an obvious choice, since this data only changes once/week anyway, and will probably be viewed frequently. In fact, I'd probably just add a whole new cache table for this, {cache_project_usage}.

And, I agree with webchick you could just reuse core's date formatting variables for this.

Thanks!
-Derek

drewish’s picture

the problem with variable_get('date_format_short') and variable_get('date_format_long') is that they include hours and minutes. since the timestamps are all rounded to midnight sunday GMT, so that time info is of no use and when it's added to the header row it eats up a bunch of space.

dww’s picture

oh, sorry, i didn't even look closely enough to catch that. good point. i'm fully in support of having separate formatting, then, since i agree that including hours/minutes would be a huge waste. that said, my preference would be a PHP constant with the default, and a variable_get() for the call sites. there doesn't have to be a UI to set it, even, folks can put it in settings.php if they really want, but it's nicer to do it that way than to force people to hack the module if they want to change it.

dww’s picture

Oh, and it's not clear why you're moving everything to /project_usage/[foo]/[nid]. That'll make it impossible to have these show up as tabs on the release and project nodes, which is where I'd probably expect to find them (although maybe that's just because it's how I imagined all of this working months ago). If not directly off the release + project nodes, how do you expect people to find this? New tabs seems like just the thing, since it doesn't involve changing the project or project_release code to conditionally add links, etc.

However, if there's a good reason not to use node/[nid]/usage or something, please fix the patch to add links somehow via the project and release nodes, and/or at least provide some screenshots of how you think it should look/work.

And, of course, the overview page(s) need to live somewhere other than directly off node/[nid], so /project_usage seems fine. Although, core moved to use - in URLs instead of _, so perhaps /project-usage would be better.

Thanks,
-Derek

drewish’s picture

Status: Needs work » Needs review
FileSize
13.97 KB

I added a little code to make the date formatting strings configurable. Rather than making the formatting calls even longer by adding a variable_get() into the mix I just put the variable_get() in side the constant declaration:

define('PROJECT_USAGE_DATE_LONG', variable_get('project_usage_date_long', 'F jS'));
define('PROJECT_USAGE_DATE_SHORT', variable_get('project_usage_date_short', 'M j'));

Same effect and it seemed a bit cleaner to me. I also added a comment explaining why we had our own formatting strings.

I switched the paths from project_usage to project-usage. The thing I didn't like about putting them under tabs was that it made for a messy navigation. I guess I can switch it back. I'll look at that next.

The page caching is in. I tested it with some drupal_set_message() calls to make sure pages were being loaded from the cache.

drewish’s picture

FileSize
13.75 KB

finished moving everything back under node/*/usage. got the breadcrumbs working... that was fun.

in my mind this is RTBC but i'd love some nitpicking reviews.

drewish’s picture

FileSize
14.91 KB

on IRC webchick suggested that i document the motivation for the query builder in the overview page. also fixed a bug i'd introduced into that function in the last patch.

drewish’s picture

Status: Needs review » Reviewed & tested by the community

since the IRC reviews didn't turn up anything other than some bad docs, i'm going to mark this RTBC. perhaps we can get it up on scratch.d.o for a little final testing? or i suppose we could put it on d.o and just not grant the view usage permission to anyone other than admins...

drewish’s picture

FileSize
14.53 KB

killed indicated that the line endings on the last patch might have been bad so here's a re-roll.

drewish’s picture

FileSize
14.4 KB

killes put this on scratch.d.o for me so i could test it out.

fixed a bug when viewing a project node's usage an no records were available. it would display empty rows years back to 1970.

after seeing this in use killes and i came to the conclusion that the pager on the overview page was just a nuisance so i've removed it.

drewish’s picture

FileSize
14.61 KB

viewing real data on s.d.o has been helpful. i realized that my overview page was showing duplicate rows for each API version of a module. threw some SQL at to sum the rows and only list them once.

drewish’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
16.44 KB

okay so after looking at the numbers generated by the turns out that something with the left joins means i need to call SUM(DISTINCT ...) though i'm not quite sure why... it just seems to work.

for debugging purposes i'm adding an implementation of the devel module's hook_devel_caches(). i'm not sure that dww will think it's kosher so i'm moving this back down to CNR.

drewish’s picture

FileSize
16.47 KB

mikejoconnor: I seem to be having some trouble with the stats, I can sort in decending order for last week, but when I try to sort desc for this week, it sorts in desc alphabetical order instead
mikejoconnor: the date also changes from aug 12 to aug 11

the problem was that format_date() was being used to generate the column headers and the data was being cached. so depending on what timezone the first user to view the page was in the date might change. passing a default timezone now.

also, a small change had snuck into the previous patch, it was only showing the last three weeks on the usage overview page. i've changed it back to 6 weeks.

drewish’s picture

FileSize
18.74 KB

looking at the project usage pages on s.d.o i got to thinking that it'd be much more useful if you could see the multiple weeks of usage for the revisions. so, here's the patch.

drewish’s picture

killes put #39 on s.d.o for me and i noticed that there seems to be a pretty major discrepancy between the project and release tallies. the update_status module is a good example, the project's 5.x branch shows a total of 2689 sites but summing the release usage shows only 9.

i'm not sure yet where the problem originates. i'm imagining that it's one of the following:

  • project-release-serve-history.php script isn't writing a valid version string to the {project_usage_raw} table.
  • project_usage_process_daily() is not setting the release node id correctly.
  • project_usage_process_weekly() is computing the total release usage for the week incorrectly.

though it could be something totally outside my control:

  • clients are submitting invalid version strings.
  • people are using project_usage that's checked our from CVS and doesn't have a version string
drewish’s picture

on IRC merlinofchaos pointed out that the the majority of the usage was probably in the update_status's 5.x-2.0-rc2 and 5.x-2.0 releases. the {nodes} table on scratch.d.o so is a week-or-so old and the {project_usage_*} tables are just a couple of days old so it would make sense that some stuff wouldn't match up.

drewish’s picture

FileSize
21.54 KB

adopting merlinofchaos' suggestion to right align the usage numbers and number_format() them so they're easier to read.

refactored my repeated query building into _project_usage_query().

drewish’s picture

FileSize
21.86 KB

whoops that last patch omitted my CSS file.

hass’s picture

there seems like some bugs. See http://scratch.drupal.org/node/12977/usage

August 12th sum is 25, but "Previous week's release usage" says 5.

hass’s picture

i'm not the friend of hard coding something, but shouldn't we remove all 4.7.x from the lists? there is no update_status module for 4.7.x... why collecting something we don't have data for? maybe not of big interests after D6 is out and D47 is no more displayed. not sure what is the best decision.

drewish’s picture

hass, read my comments on #40 and #41. those actually don't look that far off.

drewish’s picture

hass, it displays a list of all a project's active releases. it's the simplest and most uniform way to do it. i don't want to write a special case for pre-5 API versions.

hunmonk’s picture

Status: Needs review » Needs work

patch no longer applies, most likely b/c of the recent whitespace cleanup. should be pretty easy to fix up the conflicts. this is at the top of my list now, just waiting on a patch that applies :)

drewish’s picture

Status: Needs work » Needs review
FileSize
18.96 KB

here's a re-roll. i apologize in advance for the weird way that tortoise cvs handles file additions.

hunmonk’s picture

Status: Needs review » Needs work

this is just a code review at this point. overall, the code is looking pretty good. some questions/problems are listed below:

  • what's w/ the BINARY in the table create statement? core cache tables don't use that.
  • not sure about the path for the overview page -- seems like it should live inside the 'project' path, 'project/usage' or 'project/project-usage'
  • not sure, but is 'Usage' a bit vague for the usage tab? it's probably fine, i'm just feeling like there might be a better name for it.
  • admin/modules#description? i think that's long since dead.
  • drupal_get_path('module', 'project_usage') -- we should put this in a define, too
  • instead of the WHERE IN stuff, would it be possible to INNER JOIN on {project_usage_week_project} -- not sure if that's workable/desirable or not, just thinking out loud.
  • i'm not understanding why we have to build a dummy array in project_usage_project_page() -- can you clarify that?
  • // Strip out the keys so that the odd/even row themeing works correctly.
    $project_rows = array_values($project_rows);
      

    i *thought* the problems w/ row striping for non-numeric keys was fixed in D5, no? if not, we at least need to add a note to rip this out on the port to D6

  • are we going to run into problems with the caching on the per-project pages because there are two tables there?
drewish’s picture

* what's w/ the BINARY in the table create statement? core cache tables don't use that.

The definition for the cache_ table is copied from project_release.install so you'll have to ask dww.

* not sure about the path for the overview page -- seems like it should live inside the 'project' path, 'project/usage' or 'project/project-usage'

i didn't want to add onto the list of 'reserved' project names. it seemed simpler to just put it there.

* not sure, but is 'Usage' a bit vague for the usage tab? it's probably fine, i'm just feeling like there might be a better name for it.

i'm open to suggestions but it seemed simple...

* admin/modules#description? i think that's long since dead.

ah... i thought that'd been removed... webchick had pointed that out at one point. the hook_example() hadn't been updated. i'd fixed that but i guess didn't make it back to fixing the patch.

* drupal_get_path('module', 'project_usage') -- we should put this in a define, too

is there a standard for that or is it a project*.module convention?

* instead of the WHERE IN stuff, would it be possible to INNER JOIN on {project_usage_week_project} -- not sure if that's workable/desirable or not, just thinking out loud.

i did it using the IN because it allowed me to make use of the project modules functions (and their visibility access checking) and since we're aggressively caching all the pages it only gets run once per project per week.

* i'm not understanding why we have to build a dummy array in project_usage_project_page() -- can you clarify that?

it's all related to the way that we may or may not have data for a week or for an API version. i tried doing it with bunch of left joins but it was really easier to just build an empty array and then loop through the data and fill in what we've got. i though the comment was pretty explanatory but i'll see what can be done about it.

*
// Strip out the keys so that the odd/even row themeing works correctly.
$project_rows = array_values($project_rows);
i *thought* the problems w/ row striping for non-numeric keys was fixed in D5, no? if not, we at least need to add a note to rip this out on the port to D6

good point, i'll do that in the next re-roll.

* are we going to run into problems with the caching on the per-project pages because there are two tables there?

no because they're both updated at the same time on the weekly cron run.

hunmonk’s picture

i didn't want to add onto the list of 'reserved' project names. it seemed simpler to just put it there.

there's thousands of decent project names. with our setup, the problem of reserved names is something we have to deal with -- i'd rather have an intelligent path structure here.

is there a standard for that or is it a project*.module convention?

not that i know of, but it works, and it's more efficient, not to mention easier on the eyes :)

drewish’s picture

Status: Needs work » Needs review
FileSize
18.96 KB

added a PROJECT_USAGE_PATH constant.
removed the description help item.
added a note to the array_values() work around.
added a comment to project_usage_project_page().

i'll hold off on changing the project-usage menu path. i prefer project-usage and dww seemed all right it. all the usage menu items are /node/*/usage tab based rather than integrated under /project/* menu items so i'm not sure it makes sense to force the overview page under there.

hunmonk’s picture

i'm not going to commit this with that path structure, so i guess we'll have to wait for dww to get back to hash the rest of this patch out.

drewish’s picture

Status: Needs review » Needs work

on IRC wwwebernet mentioned the following:

drewish: "Only sites that have opted to allow usage information to be tracked are included." -- that's no longer optional
drewish: http://drupal.org/node/165312 -- Removed the setting to opt out of the usage reporting. No one who's willing to run this module in the first place could possibly complain about the random string appended as a query arg on the fetch URL, and the extra setting needlessly complicates the settings page.

so i'll need to update the help text.

also i should be using project_build_query() rather than adding yet another query builder.

dww’s picture

Just catching up on issues and emails since I was gone....

I don't feel _that_ strongly about the menu paths, but I think I lean towards the separate area instead of stuffing this overview under /project/*. All the namespace collisions under /project/* are bad enough as it is, and I don't really buy the "so what's one more?" argument. But, I'm torn, so if someone really feels strongly, please provide a more convincing argument one way or the other. ;) Thanks!

hunmonk’s picture

we're splintering the overall project uri tree because of the weird way we handle the path structure, blindly giving up deeper paths to project names. i've never seen another module do this, nor have the kinds of problems project has in this area.

IMO the simple answer is to be more strict. project takes the path structures it needs, and explicitly forbids those from becoming project names. a define with an array of forbidden project names would be trivial to implement, and would allow us to reserve certain names to be used in our path structure.

dww’s picture

a) we already have such a blacklist of project names we disallow. look in project.inc and search for $reserved_names.

b) project module itself never used to give up project/[name] to individual projects. that was an evil practice started manually by d.o admins circa 3+ years ago just using path.module. however, 100s of links were already set in stone like that, so it was going to involve a huge amount of link-rot to change it. moreover, it became so much work to maintain that i decided to automate the path aliases. it all sucks, and i'd never do it like this in the first place, but alas, not everything is up to me. ;)

c) http://drupal.org/project/usage is currently available (at least on d.o). we could make "usage" another reserved word and put the overview there. perhaps some other tabs/path might be available under that for various overview sorting options, etc. the per-project and per-release usage pages should still be node/[nid]/usage, either as a link or a tab. of course, some other project* using sites might already have a project named "usage" which this would screw over. however, those sites probably aren't going to be using update_status.module and project_usage.module, so i'm not sure i care. project/usage seems nicer and easier to deal with than project/project-usage for end users...

so, i guess putting the overview at project/usage and making "usage" another reserved word (if project_usage.module is enabled) is the best, long term. seems hunmonk feels strongly about keeping everything in the same tree. aside from the evil past practices that are making this troublesome in general, there's no real reason we should avoid it here...

p.s. maybe we need a project-specific hook for populating $reserved_words correctly depending on which project* modules are enabled on your site. ;)

drewish’s picture

Status: Needs work » Needs review
FileSize
16.29 KB

i'd prefer to have as few reserved words as possible for all the other sites out there... what if there's a usage jquery plugin? that'd suck for them. but i'll save my arguing for things i really care about like improving the query builder (#171253).

so this version of the patch use project/usage as the overview page path but it depends on the query builder changes in #171253 and assumes that #173397 will be committed (it avoids it's update numbers).

dww’s picture

Status: Needs review » Needs work

A) Hopefully http://drupal.org/node/173397 will only chew up 1 DB update, so this one will probably have to be renumbered.

B) If the BINARY in project_release.install is also bogus, that, too, should be fixed. I don't remember why it's like that -- probably I was just copying core or something. And lo, system.install in D5 uses BINARY for {cache_page}, which is probably what I copied. The other {cache_*} tables don't use it. Seems like it's potentially just schema mismatch + confusion. I don't really know what the right answer is, but probably BINARY isn't needed in any of these cases, and should be removed.

C) I believe phpdoc is smart about /** and /// style comments in front of define() calls, so please convert those comments so we get the goods when api.d.o is also displaying info about contrib.

D) I didn't think it was a good idea to put code like variable_get() completely outside of any functions, just in the global php space where define() lives. Aren't there cases where we'd hit that code before things are properly bootstrapped, and variable_get() could end up failing? I don't remember specifics, but I think I've been nailed by this in the past. I'd rather see the constants be the defaults, and the call-sites should use variable_get() with the constant for the default.

Same problem with drupal_get_path() for the PROJECT_USAGE_PATH define() that hunmonk requested. In principle the define() is a nice idea, but in practice, I don't like that we're calling drupal_get_path() at unknown bootstrap levels.

E)

function project_usage_project_page($node) {
  drupal_add_css(PROJECT_USAGE_PATH .'/project_usage.css');
  project_project_set_breadcrumb($node, $breadcrumb);

$breadcrumb is used without being defined here.

...

Sorry, I'm running out of steam on this review, and I've got other stuff I need to be doing. Hopefully this is enough to keep you busy for a little while. ;)

hunmonk’s picture

Same problem with drupal_get_path() for the PROJECT_USAGE_PATH define() that hunmonk requested. In principle the define() is a nice idea, but in practice, I don't like that we're calling drupal_get_path() at unknown bootstrap levels.

i'd sure like to see some evidence where this is actually a problem. i've used this approach before and not ever run into trouble with it.

dww’s picture

http://drupal.org/node/138401

drupal_get_path() is never included for anonymous users when caching is enabled. So, if you put code outside of any functions, such that it's loaded when trying to serve a cached page, you get fatal php errors...

Vacilando’s picture

Category: task » support
Status: Needs work » Active

http://drupal.org/project/usage mentioned above ( at http://drupal.org/node/165380#comment-300375 ) is not available (redirects to groups). Is there any other place we can see how the current version of the code works with live data?

drewish’s picture

Category: support » task
Status: Active » Needs work

please, don't hijack threads.

Vacilando’s picture

With all due respect, drewish, you have overreacted. That was a sincere question. See more in a personal message I've just written for you.

hunmonk’s picture

@tjfulopp: it doesn't matter if it was a sincere question, it belongs in a new issue.

@dww: so it looks like a more general restriction about putting calls to drupal functions outside of function calls in the module...

dww’s picture

@hunmonk: indeed. i could have sworn this was already listed in the coding standards in fact, but i just skimmed it and didn't see anything about it. :(

@tjfulopp: the status of this issue is "patch (code needs work)". therefore, this code isn't even committed to CVS, much less installed on drupal.org yet. ;) if you read it carefully, my comment #58 above was talking about name collisions and if a given URL path was "available" (as in, not currently in use), not that the usage overview was already visible there. sorry you were confused by that, but please understand that it's a bad idea to "take over", "hijack", "sidetrack" (whatever you want to call it) an issue with your own personal questions by altering the status and category, etc. thanks.

drewish’s picture

Status: Needs work » Needs review
FileSize
18.04 KB

this should address all the issues dww raised. i also added 'usage' to project.inc's list of reserved names.

#171253 was committed but this is still expecting that #173397 will be committed first.

dww’s picture

Status: Needs review » Needs work
% patch -p0 < project_usage_165380_0.patch 
patching file project.inc
The next patch would create the file usage/project_usage.css,
which already exists!  Assume -R? [n] 
Apply anyway? [n] y
patching file usage/project_usage.css
Patch attempted to create file usage/project_usage.css, which already exists.
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file usage/project_usage.css.rej
patching file usage/project_usage.install
Hunk #3 FAILED at 71.
1 out of 3 hunks FAILED -- saving rejects to file usage/project_usage.install.rej
patching file usage/project_usage.module
Hunk #3 succeeded at 484 with fuzz 2 (offset -1 lines).
Hunk #4 succeeded at 582 with fuzz 2 (offset 59 lines).
drewish’s picture

Status: Needs work » Needs review
FileSize
16.78 KB

whoops, i guess that CSS file got included in one of the other patches that got committed. okay here's a clean re-roll.

dww’s picture

FileSize
17.72 KB

Here's a new version that restores the change to project.inc to add 'usage' to the blacklist of $reserved_names, and adds a newline at the end of project_usage.module...

dww’s picture

Status: Needs review » Needs work
FileSize
130.17 KB

Something appears to be broken with the .css, at least on s.d.o. The table cells are aligned right, as expected, but the table headers are not. This makes things quite hard to read, IMHO. I don't have the time or energy to debug the CSS, but I disabled the CSS aggregator, cleared both the drupal cache and my browser cache, and I'm still seeing the confusing alignment (see attached screenshot).

drewish’s picture

Status: Needs work » Needs review
FileSize
17.27 KB

how's this?

drewish’s picture

FileSize
18.33 KB

wait that lost the project.inc change...

dww’s picture

Status: Needs review » Needs work
FileSize
18.62 KB

It was the padding that was bothering me, the text alignment itself was working, I guess. Attached patch (now installed on s.d.o) fixes the padding and simplifies the .css (I think -- someone should verify it's still proper).

However, I noticed another problem. :( The breadcrumbs can be inconsistent with what you'd expect. For example, if you start at the overview:

http://scratch.drupal.org/project/usage

and then decide to drill-down into a specific project's detailed usage:

http://scratch.drupal.org/node/3060/usage

you'd expect the breadcrumbs to lead you back to the usage overview page. However, now you're on a tab on the project node for core, and the breadcrumbs send you back to the download pages. :( Obviously, there's no way for the same page to have 2 sets of breadcrumbs and other navigation elements, depending on how you got there.

I see a few options:

A) Put all the per-project pages at /project/usage/[project-name] and have the breadcumbs send you back to the Usage overview. On the project nodes we have a link to usage, but not a tab. On the usage report for a project, there's a link to that project's node. Leave the per-release usage as a tab on the release node, but include a link to "View usage for all releases of [project-name]".

B) Leave the per-project pages as tabs on the project nodes, but add a the link to the usage overview page there. We'd probably still want to add the link on the per-release usage page to that project's usage summary.

C) Move everything to /project/usage/[project-name]/[version]? -- no tabs anywhere, but links to the corresponding nodes all over the place, and links on project + release nodes that point to the usage reports.

None of these options suck, but I don't feel strongly in favor of any of them. There might be an even better solution someone else comes up with...

drewish’s picture

dww, i'm not convinced that's really a problem. that's the way the navigation's worked since the very first versions of the patch.

i'm not really into any of the alternatives you laid out. i think of the overview page as a separate section that's more oriented at users of the modules. i see the the project and release usage pages as targeted at the maintainers who need to see the per version/release break down. maybe the way to resolve it would be to have project/usage link to the project pages rather than the node/*/usage pages? and really what's wrong with using the browser's back button?

hunmonk’s picture

maybe the way to resolve it would be to have project/usage link to the project pages rather than the node/*/usage pages?

that wouldn't really fix the breadcrumb issue, though, would it? plus, it would add a click to get to the project's usage, which seems wasteful...

so, my assessment (as i understand the problem): this seems more like a general drupal limitation (can't have multiple breadcrumb trails) than a problem with the structure of the module.

i'm not really crazy about any of the proposed solutions. what about the option of both node/*/usage and project/usage/* pointing the the same output function? then we can use project/usage/* on the overview pages as the link.

hunmonk’s picture

as painful as it may be, we need to shelve this issue for just a bit while we work out some larger UI issues, at which point we'll know where to take this issue. fortunately, this should be fairly temporary. we'll be working that out this week, hopefully.

drewish’s picture

I think you both were on the cc for Kieran email earlier this week in which he said "Rating modules by quality has become the number one issue for Drupal.org." so I know that I'm not the only one that would like to get this moving. I'd be open to getting a scaled back version of this committed. Perhaps we just add the overview page now (with the links to the project nodes), and save the tab and bread crumb arguments until after you've figured out your UI guidelines.

dww’s picture

drewish: just so you understand, you're *not* the only one who wants to see this moving. i'd love to get it up, and have been spending a fair bit of time with this issue/patch. thing is, the link rot problems will be real if we deploy prematurely, and it seems like the only real solution to the troubles i pointed out are to have a clear picture of how the project node UI *should* be and how this usage reporting fits into it...

this is definitely not a case of "this is unimportant, so we're going to delay it while we work on something more interesting..."

this is a case of "this is important, and will be very high visibility, and lots of people will care about how it works, so let's try to get it right..."

yes, these problems are stemming from my own (not clearly thought-out) suggestions early in the thread about using tabs for everything. ;) apologies again for that. so, please, don't take it personally or feel discouraged. i have every intention of seeing this through to successful completion and deployment, it just might take another few days so we're really clear on the plan.

just want you to understand my own intentions with this issue, and that i appreciate your efforts (and empathize with your frustration).

cheers,
-derek

alpritt’s picture

re. #75

My preference for a path would probably be /project/[project-name]/usage and /project/[project-name]/[release]/usage. That seems like the most logical hierarchy to me, but perhaps there are technical reasons against this? In line with this, my preference for the overview page would be something like /project/all/usage.

I'd avoid using tabs since they tend to be a bit ambiguous if you don't already know where they lead. On the project page, I'd have a link next to a summary of that project's stats for 'more details'. Then I'd provide links on each usage page as in option 'C' of comment #75.

And forget about the breadcrumbs. I don't think they make sense here at all.

drewish (#76) wrote

i think of the overview page as a separate section that's more oriented at users of the modules. i see the the project and release usage pages as targeted at the maintainers who need to see the per version/release break down.

As far as users are concerned, I would say we want to make the summaries useful enough that they rarely feel the need to head into these deeper sets of data. In other words, if this were a book, the pages created with this patch would be like the appendices. How to present the most useful data in the main pages will require some careful thought, but that's another issue.

hunmonk’s picture

My preference for a path would probably be /project/[project-name]/usage and /project/[project-name]/[release]/usage

this path structure is going away, see here:

http://drupal.org/node/180316

hass’s picture

Sorry for asking, but when will users able to see this killer info on d.o? I think this issue should go forward...

hunmonk’s picture

when we've got everything in position as we need it. it's delayed for a good reason, which i believe the last several posts have made quite clear -- please re-read them.

hass’s picture

Well i read them, but i don't see any process on them, too.

hunmonk’s picture

i don't see any process on them

i have no idea what this means.

hass’s picture

Well i read them, but i don't see any process on the linked issues, too.

drewish’s picture

i think it's pretty clear that hass meant to say progress.

hunmonk’s picture

@hass: to put things in perspective, and so that we're all on the same page, please take a minute to review the project roadmap for 6.x: http://groups.drupal.org/node/6180

i think if you'll check the commit logs, you'll see quite a bit of progress -- just because this issue isn't moving at this very moment does not mean that progress isn't being made towards it, or that we're ignoring it, or that we don't consider it important.

the issue that needs consideration before we go forward here has already been mentioned: http://drupal.org/node/180316

outside of my posts, and one other post by drewish, that thread is completely empty. it would be nice if some other members of the community would weigh in there.

hass’s picture

Looks like there is much work to do :-). With "no progress" i simply mean issue #180316.

I thought we get this feature in D5 times and are not made to wait for D6, but with this big To-Do list i understand that we may need to wait for D6... :-(

drewish’s picture

FileSize
18.14 KB

based on our drupalcon conversation about the paths i've updated this patch.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
17.57 KB

minor re-roll with some comment changes

pwolanin’s picture

oops- missing blacklist change

dww’s picture

Status: Needs review » Needs work

The links in the overview table at /project/usage still point to node/N/usage, not project/usage/N. Otherwise, this is looking very close.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
18.75 KB

fixing links

pwolanin’s picture

flip the order of the 2 tables for an individual project.

pwolanin’s picture

doh - need concat

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

ok, this seems to work fine on project.d.o, and making it RTBC with the following caveats:

this should go in after the release.
we need to figure out how and whether to page the main overview.
we may want to define theme functions to avoid hard-coding HTML like $output .= '<h3>'. t("Recent release usage") .'</h3>';

I think these are minor compared to getting this in so we can work on the namespace issues.

hass’s picture

After what release? This should go in before 1.2 get released... i tried to see something on project.d.o, but cannot find anything... is it hidden somewhere?

pwolanin’s picture

@hass - no this was for after 1.2 (which is now released), since dww/hunmonk didn't want any risk of an unstable release.

drewish’s picture

pwolanin, your changes look good to me. thanks for carrying this across the finish line.

drewish’s picture

dww, any thoughts on this? seems like it's been running fine on p.d.o.

hass’s picture

As long as this don't get visible on d.o - are we able to make an update of the database on p.d.o? :-)

Aside, how is it possible that Drupal them self have 17,799 for 6.x on March 2nd, but only 106 in sum on Recent release usage?

pwolanin’s picture

@hass - the DB is a copy from when we were at Druplcon, so probably the entries that were current then are falling out of the "recent" category.

hass’s picture

Not really... only as one example - i released sections 6.x-1.0 on ~10. Feb, googole_analytics on 15.2 and it is not shown as release on the project page.

drewish’s picture

dww, any chance of getting this committed before soc 2008 starts up ;)

hunmonk’s picture

the version on this issue is currently set to 5.x-1.x-dev. is this patch really for the 5.x-1.x-dev branch, or has it been updated to 5.x-2.x-dev? i hope it's been updated, because 5.x-1.x-dev is about a week away from becoming unsupported... :|

drewish’s picture

not sure what you mean... there's no DRUPAL-5--2 branch... heck there's not even a 2.x listed in the version box so i don't know how you could be that close to making it unsupported. but anyway i just checked and it still applies cleanly to HEAD.

aclight’s picture

I think hunmonk was mixing up project and project_issue. Project issue has a 2.x branch, but project does not.

drewish’s picture

ah! that makes sense. i was pretty confused since the project module seems to be very conservative in forcing upgrades.

hunmonk’s picture

oh whoops -- this is against project, right. ok, thanks for clearing that up for me guys.

hass’s picture

And what's about committing the patch? :-)

hunmonk’s picture

@hass: do you get paid to be annoying? at the very least it should be listed as a skillset on your resume...

webchick’s picture

@hass, unless you have feedback related directly to your testing of this patch, then PLEASE refrain from commenting on this issue. We'd all like to see it committed, and if you keep chiming in with unhelpful "Is it done yet?" comments, it's likely to just annoy the "powers that be" and make them put it off longer just to spite you. Knock it off, please.

hass’s picture

Sorry, I've not posted this to annoy someone - I only tried to get the *focus* back on drewish's question in #106 after 6 more or less unrelated comments. Maybe someone could nevertheless answer his question.

hunmonk’s picture

@hass: the posts following #106 were intended to clear up a misunderstanding i had about the state of the patch, and thus were relevant to the issue.

i can tell you with great certainty that none of the project maintainers need your help maintaining focus on any patches. your method is annoying, period -- and i think that it's been made clear to you numerous times in the project issue queue. i'd appreciate it if you could refocus your efforts and find a way to better work with this group of developers.

hunmonk’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
19 KB

attached cleans up this patch some more, fixing an XSS hole and some shaky SQL syntax, and removing some dead code. seems to work well for all views i checked. i'd like somebody else to give this one more lookover, especially with respect to security, and then i think it's ready to go in.

aclight’s picture

I didn't take a thorough look at this, but noticed a few minor things:

1. It might be worth adding a line of code comments near the SQL queries that add unescaped %s placeholders indicating why it is safe (assuming it is) to use these unescaped, since that's a red flag just glancing at the code.

For example, on this line and a few below it.

$sql_elements['fields']['pieces'][] = "SUM(DISTINCT p%s.count) AS week%s";

2. There's a typo in a code comment:

+  // number of distinct terms in use. This *could* be done with LEFT JOINs it
+  // ends up being a more expensive query and harder to read.

needs to be "...with LEFT JOINs but it...."

drewish’s picture

aclight, i'm not going to have the time to get to this any time soon so if you want to make any changes feel free, i'm not territorial ;)

hass’s picture

Status: Needs review » Needs work

MySQL 4.1.10a on page project/usage

user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DISTINCT p0.count) AS week0, SUM(DISTINCT p1.count) AS week1, SUM(DISTINCT p2.co' at line 1 query: SELECT n.nid, n.title, SUM(DISTINCT p0.count) AS week0, SUM(DISTINCT p1.count) AS week1, SUM(DISTINCT p2.count) AS week2, SUM(DISTINCT p3.count) AS week3, SUM(DISTINCT p4.count) AS week4, SUM(DISTINCT p5.count) AS week5 FROM node n LEFT JOIN project_usage_week_project p0 ON n.nid = p0.nid AND p0.timestamp = 1209254400 LEFT JOIN project_usage_week_project p1 ON n.nid = p1.nid AND p1.timestamp = 1208649600 LEFT JOIN project_usage_week_project p2 ON n.nid = p2.nid AND p2.timestamp = 1208044800 LEFT JOIN project_usage_week_project p3 ON n.nid = p3.nid AND p3.timestamp = 1207440000 LEFT JOIN project_usage_week_project p4 ON n.nid = p4.nid AND p4.timestamp = 1206835200 LEFT JOIN project_usage_week_project p5 ON n.nid = p5.nid AND p5.timestamp = 1206230400 WHERE n.nid IN (SEL in /srv/www/webX/html/includes/database.mysql.inc on line 172.

Applying to D5-dev have a hunk. HEAD works well.

hass’s picture

Additional the above SQL statement looks like a performance issue. It select's all nodes of all node types from node table and tries to JOIN for a sum of usage that is never available in project_usage_week_project. I think there should be a WHERE n.nid AND n.type IN ('project_project') to limit the node table resultset.

The same issue seems to be in project_usage_project_page() and should limit to type 'project_release'.

// - The LEFT JOINs mean we have to limit the entries in {node} so that we're
// not including things like forum posts, hence the WHERE IN below.

Looks like someone was aware about this performance issue... :-)

// - Each project may have multiple records in {project_usage_week_project}
// to track usage for API version. We need to SUM() them to get a total
// count forcing us to GROUP BY. Sadly, I can't explain why we need
// SUM(DISTINCT)... it just works(TM).

i would say "it just don't work" on mysql 4.1.10

hass’s picture

Here is how this SQL could looks like as i understand the query... but currently i really don't know what i must change in this $sql_elements for project_build_query to get this resulting sql statement!?

EDIT: deleted... there was something wrong in the statement group by

drewish’s picture

hass, i haven't retested the latest version of the patch i'm guessing it's probably a bug with your version of mysql... they're up to 4.1.24... since you're using a 3-year-old version you might want to upgrade first. i googled for mysql 4.1 subselect and found tons of bug reports.

hass’s picture

http://drupal.org/requirements
# Drupal 6 supports MySQL 4.1 or higher.
# Drupal 5.x and earlier supports MySQL 3.23.17 or higher. MySQL 4.1 or higher is strongly recommended.

Upgrading is difficult in a shared hosting and I'm waiting for my vServer ISP... they plan to upgrade from SuSE 9.3 to a newer version... but until this will happen I'm at this old version... but nevertheless the server meet the Drupal requirements - so i should be fine.

aclight’s picture

@hass: Those requirements are for drupal core, not for contributed modules. You'll notice that project_update also does not even support pgsql, which according to http://drupal.org/requirements is supported. I'm not about to install a 3 year old version of server software on a test site to see if what we're writing is compatible. There are lots of ISPs running updated versions of server software, so I don't feel that we need to spend extra effort to support legacy software, especially given that the number of sites running project* is relatively small.

hass’s picture

I'm sure that all mysql 4.x are broken and this module shouldn't be limited to mysql 5.x :-(. Aside it's the best ISP in germany with the highest number of user_beancounters and most memory - and this is very very important

aclight’s picture

I see no problem at all limiting it to 5.x

hass’s picture

For D7 this is ok, but for D5 i think not - as we should support 3.x in general and I like the core requirements. I think a module should follow this requirements or people start thinking drupal is bad and not working... however i'm not testing in 3.x, too. Aside we do not have a .info tag for minimum mysql versions - isn't it? And such a limitation is not yet documented.

However if we are able to fix this, we should fix it. The limiting to node types is nevertheless a bug in all versions.

webchick’s picture

Let's get this IN, please, and someone can provide a patch later to provide backwards-compatibility.

PLEASE. Stop. Derailing. This. Issue.

hass’s picture

@webchick: there are important performance issues that should be fixed first - see#121

drewish’s picture

Status: Needs work » Needs review

hass, #121 is not a blocking problem. if you look at the part of the query you're running into problems with you'll see that it is matching n.nid IN () where we pass in a list of node ids. it's not like it's going over every record in the db, i'm not certain that supplying both type and nid would be helpful since there's not an index for both nid and type. but since i don't have a copy of the d.o db dump to benchmark the queries, i'm just speculating.

in either event that isn't a huge deal because the resulting pages are cached for a week so it's not the end of the world if the query isn't perfect.

hunmonk’s picture

FileSize
19.93 KB
  • the naked %s stuff is not a security issue, and i've noted it as such in the code. using those is certainly less ominuous the just firing varibles in there.
  • fixed the grammer in the code comment
  • i played with using AND n.type = 'project_project' instead of the nid list, but that pulls in projects that have no entries in the usage table at all. the current approach seems better IMO, and since the query is cached, i'm not all that worried about a slight performance issue (if it's even an issue).
  • this entry in the mysql manual seems to confirm that SUM(DISTINCT expr) doesn't work on < 5.x:
    Here's a workaround for the lack of SUM(distinct expr) in MySQL v4.x:
    
    SUM(distinct expr) should sum unique values of a group.
    Since version 4 do not allow it, what you can do is divide a regular SUM(expr) by all count(distinct expr2) of other tables in the join which initially casued a cartezian result set.
    
    for example, the following query
    
    select a.a, sum(distinct b.b), sum(distinct c.c)
    from a join b on a.id = b.aid
    join c on a.id = c.id
    where a.id = 1
    
    can be implemented as :
    
    select a.a, sum(b.b)/count(distinct c.id), sum(c.c)/count(distinct b.id)
    from a join b on a.id = b.aid
    join c on a.id = c.id
    where a.id = 1
    

    hass, you're welcome to submit a patch to try this suggestion, _after_ this patch lands, if you really want things to work on < 5.x -- and i'll review and commit if it solves the issue. this is not an issue which should hold up this patch any longer, as it is of primary importance to get this working on drupal.org. this will be a wonderful opportunity for you to put your time or money where your mouth is :)

i'd really like at least one other person to give this a decent review before i commit. any takers?

aclight’s picture

Status: Needs review » Needs work
FileSize
21.51 KB

1.

+function project_usage_project_page($node) {
....
+  $sql_elements['wheres']['pieces'] = array('n.nid IN ('. implode(', ', array_fill(0, count($releases), '%d')) .')');

gives these errors

warning: array_fill() [function.array-fill]: Number of elements must be positive in /Applications/MAMP/htdocs/drupal.org/sites/all/modules/project/usage/project_usage.module on line 225.
warning: implode() [function.implode]: Invalid arguments passed in /Applications/MAMP/htdocs/drupal.org/sites/all/modules/project/usage/project_usage.module on line 225.
user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') ORDER BY n.title DESC' at line 1 query: project_usage_project_page SELECT n.nid, p0.count AS week0, p1.count AS week1, p2.count AS week2, p3.count AS week3, p4.count AS week4, p5.count AS week5 FROM node n LEFT JOIN project_usage_week_release p0 ON n.nid = p0.nid AND p0.timestamp = 1211068800 LEFT JOIN project_usage_week_release p1 ON n.nid = p1.nid AND p1.timestamp = 1210464000 LEFT JOIN project_usage_week_release p2 ON n.nid = p2.nid AND p2.timestamp = 1209859200 LEFT JOIN project_usage_week_release p3 ON n.nid = p3.nid AND p3.timestamp = 1209254400 LEFT JOIN project_usage_week_release p4 ON n.nid = p4.nid AND p4.timestamp = 1208649600 LEFT JOIN project_usage_week_release p5 ON n.nid = p5.nid AND p5.timestamp = 1208044800 WHERE n.nid IN () ORDER BY n.title DESC in /Applications/MAMP/htdocs/drupal.org/includes/database.mysql.inc on line 172.

if you go to project/usage/N where N is a project without any release nodes. The PHP help for array_fill() specifically states that an E_WARNING is thrown if num is < 1. The attached patch fixes these errors by returning early on if there are no releases for a project.

In addition, the attached patch adds two theme functions that project_usage_project_page() calls instead of hard coding in HTML.

The attached patch also adds "WHERE n.status = 1" to the query in project_usage_project_page() to keep any unpublished project_release nodes from appearing in the tables.

One concern I have with the code (and the reason I'm leaving it to CNW) is that we're caching the usage tables, however administrators and regular users will see the same cached version of these tables. If there are unpublished projects or releases and it's an admin that sets the cache, regular users may also see those releases/projects. We might also want to add a "WHERE n.status = 1" in project_usage_overview() as well, but I haven't added that in my patch.

2. Along these same lines, in project_usage_project_page() and project_usage_release_page() we're displaying the title of the project/release node and displaying usage data without first checking to make sure that the user has the proper permission to view the node in the first place. I realize that on d.o this shouldn't be a problem, since there are no access restrictions on viewing projects and releases, but we probably should call node_access('view') from project_usage_dispatch() before handing off to _release_page() or _project_page().

3. Instead of displaying an empty usage table for a release node without any usage information, the attached patch adds another theme function and in the case where there is no usage information a text string stating this is displayed instead of a table.

I think that the above issues need to be given a bit more thought, as they can lead to revealing information I don't think we should reveal.

The following issues are minor. The changes I made in the patch don't address any of these issues. I don't think any of these should hold up this patch, as they can be done in separate issues. However, I thought I'd bring them up here in case others deem them to be more important than I do.

4. It'd be nice to be able to go to project/usage/project_issue, for example, instead of project/usage/nid. This is possible with the project_issue module, and I don't see how it would be a problem here.

5. We're attempting to display usage information for versions of modules prior to D5. Since the update_status module didn't exist previous to D5, we won't expect any release nodes there to have any usage numbers. Should we even display such release nodes on the project pages? It's probably more trouble than it's worth to exclude them, but just a thought.

6. The overview page at project/usage which displays usage of all projects won't have projects listed that are younger than up to 1 week. In other words, since that page is only regenerated once a week, new projects will not be listed until the page is regenerated.

7. It might be nice to make the project or release title that's displayed at project/usage/nid actually link to the project/release node. This should be easy to do, but maybe it's not desirable? If not, I can't think of a good reason.

dww’s picture

@hunmonk: thanks for your patch and efforts on this.

@aclight, thanks for the thorough review and new patch, too.

@all: Although I haven't looked at the latest patches, this is clearly getting close. Seems like these are the only things holding up the patch from #133:

- The last part of 1. about what version is cached. Just like the project browsing pages, we'll have to cache 2 copies -- one for admins, and another for everyone else.

- 2. (which isn't fixed in the patch, right aclight?)

I can see all of 4-7 generating support requests once this goes live, so I'd love to see those fixed, too. But I agree that separate issues seems good for those.

I'd like to review the final patch before this is committed. Once someone fixes these 2 last bits, we should be ready to get this in. Yay!

aclight’s picture

The attached patch fixes the following issues I brought up in comment #133:

2. node_access() needs to be called from project_usage_dispatch().
4. project/usage/[uri] is now a valid path.
7. project and release titles are now links back to the respective project or project_release node.

I also discovered a new minor bug that's fixed in this patch:
8. I changed project_usage_dispatch() so that if $key == 0, it returns drupal_not_found() instead of the overview page. The reason is that project_usage_help() does not fire if the URL is project/usage/0, and I thought it would be best to return a 404 error in this case.

Still to be fixed are the following issues from above:
1. Caching of the various pages created here could reveal information to users who otherwise would not be able to see that project/release node exists.

@dww: In comment #134, you state "Just like the project browsing pages, we'll have to cache 2 copies -- one for admins, and another for everyone else.". As far as I can tell, the project browsing pages *aren't* cached, at least not in the public version. Maybe d.o is running a patched version, however. If you were referring to the download tables, that's a bit of a different situation since there we're just caching one version without "edit" links and one with "edit" links for the release nodes.

In this situation, we're caching the names of projects or releases. We really would need to cache per-user, because node access control modules would also provide the possibility of per-user access to nodes. This is a similar situation to what we had with the project_issue filter that links [#XXXXX] to the project_issue node. In that issue we decided that if any node access control module was in place, then we wouldn't cache anything. We could do the same here, and at least on most sites (including d.o) caching would be effective. One minor concern is that if a project node were unpublished, the cache wouldn't be automatically be reset, unless we implement hook_nodeapi() and reset the appropriate cache any time a project node or release node is modified.

5. We might not want to display usage info for releases with version prior to D5, since the update_status module didn't exist then. Fixing this will likely require a setting in the project_usage configuration page where an admin can select which API compatibility terms should be included in the project_usage queries when generating a list of release nodes for a project. This should go in a new issue as nothing is broken right now.

6. New projects won't be listed on the overview page. I think fixing this would require implementing hook_nodeapi() and resetting the appropriate cache when $op is insert and possibly update. Unless we decide to also do this to fix #1, this should also be a separate issue since it's not trivial to fix now.

So, I think the only issue that needs to be fixed before this is ready is #1.

hass’s picture

aclight’s picture

@hass: Clearly, we are not going to hard code anything into the module, because that would make it useless for sites other than d.o. Granted, it's not too likely to be used by other sites anyway, but there's a chance it would be. Drewish may not have wanted to add support for eliminating certain versions from the tables, and I'm also not advocating that we do so in this patch.

hunmonk’s picture

Status: Needs work » Needs review
FileSize
23.18 KB

to address the points in #135:

  • #1 -- i agree w/ aclight that we should handle this the same as the project_issue filter case, therefore i have created a function in the project module that performs some simple tests to make sure that caching can be done in this case. the code went in project module so that other modules may leverage it in the case where they may need to make the same kind of caching decision. if somebody has a spiffier name than project_can_cache(), i'm all ears...
  • #5 -- i believe solving this is out of scope for this patch, so if somebody could create a separate issue for it, that would be great... :)
  • #6 -- all these edge cases are most easily solved by expiring the cached page at a reasonable interval. i've chosen 24 hours as the default, but made it a variable that can be overridden in settings.php if somebody really wants to tweak it.
hass’s picture

Follow up for MySQL 4.x support http://drupal.org/node/263407

aclight’s picture

Should we add a note to the README.txt file mentioning the conditions under which the usage data won't be cached?

Something else I noticed that's been in for a while. In theme_project_usage_project_page(), we should be using single quotes instead of double quotes.

  $output .= '<h3>'. t("Recent release usage") .'</h3>';

I've fixed that in the attached patch (but that's all I've changed).

All of the code looks good to me. I still haven't tested this most recent patch (#138/#140) to make sure it works, however.

aclight’s picture

A few more fixes:

1. We weren't including n.status = 1 in the WHERE clause of project_usage_overview().
2. I think we need to call project_can_cache() before where we check to see if a given page has a cache set in the first place. This is in both project_usage_overview() and project_usage_project_page().

Attached patch does both of the above.

davidwhthomas’s picture

subscribing

dww’s picture

Status: Needs review » Needs work
FileSize
23.3 KB

I was hoping to get this in tonight, but I'm running out of time and steam. :( I've updated the patch to fix the DB update number (earlier tonight I committed #168009: Prevent/limit project_usage abuse which already defined project_usage_update_5001()), and correct some trivial code style problems and comment typos.

Mostly, I think this is ready, but my eyes started bugging out of their sockets when I was reviewing the SQL in project_usage_overview() and project_usage_project_page(). ;) It's not obvious (yet) just from reading the code and comments, why we need a separate LEFT JOIN for every week in the table we're displaying. That just seems evil and wrong. But, I haven't thought about it much (and not at all with a well-rested brain), so maybe it'll make more sense when I look at this again...

@drewish, if you want to explain this part in English, I'll try to work it into the code comments once it all makes sense to me. I'm only setting to CNW for better comments about this. If there's a way to do it via more self-evident SQL, all the better, but I'm not going to hold up the patch for that.

Semi-related, I'm not 100% sure what we gain from using the crazy project query builder code here. That code is (historically speaking) on its way out. I'd rather we weren't making more use of it here, if we don't need it. If we're iteratively adding LEFT JOINs and such, I guess it makes sense, but if there's a better way to express that, perhaps we can just build up the query directly and comment it, instead of using the somewhat obfuscated query builder. Again, not a deal breaker, but I thought I'd mention it.

hass’s picture

Cool that i'm not the only who don't like this "crazy query builder code" very much. I would also be happy if this get's changed to a normal query as this would also allows me to provide a patch for #263407: Make usage statistics mysql 4.x friendly.

drewish’s picture

if you read back up the issue originally i'd had my own query builder but pulled it out to try to reduce the duplication.

i don't know what else i can write about those queries. i spent quite a bit of time drafting the comments in my initial patch to explain my reasoning. a year later they still seem like good explanations and any further thoughts i had at that time have been forgotten.

as for all the left joins, i'd be happy to see something better. the two problems (which i describe in the comments of the last patch i'd posted) are missing data and sortability. when you've got data with no gaps inner joins would be much better but when a new project or release appears you won't have a complete data set and inner joins end up ignoring the data you do have. if you want the sql server to do the sorting rather than PHP you've got go through some contortions to get the numbers summed up correctly. i spent at least a week trying to come up with a way to get that data out in manner that allowed sortable tables and that was the best i could come up with.

drewish’s picture

oh, i'll throw in a nit pick too. the docs for project_can_cache() don't follow the convention of having a single line describing the purpose followed by a further explanation.

dww’s picture

Assigned: Unassigned » dww

@drewish: Yes, I know, I suck that this has dragged on for so long... Again, my apologies about that.

Re: LEFT JOIN: I understand the need for LEFT JOIN in cases where a project has no data for a given week. What I don't understand is the separate LEFT JOIN for each week. Is that for the "sortability" problem? As in, you can sort projects by usage for any given week, and the only way to do that is to have each week's data in a separate field in the query, and the only way to do that is to JOIN the same table over and over? Is that really necessary? ;) /me ducks. ;)

Re: the query builder: Yes, I'd much rather use the existing query builder than write a new one. I was just balking at the fact that we were using the query builder predominantly to iteratively add all the LEFT JOINs. Thinking we might not need that, I was questioning the need for a query builder at all. Assuming we need it, I'm happy to leave this as is.

Assuming I understand all of this correctly, I'll update the comments and get this in. Please let me know if I'm going astray with any of this explanation.

Thanks,
-Derek

drewish’s picture

you know at this point i'm having a hard time making heads or tails of the queries. there's been several people making changes and in some places it not clear why. i.e. why did something as simple as AND n.status = 1 get changed to AND n.status = %d with the value one getting set way down below?:

+  $args = array_merge($fields_args, $joins_args);
+  $args[] = 1;   // n.status == 1

and why do we need separate arrays for the field and join args? couldn't we just store those into $sql_elements['fields']['values'] and $sql_elements['joins']['values']?

i'd try to clean some of this up but i don't have a d.o dump anymore.

drewish’s picture

realized i didn't answer the join question. yes, your description sounds pretty accurate to me. sorting requires separate columns and you've got to do a pivot (with the sum/group by) to get the data into the columns.

aclight’s picture

i.e. why did something as simple as AND n.status = 1 get changed to AND n.status = %d with the value one getting set way down below?

I did it this way because for D6, all values, regardless of whether or not they are really variables or not, need to be provided as arguments to db_query and use placeholders. This isn't required for D5, but I figured we might as well do it right now instead of needing to revise later.

drewish’s picture

aclight, i hadn't heard that. got a link to the documentation?

aclight’s picture

@drewish: http://drupal.org/node/114774#db-query-standard

It seems like this is a requirement for pretty edgy cases which I didn't even realize were supported, but it's at least documented.

drewish’s picture

acligh, ugh... don't know how i missed that one. well i guess i'll be writing some patches to update some SQL.

hunmonk’s picture

given that we're caching the query results, i think it's fine to exchange a monstrously ugly query for more user flexibility. i also suppose it's fine to yank out the sorting, too -- it's not like it's a critical usability feature. my feeling is to leave it in, since it's already done and working

dww’s picture

Yeah, I'm fine leaving it all in, so long as a) I understand it and b) the comments in the code are clear enough. ;) I think the only reason this is in CNW is for me to revise those comments. After that, I've just got a little bit more of the patch to actually review, then we're (hopefully!) done.

Pasqualle’s picture

@#150-153

need to be provided as arguments to db_query and use placeholders

that document is little flawed. it is just a one man crusade as I see it..
http://drupal.org/node/114774/revisions/view/189503/190765
http://drupal.org/node/172541#comment-796736

aclight’s picture

@Pasqualle: I figured that was the case but considering it's been part of the "official" list of changes that need to be made for upgrading to D6 for almost a year now and nobody else has removed that blurb, it's probably best that we try to comply. If I hadn't already made all of the changes necessary in the D6 port of project than maybe we'd just decide not to bother, but I have and so we might as well try to be consistent. Obviously for D5 this is even less of a consideration, but I was just trying to save myself a bit of work that I'd need to do later.

Pasqualle’s picture

@157
off topic:
lets see how long will it stay..
#271142: constant values in db_query

markus_petrux’s picture

subscribing

Edited: sorry if this has been described somewhere else, and sorry if this is not the place for this, but I have a couple of questions and don't know which door to knock about it. I think many other contributors would love to see this feature in d.o because you'll get an idea of how many people is using your module, or which versions, so you can focus on new features and/or keep working in versions that have a significant user base.

- Will this feature be used in the version of project module running here at d.o?
- Is there anything that you guys need help on to push this feature in?

webchick’s picture

a) Yes. That's pretty much the entire point. :)
b) This patch needs to be reviewed, and brought up to a RTBC state. If you're interested in helping, the http://drupal.org/project/drupalorg_testing profile will get you up and running with Project* and friends.

aclight’s picture

@markus_petrux: Please don't edit issue comments except for typographical mistakes. When you edit a comment, those of us who follow issues via email do not get sent the updated comment, and so are unlikely to (ever?) see your edited comments. Instead, post a new followup.

drewish’s picture

Chatted with dww on IRC today about this patch. Seems like it's waiting on some enhancements to the PHPDocs and pulling out some of the changes made based on hswong3i's bogus D5->D6 update instructions.

SeeSchloss’s picture

Just subscribing, any vague idea of when and if it will be available ?

dww’s picture

Status: Needs work » Needs review
FileSize
23.95 KB

Finally made some time for this issue again, since I want to include it before the project 5.x-1.3 release.

A) Clarified a bunch of comments

B) Ripped out some of the weird query code using %s and such instead of {$i}. That was a misunderstanding, even of hswong3i's bogus porting instructions. These weren't integer literals in the query, they're suffixes for field and table aliases to ensure uniqueness. So, they need to be substituted properly via PHP string concatenation long before they ever make it to the db_query() level. This simplifies a lot of the query code, too.

C) Fixed some minor code style issues (whitespace, etc)

I copied over the most recent data from the d.o DB for the {project_usage_*} tables, and deployed this on project.d.o:

http://project.drupal.org/project/usage

http://project.drupal.org/project/usage?sort=desc&order=Oct+12 is pretty useful. ;)

At this point, I think this could go in, but it's after 2am here, so I'd like to get some more eyes on the patch and some more testers on the test site before I do.

Note to testers: I didn't do a complete sync of the d.o database with project.d.o. So, things will look a little weird as you drill down, since there are a lot of releases that don't exist in project.d.o's copy of {node} etc. So, a lot of tables are missing rows for releases that should be there. For example:

http://project.drupal.org/project/usage/3060

The upper table only has up through 6.0-beta2 and 5.3. However, the lower table (per week) should be pretty accurate, including all of the 6.x releases through 6.6. Hope that makes sense. It's more of a pain in the ass to sync the entire d.o DB over to p.d.o, so folks should just test this as-is, and realize that some of the weirdness you might see will just be caused by the fact that not every release (or even project) exists on p.d.o...

scor’s picture

thanks dww for installing the patch on project.d.o, it's much easier to try ;) I've played around with project.d.o and could not find any bug. Skimmed through the patch ok. I'm not an expert with project* so I'll let someone else set this to RTBC.

sun’s picture

Status: Needs review » Needs work

I clicked a bit around, and all of this looks stunning!

The weekly project usage values somehow never match the recent release usage values (in total), but I guess that's what you meant with incomplete data. Anyway, for example:
http://project.drupal.org/project/usage/29351 (Signup)
http://project.drupal.org/project/usage/181465 (Wysiwyg)

We might want to consider to span http://project.drupal.org/project/usage over multiple pages, maybe having 200 rows on each page.

I also had a look on http://project.drupal.org/project/issues/statistics to ensure that everything still looks correctly there.

On the patch: only minor issues AFAICS:

- Wrong table name in PHPdoc:

+/**
+ * Add a cache table {cache_project_release}.
+ */

- Just return array() here:

+function project_usage_perm() {
+  $perms = array(
+    'view project usage',
+  );
+  return $perms;
+}

- Table alias for project_usage_week_project is sometimes 'puwp', and sometimes 'p'.

After reviewing the code, I see the culprit now. So, maybe defer pager_query() for later, if at all.

dww’s picture

Status: Needs work » Needs review
FileSize
23.99 KB

Right, a pager_query() is going to be a huge pain. Definitely for a followup issue, if ever.

If we're only querying {project_usage_week_project}, I don't care if the alias is only p, especially since those table names are already a little hard to read with the {$i} stuff.

Fixed the other two minor code style and comment problems you found, thanks.

dww’s picture

Oh, yeah, re: "The weekly project usage values somehow never match the recent release usage values (in total), but I guess that's what you meant with incomplete data."

Yes, that's exactly what I meant. The per-release stuff is incomplete since there are a ton of releases in the usage data (just copied from d.o) that don't actually exist in the p.d.o DB. If I could easily sync the entire {node} table and everything else, I would, but that's a big pain (at least for now). So, p.d.o is showing totals based on the real live data, but only showing per-release rows for all the release nodes at actually exist in the DB.

hass’s picture

But without a full sync we cannot be sure that all counters/sum's are correct, isn't it?

aclight’s picture

I briefly clicked around and don't see problems.

One very minor point. I think it would be better if, instead of sorting in 'asc' order by default in project_usage_overview, we sorted by 'desc' by default.

Presumably, that would be:

-  $header = array(array('field' => 'n.title', 'data' => t('Project'), 'sort' => 'asc'));
+  $header = array(array('field' => 'n.title', 'data' => t('Project'), 'sort' => 'desc'));
dww’s picture

Sadly, that's only for the default sort (on project titles), which we don't want desc, we want those alphabetical. I don't know any way to force tablesort_sql() so that when you click on another column it defaults to desc. :( I think that's just a limitation of the tablesort_sql() API itself. I could be wrong, but I'm 95% sure changing what you mention above wouldn't do it...

However, arguably, the people visiting this page are going to want it to default to sorting by the most recent date, in desc order of usage. As in:

http://project.drupal.org/project/usage?sort=desc&order=Oct+12

That much we could do, but not with the change you propose above -- we'd have to change it so that the first time we append another entry to the $header array after the title column, we added the 'sort' => 'desc'. Then, the page would default to the sort I linked above...

dww’s picture

Like so:
http://project.drupal.org/project/usage
(see attached patch, now deployed on p.d.o)

dww’s picture

Better variable names. The array of things is called "headers" and each entry is now just "header", instead of the "header_row" misnomer (which should have been "column" or "header_column" or something...)

markus_petrux’s picture

Could it be an alternative approach using some kind of jQuery table plugin to sort that on the client?

Edited just to post an example:
http://docs.jquery.com/Plugins/Tablesorter

Pasqualle’s picture

Would it be possible to show the actual rank in the first column?

I like these statistics:
http://sourceforge.net/softwaremap/trove_list.php?form_cat=18

scor’s picture

@markus_petrux that'd fit in a second issue once this has been committed.

dww’s picture

@#174 and #175: Maybe, but only in follow-up patches in other issues.

aclight’s picture

re #171: Yeah, I see what you mean. Eventually this stuff will probably be done with Views, and we'll have much better control there. But for the moment it's most important to get this done rather than make it as pretty as we can possibly make it.

It's looking like my weekend is going to be project*-a-rific, so if this hasn't been committed by then, I'll review the patch itself.

dww’s picture

Status: Needs review » Fixed

Committed to HEAD and DRUPAL-5. Deployed on d.o:
http://drupal.org/project/usage
And there was much rejoicing... ;)

jcnventura’s picture

REJOICE!!

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

Alan D.’s picture

This was so cool for D6, but D7 stats are lost. #1248922: Project usage overview page: Filter by version was the only issue that I found. Any takers?