On cron, Drupal collects statistics on enabled contrib modules and reports this information back to Drupal.org. This gives people an indication of how many sites use a given module. It should do the same for core modules, to give developers an idea of which core modules are enabled and/or used most often.
For the Drupal.org / Project issue corresponding to this update.module issue, see also #1274766: Collect stats on enabled sub-modules, not just projects.
Patches have already been committed to D7 and D8, the final task here is to review and RTBC the backport to D6 #72. This is necessary to get a full picture of the current installed base as we move to release D8. The data will be very valuable to core and contrib module maintainers. We also still need Drupal.org to listen for and record the data being sent back.
Comment | File | Size | Author |
---|---|---|---|
#72 | drupal-1036780-72.patch | 1.35 KB | cafuego |
#50 | drupal-1036780-50-tests.patch | 1.61 KB | tim.plunkett |
#50 | drupal-1036780-50-combined.patch | 2.96 KB | tim.plunkett |
#44 | drupal-1036780-44.patch | 3 KB | tim.plunkett |
#42 | 1036780-42.patch | 3.01 KB | Mike Wacker |
Comments
Comment #1
jerdavis+1 subscribe
Comment #2
emmajane CreditAttribution: emmajane commentedsubscribe +1 too.
Comment #3
jerdavisFirst time looking in depth through update status, of course as I expected this is going to be a bit tricky. Anyone have some thoughts on the path this should take? Right now of course all of the core modules are being lumped in as includes for the "drupal" project, and as such both don't have their own update status info (desired) but also don't get their usage reported (less desired).
This task becomes important as we can use this to gather data about what modules are actually being used for site building so we can make more informed decisions on what should be refactored or removed from a feature perspective.
Adding this to the Framework tag for tracking for this reason.
Comment #4
sunNot sure about this. I think we have a pretty good understanding of what is being used from core modules and what's not ourselves.
Comment #5
moshe weitzman CreditAttribution: moshe weitzman commentedThe usage tracking system only cares about projects, not modules or themes. I doubt it will be elegant to convince it otherwise.
Comment #6
David_Rothstein CreditAttribution: David_Rothstein commentedI don't think this is minor. There are a whole lot of Drupal sites out there that "we" aren't involved with :) And there is no substitute for having actual hard data.
On the Update module side of things, this doesn't seem that hard. At admin/reports/updates it already displays the enabled modules/themes per project (and even if it didn't that data is not hard to get) so it seems like it could easily be made to send that data back to Drupal.org. As for the Drupal.org side of things, though, that might be harder.
Comment #7
cafuego CreditAttribution: cafuego commentedOver the course of DrupalCon I've heard people claim that various modules can be removed from core because nobody uses them. Since there is currently no way of backing that up with any factual data whatsoever*, I suggest that creating a way of having such data might be fairly important.
Important decisions are being made based on zero data. I'd put this issue on critical if I didn't think it'd be changed back immediately.
* "I don't think..." and "I don't use..." isn't factual data.
Comment #8
sunMy stance on this is pretty clear: If we want numbers for particular uncertain modules, then those modules should be separated from core and re-referenced when packaging the Standard installation profile.
That not only solves missing usage stats, but also many other issues. And it requires close to zero work, neither in core, nor on drupal.org.
Comment #9
catchUsage stats for modules would be useful for contrib too. Less useful than core but still handy to have.
Comment #10
Bojhan CreditAttribution: Bojhan commentedI agree, this is something we should investigate more - not just core, but many other projects that package 'tiny product feature modules' could use these statistics.
@sun I don't get why you are against it, it makes no sense that you don't want data to help us make more informed decisions. Also I don't like that you consider issues "wont fix" because of a direction, that is far from decided upon.
Comment #11
karschsp CreditAttribution: karschsp commentedsubscribe
Comment #12
sunComment #13
cafuego CreditAttribution: cafuego commentedWhat is the module on the d.o end that the reports get posted to and that does the aggregation?
Comment #14
webchickThat's the Project module, specifically this file: http://drupalcode.org/project/project.git/blob/HEAD:/release/project-rel...
And +1 for this. I'm sure a module like Drupal Commerce for example would love to know that only 5% of its users use the "Shipping" component or whatever, when deciding where to prioritize development.
Comment #15
webchickAnd if anyone's keen to hack on Project module stuff, there's an install profile that gets you a good chunk of the way there at http://drupal.org/project/drupalorg_testing
Comment #16
webchickThis would be a more accurate title. Fixing this for core will also fix it for Drupal Commerce and the like.
Comment #17
MichelleI think this is a great idea, not only for core but contrib as well. I don't currently have sub-modules but I might with Artesian and I would love to see how many people are using the various pieces. I also am very curious how many people have Forum enabled compared to how many use Advanced Forum. :)
Michelle
Comment #18
bryancasler CreditAttribution: bryancasler commentedsubscribe
Comment #19
MichelleFixing title since the change is inaccurate.
Michelle
Comment #20
bryancasler CreditAttribution: bryancasler commentedNot my intention. Must have something to do with leaving a tab opening for a while, refreshing the tab to see new comments, but the form values stayed the same as when I had first opened the tab (ie. before webchick's title change).
Comment #21
MichelleIt's ok. It's easily fixed. :)
Michelle
Comment #22
cafuego CreditAttribution: cafuego commentedAdded feature request #1274766: Collect stats on enabled sub-modules, not just projects on project.module to match this one and quick spelling fix in issue title.
Comment #23
dwwCurrently everything we record and track for the usage data is in the GET URL that update status requests. drupal.org just analyzes the web cache access logs to figure out the usage from all the various update manager/status clients out there. So, all you'd really need to do here is to bloat that URL when requesting release history for a given project to include all the enabled sub-modules/components. For example, instead of this:
http://updates.drupal.org/release-history/drupal/7.x?site_key=[something...
We'd have something like this:
http://updates.drupal.org/release-history/drupal/7.x?site_key=[something...,...
More or less. The rest of the update manager wouldn't care at all. _update_build_fetch_url() would just have to inspect $project['includes'] (which it should already have passed in as part of the $project associative array) and stuff all the data that into the URLs it generates. Assuming people didn't consider that a privacy violation of some kind...
On the d.o side, we'd have to add smarts to parse this &modules=whatever query in the URL and include that in the processing (all the raw stats are stuffed into mongo and then summaries are computed from there and stored in the DB for display, and then stuffed into Solr for things like sorting the project browsing pages).
Probably the trickiest part is going to be over at #1274766: Collect stats on enabled sub-modules, not just projects to store and display this data in some kind of meaningful way. But yeah, the update manager part of this whole proposal (i.e. this issue) might be about a 3 line patch to _update_build_fetch_url()...
Comment #24
MichelleHow much can that URL handle? I have nearly 100 projects on one site and well over that if you count each module separately. And some have long names.
Michelle
Comment #25
webchickThis is apparently not a straight-forward question, according to http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-.... Looks like ~2000 characters is a baseline limit.
Comment #26
dwwWe're still doing separate queries for each *project*. So, we're just talking the maximum number of *modules* included in a single *project*. Core is probably the worst case for that, with close to 40 separate modules in a single project.
Comment #27
Bojhan CreditAttribution: Bojhan commented@dww So there is a limit to a solution like this? Core is a worse case, but I wonder about other distributions.
Comment #28
cafuego CreditAttribution: cafuego commentedI wouldn't worry about IE, as that doesn't need to use such a generated URL anyway. It'll depend entirely on the proxies the drupal.org site uses and whether suhosin is installed and configured with a lower limit than the default (which is 100 variables per URL).
With that in mind, 40 modules for a project doesn't seem too outlandish, so ought to be fine. A quick test on my own site, which runs varnish and suhosin, shows that a 2500+ character variable name with a 2500+ character value works fine.
If all else fails, could data be added to the request via custom HTTP headers like say
X-Drupal-Statistics: project=core; modules=blog,node,user,system
?Comment #28.0
juan_g CreditAttribution: juan_g commentedAdding reference to the corresponding d.o issue.
Comment #29
juan_g CreditAttribution: juan_g commentedIssue summary: adding reference to the corresponding d.o / project issue.
On the title, it wasn't clear -at least for me- that this issue included usage stats for core modules. Adding three words for complete clarity.
Like others here, I also think the proposal of this issue would be a really useful feature, as one of the many factors to consider for decision making.
dww wrote:
Even when this is a feature and not a bug/security fix, perhaps those few code lines could be later backported for D7 and D6 sites, since they are intended just for drupal.org use (stats)...
Comment #30
klonosThis would greatly help get a more objective idea of which features are used mostly in core. In other words it would also help decisions in issues like: #1273344: Establish heuristics for core feature evaluation
If we made it possible to also collect anonymous stats on various other settings, we wouldn't "fight" over what should be the default out of the box in things like radio boxes, ckeckboxes, enabled/disabled modules and so many more. I'm not talking only core here. Module maintainers should be also able to implement a way to collect usage stats for new features so that they know if they should be on/off by default when something goes stable.
A great example of where this could be useful is the core Overlay module introduced in D7 core for which various opinions are heard here and there about its usefulness. If we had actual stats of its usage (how many sites keep it enabled I mean), then deciding if it should be taken out of core for D8 or not wouldn't require much thinking/debating. If stats showed that the amount of sites using it is small but not that small so to justify moving it out of core, then we would keep shipping core with it, but we would at least have it disabled by default ...so we could spare the majority of users the extra clicks it takes to disable it ;)
Comment #31
longwave+1, I would love to see this data for Ubercart so we know which modules are actually used; at present we have no real idea whether some smaller modules are bug free or just nobody uses them.
Comment #32
Jono CreditAttribution: Jono commentedSeems to me a small module can collect a whole bunch of useful metrics. I'm thinking something that uses a hook call that lets other modules submit any metrics it wants -- all that a "metrics" module needs to do is collect the metrics reported by other modules and send them up to a server.
I'll try to flesh something out in the next couple of days ... meanwhile:
http://drupal.org/metrics has some metrics that are tracked about d.o itself.
There's a "cross-site activity" module (http://drupal.org/project/xs_activity) that looks useful. It doesn't look like this has been updated in a long while, but this looks worth a look.
Looks like there's a group on this: http://groups.drupal.org/module-metrics-and-ranking -- no recent posts, though.
Please add ideas here ... a group of us will try to pull something together in time to gather meaningful data for D8 design decisions.
Comment #33
sprice CreditAttribution: sprice commentedI just want to add in a quick comment to make sure that one concern that many module users might have won't be overlooked: privacy.
Personally, I'll send all my usage statistics back to d.o. so that the developers can have some useful data to prioritize development, but I can easily imagine a few folks really disliking that d.o. tracks what they use.
There is a solution that seems perfectly elegant: in the config section (along with perhaps a nice message displayed), there should be an obvious check-box that allows the user to opt-out of sending their information back to Drupal. The emphasis here is, displayed by the bold, that it should be obvious and big, and it should have a bit of an explanation (and perhaps just a quick note for why participating will help development).
The option should be obvious and easy to turn off.
Another aspect that is less important is whether or not there should be guidelines (in the form of requests?) to track it on production sites or development sites. I don't have an answer to that specifically, but it might give misleading data to developers if the statistics are collected are temporary testing or development sites.
Again, I fully support the overall feature.
Comment #34
MichelleI don't think privacy is an issue for 99% of people considering the information is only available to the public in combined stats. The actual report tied to your IP is only available to specific people who need it for troubleshooting. Everyone else just sees X sites use Foo, not the site at IP nn.nn.nn.nnn uses Foo.
I say 99% because there's always going to be someone who says that's not good enough and, well, all I can say is turn off the update status module if it's really a big deal for you.
Comment #35
David_Rothstein CreditAttribution: David_Rothstein commented@sprice, that privacy checkbox already exists in the installer (since we already send data back).
This issue would necessarily expand the data that is sent back, though, and we probably want to minimize the additional data here to only what is strictly necessary. It also raises the question of whether we need to actively "get permission" from sites that already have this module turned on, if they still want to send the information. Since the current wording is vague ("Anonymous information about your site is sent to Drupal.org") and that part wouldn't be changing, maybe no need?
Comment #36
dwwRe: #32-#35: Can we please leave the question of additional metrics and stats out of this issue? That's going to ensure this turns into a giant thread with no actionable tasks and it's going to delay the thing we might be able to agree on (see #23).
For adding more statistics and metrics into what we send back from core to d.o, please see #1439316: Provide means for module maintainers to collect heuristics on certain settings of their modules.
Thanks,
-Derek
Comment #37
David_Rothstein CreditAttribution: David_Rothstein commentedBut doesn't this issue (including #23) require that we send a small amount of additional information from the site back to Drupal.org? I think that's why the privacy question, at least, is relevant.
Comment #38
dwwDiscussing privacy about sending back info about specific modules enabled, yes. Additional metrics and a hook to collect them, no. ;)
Thanks,
-Derek
Comment #39
Mike Wacker CreditAttribution: Mike Wacker commented@dww Re: #36
Good point. For the small amount of information being collected in this issue, there shouldn't be any new privacy issues. Collecting whether a module is used or not is no more an invasion of privacy than it is to collect whether a project is used or not.
However, the privacy issue will become more relevant in #1439316: Provide means for module maintainers to collect heuristics on certain settings of their modules., as we'll be collecting data that is more free-form than whether a module/project is being used.
Comment #40
Mike Wacker CreditAttribution: Mike Wacker commentedHere's a first draft of the patch. The investigation was trickier than the code changes.
The information returned by
update_get_projects()
includes a list of enabled modules and themes in$project['includes']
. I noticed for the Drupal project this array did not include disabled modules.Thus, I only needed to make a view tweaks to
_update_build_fetch_url()
to add that information as an extra URL parameter.Comment #42
Mike Wacker CreditAttribution: Mike Wacker commentedNew patch to fix the broken unit tests. I presume that other tests for this module will ensure that the data returned by
update_get_projects()
is correct, so updating the unit tests which populate dummy data for the$project
variable should suffice for testing.Comment #43
dwwThanks for pushing this forward, Mike!
Code looks good, and upon visual inspection does what I proposed in #23. The test bot is happy. RTBC for me. Not sure if it makes sense to commit this before anything happens at #1274766: Collect stats on enabled sub-modules, not just projects but then again, it's nice to know what's in core while we're working on the infra and project* parts of this. A D8 commit for this minor change would also help facilitate a D7 (and possibly even D6?) backport so that we could be tracking this data from the "live" versions of core in the wild (once they upgrade to the next point release, of course).
Cheers,
-Derek
Comment #44
tim.plunkettRemoved trailing whitespace, no commit credit please.
Leaving at RTBC.
Comment #45
klonosGreat! Glad to see we're getting somewhere with this ;)
Comment #46
catchI'm fine with adding this now and having the Drupal.org infra catch up once it's in, since it's not really much of a change to core. Committed/pushed to 8.x.
Seems reasonable for backport and that information would be very interesting to have, so moving back.
Comment #47
Bojhan CreditAttribution: Bojhan commentedI think its extremely valuable to have this data for actual running sites, because that means we can use it for D8 decision making process
With the danger of opening a can of worms, should we inform users about this for the D7 patch - because we are going to be tracking new data, that was previously not tracked? Not necessarily on new installs, but on updates.
Comment #48
David_Rothstein CreditAttribution: David_Rothstein commentedThat can of worms was already opened above :) See #35, #39, etc.
Comment #49
cafuego CreditAttribution: cafuego commentedI think this patch would finally send the information that people think is already being sent. Ie: people think module information is sent currently, but the code in fact only sends project information. I expect it's a non-issue, so a note in README will probably suffice.
Comment #50
tim.plunkettRerolled, and split up just to be sure.
Comment #51
dww#50 is RTBC for 7.x. Identical to the 8.x version, and does what it needs to.
Thanks,
-Derek
Comment #52
David_Rothstein CreditAttribution: David_Rothstein commented#50: drupal-1036780-50-combined.patch queued for re-testing.
Comment #53
David_Rothstein CreditAttribution: David_Rothstein commentedAny further thoughts on the best timing for committing this to D6/D7 specifically?
One complication is that I think when this does get released in D6/D7, we're going to need to mention it in the release notes (and probably the release announcement too). Just to make sure no one accuses Drupal of doing some extra spying on their site without warning them :) So, it would be nice if we added it to D6/D7 right around the time #1274766: Collect stats on enabled sub-modules, not just projects was ready to go (or shortly before that), because then the announcement could have something useful to point to.
But if we need to add it to D7 earlier for whatever reason, I think that would be fine too.
Comment #54
dwwNearly all of my Drupal attention is on the d.o D7 upgrade at this point, along with most of the rest of the infra/d.o teams. While some other d.o features are being rolled out now, I don't have the bandwidth to deal with this. bdragon might, and he's the main one maintaining the d.o infra for usage stats. He's really the main person to ask. I'm happy to help out in small ways to unstick things whenever I can, but I can't be the one driving this forward.
The good news is that our usage infra works via parsing logs, so even if we don't upgrade the parsing jobs and actually aggregate and display the data right away, as soon as this hits an official release, d.o will be collecting stats. bdragon and/or nnewton would need to say how long we keep historical logs that we could go back to reparse, but it's at least feasible that we could gather data for months before displaying it. Of course, if we announce that we're gathering this new data (and I agree we should mention it in the release notes), people will probably expect that they can see the data. But from a technical standpoint, we don't need any new plumbing in place on d.o right away to still make this worth doing.
Hope that helps...
Thanks,
-Derek
Comment #55
David_Rothstein CreditAttribution: David_Rothstein commentedThat's useful to know, thanks, and seems like a good reason to get this in sooner rather than later.
Sounds like checking with @bdragon or @nnewton would be a good next step for someone to do.
In the meantime, we're not quite under thresholds (although we're close) so I probably shouldn't commit it to D7 right this moment anyway.
Comment #56
Dries CreditAttribution: Dries commentedTagging.
Comment #57
Bojhan CreditAttribution: Bojhan commented...
Comment #58
David_Rothstein CreditAttribution: David_Rothstein commentedAnother issue here is that we should probably decide beforehand if we're going to try getting this patch into D6 also. Because if we are, we should really release them to D6 and D7 at the same time (to make the announcement simpler).
Not sure if it's worth the effort to bother with D6 or not, but the possibility was mentioned above.
Comment #59
cafuego CreditAttribution: cafuego commentedI think it's worthwhile porting the patch to 6.x and including it on there as well.
Comment #60
dwwFYI: I got bdragon to reply at #1274766: Collect stats on enabled sub-modules, not just projects so it's on his radar and he's got the issue assigned to himself now.
See also #1627676: Display stats on enabled components (e.g. modules included in a project) to discuss where/how to display this data once we're collecting and storing it on d.o.
Cheers,
-Derek
Comment #61
webchickWhile we're now under thresholds, David has requested help on resolving release blockers, so I don't feel comfortable committing 7.x feature patches until that happens.
Comment #62
David_Rothstein CreditAttribution: David_Rothstein commentedSo we're in a weird situation here where we ideally want to release this in Drupal 6 and Drupal 7 at the same time, but don't want to wait around for Drupal 6 forever either.
I think what I should do is commit this very soon after the Drupal 7.15 release. That will give it quite a while before Drupal 7.16 comes out, and plenty of time to try backporting this to Drupal 6 in the meantime. If it doesn't happen by then, we'll just go with Drupal 7 only for now.
Temporarily assigning this to Gábor to see if he has any feedback on this plan, or on the feasibility of doing this patch at all in Drupal 6. (Any Drupal 6 version of this patch will definitely require some manual testing of its own...)
Comment #63
Gábor HojtsyNot sure why do we need to coordinate the release of this change, the server would need to handle old clients forever anyway, so it needs to be ready for clients not updated. I'm fine with the plan to a commit after the next D7 release and then try to coordinate with D6 if we want to.
Comment #64
Bojhan CreditAttribution: Bojhan commentedBack to David to commit it after 7.15
Comment #65
kattekrab CreditAttribution: kattekrab commented7.15 is out now ... right?
Comment #66
Bojhan CreditAttribution: Bojhan commentedYes, I am sure he will get to it.
Comment #67
David_Rothstein CreditAttribution: David_Rothstein commentedYup, I took a break for a while :) Technically, I shouldn't commit it now since we're a tiny bit over thresholds, but this has been RTBC for a while and only was on hold for non-technical reasons so I think it's OK to make an exception...
Committed to 7.x via http://drupalcode.org/project/drupal.git/commit/c6200e8 - thanks everyone!
Moving down to Drupal 6 for possible backport. I think we still have a fair amount of time before the next release so hopefully we can release it in both versions at the same time.
This will definitely go in the release notes - not sure if we want a change notification also (do Drupal 7 site builders actually look at those)?
Comment #68
David_Rothstein CreditAttribution: David_Rothstein commentedInventing a new tag as well... since I think this will require a special call-out in the release announcement (not just the technical release notes).
Comment #69
David_Rothstein CreditAttribution: David_Rothstein commentedDrupal 7.16 was a security release only, so this issue is now scheduled for Drupal 7.17 instead.
Fixing tags accordingly.
Comment #70
Bojhan CreditAttribution: Bojhan commentedI assume this is in already?
Comment #71
David_Rothstein CreditAttribution: David_Rothstein commentedYup, see the mention of it at http://drupal.org/drupal-7.17
Comment #72
cafuego CreditAttribution: cafuego commentedBackported the non-test code from the patch in #50 to Drupal 6. Apart from line numbers, it's identical.
Comment #73
hass CreditAttribution: hass commentedAre there any plans to collect feature usage? I'm highly interrested how much specific features are used or not to decide if the feature need to be kept for future versions or not. This must not depend on modules.
Comment #74
juan_g CreditAttribution: juan_g commentedI think features are usually included in drupal.org downloads as contrib modules, therefore with usage stats.
Part of them in the Features Package category, but not all (e.g., Commons features are not in that category currently, just as normal modules).
Comment #75
cweagansI don't think that's what he was asking. I think he was asking about features that are included in modules, not Feature modules that are distributed from Drupal.org.
hass, afaik, there aren't any such plans, but I could be wrong.
Comment #76
hass CreditAttribution: hass commentedI mean specific settings used in a module that allow me as a maintainer to identify how my modules are used and what functionality, not the feature modules. It's not about every setting, i need to be able to add a hook or so that will be called and that update module logs the used feature.
Comment #77
klonosI believe that #1439316: Provide means for module maintainers to collect heuristics on certain settings of their modules. is what you're looking for. I've filed that one about a little over a year ago ;)
The respective issue for core is #1273344: Establish heuristics for core feature evaluation
Comment #78
kattekrab CreditAttribution: kattekrab commentedIf this was rolled back into D6 - that would mean we'd get a huge amount of data from the installed base out there right? That would be helpful for maintainers wondering whether to abandon old modules, or port them to D8.
Comment #79
longwaveAs far as I can tell, D7 sites are sending the data, but nothing is collecting it at the other end until #1274766: Collect stats on enabled sub-modules, not just projects is implemented and deployed to drupal.org infrastructure. We then need #1627676: Display stats on enabled components (e.g. modules included in a project) for anyone other than the infrastructure team to be able to access the results.
Comment #79.0
longwaveMinor clarification (Project module issue, used by Drupal.org).
Comment #80
kattekrab CreditAttribution: kattekrab commentedIt would be great if we could revisit this issue and start gathering this data on core module use. A lot has changed in 2 years though, so the patch will need a re-roll.
And the associated
[edit] I don't know what happened there... I blame DrupalCon and an addled brain.
Going back through the thread, it looks as though patches were committed for Drupal 8 and 7, but I'm not at all sure about whether they made it in to D6, I think not.
I think the main remaining problem here is that Drupal.org still isn't equipped to actually collect this info. Right? Wrong?
Comment #81
kattekrab CreditAttribution: kattekrab commentedAdded some related issues, and now I see there is a D6 backport patch at #72
Whilst D6 is nearing sunset, I still think the data we could gather from the installed base could be very valuable.
Comment #82
xjmComment #84
Alan D. CreditAttribution: Alan D. commentedOld thread, not sure if still relevant, but the corresponding Project issue is set at 7.x, so bumping version so this isn't lost.
Comment #85
kattekrab CreditAttribution: kattekrab commentedIt's a real shame this never got committed to D6.
That data would have been really useful for future roadmap work, but now we'll never know.
Comment #87
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedThis already happened in both Drupal 7 and 8. #1627676: Display stats on enabled components (e.g. modules included in a project) is the followup issue for actually doing something with the data...
Comment #88
xjmClosed #2297143: Move module usage statistics out of Update module as a duplicate.