On cron, Drupal collects statistics on enabled contrib modules and reports this information back to Drupal.org. This gives people an indication of how many sites use a given module. It should do the same for core modules, to give developers an idea of which core modules are enabled and/or used most often.

For the Drupal.org / Project issue corresponding to this update.module issue, see also #1274766: Collect stats on enabled sub-modules, not just projects.

Files: 
CommentFileSizeAuthor
#72 drupal-1036780-72.patch1.35 KBcafuego
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#50 drupal-1036780-50-tests.patch1.61 KBtim.plunkett
FAILED: [[SimpleTest]]: [MySQL] 39,001 pass(es), 2 fail(s), and 0 exception(s).
[ View ]
#50 drupal-1036780-50-combined.patch2.96 KBtim.plunkett
PASSED: [[SimpleTest]]: [MySQL] 39,116 pass(es).
[ View ]
#44 drupal-1036780-44.patch3 KBtim.plunkett
PASSED: [[SimpleTest]]: [MySQL] 35,070 pass(es).
[ View ]
#42 1036780-42.patch3.01 KBMike Wacker
PASSED: [[SimpleTest]]: [MySQL] 35,061 pass(es).
[ View ]
#40 1036780-40.patch1.36 KBMike Wacker
FAILED: [[SimpleTest]]: [MySQL] 35,063 pass(es), 2 fail(s), and 6 exception(s).
[ View ]

Comments

+1 subscribe

subscribe +1 too.

First time looking in depth through update status, of course as I expected this is going to be a bit tricky. Anyone have some thoughts on the path this should take? Right now of course all of the core modules are being lumped in as includes for the "drupal" project, and as such both don't have their own update status info (desired) but also don't get their usage reported (less desired).

This task becomes important as we can use this to gather data about what modules are actually being used for site building so we can make more informed decisions on what should be refactored or removed from a feature perspective.

Adding this to the Framework tag for tracking for this reason.

Component:base system» update.module
Priority:Normal» Minor
Issue tags:+Framework Initiative

Not sure about this. I think we have a pretty good understanding of what is being used from core modules and what's not ourselves.

The usage tracking system only cares about projects, not modules or themes. I doubt it will be elegant to convince it otherwise.

Priority:Minor» Normal

I don't think this is minor. There are a whole lot of Drupal sites out there that "we" aren't involved with :) And there is no substitute for having actual hard data.

On the Update module side of things, this doesn't seem that hard. At admin/reports/updates it already displays the enabled modules/themes per project (and even if it didn't that data is not hard to get) so it seems like it could easily be made to send that data back to Drupal.org. As for the Drupal.org side of things, though, that might be harder.

Over the course of DrupalCon I've heard people claim that various modules can be removed from core because nobody uses them. Since there is currently no way of backing that up with any factual data whatsoever*, I suggest that creating a way of having such data might be fairly important.

Important decisions are being made based on zero data. I'd put this issue on critical if I didn't think it'd be changed back immediately.

* "I don't think..." and "I don't use..." isn't factual data.

Issue tags:+Platform Initiative

My stance on this is pretty clear: If we want numbers for particular uncertain modules, then those modules should be separated from core and re-referenced when packaging the Standard installation profile.

That not only solves missing usage stats, but also many other issues. And it requires close to zero work, neither in core, nor on drupal.org.

Usage stats for modules would be useful for contrib too. Less useful than core but still handy to have.

I agree, this is something we should investigate more - not just core, but many other projects that package 'tiny product feature modules' could use these statistics.

@sun I don't get why you are against it, it makes no sense that you don't want data to help us make more informed decisions. Also I don't like that you consider issues "wont fix" because of a direction, that is far from decided upon.

subscribe

Issue tags:-Framework Initiative

What is the module on the d.o end that the reports get posted to and that does the aggregation?

That's the Project module, specifically this file: http://drupalcode.org/project/project.git/blob/HEAD:/release/project-rel...

And +1 for this. I'm sure a module like Drupal Commerce for example would love to know that only 5% of its users use the "Shipping" component or whatever, when deciding where to prioritize development.

And if anyone's keen to hack on Project module stuff, there's an install profile that gets you a good chunk of the way there at http://drupal.org/project/drupalorg_testing

Title:Drupal.org should collects stats on enabled core modulesDrupal.org should collects stats on enabled sub-modules

This would be a more accurate title. Fixing this for core will also fix it for Drupal Commerce and the like.

I think this is a great idea, not only for core but contrib as well. I don't currently have sub-modules but I might with Artesian and I would love to see how many people are using the various pieces. I also am very curious how many people have Forum enabled compared to how many use Advanced Forum. :)

Michelle

Title:Drupal.org should collects stats on enabled sub-modulesDrupal.org should collects stats on enabled core modules

subscribe

Title:Drupal.org should collects stats on enabled core modulesDrupal.org should collects stats on enabled sub-modules

Fixing title since the change is inaccurate.

Michelle

Not my intention. Must have something to do with leaving a tab opening for a while, refreshing the tab to see new comments, but the form values stayed the same as when I had first opened the tab (ie. before webchick's title change).

It's ok. It's easily fixed. :)

Michelle

Title:Drupal.org should collects stats on enabled sub-modulesDrupal.org should collect stats on enabled sub-modules

Added feature request #1274766: Collect stats on enabled sub-modules, not just projects on project.module to match this one and quick spelling fix in issue title.

Currently everything we record and track for the usage data is in the GET URL that update status requests. drupal.org just analyzes the web cache access logs to figure out the usage from all the various update manager/status clients out there. So, all you'd really need to do here is to bloat that URL when requesting release history for a given project to include all the enabled sub-modules/components. For example, instead of this:

http://updates.drupal.org/release-history/drupal/7.x?site_key=[something...

We'd have something like this:

http://updates.drupal.org/release-history/drupal/7.x?site_key=[something...,...

More or less. The rest of the update manager wouldn't care at all. _update_build_fetch_url() would just have to inspect $project['includes'] (which it should already have passed in as part of the $project associative array) and stuff all the data that into the URLs it generates. Assuming people didn't consider that a privacy violation of some kind...

On the d.o side, we'd have to add smarts to parse this &modules=whatever query in the URL and include that in the processing (all the raw stats are stuffed into mongo and then summaries are computed from there and stored in the DB for display, and then stuffed into Solr for things like sorting the project browsing pages).

Probably the trickiest part is going to be over at #1274766: Collect stats on enabled sub-modules, not just projects to store and display this data in some kind of meaningful way. But yeah, the update manager part of this whole proposal (i.e. this issue) might be about a 3 line patch to _update_build_fetch_url()...

How much can that URL handle? I have nearly 100 projects on one site and well over that if you count each module separately. And some have long names.

Michelle

This is apparently not a straight-forward question, according to http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-.... Looks like ~2000 characters is a baseline limit.

We're still doing separate queries for each *project*. So, we're just talking the maximum number of *modules* included in a single *project*. Core is probably the worst case for that, with close to 40 separate modules in a single project.

@dww So there is a limit to a solution like this? Core is a worse case, but I wonder about other distributions.

Looks like ~2000 characters is a baseline limit.

I wouldn't worry about IE, as that doesn't need to use such a generated URL anyway. It'll depend entirely on the proxies the drupal.org site uses and whether suhosin is installed and configured with a lower limit than the default (which is 100 variables per URL).

With that in mind, 40 modules for a project doesn't seem too outlandish, so ought to be fine. A quick test on my own site, which runs varnish and suhosin, shows that a 2500+ character variable name with a 2500+ character value works fine.

If all else fails, could data be added to the request via custom HTTP headers like say X-Drupal-Statistics: project=core; modules=blog,node,user,system?

Issue summary:View changes

Adding reference to the corresponding d.o issue.

Title:Drupal.org should collect stats on enabled sub-modulesDrupal.org should collect stats on enabled sub-modules and core modules

Issue summary: adding reference to the corresponding d.o / project issue.

On the title, it wasn't clear -at least for me- that this issue included usage stats for core modules. Adding three words for complete clarity.

Like others here, I also think the proposal of this issue would be a really useful feature, as one of the many factors to consider for decision making.

dww wrote:

the update manager part of this whole proposal (i.e. this issue) might be about a 3 line patch to _update_build_fetch_url()...

Even when this is a feature and not a bug/security fix, perhaps those few code lines could be later backported for D7 and D6 sites, since they are intended just for drupal.org use (stats)...

This would greatly help get a more objective idea of which features are used mostly in core. In other words it would also help decisions in issues like: #1273344: Establish heuristics for core feature evaluation

If we made it possible to also collect anonymous stats on various other settings, we wouldn't "fight" over what should be the default out of the box in things like radio boxes, ckeckboxes, enabled/disabled modules and so many more. I'm not talking only core here. Module maintainers should be also able to implement a way to collect usage stats for new features so that they know if they should be on/off by default when something goes stable.

A great example of where this could be useful is the core Overlay module introduced in D7 core for which various opinions are heard here and there about its usefulness. If we had actual stats of its usage (how many sites keep it enabled I mean), then deciding if it should be taken out of core for D8 or not wouldn't require much thinking/debating. If stats showed that the amount of sites using it is small but not that small so to justify moving it out of core, then we would keep shipping core with it, but we would at least have it disabled by default ...so we could spare the majority of users the extra clicks it takes to disable it ;)

+1, I would love to see this data for Ubercart so we know which modules are actually used; at present we have no real idea whether some smaller modules are bug free or just nobody uses them.

Seems to me a small module can collect a whole bunch of useful metrics. I'm thinking something that uses a hook call that lets other modules submit any metrics it wants -- all that a "metrics" module needs to do is collect the metrics reported by other modules and send them up to a server.

I'll try to flesh something out in the next couple of days ... meanwhile:

  • What metrics should be collected?
  • Presumably, these's be gathered in a cron run ... and reported up to a companion server module. Anyone with expertise in the server/sending side of this problem?

http://drupal.org/metrics has some metrics that are tracked about d.o itself.

There's a "cross-site activity" module (http://drupal.org/project/xs_activity) that looks useful. It doesn't look like this has been updated in a long while, but this looks worth a look.

Looks like there's a group on this: http://groups.drupal.org/module-metrics-and-ranking -- no recent posts, though.

Please add ideas here ... a group of us will try to pull something together in time to gather meaningful data for D8 design decisions.

I just want to add in a quick comment to make sure that one concern that many module users might have won't be overlooked: privacy.

Personally, I'll send all my usage statistics back to d.o. so that the developers can have some useful data to prioritize development, but I can easily imagine a few folks really disliking that d.o. tracks what they use.

There is a solution that seems perfectly elegant: in the config section (along with perhaps a nice message displayed), there should be an obvious check-box that allows the user to opt-out of sending their information back to Drupal. The emphasis here is, displayed by the bold, that it should be obvious and big, and it should have a bit of an explanation (and perhaps just a quick note for why participating will help development).

The option should be obvious and easy to turn off.

Another aspect that is less important is whether or not there should be guidelines (in the form of requests?) to track it on production sites or development sites. I don't have an answer to that specifically, but it might give misleading data to developers if the statistics are collected are temporary testing or development sites.

Again, I fully support the overall feature.

I don't think privacy is an issue for 99% of people considering the information is only available to the public in combined stats. The actual report tied to your IP is only available to specific people who need it for troubleshooting. Everyone else just sees X sites use Foo, not the site at IP nn.nn.nn.nnn uses Foo.

I say 99% because there's always going to be someone who says that's not good enough and, well, all I can say is turn off the update status module if it's really a big deal for you.

@sprice, that privacy checkbox already exists in the installer (since we already send data back).

This issue would necessarily expand the data that is sent back, though, and we probably want to minimize the additional data here to only what is strictly necessary. It also raises the question of whether we need to actively "get permission" from sites that already have this module turned on, if they still want to send the information. Since the current wording is vague ("Anonymous information about your site is sent to Drupal.org") and that part wouldn't be changing, maybe no need?

Re: #32-#35: Can we please leave the question of additional metrics and stats out of this issue? That's going to ensure this turns into a giant thread with no actionable tasks and it's going to delay the thing we might be able to agree on (see #23).

For adding more statistics and metrics into what we send back from core to d.o, please see #1439316: Provide means for module maintainers to collect heuristics on certain settings of their modules.

Thanks,
-Derek

But doesn't this issue (including #23) require that we send a small amount of additional information from the site back to Drupal.org? I think that's why the privacy question, at least, is relevant.

Discussing privacy about sending back info about specific modules enabled, yes. Additional metrics and a hook to collect them, no. ;)

Thanks,
-Derek

@dww Re: #36

Good point. For the small amount of information being collected in this issue, there shouldn't be any new privacy issues. Collecting whether a module is used or not is no more an invasion of privacy than it is to collect whether a project is used or not.

However, the privacy issue will become more relevant in #1439316: Provide means for module maintainers to collect heuristics on certain settings of their modules., as we'll be collecting data that is more free-form than whether a module/project is being used.

Status:Active» Needs review
StatusFileSize
new1.36 KB
FAILED: [[SimpleTest]]: [MySQL] 35,063 pass(es), 2 fail(s), and 6 exception(s).
[ View ]

Here's a first draft of the patch. The investigation was trickier than the code changes.

The information returned by update_get_projects() includes a list of enabled modules and themes in $project['includes']. I noticed for the Drupal project this array did not include disabled modules.

Thus, I only needed to make a view tweaks to _update_build_fetch_url() to add that information as an extra URL parameter.

Status:Needs review» Needs work

The last submitted patch, 1036780-40.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new3.01 KB
PASSED: [[SimpleTest]]: [MySQL] 35,061 pass(es).
[ View ]

New patch to fix the broken unit tests. I presume that other tests for this module will ensure that the data returned by update_get_projects() is correct, so updating the unit tests which populate dummy data for the $project variable should suffice for testing.

Status:Needs review» Reviewed & tested by the community

Thanks for pushing this forward, Mike!

Code looks good, and upon visual inspection does what I proposed in #23. The test bot is happy. RTBC for me. Not sure if it makes sense to commit this before anything happens at #1274766: Collect stats on enabled sub-modules, not just projects but then again, it's nice to know what's in core while we're working on the infra and project* parts of this. A D8 commit for this minor change would also help facilitate a D7 (and possibly even D6?) backport so that we could be tracking this data from the "live" versions of core in the wild (once they upgrade to the next point release, of course).

Cheers,
-Derek

StatusFileSize
new3 KB
PASSED: [[SimpleTest]]: [MySQL] 35,070 pass(es).
[ View ]

Removed trailing whitespace, no commit credit please.
Leaving at RTBC.

Great! Glad to see we're getting somewhere with this ;)

Version:8.x-dev» 7.x-dev
Status:Reviewed & tested by the community» Patch (to be ported)

I'm fine with adding this now and having the Drupal.org infra catch up once it's in, since it's not really much of a change to core. Committed/pushed to 8.x.

Seems reasonable for backport and that information would be very interesting to have, so moving back.

I think its extremely valuable to have this data for actual running sites, because that means we can use it for D8 decision making process

With the danger of opening a can of worms, should we inform users about this for the D7 patch - because we are going to be tracking new data, that was previously not tracked? Not necessarily on new installs, but on updates.

That can of worms was already opened above :) See #35, #39, etc.

I think this patch would finally send the information that people think is already being sent. Ie: people think module information is sent currently, but the code in fact only sends project information. I expect it's a non-issue, so a note in README will probably suffice.

Status:Patch (to be ported)» Needs review
Issue tags:+needs backport to D7
StatusFileSize
new2.96 KB
PASSED: [[SimpleTest]]: [MySQL] 39,116 pass(es).
[ View ]
new1.61 KB
FAILED: [[SimpleTest]]: [MySQL] 39,001 pass(es), 2 fail(s), and 0 exception(s).
[ View ]

Rerolled, and split up just to be sure.

Status:Needs review» Reviewed & tested by the community

#50 is RTBC for 7.x. Identical to the 8.x version, and does what it needs to.

Thanks,
-Derek

#50: drupal-1036780-50-combined.patch queued for re-testing.

Not sure if it makes sense to commit this before anything happens at #1274766: Collect stats on enabled sub-modules, not just projects but then again, it's nice to know what's in core while we're working on the infra and project* parts of this. A D8 commit for this minor change would also help facilitate a D7 (and possibly even D6?) backport so that we could be tracking this data from the "live" versions of core in the wild (once they upgrade to the next point release, of course).

Any further thoughts on the best timing for committing this to D6/D7 specifically?

One complication is that I think when this does get released in D6/D7, we're going to need to mention it in the release notes (and probably the release announcement too). Just to make sure no one accuses Drupal of doing some extra spying on their site without warning them :) So, it would be nice if we added it to D6/D7 right around the time #1274766: Collect stats on enabled sub-modules, not just projects was ready to go (or shortly before that), because then the announcement could have something useful to point to.

But if we need to add it to D7 earlier for whatever reason, I think that would be fine too.

Nearly all of my Drupal attention is on the d.o D7 upgrade at this point, along with most of the rest of the infra/d.o teams. While some other d.o features are being rolled out now, I don't have the bandwidth to deal with this. bdragon might, and he's the main one maintaining the d.o infra for usage stats. He's really the main person to ask. I'm happy to help out in small ways to unstick things whenever I can, but I can't be the one driving this forward.

The good news is that our usage infra works via parsing logs, so even if we don't upgrade the parsing jobs and actually aggregate and display the data right away, as soon as this hits an official release, d.o will be collecting stats. bdragon and/or nnewton would need to say how long we keep historical logs that we could go back to reparse, but it's at least feasible that we could gather data for months before displaying it. Of course, if we announce that we're gathering this new data (and I agree we should mention it in the release notes), people will probably expect that they can see the data. But from a technical standpoint, we don't need any new plumbing in place on d.o right away to still make this worth doing.

Hope that helps...

Thanks,
-Derek

The good news is that our usage infra works via parsing logs, so even if we don't upgrade the parsing jobs and actually aggregate and display the data right away, as soon as this hits an official release, d.o will be collecting stats.

That's useful to know, thanks, and seems like a good reason to get this in sooner rather than later.

Sounds like checking with @bdragon or @nnewton would be a good next step for someone to do.

In the meantime, we're not quite under thresholds (although we're close) so I probably shouldn't commit it to D7 right this moment anyway.

Issue tags:+Favorite-of-Dries

Tagging.

Another issue here is that we should probably decide beforehand if we're going to try getting this patch into D6 also. Because if we are, we should really release them to D6 and D7 at the same time (to make the announcement simpler).

Not sure if it's worth the effort to bother with D6 or not, but the possibility was mentioned above.

I think it's worthwhile porting the patch to 6.x and including it on there as well.

FYI: I got bdragon to reply at #1274766: Collect stats on enabled sub-modules, not just projects so it's on his radar and he's got the issue assigned to himself now.

See also #1627676: Display stats on enabled components (e.g. modules included in a project) to discuss where/how to display this data once we're collecting and storing it on d.o.

Cheers,
-Derek

While we're now under thresholds, David has requested help on resolving release blockers, so I don't feel comfortable committing 7.x feature patches until that happens.

Assigned:Unassigned» Gábor Hojtsy
Issue tags:+needs backport to D6

So we're in a weird situation here where we ideally want to release this in Drupal 6 and Drupal 7 at the same time, but don't want to wait around for Drupal 6 forever either.

I think what I should do is commit this very soon after the Drupal 7.15 release. That will give it quite a while before Drupal 7.16 comes out, and plenty of time to try backporting this to Drupal 6 in the meantime. If it doesn't happen by then, we'll just go with Drupal 7 only for now.

Temporarily assigning this to Gábor to see if he has any feedback on this plan, or on the feasibility of doing this patch at all in Drupal 6. (Any Drupal 6 version of this patch will definitely require some manual testing of its own...)

Not sure why do we need to coordinate the release of this change, the server would need to handle old clients forever anyway, so it needs to be ready for clients not updated. I'm fine with the plan to a commit after the next D7 release and then try to coordinate with D6 if we want to.

Back to David to commit it after 7.15

7.15 is out now ... right?

Yes, I am sure he will get to it.

Version:7.x-dev» 6.x-dev
Assigned:David_Rothstein» Unassigned
Status:Reviewed & tested by the community» Patch (to be ported)

Yup, I took a break for a while :) Technically, I shouldn't commit it now since we're a tiny bit over thresholds, but this has been RTBC for a while and only was on hold for non-technical reasons so I think it's OK to make an exception...

Committed to 7.x via http://drupalcode.org/project/drupal.git/commit/c6200e8 - thanks everyone!

Moving down to Drupal 6 for possible backport. I think we still have a fair amount of time before the next release so hopefully we can release it in both versions at the same time.

This will definitely go in the release notes - not sure if we want a change notification also (do Drupal 7 site builders actually look at those)?

Inventing a new tag as well... since I think this will require a special call-out in the release announcement (not just the technical release notes).

Drupal 7.16 was a security release only, so this issue is now scheduled for Drupal 7.17 instead.

Fixing tags accordingly.

I assume this is in already?

Yup, see the mention of it at http://drupal.org/drupal-7.17

Status:Patch (to be ported)» Needs review
StatusFileSize
new1.35 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

Backported the non-test code from the patch in #50 to Drupal 6. Apart from line numbers, it's identical.

Are there any plans to collect feature usage? I'm highly interrested how much specific features are used or not to decide if the feature need to be kept for future versions or not. This must not depend on modules.

I think features are usually included in drupal.org downloads as contrib modules, therefore with usage stats.

Part of them in the Features Package category, but not all (e.g., Commons features are not in that category currently, just as normal modules).

I don't think that's what he was asking. I think he was asking about features that are included in modules, not Feature modules that are distributed from Drupal.org.

hass, afaik, there aren't any such plans, but I could be wrong.

I mean specific settings used in a module that allow me as a maintainer to identify how my modules are used and what functionality, not the feature modules. It's not about every setting, i need to be able to add a hook or so that will be called and that update module logs the used feature.

I believe that #1439316: Provide means for module maintainers to collect heuristics on certain settings of their modules. is what you're looking for. I've filed that one about a little over a year ago ;)

The respective issue for core is #1273344: Establish heuristics for core feature evaluation

If this was rolled back into D6 - that would mean we'd get a huge amount of data from the installed base out there right? That would be helpful for maintainers wondering whether to abandon old modules, or port them to D8.

As far as I can tell, D7 sites are sending the data, but nothing is collecting it at the other end until #1274766: Collect stats on enabled sub-modules, not just projects is implemented and deployed to drupal.org infrastructure. We then need #1627676: Display stats on enabled components (e.g. modules included in a project) for anyone other than the infrastructure team to be able to access the results.

Issue summary:View changes

Minor clarification (Project module issue, used by Drupal.org).