Closed (fixed)
Project:
Drupal.org infrastructure
Component:
Other
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
3 Nov 2007 at 16:39 UTC
Updated:
11 Oct 2008 at 09:53 UTC
Hello, I'd like to have a running tally of Drupal downloads, including versions of core, and contributed projects.
AWSTATs is currently behind an HTTP Auth. Morbus indicates we can do the following
"so you either want to pass http auth headers to drupal_http_request, or tell your apache/.htaccess to Allow from localhost (presuming your scraper is running on the same box as d.o) and Satisfy any."
We then need to gather data and post to Drupal.org somehow.
Comments
Comment #1
morbus iffAn example of the .htaccess (or httpd.conf) would be something like:
<Location "/awstats">
AuthType Basic
AuthName "awstats super sekrit"
AuthUserFile /usr/local/apache2.2/user.passwd
Require valid-user
Allow from 111.11.11.1 [this is the IP of the box that runs the scraper]
Allow from localhost [sometimes, you JUST need this line; depends on network config]
Order allow,deny [allow those who match; deny everyone else]
Satisfy any [this says to satisfy EITHER the HTTP auth OR the Allow/Deny rules]
</Location>
Also: "amazon: sure. i don't have any examples of http auth with drupal_http_request, but ... there is scraping code in bot_project.module (that scrapes project.module from d.o)... line 121 through 143ish here.
Comment #2
gregglesDrupal5 drupal_http_request cannot to http auth. In Drupal6 it can http://drupal.org/node/182410
Also, note that there is a lot of good discussion on this topic http://lists.drupal.org/pipermail/development/2006-November/020841.html
As far as I know, awstats doesn't include all modules. It stops keeping track of files at some point (the last time I made a report on this it was around 150 downloads a month). Especially now that we've got more projects I'm not sure that awstats can do what we need.
Comment #3
emsearcy commentedI appreciate the suggested apache config, but as usual things are more complicated here because we have a more complex setup; awstats runs on a separate box, that pulls logs from the webnodes, but it pushes the reports to an accessible web server, which is behind Squid so I would have to use X-Forward-For environmental variables instead of Allow/Deny, etc. We can't pull from Apache anyhow (keep reading), so it doesn't matter.
Also, I don't think running this `scraper' from within Drupal (presumably via drupal cron job) is a good idea, it's more robust for me pull via rsync in a system cron job. Basically, I can drop these statistics into the Drupal web NFS space, we should be able to have a module parse the data already on disk. This is similar to how we have a system cron job updating /var/www/api.drupal.org/src/ which is parsed by the site in /var/www/api.drupal.org/htdocs.
Depends. It's true that the reports do not show all the modules, however the compiled awstats data files that are used to build the reports do have this data. It's more efficient to parse these data files than the html reports anyhow.
Comment #4
Amazon commentedEric, could we get a sampl awstats data file?
Comment #5
emsearcy commentedSample is the month of October (auth required). The section you'd want is the last one, between BEGIN_SIDER and END_SIDER (and you can use the POS_SIDER byte offset in your script to seek directly to this section).
Comment #6
Amazon commentedhttp://sourceforge.net/projects/phpawstats/
This script might come in handy for someone who wanted to get a parser running so we dynamically show the latest downloads.
Comment #7
gerhard killesreiter commentedAmazon has now access to awstats.
Comment #8
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.