http://drupal.org/project/usage is currently showing Oct 19th data (as in, the week of Oct 19 - Oct 25), but there's no data yet, so the entire column is 0. Seems like some kind of fence-post bug with how soon we decide to switch over to the latest data, vs. how soon the latest data has actually been processed by cron. Perhaps due to timezone weirdness? See attached screenshot of what I'm currently seeing (2008-10-25 19:13 GMT-700). I don't have time now to carefully dig in to the code and try to debug it...

Comments

aclight’s picture

It seems to me that this is due to the code in project_usage_cron(). It appears that we're processing the weekly usage data once per week, but not necessairly at the beginning of a new week/end of the last week. But since the weekly data stored in the database always starts at the beginning of the week (midnight on Sunday), and since we probably last caused the weekly tables to be regenerated sometime on Thursday or Friday, whenever we unveiled project_usage on d.o, that's why we have the offset.

  // Figure out if it's been a week since we did weekly stats. If the weekly
  // data has never been processed go back only as far as there is daily data.
  $default = $now - variable_get('project_usage_life_daily', 4 * PROJECT_USAGE_WEEK);
  $last_weekly = variable_get('project_usage_last_weekly', $default);
  if ($last_weekly <= ($now - PROJECT_USAGE_WEEK)) {
    project_usage_process_weekly($last_weekly);
    variable_set('project_usage_last_weekly', $now);
  }

So, to solve the problem temporarily, we just force project_usage_cron() to recalculate the week data by blasting away the project_usage_last_weekly variable.

For a more long term solution, I would say that we'd want to do something like this in project_usage_cron(). Note that I have not tested this and haven't carefully made sure that this gets us what we want :)

  // Figure out if a new week has started since we did weekly stats. If the
  // weekly data has never been processed go back only as far as there is
  // daily data.
  $default = $now - variable_get('project_usage_life_daily', 4 * PROJECT_USAGE_WEEK);
  $last_weekly = variable_get('project_usage_last_weekly', $default);
  if ($last_weekly <= (project_usage_weekly_timestamp($now, 0))) {
    project_usage_process_weekly($last_weekly);
    variable_set('project_usage_last_weekly', $now);
  }

Basically, the idea is that we want to process weekly data not if if's been >= 1 week since we last did so, but if a new week has started since we last did so.

hunmonk’s picture

Status: Active » Postponed (maintainer needs more info)

ok, i set project_usage_last_weekly to 0 and cleared the variable_cache, so that should trigger another run of the weekly processing. if that doesn't fix the problem, then at least we know it's not w/ this code.

hunmonk’s picture

ok, looks like the weekly usage processing code ran, and we still have all the zeros :(

is this an issue w/ the page cache for the summary page??

hunmonk’s picture

Status: Postponed (maintainer needs more info) » Active

hrm. now it seems fine. it may have been a temporary caching issue somewhere, but that bump seems to have gotten the weekly usage data processed. so i guess the next step is to try something along the lines of aclight's proposed fix above.

dww’s picture

Nice detective work, aclight!

The usage pages are cached independently of the page cache (so that we cache them for auth users, too). It's {cache_project_usage} and I had just cleared that manually. Indeed, now things are fine.

Indeed, next step is to consider aclight's proposed solution or look into other ways to make sure that we've got the data when we think we do. I've gotta run right now, but I'll be back later today to work on this and other things...

Thanks, folks.

dww’s picture

Assigned: Unassigned » dww
Status: Active » Needs work
StatusFileSize
new2.69 KB

aclight and I just had a very fruitful conversation in IRC about this. Here's what we came up with:

A) The root of this bug is that the project_usage code can't make up its mind about if it cares about now() or the last time a "usage week" started.

B) At first, I wanted to fix (A) by not storing the timestamp when we last processed the weekly data, but rather the timestamp of the beginning of the last usage week that we already processed. Then, the code in cron() would be:

  $current_week = project_usage_weekly_timestamp($now, 0);
  $last_week_processed = variable_get('project_usage_last_week_processed', 0);
  if ($current_week > $last_week_processed) {
    project_usage_process_weekly($current_week);
    variable_set('project_usage_last_week_processed', $current_week);
  }

However, aclight pointed out that we'd then loose the data about when the last time we actually processed the data was, which is sometimes useful for debugging, especially on d.o where cron often dies prematurely. So, something like aclight's pseudo patch above is probably what we'll want in the end.

C) The processing code is totally whack, and regularly eats data. hunmonk and I are debugging more closely. Meanwhile, I had to restore the weekly data from before I deployed #165380: Make usage statistics visible on d.o. I also temporarily disabled weekly processing until we can get to the bottom of it.

D) Even if the weekly processing was right, there's still a race condition between when the usage pages start thinking a new period has started that it should display, and when we actually process that weekly data and have it in the DB. The fact that the window of this race was huge is why we saw the bug so obviously, but even if we fix everything else, it'd still be there. Attached patch fixes the display pages to query the DB for the timestamps we actually have, instead of doing all the date math again to figure out the timestamps we think we should have.

This patch is only the start to fix this, but it's going to temporarily stop the bleeding while we figure out everything else...

dww’s picture

StatusFileSize
new4.39 KB
mysql> DESCRIBE SELECT DISTINCT(timestamp) FROM project_usage_week_release order by timestamp DESC LIMIT 6;
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table                      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | project_usage_week_release | index | NULL          | PRIMARY | 8       | NULL | 116270 | Using index; Using temporary; Using filesort | 
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
1 row in set (0.01 sec)

Eeek, "Using filesort"? That's evil. And, it's happening on every page load, since we end up calling that to help us even figure out if we have the resulting page cached. :(

Also, the right values to use only change once per week when we process another week's data, so it seems stupid to re-do this expensive query on every page load.

Instead, why don't we just generate the array of currently active timestamps (which should be the same in both {project_usage_week_project} and {project_usage_week_release}) when we do the weekly processing, and just stash that array into a setting? Then, each page load just grabs something out of the setting cache. That means we'll have to retrieve and unserialize this array of 6 integers on every page load on d.o, but that seems better than such an evil query on every usage statistics page view. Alternatively, we could just add a new table, {project_usage_active_weeks} or something, that just contains a row for each timestamp we think is currently active. That'd be an additional (incredibly cheap) query on each usage stat page view, but it'd save the unserialize() on every d.o page load... Certainly just using variable_(set|get) would be easier in the code than a whole new table.

I put this patch on d.o earlier tonight, while the DB was melting down. Things seem to be better now, so I presume this is an improvement. ;)

drewish’s picture

StatusFileSize
new4.78 KB

Storing the recent weeks in a variable makes a ton of sense. I didn't appreciate how expensive that query was.

It looks like project_usage_get_active_weeks(TRUE) is never being called to update the list of weeks. I added in a call to it in project_usage_cron()'s weekly case.

I also noticed that project_usage_get_active_weeks() was calling db_query() with a LIMIT rather than using db_query_range(). I'm assuming this was just an oversight so I switched it to db_query_range().

I'm also having some doubts about the wisdom of resetting the cache on every cron run. I think that should really be done on the weekly basis when we process new data. I'll hold off on changing that right now in case there's another reason I'm missing.

dww’s picture

Right, I wasn't calling project_usage_get_active_weeks(TRUE) since on d.o we've basically disabled the usage cron stuff for now. It was indeed the plan to add that back in once we resolve everything else in here, but it's fine to include it in the patch now.

db_query_range() -- cool, yeah, just an oversight.

I agree we should only invalidate {cache_project_usage} if anything has changed. I don't see any reason to clear it every day if the data only changes every week. Seems like we could pretty safely just TRUNCATE {cache_project_usage} at the end of the weekly processing, and otherwise, set those items to never expire, no? ;)

That said, I don't want to turn on this cron code again so long as the DELETE queries are in there. I'm sorry I failed to put up a fight about this when I first committed the code -- I don't understand why those are necessary, and they seem like an awfully easy way to destroy a ton of useful data on your site.

drewish’s picture

StatusFileSize
new6.99 KB

what do you think about this? it makes an effort to do the insert and then only if there are errors (because of primary key conflicts from a previous failed run) it deletes the data and re-does the insert.

i didn't make any changes to the caching since that seems like it could be done in a subsequent patch.

cburschka’s picture

Status: Needs work » Needs review

Setting to CNR since obviously the latest patch has not yet been reviewed.

I would really like this to get fixed quickly, as it is extremely valuable to know how many people are using which release of contrib modules.

hunmonk’s picture

StatusFileSize
new7.24 KB

i don't think we should be trying to defensively code for a database error here. PHP should be able to trust that the system is working as expected, and we should never have an insert error if everything is working correctly. if there is an insert error, i'd rather see us throwing an error message.

attached adds watchdog error messages in the case of a failed insert query.

webchick’s picture

Seems like it might be useful to have the exact query that failed as part of the watchdog message. So adding a %sql variable in there, since it's already available as a variable.

greggles’s picture

I've got about zero context on this specific issue, but...

The standard "do what core does" answer is to run an update query and, if no rows were affected, do an insert. It's less code and more likely to work.

See #271288: Duplicate inserts for another example of this recently fixed in project*.

dww’s picture

Status: Needs review » Needs work

That's not the problem. There should never be existing data in these tables, except when cron dies half-way through. It should be an error if we're processing weekly data and are trying to update rows that already exist. However, cron dies pretty regularly on d.o, so we need to be smarter about this.

drewish’s picture

dww, so on IRC we'd discussed moving this code to it's own script. can you suggest an existing script that i could use as a template for having an externally called cron script? i'd love to get this fixed.

drewish’s picture

Status: Needs work » Needs review
StatusFileSize
new19.06 KB

here's a patch that takes hunmonk's changes from #12 and moves them into a new project-usage-process.php script based off of project-release-create-history.php.

totally untested but looks good.

hanoii’s picture

Subscribing

dww’s picture

Status: Needs review » Fixed

Reviewed closely, tested on p.d.o, backed up {project_usage_*} on d.o, deployed, and ran. All is good. Now I just have to get the OSUOSL to setup a cron job to invoke the new script regularly. Committed to HEAD and DRUPAL-5.

hanoii’s picture

Not sure if this is the place to ask about this, but why is now only displaying until week of Nov 23 and new weeks are not appearing?

Thanks,
a.=

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

moshe weitzman’s picture

Thanks dww. Seems like we can say with some certainty that long running jobs need to either use batch API or run using command line php. CLI is a nice option since we have this nifty drush framework. job_queue could ship with a drush command and we'd be in very good shape IMO.