I doubt it matters much which release this is reported against because the real problem is the DO site.

In some cases, US is unable to ever complete its Cron run because of the response time of the DO site. When this happens, Cron fails (already running). Yes, I know the "culprit" is US for a fact.

Is there some way you can set a limit on how long it is allowed to take you to update the list? At least then you can tell Cron you're done, even if you've failed.

CommentFileSizeAuthor
#7 update_status_direct_xml.patch.txt3.45 KBdww
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

NancyDru’s picture

Assigned: Unassigned » NancyDru
Status: Active » Fixed
dww’s picture

i don't understand what the sitedoc module has to do with this problem. ;) can you please elaborate? thanks.

NancyDru’s picture

It detects the problem and optionally deletes the Cron variables. I marked this as fixed.

Anonymous’s picture

Status: Fixed » Closed (fixed)
mhutch’s picture

Assigned: NancyDru » mhutch
Category: feature » bug
Priority: Normal » Critical
Status: Closed (fixed) » Active

This is happening on both of my sites on completely different servers, and hanging cron prevents other jobs from running. Installing sitedoc is a workaround, not a fix, so IMO this bug should remain open.

AFAICT this only happens with 5.x-2.0*/dev (post 5.x-1.2) versions.

NancyDru’s picture

Fine with me. Just out of curiosity, how often do your have Cron set up to run?

To be a bit more fair about this, it's not actually US that's causing the sites to hang; it's the slowness of drupal.org that's causing it. I don't know if US can do much about it unless they can set a timer event that would terminate US after 15 minutes (for example) and clean up the 'cron_semaphore' variable. But that could affect other modules, so it could be dangerous.

It probably would be good if US could somehow tell Drupal to make it the last Cron hook that gets run.

dww’s picture

Assigned: mhutch » dww
Status: Active » Needs review
FileSize
3.45 KB

In IRC, myself, merlin, killes, and drewish discussed this problem at length. In spite of my earlier hopes, it seems like a bad idea to rely on a project_release.module menu path to serve up the .xml files and use the menu callback as the place to record usage stats (see http://drupal.org/node/128827 for background). d.o is just too damn slow at times, and update_status.module isn't going to be helping much by sending things through entire drupal menu paths, hitting the DB, etc, etc.

luckily, there's an alternative: we can just directly fetch the .xml files straight from apache, no drupal required. the client side settings for allowing the usage stats (http://drupal.org/node/153741) will still work. all we have to do is write something to scrape apache logs and collect the usage stats that way. we can periodically process the logs and stuff them into the DB, sort of like what happens for the CVS commit info. then, all the project quality stuff we want can still just query the DB for the last known stats, and do whatever it wants based on that.

So, I'm convinced the attached patch should fairly dramatically speed up the response time for fetching the data, since we're directly going to the .xml files instead of going through drupal. In the future, we could potentially speed this up even more by making updates.drupal.org point to ftp.osuosl.org (much like we do for the binary downloads), although we'd need to make sure we can still scrape the logs from there...

Anyway, I'd *love* to see some benchmarking results on this, in terms of how long it takes to manually check for updates before and after the patch.

hunmonk’s picture

code looks ok. patch applies cleanly. some performance results below:

w/o patch
8056.89 -- initial load
960.64
738.28
994.28
882.6
809.69

w/ patch
1181.44 -- initial load
775.46
888.58
764.64
784.54
743.32

dww’s picture

Status: Needs review » Fixed

The initial is significant IMHO. I bet the rest are just hitting squid in both cases. In hook_cron(), you're more likely to hit a stale cache and eat the expensive version. Note that the new one has only a relatively small difference between the initial and subsequent reloads.

Anyway, this is clearly a win, and it works, so I committed to HEAD.

hass’s picture

i think i understand the current performance problems and this is a good idea to simply host a static file, but this will create some other tricki problems...

Don't forget that collecting this statistics may not working with IIS or will be totaly different. if you have different apache log file formats you have different configs for collecting the stats, too. I'm interrested how this stats will be collected from the webserver logs... anything ready?

hass’s picture

Aside, what will happen to the "anonymous usage report" feature in this case? i should have a look to the code how fake setting is working...

dww’s picture

Please see http://drupal.org/node/155281 for the current reality.

Anonymous’s picture

Status: Fixed » Closed (fixed)