I have a little problem with update.module, after the Drupal-core update from 6.4 to 6.5 (yesterday).
At every cron (15 minutes on my config) my site fetches update informations from drupal.org and mail me to inform of new releases. This happens always, whatever is setting on reports/update/settings.
I think this problem is due to a cache problem... but I can't find why.
On my site I configured a normal cache type, refreshed every 15 minutes, with css and js compression.
Someone can suggest a solution or a way to find the cause of the problem?
Regards,
Saxx

Comments

danmurf’s picture

Version: 6.5 » 6.7

I've also been experiencing this issue with 6.6 and 6.7.

After I removed page compression, set the minimum cache lifetime to none and cleared the cache the problem seems to have stopped. I hope this helps.

Many thanks.
Dan

dww’s picture

Status: Active » Closed (duplicate)
korvus’s picture

Version: 6.7 » 6.10
Status: Closed (duplicate) » Active

I hate to bring this old problem back to life, but I read #222073 and it doesn't actually address the problem. Looking at the CVS logs on drupal.org, I believe I see what the ACTUAL problem is -- and it corresponds with danmurf's "solution".

I noticed it sending me an email about available updates every time cron ran recently. I originally figured it was a result of MySQL freaking out after a DDoS attack, but I've now realized I was wrong. The reason it's happening is because I implemented a minimum cache lifetime in order to (hopefully) make my site more resilient to these attacks in the future. The reason danmurf saw this disappear was his removal of the minimum cache lifetime.

Look at this change in the update.module CVS. Someone added a cache_get() call because we wanted to get update status if we didn't have anything cached, regardless of the time period we were supposed to be checking. Makes sense. The problem is if you look at the D6 cache_get() function (you may want to follow along).

After the $cache_flush check, we grab the data from the cache table. After that, we hit this if statement: if ($cache->expire == CACHE_PERMANENT || !variable_get('cache_lifetime', 0)) {. Ok, expire isn't set to CACHE_PERMANENT, and my cache_lifetime is now set to 5 mins (so 300). So we skip that block. Note that we wouldn't be skipping that block if I didn't set a minimum cache lifetime.

In the "else" clause, we then do this check: if ($user->cache > $cache->created) {

If this is true, we return 0 (which causes update.module to check for updates again). Unfortunately, I have no idea what $user->cache is when we are called from cron (sounds like it is a session variable, but there is no pre-existing session when I'm running wget on cron.php). I can only assume that $user->cache is set to the current time (or something along those lines...perhaps the current time minus cache_lifetime). This causes update data to be downloaded every time cron runs (which, for me, means an email every 30 minutes). I do not believe this was the desired intent of adding the cache_get() call. In fact, since the update system has it's own way of determining if the cached update data has expired, it shouldn't be using the cache_lifetime variable at all.

Feel free to tell me that I'm looking at this wrong. The issue here may be in cache_get(), as I don't see how that code implements the functionality described in the admin/settings/performance screen (which is: "On high-traffic sites, it may be necessary to enforce a minimum cache lifetime. The minimum cache lifetime is the minimum amount of time that will elapse before the cache is emptied and recreated, and is applied to both page and block caches. A larger minimum cache lifetime offers better performance, but users will not see new content for a longer period of time."). It would seem to FORCE expiration if the cache_lifetime is anything but none, which would seem to imply it is a MAXIMUM, and that it would incur a performance PENALTY instead of better performance. But someone who knows Drupal core better than myself could chime in on that. In my world, something should be AT LEAST looking at $cache->expire instead of looking at $user->cache and $cache->created, but I could be mistaken.

But I DO believe the cache_get() call in update.module has unintended consequences, and either cache_get() needs to be fixed or update.module needs to be fixed. There is no reason that update checks should run every time cron runs (on some sites, with 5-minute cron runs, that could be brutal).

deverman’s picture

We are also experiencing the same bug. I only want the module to check once per week but it seems to be checking every cron and sending me and email telling me there is updates. For now we are going to disable the module and activate it once per week.

dww’s picture

Priority: Normal » Critical

Yes, this is definitely a critical bug. This is causing a punishing load on updates.drupal.org from a handful of sites that check for updates *way* too often. :( I think I'd call this a bug in cache_get(), but that's going to be a harder change to fix in D6, so we'll probably have to work-around it in update.module itself.

killes@www.drop.org’s picture

I think we should rework update.module to not use the usual caching layer. In fact, I believe that the update status cache is different from all other caches: The update status cache relies on external data that can not be generated on site. All the other caches only hold local data.

if you for example set up memcache for you site and not add an extra bin for cache_update, you will put the update cache in the default cache bin and that can ge cleared quite often (depending on which other items are also in there).

dww’s picture

Yeah, given the experience with {cache_form} right after the drupal.org D6 upgrade, I tend to agree. The huge question is how far Dries/Gabor are going to let us go changing schemas and APIs here to fix a critical bug in a stable release series.

dww’s picture

Assigned: Unassigned » dww

Basic proposal for review from the D6 core maintainers (I'll be pinging Gabor and Dries via private email about this):

A) Drop the {cache_update} table.

B) Add a new {update_data_cache} table.

C) Change the cache_set() and cache_get() calls in update status to use private functions named something like _update_cache_set() and _update_cache_get(). In the style of the rest of update status, all of this will be heavily PHPDoc'ed. ;)

Yes, we normally avoid schema and API changes in a stable series, but for the sake of the d.o infrastructure and sites trying to run update status that use "advanced" caching configurations, we desperately need an exception. The above changes should be very safe -- we're not going to be changing any existing APIs, just adding a few (internal) functions for update status itself. The schema change is only touching a table that is regularly truncated and regenerated anyway, so there's no worry about losing site data during a migration.

Once a D6 core committer approves this proposal (and removes the "Needs architectural review" tag), I'll roll the patch. I'll also give another close look at the update status issue queues and see if there are any other fixes that need to happen to ensure we never have problems like this again.

chx’s picture

Assigned: dww » Unassigned

The problem is IMO that user->cache is not per table. It's set in cache_clear_all btw.

chx’s picture

However, solving that is way beyond D6. Moving the update status cache into a separate table is a relatively small change. Noone builds on that table / API however if we change the cache API (and subsequently sessions table, OMG) then all hell breaks lose. Let's have that in D7.

dww’s picture

Assigned: Unassigned » dww

Right. That's my point -- there are tons of ways we could fix this in D7. But, we need this fixed now in D6. Hence, I'm asking for two exceptions: 1) to the schema/API freeze in D6, and 2) to the tendency to fix bugs in HEAD first and then backport. The key question is a strategy where we can fix this that is viable to land in D6 core. Then, we can gather around the crack pipe and dream about the best way to fix this in D7 before the code freeze...

david strauss’s picture

We need to extend the caching API to allow using non-volatile caches. We also need to extend the caching API to allow for inconsistent storage and retrieval, though this is not necessary to solve this particular issue.

david strauss’s picture

And, yes, I mean this the volatility issue needs to be fixed in D6.

Cybergarou’s picture

Since I detect knowledge of the caching system here, I have a question. When cron is run under an anonymous user, is $user->cache supposed to be a timestamp? I was under the impression it would be 0 as it is set in drupal_anonymous_user(). That would at least explain why update information is retrieved with every cron run, even if there is no possible way the minimum cache lifetime has elapsed yet.

dww’s picture

@all: Keep in mind that the data in {cache_update} is shared across all users on the site. The (presumably) anon user that cron runs as, and the admins viewing the available updates report. So, if any of them invalidate the cache, they're all in trouble. Someone has proposed trying to set $user->cache to 0 before update status calls cache_get(), but that's going to potentially mess with admin users clicking around anywhere in admin/*. Update status currently checks the status of the site on admin/* to report an error message if the site is missing a security update. Normally, it's just pulling this info out of the cache, but if the cache is invalidated at any stage, we fetch data again (which is a related, but separate bug -- that's ultimately what #222073: Update Status causing Drupal to run very slowly is about, I believe).

I think it's far more risky in the D6 stable series to change $user->cache (either into an array as chx proposes, or forcing it to 0 as some have proposed) inside update status than to just treat the update status cache as a special case, move it into a separate table, and don't use any of the existing cache API. The chances that we might accidentally break something else by changing $user->cache are quite high, especially since we have no test suite for D6. The chances we might accidentally break something by moving the update status cache table are basically 0%.

gábor hojtsy’s picture

I don't have the resources now to look into this problem and consequences of alternatives like setting $user->cache to 0. If it turns out that introducing a separate table for update_status is the viable solution, I'd agree to do that for Drupal 6 unless Dries has something against it. Keep in mind that the update code for Drupal 7 will be more complex with two possible schemas coming from Drupal 6, however, since these are cached data, keeping it is not required on updates either.

dries’s picture

In update_cron(), can't we just change if (!cache_get('update_info', 'cache_update') || ((time() - variable_get('update_last_check', 0)) > $interval)) { to remove the first clause?

Cybergarou’s picture

That's the way the function worked originally. This is the reasoning for the change from the revision log.

#267724 by cpugeniusmv, dww, maartenvg: Update module was not checking for new data when the cache was cleared, until the set time was elapsed. Now checks on cron after the cache was cleared.

I would say removing that clause would certainly be better than leaving it as is until another solution is implemented. In fact it sounds to me that the only reason for the change was to save somebody from having to run a manual check if they cleared the cache. If that's the case, remove the clause and don't look back. We can run a manual check if we need to see the available updates that soon after clearing the cache.

dww’s picture

@Dries #17: repeating myself from #15:

@all: Keep in mind that the data in {cache_update} is shared across all users on the site. The (presumably) anon user that cron runs as, and the admins viewing the available updates report. So, if any of them invalidate the cache, they're all in trouble.

This is not just about cron. It's about using the cache at all, and the stupid things core does with the cache if you either a) set the [sic] "minimum" cache lifetime or b) install memcache.

trogie’s picture

subscribing. I'm having this problem since 6.10...

Anonymous’s picture

subscribing

dww’s picture

Status: Active » Closed (duplicate)

There are a lot of duplicate bugs stemming from the same root problem. I'm consolidating everything into the earliest issue about this particular bug:

#220592: Core cache API breaks update.module: fetches data way too often, kills site performance, etc

See also #359171: Avoid cases where update data is refreshed on admin pageloads and ensure update notifications are always sent as configured for a related bug (the pair of them is brutal)...