I just updated to latest version and my website welcomed me with 500 error. I looked at my vps process manager and php was going nuts, spawning child processes like crazy and allocating all available resources. That led to apache just dying. This is what I get in apache logs:
[Thu Jan 26 08:02:21 2012] [error] [client local.ip.address] Premature end of script headers: index.php

I rolled back to previous version and everything is back to normal.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Wim Leers’s picture

Title: CDN causes PHP to allocate all available CPU and Memory » CDN + Far Future expiration in some cases: PHP allocates all available CPU and memory
Component: Module » Origin Pull mode
Assigned: Unassigned » Wim Leers
Priority: Critical » Normal
Status: Active » Postponed (maintainer needs more info)

1) Are you using a CDN or a static file server? If you're using a CDN, this should not happen.
2) Does your site have *a lot* of resources per page? Then I could see this happen.
3) This would also be addressed by #1413156: .htaccess rules for Far Future expiration: make it possible to use the Far Future feature directly in Apache, avoiding PHP. If you're using Apache, could you please try that patch?

graytoby’s picture

1) Subdomain, so I guess it falls under static file server.
2) Yes and no. Small cssimages are embedded in css files using datauri, small images are embedded in html. What I see now is that CDN is serving 5-7 images per page (I only serve images using CDN). All images are processed by apache mod_pagespeed, and for Chrome and Opera users are served as webp.
3) I tried that patch, it worked util I flushed the caches, so I guess it didn't really work.

Wim Leers’s picture

I suspect this is due to mod_pagespeed. Can you try without mod_pagespeed?

graytoby’s picture

Nope, no change at all. End result is all available cpu and memory are allocated to php and in return apache gets killed.

ebrueggeman’s picture

+1

broncomania’s picture

I got the same problem with d6. Since I went back to 2.1 this problem is gone. I also tried the dev but with the same results.

Wim Leers’s picture

I wish I could reproduce this.

#2: did you just apply the patch, or did you follow the instructions that the patch adds? The patch actually modifies the README, providing instructions on how to modify your .htaccess file.

#5 & #6: can you try the same as I said in #2, with the addendum above? Alternatively, contact me and I'll help you. I really want to see this solved ASAP.

mesh’s picture

Version: 7.x-2.2 » 6.x-2.2

I too can confirm that error 500 occurs after update to 6.x-2.2. Disabling 'Far future expiration' under settings doesn't solve the problem either. Cannot enable the CDN module at all.

Wim Leers’s picture

#8: Are you 100% sure that disabling that setting also doesn't solve the problem? That's hard to believe. But if it's true, it's a good thing to know, that will help us narrow down the problem.
Please also try the same as in #2, with the addendum in #7.

graytoby’s picture

Version: 6.x-2.2 » 7.x-2.2

Sorry for the delay.

Yes, I modified .htaccess too.

Wim Leers’s picture

#10:
1) If you upgrade to the development snapshot, does it still happen?
2) If you do a vanilla Drupal install on the same server, then install the CDN module, can you reproduce it?

graytoby’s picture

Wim,
I've updated to 7.x-2.3 and disabled far future expiration before clearing any caches. It works, but if I enable far future expiration, it kills apache.

mr.j’s picture

Priority: Normal » Critical

This has just killed my server after upgrading on the 6.x branch!

I don't even want the far future or CSS aggregation functionality. We set far future headers for known image file types using .htaccess already.
We also use the bundlecache module which is kind of like advagg so we don't use core CSS aggregation.

We are using an origin pull CDN, hence basic mode.

I have managed to comment out the CSS aggregation code and had to manually disable the far future stuff by manually setting the 'cdn_farfuture_status' variable in the database.

I suggest you release an updated version of the module with these things disabled by default instead of enabled because they are potential site killers.

Wim Leers’s picture

Fair enough. I'll change the default.

However, that still doesn't solve the problem at hand.

andreiashu’s picture

I think I might have stumbled on this issue as well.
Just to make sure that it is the same: can anyone with the D7 version comment out from line 152 to 163 and see wether you experience the same memory leak? Please note that commenting that out will make your imagecache and other imagecache like functionality stop working. But this is just to see wether those lines are the cause of the problem.

For D6 users you need to comment out from line 141

andreiashu’s picture

Version: 7.x-2.2 » 6.x-2.3
Status: Postponed (maintainer needs more info) » Needs review
FileSize
1.21 KB

I have attached an example patch for the time being. It is very basic in that CDN will fire a http request only if the path is an imagecache one.
This not only saves drupal from going into an infinite loop but also saves resources when a website has a lot of non existing images on a page.
Again, this is just an example patch about how we could try and approach the problem. Ideally we would want some more generic solution that supports any imagecache like path.

The patch is for D6 version for the time being.

Wim Leers’s picture

Wim Leers’s picture

As per #14, Far Future expiration is now disabled by default.

D7: http://drupalcode.org/project/cdn.git/commit/c370a4c
D6: http://drupalcode.org/project/cdn.git/commit/7def668

Wim Leers’s picture

Version: 6.x-2.3 » 7.x-2.x-dev
FileSize
4.01 KB
3.75 KB

The patch in #16 is much appreciated, but it doesn't solve the cause of the problem. The attached patches fix this (it was only a problem in Drupal 6 in the first place AFAICT, because I was using drupal_not_found() and drupal_access_denied()).

Additionally, the attached patches add watchdog logging, so that it actually becomes possible to easily track down problematic generated files.

interestingaftermath’s picture

+1 - having this same issue. Not able to try the patch right now but will give it a shot ASAP. Thanks!

andreiashu’s picture

Status: Needs review » Needs work

Thanks for the patch in #19.
I tested in D6 and it doesn't fix the problem: apache keeps creating new httpd processes filling up the available RAM...

We need a flag to let the underlying new process know that it should quit early if the file does not exist. I did this with an additional HTTP header parameter that I've put in the request. I can reroll your patch if you think there is not other way for this.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
5.49 KB
5.23 KB

#21: I don't understand — if a file cannot be generated, it immediately stops. I don't understand how it can go in "recursive mode".

I don't see any special HTTP header being output by your code? Just a simple 404? The code in #19 does that too?

Furthermore, I don't entirely understand how it's possible to generate so many HTTP requests. AFAICT, it's impossible that these are *truly* recursive requests. Suppose example.com/foo references /imagecache/bar.jpg and /imagecache/baz.jpg. Both will hit Drupal again, but they will *not* hit reference the file they're generating again, so I don't see how your cdn_file_url_alter() change could help?

However, I believe there *was* a "sort of recursive request" in the D6 version prior to the above patch, whenever drupal_not_found() or drupal_access_denied() would be called (as I already mentioned in #19).

Hence, I believe that the current problem only occurs on pages where *a lot* of yet to be generated files (most likely images) are referenced. The only other case I can think of is where a referenced file does not exist and yet we try to generate it.

Note the "believing": I can't reproduce this myself, so that's why I haven't been able to fix this in the first place.

In theory, we don't even need to generate the files, we can just detect they're missing and link to them, and wait for the file to be generated by the user's browser's requests. But we want as much to be served from the CDN as soon as possible, plus, if a non-CDN link is stored in the page cache, many pages/people could then be suffering from this lack of CDN'd files.

So, in the new version of the patch, there's two changes:
1) only perform a nested HTTP request if there's actually a menu item at this location
2) define a limit of the number of nested HTTP requests that can be performed to prevent overwhelming/overloading the webserver — the default limit is 2

andreiashu’s picture

Hi Wim,

I'll try and replicate this behaviour on a vanilla install in order to be able to describe the steps needed to replicate.

andreiashu’s picture

Nice, this latest patch worked great both on a clean D6 (pressflow) install and on one of our sites with homepage with a lot of assets.

The way to replicate the bug on a clean install was to delete any of the images from the Garland theme. That would make the additional background request fire up another httpd process which would bootstrap Drupal and then get into the cdn hook_url_alter and do another request (this is the infinity happens :)
Just to make sure you can easily replicate, try and delete the logo image from garland and some of the images in the theme. With CDN enabled and far future (default values) refresh the page. I attached a screenshot with my watchdog table where a bunch of 404s generated on /themes/garland/logo.png. All those logs were produced by the same page refresh - when testing I need have a "killall -9 httpd" terminal window on hold because the httpd eats up all the available memory pretty fast and after it starts swapping it is hard to get my machine back.

Anyway, long story short: I can confirm this is working for D6 version. Is that enough for RTBC. Can anyone else test this patch on D7?

andreiashu’s picture

Status: Needs review » Reviewed & tested by the community

I'll RTBC anyway, unless someone disagrees.

Wim Leers’s picture

Thanks for the review!

Also reproduced, but only on D6. This is really just because of the check for the menu item in D6. When a file was missing and we'd try to generate it, but there was *no* menu item for it, that would trigger a Drupal 404! That, in turn, was the cause of the endless recursion!

That also explains why I couldn't reproduce this: it would only happen if you were linking to files that did not exist AND could not be generated. I.e.: bad links. That also means this only happens if the file that's missing and cannot be generated is actually *part* of the 404 page that's served in that case.

So, for good measure, I'm including all of #22 *and* am adding part of what you did in #16. There was a bug in your code: it didn't account for sites not installed at the root of a domain.

Commits following soon.

Wim Leers’s picture

Status: Needs work » Fixed

One more addendum to #26 was made: the request timeout for the nested HTTP request has been reduced to 2 instead of 30 seconds. That should also help :)

I'm very glad this one is finally behind us!

D7: http://drupalcode.org/project/cdn.git/commit/474b454
D6: http://drupalcode.org/project/cdn.git/commit/5fe41c6

andreiashu’s picture

Status: Reviewed & tested by the community » Needs work

Hi Wim,

Thanks for committing this.
I'd like to discuss the additional functionality related to the max number of missing files that can be generated on a request. Correct me if I'm wrong but I think the current stable version of CDN module will try and generate ALL missing files during a request. If this is correct then the new defaults (2 per req) will confuse people that already have CDN installed and it can potentially cause problems. Consider the fact that most instances that have CDN module enabled will have a caching layer (Varnish) in front of Drupal. This means that, by default with the new version, we will only have 2 imagecache images taken from cdn and the rest of them from the main site for a page until we clear the caches on Varnish.

So, the above being correct: I think we should try and keep the current behaviour of generating all missing images during a request but keep in place the variable which will give us the ability to override that behaviour. Same thing for timeout?

Also related to caching layers and Varnish: when the CDN is not able to generate all the missing files (because of CDN_BASIC_FARFUTURE_GENERATE_MISSING_FILES_DEFAULT limit or timeout limit) it should invoke a hook to let other modules know about this so that we can set custom HTTP headers to let Varnish know not to cache current page?

I tend to think that this additional functionality for timeout/nr of generated missing files should be split into a separate issue.

Wim Leers’s picture

I completely agree with your POV and this is also why I really doubted this aspect of the patch. Defaults matter. At the same time, it's true that advanced users *know* how to change these things. So I actually feel it might be better to ship with this relatively aggressive, yet safe default. OTOH, we're now probably safe, with the many anti-recursion measures that are now built-in.

The timeout, however, I think is useful. It'll prevent excessively long 404 page-generating & loading. Generating an image will probably never take 2 seconds. Your suggestions of sending the right HTTP headers to let reverse proxy caches know that the current page should not be cached makes perfect sense — I don't know why I didn't think of that. Shouldn't we just do the right thing right in the CDN module?

Also: we could mark any page that contains some "unCDN'd" links as uncacheable, thereby triggering frequent retries and preventing even Drupal's page cache from spoiling things. The big downside of this: what if it's a very expensive, high-traffic page? We'd then increase the server load massively.

So, check out this idea: what about queueing every file that could not be generated? See e.g. this article. We could even remember the pages on which this happened, and thus we could invalidate them automatically from the page cache! (Sadly, this would add another module dependency in D6.)
All of this is of course overengineering if it's not necessary for a site. So we'd probably just want to call a hook with the "ungeneratable" (I'm sure something's wrong with that word :P) files. We could then write a stand-alone cdn_queued_generation for people who need this — if they ever turn out to need this.

Relevant links: http://drupal.stackexchange.com/a/21959, http://drupal.org/project/imagecache_batch.

If we're honest, then only ImageCache/Image Styles are truly going to be important. So maybe it's not even necessary to build this deep integration (with queued generation) — maybe it's okay to simply force sites of this magnitude to use queued image generation anyway.

andreiashu’s picture

First of all, let me point out that the code that you put in place so that CDN module doesn't cause infinite sub requests already fixes this issue. The timeout and the number of subrequests setting shouldn't be committed until we actually need them. Or, at least they should be in a separate ticket as a feature. All the things that we are talking about in #29 & #28 are nice but will create more code, more maintenance and most importantly more potential for bugs.

Defaults matter. At the same time, it's true that advanced users *know* how to change these things.

I imagine most (advanced and novice) users won't even notice this change in behaviour until they look at their webserver traffic graphs and note that their imagecache files are not being served from their CDNs.

A good quality module shouldn't cause unnecessary additional work for any user or sysadmin.

andreiashu’s picture

I forgot to mention: ultimately, as the module creator/maintainer, you have the final say in this.
But I just wanted to put myself in a user's/sysadmin's shoes. :)

Wim Leers’s picture

I wanted to prevent the problem case of a page containing perhaps two dozen images that were yet to be generated. But I agree, this may not even be necessary and thus I'm adding unnecessary complexity. I got rid of the timeout and number of nested requests settings.

So, I guess we're all set then :) We've discussed how I went too far in some areas (I wrote the nested requests limitation thingie in the first place because I couldn't reproduce the problem). We've discussed potential additions. So comments #28—#32 now form a nice overview of what still can be done if turns out that it's really needed :)

Thanks for all your input! :)

Rollback commits:
D7: http://drupalcode.org/project/cdn.git/commit/d372488
D6: http://drupalcode.org/project/cdn.git/commit/0c7b357

andreiashu’s picture

sweet! thanks :)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

JeremyFrench’s picture

Status: Closed (fixed) » Active

I think we are getting this issue, or something very similar in D7 with 7.x-2.5.

We have turned off far future and it seems fine.

The issue was that it was trying to load links from a page when generating a page for every

<a href="/node/123"><img ....></a>

That we had on a page, it would do a request to /node/123 on the web server. So to load a page after a cache clear it would have to load every link on that page. Bringing down the site on a number of occasions.

We would also get "Nested HTTP request to generate..." notices in the page.

mstrelan’s picture

Wim Leers’s picture

Status: Active » Closed (fixed)

Back to closed as per #36. 2.6 release (which contains the fix) is imminent as per #1915662: [meta] 2.6 release (bugfixes only).