On my site, I'm noticing a 502 Error when running on Nginx. I can make the problem appear and disappear by enabling and disabling the File Entity module.

Further information: this appears only when files are being rendered, for instance on pages where an image has been inserted into the WYSIWYG and Media markup is being converted to HTML. I can make this problem go away on pages where I convert the Media markup, manually, to an <img> tag, which of course isn't ideal.

A 502 Bad Gateway error can happen when a server stops responding. In this case, it's taking over 9 seconds to render the page, at which point PHP crashes and Nginx returns the 502.

An example of a page where this is happening is http://activerx.dev.dal.minimalmedia.net/franchise/news.

Comments

visuaLatte’s picture

Issue summary: View changes
visuaLatte’s picture

Title: File Entity causing 502 error on Nginx » File Entity causing 502 Bad Gateway Error on Nginx
Issue summary: View changes
visuaLatte’s picture

Updated: I found the function that is triggering the 502 Bad Gateway error. By removing (and then re-adding) the code within this function on my dev site, I can make the problem disappear and then reappear. Function is file_entity_file_load(), which is called around line 184 of file_entity.file.inc.

function file_entity_file_load($files) {
  $alt = variable_get('file_entity_alt', '[file:field_file_image_alt_text]');
  $title = variable_get('file_entity_title', '[file:field_file_image_title_text]');

  $replace_options = array(
    'clear' => TRUE,
    'sanitize' => FALSE,
  );

  foreach ($files as $file) {
    $file->metadata = array();

    // Load alt and title text from fields.
    if (!empty($alt)) {
      $file->alt = token_replace($alt, array('file' => $file), $replace_options);
    }
    if (!empty($title)) {
      $file->title = token_replace($title, array('file' => $file), $replace_options);
    }
  }

  // Load and unserialize metadata.
  $results = db_query("SELECT * FROM {file_metadata} WHERE fid IN (:fids)", array(':fids' => array_keys($files)));
  foreach ($results as $result) {
    $files[$result->fid]->metadata[$result->name] = unserialize($result->value);
  }
}
visuaLatte’s picture

Further update: it appears that the 2 lines calling token_replace() specifically cause the problem. Again, I can turn the problem off and on by commenting and un-commenting these lines.

swap20’s picture

I dont know if you manage to solve it

Your nginx vhost conf needs to serve static html as below

location ~* ^.+\.(?:css|js|html|xml|png|jpg|jpeg|gif|ico|swf|htm)$ {
   access_log        off;
    expires           30d;
    tcp_nodelay off;
    open_file_cache max=3000 inactive=120s;
    open_file_cache_valid 45s;
    open_file_cache_min_uses 2;
    open_file_cache_errors off;
    ## ETag support. This requires an Nginx version >= 1.3.3.
    etag on;    
}
ebsen’s picture

If you're finding this problem on a site hosted on Pantheon, as we are, you might be better off looking into another implementation. Pantheon does not allow access to the nginx.conf file.

As explained on Stack Overflow, Pantheon chose to support max-age over etags, which means that we aren't able to use the shortcode-like tags that Media employs on a Pantheon site.

MichaelSt’s picture

Drupal 7.34 on Pantheon server.

I've been having a really difficult time with my Drupal site the last couple of weeks. I had to renew an SSL certificate and once I did, I started getting Bad Gateway 502 errors everywhere.

It's a pretty robust site as it runs our whole business.

There were no new modules or updates for a couple of weeks - everything was working great, it just started giving these errors after the new SSl certificate was updated.

Basically when I try to update and save a View, it goes to a 502 Error Nginx screen. I can't refresh or go to another page - I have to restore the site from the server to a previous backup.

When I refresh the cache, enable disable a module, edit a registration, add comments in a forum - I also get a 502 Bad Gateway message but refreshing the browser will allow the site to reappear continue working but not very stable.

The server logs don't show any errors (that I can tell). Basically everything looks to just stop (all logs) after the 502 happens.

I've built quite a few Drupal sites but am not familiar with php or server side issues so any suggestions would be very helpful.

We do use the Media Module and am not sure if this is the problem. I've tried to disable a bunch of modules but can't isolate and can't turn off Media since it is in use and part of core now.

This is totally stumping me. We're really limping by since so much of the site is untouchable without locking everything up.

Any ideas would be greatly appreciated.

faite’s picture

I'm having the same exact thing happening with the 502s at all the same points. Not sure if its a coincidence, but it happened around the time I started using cloudflare and their SSL certs. Turning off cloud-flare doesn't seem to stop the issue however. I am on Pantheon and they switched servers for me which seemed to help for a couple of days, but the problem keeps coming back.

MichaelSt’s picture

That's good that I'm not the only one... I thought I was going crazy. I've tried disabling every module with no luck.

My SSL certs were and are with GoDaddy... I just had one expire and had to get a new one. Hopefully someone will have some suggestions on this. Pantheon said they were doing tests on the nginx.

They said something is stampeding the server and think it's a module that needs a patch. But I can't figure out which one or why it would be happening all the sudden after switching certs... Everything was great until then.

kopeboy’s picture

Version: 7.x-2.0-alpha3 » 7.x-2.x-dev

Pantheon & CloudFlare here too.
I experience the problem every now and then too, but it don't think its related to file entities exclusively.
I don't use any SSL cert.

The pages where I can see the bad gateway more frequently are:

  • after saving a content, especially when the node edit form as been open for long
  • after saving a translation
dgtlmoon’s picture

Category: Bug report » Support request

Sounds like a support request to me

Can anyone provide some concrete debug information - I would like to see what if any PHP errors are being thrown?

Are they always images that are having trouble?

MichaelSt’s picture

I've included the php error log and nginx-error logs from my Pantheon server below.

The following makes the website lock up into 502 errors...
1.) Saving a view (either new or edited)
2.) Turning on/off modules
3.) Posting in the advanced forum (Now disabled)
4.) Creative a New Content Type

I hope someone can figure this out or point me in the right direction. It's been months of limping by with nothing solid to build on.

php error log:
[27-Feb-2015 14:45:41 America/Los_Angeles] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 2006 MySQL server has gone away' in /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/database/database.inc:2171
Stack trace:
#0 /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/database/database.inc(2171): PDOStatement->execute(Array)
#1 /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/database/database.inc(683): DatabaseStatementBase->execute(Array, Array)
#2 /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/database/database.inc(2350): DatabaseConnection->query('SELECT expire, ...', Array, Array)
#3 /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/lock.inc(167): db_query('SELECT expire, ...', Array)
#4 /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/lock.inc(146): lock_may_be_available('schema:runtime:...')
#5 /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/bootstrap.inc(433): lock_acquire('schema:runtime:...')
#6 /srv/bindings/19ea83cb8cb54582b in /srv/bindings/19ea83cb8cb54582b31f790d3275b209/code/includes/database/database.inc on line 2171

nginx-error log:
2015/03/07 22:50:30 [error] 97008#0: *1 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.223.177.45, server: , request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/srv/bindings/19ea83cb8cb54582b31f790d3275b209/run/php-fpm.sock:", host: "dev-cma-cms-test.gotpantheon.com"
2015/03/07 22:51:18 [error] 97008#0: *1 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.223.177.45, server: , request: "POST /node?destination=node HTTP/1.1", upstream: "fastcgi://unix:/srv/bindings/19ea83cb8cb54582b31f790d3275b209/run/php-fpm.sock:", host: "dev-cma-cms-test.gotpantheon.com", referrer: "http://dev-cma-cms-test.gotpantheon.com/"
2015/03/07 22:51:49 [error] 97008#0: *1 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.223.177.45, server: , request: "GET /node HTTP/1.1", upstream: "fastcgi://unix:/srv/bindings/19ea83cb8cb54582b31f790d3275b209/run/php-fpm.sock:", host: "dev-cma-cms-test.gotpantheon.com", referrer: "http://dev-cma-cms-test.gotpantheon.com/"
2015/03/07 22:57:50 [error] 101325#0: *8 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.223.193.24, server: , request: "GET /update.php?op=selection&token=spjmtWxJdAqH60SE9IkWlo6pOA4zoJ2cIyklSj5kI0Y&id=244&op=finished HTTP/1.1", upstream: "fastcgi://unix:/srv/bindings/19ea83cb8cb54582b31f790d3275b209/run/php-fpm.sock:", host: "dev-cma-cms-test.gotpantheon.com", referrer: "http://dev-cma-cms-test.gotpantheon.com/update.php?op=selection&token=sp..."
2015/03/08 00:43:03 [error] 99005#0: *23 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.223.177.45, server: , request: "GET /admin_menu/flush-cache?token=1fnp0JJ9fnQOmvlBnnBZkJwfVfb8kUMP9DRhIoEO3i8&destination=node HTTP/1.1", upstream: "fastcgi://unix:/srv/bindings/19ea83cb8cb54582b31f790d3275b209/run/php-fpm.sock:", host: "dev-cma-cms-test.gotpantheon.com", referrer: "http://dev-cma-cms-test.gotpantheon.com/"
2015/03/20 01:42:05 [error] 118763#0: *40 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.223.193.24, server: , request: "GET /update.php?op=selection&token=spjmtWxJdAqH60SE9IkWlo6pOA4zoJ2cIyklSj5kI0Y&id=247&op=finished HTTP/1.1", upstream: "fastcgi://unix:/srv/bindings/19ea83cb8cb54582b31f790d3275b209/run/php-fpm.sock:", host: "dev-cma-cms-test.gotpantheon.com", referrer: "http://dev-cma-cms-test.gotpantheon.com/update.php?op=selection&token=sp..."

MichaelSt’s picture

Here is the php-fpm-error.log after getting the 502 error...

[root@endpointa277e005 logs]# tail -100 php-fpm-error.log
[26-Feb-2015 15:02:56] WARNING: [pool www] child 55014 exited on signal 11 (SIGSEGV) after 3119.973335 seconds from start
[26-Feb-2015 15:02:56] NOTICE: [pool www] child 36691 started
[26-Feb-2015 15:06:47] WARNING: [pool www] child 36691, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (6.373544 sec), logging
[26-Feb-2015 15:06:47] NOTICE: child 36691 stopped for tracing
[26-Feb-2015 15:06:47] NOTICE: about to trace 36691
[26-Feb-2015 15:06:47] NOTICE: finished trace of 36691
[26-Feb-2015 15:06:48] WARNING: [pool www] child 36691 exited on signal 11 (SIGSEGV) after 231.843262 seconds from start
[26-Feb-2015 15:06:48] NOTICE: [pool www] child 45615 started
[26-Feb-2015 15:06:54] WARNING: [pool www] child 80447, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.763884 sec), logging
[26-Feb-2015 15:06:54] NOTICE: child 80447 stopped for tracing
[26-Feb-2015 15:06:54] NOTICE: about to trace 80447
[26-Feb-2015 15:06:54] NOTICE: finished trace of 80447
[26-Feb-2015 16:00:01] WARNING: [pool www] child 15357, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "POST /index.php") executing too slow (5.275733 sec), logging
[26-Feb-2015 16:00:01] NOTICE: child 15357 stopped for tracing
[26-Feb-2015 16:00:01] NOTICE: about to trace 15357
[26-Feb-2015 16:00:01] NOTICE: finished trace of 15357
[26-Feb-2015 16:00:01] WARNING: [pool www] child 15357 exited on signal 11 (SIGSEGV) after 3968.278739 seconds from start
[26-Feb-2015 16:00:01] NOTICE: [pool www] child 28609 started
[26-Feb-2015 16:00:16] WARNING: [pool www] child 45615, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "POST /index.php") executing too slow (5.385728 sec), logging
[26-Feb-2015 16:00:16] NOTICE: child 45615 stopped for tracing
[26-Feb-2015 16:00:16] NOTICE: about to trace 45615
[26-Feb-2015 16:00:16] NOTICE: finished trace of 45615
[26-Feb-2015 16:00:33] WARNING: [pool www] child 80447, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (6.133935 sec), logging
[26-Feb-2015 16:00:33] NOTICE: child 80447 stopped for tracing
[26-Feb-2015 16:00:33] NOTICE: about to trace 80447
[26-Feb-2015 16:00:33] NOTICE: finished trace of 80447
[26-Feb-2015 16:00:33] WARNING: [pool www] child 80447 exited on signal 11 (SIGSEGV) after 5557.465245 seconds from start
[26-Feb-2015 16:00:33] NOTICE: [pool www] child 30334 started
[26-Feb-2015 16:00:41] WARNING: [pool www] child 28609, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.024162 sec), logging
[26-Feb-2015 16:00:41] NOTICE: child 28609 stopped for tracing
[26-Feb-2015 16:00:41] NOTICE: about to trace 28609
[26-Feb-2015 16:00:41] NOTICE: finished trace of 28609
[26-Feb-2015 16:00:45] WARNING: [pool www] child 45615, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.227489 sec), logging
[26-Feb-2015 16:00:45] NOTICE: child 45615 stopped for tracing
[26-Feb-2015 16:00:45] NOTICE: about to trace 45615
[26-Feb-2015 16:00:45] NOTICE: finished trace of 45615
[26-Feb-2015 16:00:48] WARNING: [pool www] child 31627, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.537915 sec), logging
[26-Feb-2015 16:00:48] WARNING: [pool www] child 30334, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (6.598346 sec), logging
[26-Feb-2015 16:00:48] NOTICE: child 30334 stopped for tracing
[26-Feb-2015 16:00:48] NOTICE: about to trace 30334
[26-Feb-2015 16:00:48] NOTICE: finished trace of 30334
[26-Feb-2015 16:00:48] NOTICE: child 31627 stopped for tracing
[26-Feb-2015 16:00:48] NOTICE: about to trace 31627
[26-Feb-2015 16:00:48] NOTICE: finished trace of 31627
[26-Feb-2015 16:00:50] WARNING: [pool www] child 31928, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.859879 sec), logging
[26-Feb-2015 16:00:50] NOTICE: child 31928 stopped for tracing
[26-Feb-2015 16:00:50] NOTICE: about to trace 31928
[26-Feb-2015 16:00:50] NOTICE: finished trace of 31928
[26-Feb-2015 16:01:03] WARNING: [pool www] child 45615 exited on signal 11 (SIGSEGV) after 3254.986519 seconds from start
[26-Feb-2015 16:01:03] NOTICE: [pool www] child 32890 started
[26-Feb-2015 16:01:33] WARNING: [pool www] child 31928 exited on signal 11 (SIGSEGV) after 49.534893 seconds from start
[26-Feb-2015 16:01:33] NOTICE: [pool www] child 33925 started
[26-Feb-2015 16:01:38] WARNING: [pool www] child 28609 exited on signal 11 (SIGSEGV) after 96.704569 seconds from start
[26-Feb-2015 16:01:38] NOTICE: [pool www] child 33993 started
[26-Feb-2015 16:01:43] WARNING: [pool www] child 30334 exited on signal 11 (SIGSEGV) after 70.455703 seconds from start
[26-Feb-2015 16:01:43] NOTICE: [pool www] child 34073 started
[26-Feb-2015 16:01:49] WARNING: [pool www] child 31627 exited on signal 11 (SIGSEGV) after 67.591144 seconds from start
[26-Feb-2015 16:01:49] NOTICE: [pool www] child 34188 started
[26-Feb-2015 16:05:21] WARNING: [pool www] child 34188, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (6.074793 sec), logging
[26-Feb-2015 16:05:21] NOTICE: child 34188 stopped for tracing
[26-Feb-2015 16:05:21] NOTICE: about to trace 34188
[26-Feb-2015 16:05:21] NOTICE: finished trace of 34188
[26-Feb-2015 16:05:25] WARNING: [pool www] child 34188 exited on signal 11 (SIGSEGV) after 215.991351 seconds from start
[26-Feb-2015 16:05:25] NOTICE: [pool www] child 41067 started
[26-Feb-2015 16:05:36] WARNING: [pool www] child 33993, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.295656 sec), logging
[26-Feb-2015 16:05:36] NOTICE: child 33993 stopped for tracing
[26-Feb-2015 16:05:36] NOTICE: about to trace 33993
[26-Feb-2015 16:05:36] NOTICE: finished trace of 33993
[26-Feb-2015 16:05:53] WARNING: [pool www] child 33993 exited on signal 11 (SIGSEGV) after 254.489033 seconds from start
[26-Feb-2015 16:05:53] NOTICE: [pool www] child 41776 started
[26-Feb-2015 16:07:31] WARNING: [pool www] child 41067, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (6.216343 sec), logging
[26-Feb-2015 16:07:31] NOTICE: child 41067 stopped for tracing
[26-Feb-2015 16:07:31] NOTICE: about to trace 41067
[26-Feb-2015 16:07:31] NOTICE: finished trace of 41067
[26-Feb-2015 16:07:31] WARNING: [pool www] child 41067 exited on signal 11 (SIGSEGV) after 126.551535 seconds from start
[26-Feb-2015 16:07:31] NOTICE: [pool www] child 45128 started
[26-Feb-2015 16:11:10] WARNING: [pool www] child 34073, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.289173 sec), logging
[26-Feb-2015 16:11:10] NOTICE: child 34073 stopped for tracing
[26-Feb-2015 16:11:10] NOTICE: about to trace 34073
[26-Feb-2015 16:11:10] NOTICE: finished trace of 34073
[26-Feb-2015 16:11:10] WARNING: [pool www] child 34073 exited on signal 11 (SIGSEGV) after 566.494732 seconds from start
[26-Feb-2015 16:11:10] NOTICE: [pool www] child 50832 started
[26-Feb-2015 16:17:46] WARNING: [pool www] child 41776, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.701115 sec), logging
[26-Feb-2015 16:17:46] NOTICE: child 41776 stopped for tracing
[26-Feb-2015 16:17:46] NOTICE: about to trace 41776
[26-Feb-2015 16:17:46] NOTICE: finished trace of 41776
[26-Feb-2015 16:17:48] WARNING: [pool www] child 41776 exited on signal 11 (SIGSEGV) after 715.853168 seconds from start
[26-Feb-2015 16:17:48] NOTICE: [pool www] child 69195 started
[26-Feb-2015 16:17:50] WARNING: [pool www] child 45128, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (6.134834 sec), logging
[26-Feb-2015 16:17:50] NOTICE: child 45128 stopped for tracing
[26-Feb-2015 16:17:50] NOTICE: about to trace 45128
[26-Feb-2015 16:17:50] NOTICE: finished trace of 45128
[26-Feb-2015 16:18:34] WARNING: [pool www] child 45128 exited on signal 11 (SIGSEGV) after 662.683262 seconds from start
[26-Feb-2015 16:18:34] NOTICE: [pool www] child 72543 started
[26-Feb-2015 16:18:52] WARNING: [pool www] child 69195 exited on signal 11 (SIGSEGV) after 63.927376 seconds from start
[26-Feb-2015 16:18:52] NOTICE: [pool www] child 73834 started
[26-Feb-2015 16:18:58] WARNING: [pool www] child 72543, script '/srv/bindings/0ff4d67fc35a45cc9f80eae6f594660c/code//index.php' (request: "GET /index.php") executing too slow (5.343471 sec), logging
[26-Feb-2015 16:18:58] NOTICE: child 72543 stopped for tracing
[26-Feb-2015 16:18:58] NOTICE: about to trace 72543
[26-Feb-2015 16:18:58] NOTICE: finished trace of 72543

dgtlmoon’s picture

that says to me that the time you spent uploading the file (or the time processing) is greater than the time mysql is allowed to stay connected to the PHP

need to rule out basic things first - is the server local to you?

MichaelSt’s picture

dgtlmoon - I'm not totally sure what you mean. The site is not on my computer or anywhere on site. It's a Pantheon server - I'm not sure if they have our folder in Root or not.

The php-fpm-error.log above is after the site crashes. The people at Pantheon said the site is stampeding when showing a 502. But the culprits are mentioned above and the other logs show what was happening when the site went down (as far as I know - this is a bit above my head).

MichaelSt’s picture

This is where this stands. I sent Pantheon a list of things that crashed the site and they tried things on their end. This was the response...

We we able to replicate it on Test, via HTTP so it’s not Cloudflare or HTTPS. We even put in a new server but saving a view returns an segfault and running db updates as before every time:

[I pasted the gist below]

However, this does not fail on a new install on Drupal and we have also checked with some of the larger sites and they dont have the same issue. If it was configuration, those should be affected too.

There is something about the config or perhaps some database updates that need to be run. Based on the gist above [below] it is getting stuck on menu_rebuild:

https://www.drupal.org/node/2425259

Unfortunately, there isn’t any nginx configuration left to change or limit to increase and we have sifted through all the available logs. All the containers have the same profile but testing a number of site worked 100% of the time. If there was, we would have no issues trying to help but we have exhausted all options.

New relic should, step number 1 before anything else. That could help show you this in action, however, the next step would be to see why menu_rebuild appears in the slow log and trace it back from there as it is consistent when the site fails.

gist***********************
==> php-fpm-error.log <==
[29-Apr-2015 19:10:05] ERROR: failed to ptrace(ATTACH) child 21345: Operation not permitted (1)
[29-Apr-2015 19:10:05] WARNING: [pool www] child 21345, script '/srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code//index.php' (request: "GET /index.php") executing too slow (5.069348 sec), logging

==> nginx-error.log <==
2015/04/29 19:10:05 [error] 21525#0: *9 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.223.224.238, server: , request: "GET /admin/structure/views/view/all_lesson_payment_options/edit HTTP/1.1", upstream: "fastcgi://unix:/srv/bindings/21e355ab740a4b7f9da5a579ef1be008/run/php-fpm.sock:", host: "test-cma-cms-test.gotpantheon.com", referrer: "http://test-cma-cms-test.gotpantheon.com/admin/structure/views/view/all_..."

==> php-fpm-error.log <==
[29-Apr-2015 19:10:05] WARNING: [pool www] child 21345 exited on signal 11 (SIGSEGV) after 148.406309 seconds from start
[29-Apr-2015 19:10:05] NOTICE: [pool www] child 16915 started
[29-Apr-2015 19:10:12] WARNING: [pool www] child 21911, script '/srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code//index.php' (request: "GET /index.php") executing too slow (6.120900 sec), logging
[29-Apr-2015 19:10:12] NOTICE: child 21911 stopped for tracing
[29-Apr-2015 19:10:12] NOTICE: about to trace 21911
[29-Apr-2015 19:10:12] NOTICE: finished trace of 21911

==> php-slow.log <==

[29-Apr-2015 19:10:12] [pool www] pid 21911
script_filename = /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code//index.php
[0x0000000002172640] usleep() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/includes/lock.inc:225
[0x0000000002172198] lock_wait() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/includes/menu.inc:2732
[0x0000000002171750] menu_rebuild() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/includes/menu.inc:460
[0x0000000002171160] menu_get_item() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/includes/menu.inc:1757
[0x0000000002171090] menu_get_custom_theme() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/includes/menu.inc:1772
[0x0000000002170480] menu_set_custom_theme() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/includes/common.inc:5260
[0x000000000216fd68] _drupal_bootstrap_full() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/includes/bootstrap.inc:2287
[0x000000000216fb80] drupal_bootstrap() /srv/bindings/21e355ab740a4b7f9da5a579ef1be008/code/index.php:20

dgtlmoon’s picture

yeah so php5 is timing out after some considerable time (as it should), or nginx is dropping the connection after a considerable time (as it should).. perhaps it's getting stuck in a redirect loop of sorts? can you tail the logs whilst it's happening and see if it's repeatedly requesting something from the server whilst it seems to be 'waiting' on your end?

MichaelSt’s picture

This just came back from support at Pantheon. I'm sorry to say I barely understand any of it...

"The gist we sent you earlier (listed in #16 above) is actually from a _tail -f *_ of all the logs, as you can see before the segfault the request takes 148s before it terminates. Other than the time it takes, the traces are identical."

They suggest the following two things could help sort this out. Not exactly how to proceed with either...

"*1) Update.php is not running correctly and resulting in a timeout*

There is a good chance the database updates are not being correctly run on the site that can lead to problems. For example, this is the only site reporting a segfault and this is a separate servers and happens reliably:

bc.. $ drush updb -n
sh: line 1: 115205 Segmentation fault (core dumped) /srv/bindings/bffdcef3812f4c33af54a585e15da63a/php/php --php-ini /srv/bindings/bffdcef3812f4c33af54a585e15da63a/.drush/php.ini -d magic_quotes_gpc=Off -d magic_quotes_runtime=Off -d magic_quotes_sybase=Off /opt/drush5/drush.php --php=/srv/bindings/bffdcef3812f4c33af54a585e15da63a/php/php --php-options='--php-ini /srv/bindings/bffdcef3812f4c33af54a585e15da63a/.drush/php.ini -d magic_quotes_gpc=Off -d magic_quotes_runtime=Off -d magic_quotes_sybase=Off' --backend=2 --no --config=../drushrc.php --root=/srv/bindings/bffdcef3812f4c33af54a585e15da63a/code --uri=http://cse-cma-cms-test.gotpantheon.com cache-clear all 2>&1
Finished performing updates.

p. Getting that identified cleared up should give you better performance and results.

*2) Timeout issues*

You should look into the stack traces we sent over, they have some timeouts on menu_rebuild which is a long time problem with Drupal sites, especially if the caches or cron is not running as data can accumulate and cause bottlenecks preventing operations like cron, rules, cache clearing and updates to fail.

We see issues like this quite frequently with Drupal which is often resolved by a patch that becomes part of a dependent module or theme and everything is fine after that. In some instances a cron run that is incomplete or simple things like encoding of a variable or data if some service or hook does not handle that type of input results in the errors."

jtjones23’s picture

Version: 7.x-2.x-dev » 7.x-2.0-beta1
jtjones23’s picture

Hi,

I'm checking to see if anyone was able to solve this problem. I'm having the same exact issue happening with the 502s when trying to browse files to upload on a live Pantheon site (dev and test are working fine). Not sure if its a coincidence, but it happened around the time I installed a SSL certs too. I've run cron, Cleared Caches and run update.php with no luck.

dgtlmoon’s picture

@michaelst his issue was resolved through a different module that was causing a PHP error

dgtlmoon’s picture

Status: Active » Postponed (maintainer needs more info)

@jtjones23 can you get the PHP error log? this is the most important place to start

jtjones23’s picture

I checked the nginx-error.log, php-error.log and php-fpm-error.log files and none of them have any entries during the last week.

dgtlmoon’s picture

there has to be a reason for it, can you ask pantheon to keep an eye on it and tell you if it's a timeout? or a PHP error? in both cases it should be logged, 502 is substantial enough for a record to be made somewhere, is your watchdog logs working also?

jtjones23’s picture

Watchdog logs are also on and return no errors regarding the 502 Bad gateway.

jtjones23’s picture

Turns out I was looking at the wrong log files. Looks like the probem is actually the Media module, specifcally with the Media browser view. Still not sure why this is only happening on the live site.

nginx-error.log:

2015/06/18 23:58:49 [error] 104905#0: *3535 recv() failed (104: Connection reset by peer) while reading response header from upstream,
client: 10.223.243.68, server: , request: "GET /media/browser?render=media-popup&types%5B%5D=image&schemes%5B%5D=public&schemes%5B%5D=p
rivate&file_directory=&file_extensions=png+gif+jpg+jpeg&max_filesize=&uri_scheme=public&plugins= HTTP/1.1", upstream: "fastcgi://unix:/
srv/bindings/5a193d69d0354455903b177f96e5f4ff/run/php-fpm.sock:", host: "www.example.com", referrer: "http://www.example.com/community/example"
2015/06/18 23:59:40 [error] 104904#0: *3545 recv() failed (104: Connection reset by peer) while reading response header from upstream,
client: 10.223.224.9, server: , request: "POST /admin/structure/views/view/media_default/preview/media_browser_1/ajax HTTP/1.1", upstre
am: "fastcgi://unix:/srv/bindings/5a193d69d0354455903b177f96e5f4ff/run/php-fpm.sock:", host: "www.example.com"www.example.com
, referrer: "http://www.example.com/admin/structure/views/view/media_default/edit"
2015/06/19 00:00:09 [error] 104904#0: *3532 recv() failed (104: Connection reset by peer) while reading response header from upstream,
client: 10.223.224.9, server: , request: "POST /admin/structure/views/view/media_default/preview/media_browser_1/ajax HTTP/1.1", upstre
am: "fastcgi://unix:/srv/bindings/5a193d69d0354455903b177f96e5f4ff/run/php-fpm.sock:", host: "www.example.com"
, referrer: "http://www.example.com/admin/structure/views/view/media_default/edit"
2015/06/19 15:19:43 [error] 104903#0: *5391 recv() failed (104: Connection reset by peer) while reading response header from upstream,
client: 10.223.243.68, server: , request: "GET /media/browser?render=media-popup&types%5B%5D=image&schemes%5B%5D=public&schemes%5B%5D=p
rivate&file_directory=&file_extensions=png+gif+jpg+jpeg&max_filesize=&uri_scheme=public&plugins= HTTP/1.1", upstream: "fastcgi://unix:/
srv/bindings/5a193d69d0354455903b177f96e5f4ff/run/php-fpm.sock:", host: "www.example.com", referrer: "www.example.com"
dgtlmoon’s picture

host: "www.example.com", referrer: "www.example.com" ? that looks interesting

jtjones23’s picture

I just removed the real URL, site isn't live yet

jtjones23’s picture

Fixed! Customer was loading images larger than 2MB to the Media browser (media_default view) which was causing the 502. We have modified the the view to filter out any files that have a size larger than 2MB and the error has gone away. We have also set up the fields on our file types to limit the uploads to less than 2MB to avoid having any further issues. We will never forget this issue.

dgtlmoon’s picture

Status: Postponed (maintainer needs more info) » Closed (cannot reproduce)
jamesrward’s picture

I know this issue is closed but I had a similar issue on Acquia Cloud where the media browser was returning 50x errors and sigsegv on the php-fpm logs but only for some logged-in users. Disabling oembed in the Media Browser for the related fields solved the issue for me.

inversed’s picture

If anyone comes across this or something similar (I saw it with the Entity Browser in Drupal 8.x), my fix was to limit the number of results returned to the Media Browser when using the Views Infinite Scroll pagination.

Matt B’s picture

Getting this issue on Drupal & & nginx (not Pantheon). I've tried the update in #5, but it still timesout. Uploading large Mp4 files ;-)