Problem/Motivation

When running a command that includes a call to drush_invoke_process CPU usage becomes crazy high, and basically consumes an entire CPU core until the command completes.

Run a simple drush make command and watch CPU usage for a simple demonstration, or set up a more artificial, but more useful, test as follows:

I've created a few Drush commands that show this issue very clearly:

Install my sandbox commands: http://drupal.org/sandbox/darthsteven/1553722

This will give you four new commands for Drush 5:

  • drush-process-test-sleep - sleeps for the given amount of time in seconds, using PHP's sleep function..
  • drush-process-test-sleep-wrapper - calls the above command with drush_invoke_process
  • drush-process-test-pi - computes pi using a given number of iterations, CPU heavy Drush command!
  • drush-process-test-pi-wrapper - calls the above command with drush_invoke_process

So, some empirical testing, on a mid 2011 macbook pro with 2.7Ghz Core i7:

$ time drush drush-process-test-sleep 15                             
drush drush-process-test-sleep 15  0.18s user 0.11s system 1% cpu 15.558 total

This is what we expect, Drush basically consumes no real resources, and takes 15 seconds to run.

So now, the wrapper Drush command:

$ time drush drush-process-test-sleep-wrapper 15
drush drush-process-test-sleep-wrapper 15  14.85s user 0.88s system 100% cpu 15.606 total

So we took 1/10 second longer to complete, but averaged 100% CPU (out of 400%). Ouch.

With the CPU intensive task:

$ time drush drush-process-test-pi 200          
drush drush-process-test-pi 200  14.50s user 0.10s system 99% cpu 14.660 total

and then that command, wrapped:

$ time drush drush-process-test-pi-wrapper 200
drush drush-process-test-pi-wrapper 200  33.11s user 0.93s system 195% cpu 17.388 total

So we actually took three seconds longer (wall time), just by being in a subcommand, and actually used twice as much CPU time!

We're saved here, because I'm running these tests on a multi-core machine, if one ran them in a VM with a single core say, I'd expect them to be awful. Either way, what we see here, is that the parent process is stealing resources from the children. By default, Drush will invoke four children processes to get through the invoked commands, with means that for my system for example, there will be more threads than CPU cores, and one of those cores with be being maxed out at 100% needlessly.

So, why is this happening?

_drush_backend_proc_open is the key function here, it is responsible for opening the child processes and polling them for changes, to see if they have written output data, or indeed, completed. It does the polling in a loop, but that loop just executes as often as possible. This does make the parent process as responsive as possible, but doesn't really reflect what the child process is doing. If the child process is going to sit and output nothing for 15 seconds, the parent doesn't need to check for output thousands of times a second, people aren't going to notice it it takes 15 seconds or 15.1 seconds.

Proposed resolution

Use the PHP stream_select function to block until there is data to be read, this won't work on Windows, so as a workaround introduce the notion of 'nap time' to _drush_backend_proc_open which will mean that it won't poll the sub-processes as fast as possible, but will sensibly back off its polling when it hasn't received any updates from them in its run loop.

Basically at the start of the function, we set $nap_time to 0, then at the start of each iteration of the run loop we increase this by 1. During the course of the loop, if there is any output from any of the children, or if one of them finishes etc. then we reset $nap_time back to 0. At the end of the run loop we sleep for a period of time determined by the $nap_time variable. Slightly arbitrarily, I've suggested we sleep for increments of half a millisecond, up to a maximum of 1/10 of a second. So, basically, if you have lots of child processes that exit really quickly, then it should still be responsive, but if you have a process that doesn't really do much for a long time, it should get to polling it every 0.1 seconds relatively quickly. On Unix like systems it should basically poll the streams as often as it needs to automatically, by using stream_select.

Testing of the proposed fix in #7

So, using the same tests as above, here's how the timings look:

$ time drush drush-process-test-sleep 15        
drush drush-process-test-sleep 15  0.17s user 0.04s system 1% cpu 15.295 total

And now the wrapper:

$ time drush drush-process-test-sleep-wrapper 15
drush drush-process-test-sleep-wrapper 15  0.41s user 0.08s system 3% cpu 15.648 total

And the pi test:

$ time drush drush-process-test-pi 200          
drush drush-process-test-pi 200  14.66s user 0.07s system 98% cpu 14.928 total

And the wrapper of that:

$ time drush drush-process-test-pi-wrapper 200  
drush drush-process-test-pi-wrapper 200  14.65s user 0.10s system 98% cpu 15.010 total

So you can see that the wrapper hasn't added a crazy amount of overhead this time.

Remaining tasks

None.

User interface changes

In rare circumstances Drush commands may execute more slowly, but in general they should be much, much faster.

API changes

None!

Comments

steven jones’s picture

Status: Active » Needs review
StatusFileSize
new2.47 KB

Here's the patch.

steven jones’s picture

Title: Decrease the carbon footprint of drush_invoke_process » drush_invoke_process uses 100% CPU

Here's a test with make:

before:

time drush make ~/some-makefile.make --test --md5
Build hash: 304f9bd28015ffca4cb21174ca39e7d0
drush make ~/some-makefile.make --test --md5  100.42s user 21.06s system 183% cpu 1:06.21 total

after:

time drush make ~/some-makefile.make --test --md5
Build hash: 304f9bd28015ffca4cb21174ca39e7d0
drush make ~/some-makefile.make --test --md5  38.64s user 17.18s system 86% cpu 1:04.91 total
anarcat’s picture

This patch looks sound and I think it should be committed as is as a workaround for a quite faulty behavior.

However, I believe the proper way of doing this is through the select() system call, which seems to be exploded in zillions of small different functions in PHP. The one appropriate here *may* be stream_select(), but I am uncertain.

Because this may require a rearchitecture of that loop, I suggest that Steven's patch be committed as is an we keep this issue opened until we learn about using select() properly. :)

greg.1.anderson’s picture

Status: Needs review » Needs work

What we need is a mechanism to block until any one of N streams has data available on it. stream_select() is a way to block until any one of N streams has data available on it. I think we could drop in a stream_select-based solution without rearchitecting the loop at all.

I do feel bad that this was missed before Drush 5 stable came out, but I think that we should work out the best solution, and not commit a workaround. I appreciate the analysis done on this issue so far.

greg.1.anderson’s picture

I see that the documentation on stream_select says:

Note:

Windows compatibility: stream_select() used on a pipe returned from proc_open() may cause data loss under Windows 98.
Use of stream_select() on file descriptors returned by proc_open() will fail and return FALSE under Windows.

Perhaps we need to institute 'nap time' as a fallback strategy for Windows compatibility.

greg.1.anderson’s picture

Status: Needs work » Needs review
StatusFileSize
new3.71 KB

I cannot adequately test now, but here is a patch with stream_select and $nap_time working happily together. Backend tests pass, but I did no test Windows, confirm that stream_select is actually sleeping, or take any performance measurements.

steven jones’s picture

StatusFileSize
new3.46 KB

Tested this, there was a warning about not actually specifying any streams when calling stream_select, so added a test for that. Also your call to stream_select had a timeout of 200,000 seconds, not 2!
Also tested on Windows, and found that the actual call to stream_select took a long time to complete, even though it doesn't work! So although OS sniffing is really nasty, and we really should be able to feature-detect, in this case the performance suffered badly, so have added a check for not Windows.
Other than that, the tests look great, seems like stream_select does exactly what we want. I'll update the issue summary in a bit with some results.

greg.1.anderson’s picture

Thanks for fixing this up some more, and for catching my tv_sec / tv_usec gaffe. The code looks good, and I agree with the improvements related to OS-checking and error checking.

This block of code that I put in is actually unnecessary:

+        // If stream_select returns a error, then fallback to using $nap_time.
+        if ($ss_result !== FALSE) {
+          $nap_time = 0;
+        }

Any circumstance under which we would set $nap_time to zero here, we will also set it to zero when we handle the available input in the code that follows. It's not likely to hurt anything to leave it in; I guess I'm indifferent about it.

steven jones’s picture

It will be used if there's an error using steam_select, which is probably a good fallback, but yes, we could probably get rid of it.

What's stopping us getting this RTBC'd? What else needs reviewing?

greg.1.anderson’s picture

In #7 you offered to post some results.

greg.1.anderson’s picture

Issue summary: View changes

Updated issue summary.

steven jones’s picture

Ah yes! Updated the issue summary accordingly.

moshe weitzman’s picture

Assigned: Unassigned » greg.1.anderson
greg.1.anderson’s picture

Status: Needs review » Fixed

Committed; thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.