Problem
Every minute /var/xdrago/second.sh
is run by cron and it contains loops through itself every 10 seconds and checks the load levels and if kills tasks is set thresholds are breached, the default settings result in a load of 18.88 cause php and drush tasks to be killed and a load of 14.44 causes the web server to be killed (this is my and chrisc's reading of the script)
Since these values are based around a server with 4 CPU cores, the default load limits do not work at all well on servers with >4 CPUs.
According to the UNIX-style load calculations for page at Wikipedia, these values are per CPU, so 14.44 represents a load of 14.44 / 4 = 3.61 before the server is killed assuming 4 cores.
However, on a server with, say, 14 cores, the calculation does not work so well: 14.44 / 14 = 1.03. This means even a light load can kill PHP, Nginx services, or Drush tasks.
Example
The BOA virtual server running www.transitionnetwork.org has 14 CPUS, and we had to establish the best values for us by a process of elimination. This clearly also means hacking second.sh with our preferred values -- something that we have to do after every BOA update.
Background issue: https://tech.transitionnetwork.org/trac/ticket/555
FYI our current values that work for the server with 14 cores are:
CTL_ONEX_SPIDER_LOAD=2716
CTL_FIVX_SPIDER_LOAD=2716
CTL_ONEX_LOAD=10108
CTL_FIVX_LOAD=6216
CTL_ONEX_LOAD_CRIT=13216
CTL_FIVX_LOAD_CRIT=10885
Proposed solution
There are two parts/alternatives for the proposed fix:
- Allow hard-coded variables to be overridden by those taken from /root/.barracuda.cnf
- AND/OR do some calculations based on the number of CPU cores during second.sh runtime, or during BOA updates to set the values in second.sh.
Allowing overrides (1, above) would mean second.sh was altered like this:
...
load_limits()
{
if [ -e " /root/.barracuda-overrides.cnf" ] ; then
source /root/.barracuda-overrides.cnf
else
CTL_ONEX_SPIDER_LOAD=388
CTL_FIVX_SPIDER_LOAD=388
CTL_ONEX_LOAD=1444
CTL_FIVX_LOAD=888
CTL_ONEX_LOAD_CRIT=1888
CTL_FIVX_LOAD_CRIT=1555
fi
}
...
... near bottom ...
load_limits
control
sleep 10
... rest of file ...
...
And doing a calculation (2) would require the CTL_*_LOAD
variables to be calculated by using multipliers for CPUS, e.g.:
CTL_ONEX_SPIDER_LOAD=_CPU_CORES * 0.9
CTL_FIVX_SPIDER_LOAD=_CPU_CORES * 0.9
CTL_ONEX_LOAD=_CPU_CORES * 3.6
CTL_FIVX_LOAD=_CPU_CORES * 2.2
CTL_ONEX_LOAD_CRIT=_CPU_CORES * 4.8
CTL_FIVX_LOAD_CRIT=_CPU_CORES * 4
... Where _CPU_CORES
is set using your preferred method of getting cores count.
Comments
Comment #1
Jim Kirkpatrick CreditAttribution: Jim Kirkpatrick commentedComment #2
omega8cc CreditAttribution: omega8cc commentedThis commit should do the trick, I think: http://drupalcode.org/project/barracuda.git/commit/5c9e954
Thanks for bringing this to our attention.
Comment #3
Jim Kirkpatrick CreditAttribution: Jim Kirkpatrick commentedLovely, thanks!