I woke up this moring to a server down, and nginx won't restart. The nginx include.conf files are becoming corrupted - merging IPv6 and 4 addresses and not allowing IPv6 letters and : thus nginx fails configtest and doesn't restart after latest overnight cron run (this is after I ran fix-it-hot-upgrade script 24 hours ago).
Command line errors:
server1:~# /etc/init.d/php53-fpm restart
Gracefully shutting down php53-fpm... done
Starting php53-fpm.. done
server1:~# service nginx status
nginx is NOT running.
server1:~# service nginx status
nginx is NOT running.
server1:~# service nginx start
[....] Starting Nginx Server...:nginx: [emerg] invalid parameter "20014102122.75
7.19.108" in /data/disk/fast1/config/includes/nginx_modern_include.conf:124
. ok
server1:~# service nginx status
nginx is NOT running.
server1:~# service nginx restart
[FAIL] Stopping Nginx Server...: failed!
[....] Starting Nginx Server...:nginx: [emerg] invalid parameter "20014102122.75
7.19.108" in /data/disk/fast1/config/includes/nginx_modern_include.conf:124
. ok
server1:~#
The "20014102122.757.19.108" in the above error is the servers correct IPv6 address (but just the numbers, not the letters or the :'s which are left out) run into the correct IPv4 address. I have changed the addresses for privacy.
This is on a debian wheezy 7 dedicated with stable boa 2.1.3 with the fix-it-hot update script being run on nov 28.
.cnf files and logs are forthcoming. I am testing the fix.
| Comment | File | Size | Author |
|---|---|---|---|
| #16 | local.IP-list.txt | 72 bytes | Anonymous (not verified) |
| #13 | fast3.octopus-cnf.txt | 1.01 KB | Anonymous (not verified) |
| #13 | barracuda-cnf.txt | 1.79 KB | Anonymous (not verified) |
| #13 | octopus_log.txt | 110 bytes | Anonymous (not verified) |
| #13 | barracuda_log.txt | 1.84 KB | Anonymous (not verified) |
Comments
Comment #1
Anonymous (not verified) commentedI replaced the "IPv6-without-letters-and-:-plus-IPv4" with the complete IPv6 address only, in /data/disk/o1/config/includes/nginx_modern_include.conf in 2 places as well as the same 2 places in /var/aegir/config/includes/nginx_modern_include.conf and nginx passes the config test and restarts fine.
I tried to include both IPv4 and IPv6 addresses in those files in the allow directive, but I couldn't get the syntax right if 2 allow directives are even allowed in that file. Would an array work within the outer array?
Comment #2
Anonymous (not verified) commentedI imagine that this fix must be made to all the other nginx .conf files in both the o1 and the /var/aegir locations, unless you are going to put the fix in the hot-fix script?
Comment #3
Anonymous (not verified) commentedComment #4
Anonymous (not verified) commentedThis problem MUST have been caused by the hot-fix upgrade script, since the nightly cron run was the first time nginx was (attempted) restarted since I ran the hot-fix update script yesterday. I have confirmed this problem on a second server - the IPv6 address's numbers ONly run straight into the IPv4 address, and nginx won't be able to restart. I believe the date of the hot-fix update script I ran was nov 28 or 29, even though I ran it on nov 30. I'll manually correct the nginx_include.conf files in both locations for now, and wait to hear from you if a new version of the hot-fix script needs to be run or not.
Comment #5
omega8cc commentedPlease post anonymized output from commands:
$ hostname$ hostname -i$ hostname -I$ cat /etc/hosts$ ifconfig$ ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'$ echo $(getent ahostsv4 `hostname`) | cut -d: -f2 | awk '{ print $1}' > /tmp/local-ip$ cat /tmp/local-ip$ sipcalc $(cat /tmp/local-ip)Comment #6
Anonymous (not verified) commentedComment #7
Anonymous (not verified) commentedI have posted the output - anonymized, from the above commands, and I used 33.333.33.333 as the correct IPv4 address
Comment #8
omega8cc commentedThanks, it is very helpful. This has been fixed in the boa-fix-upgrade script (but it will not fix already broken include files), but we still need to investigate IPv6 (in)compatibility in BOA to avoid similar issues in general.
Comment #9
Anonymous (not verified) commentedI fixed my octopus by adding ONLY the full IPv6 IP address to the include files, and it seems to work fine. I have just seen your latest commit (Use IPv4-strict hostname and IP checks only.) and I want to make sure I am not setting myself up for trouble by putting the IPv6 IP address in those include files. Could this be a potential problem with a future boa up-stable? Do you recommend only listing the IPv4 address in those include files? Thanks for the clarification.
Comment #10
Anonymous (not verified) commentedI woke up today, Sunday to find the server down, no sites, no ssh, nginx hadn't re-started at some point on the early Sunday AM (Saturday night cron run?) Is there something special that happens once a week at this time?
I looked at the same config include files and found that the "allow" had been changed to 127.0.0.1: (around line 128 for advanced and modern_include):
at /data/disk/fast1/config/includes/nginx_modern_include.conf
###
### Allow local access to support wget method in Aegir settings
### for running sites cron.
###
location = /cron.php {
tcp_nopush off;
keepalive_requests 0;
access_log off;
allow 127.0.0.1;
deny all;
try_files $uri =404;
fastcgi_pass 127.0.0.1:9090;
}
###
### Allow local access to support wget method in Aegir settings
### for running sites cron in Drupal 8.
###
location = /core/cron.php {
tcp_nopush off;
keepalive_requests 0;
access_log off;
allow 127.0.0.1;
deny all;
try_files $uri =404;
fastcgi_pass 127.0.0.1:9090;
Is this how it is suppossed to be, or should the server's IPv4 address be after allow in those two places?
I have also seen that the similarily named files in /var/aegir/config/includes/
( such as nginx_modern_include.conf) also have the localhost IP 127.0.0.1 instead of the IPv4 - can you confirm that this is intended, please?
Thank you for your response.
Comment #11
Anonymous (not verified) commentedComment #12
Anonymous (not verified) commentedI have not run the fix-upgrade-script again. I did run a boa up-stable to install a second octopus instance, but that was 4 or 5 days ago, and the server only was down this morning, Sunday morning.
Comment #13
Anonymous (not verified) commentedComment #14
omega8cc commented1. This is a support request and not a bug report.
2. BOA is not designed to work with IPv6 (yet)
3. You shouldn't ever hardcode hostname/IP pairs in the /etc/hosts (remove it from this file)
4. When you started editing config files manually, you are on your own and don't expect any support from us.
We are trying to make BOA so it will ignore incompatible system settings, but it *doesn't* support IPv6 at all (yet).
We can't reproduce this issue anywhere and no, there is nothing on the BOA system side which could break working config at Sunday.
It should be safe to run barracuda and octopus upgrade to stable and then run the fix scripts.
Comment #15
omega8cc commentedPlease attach the contents of your
/root/.local.IP.listfile.Comment #16
Anonymous (not verified) commentedComment #17
Anonymous (not verified) commentedYou never answered my # 9, and it was your bad script that got me into this mess in the first place. So, I have done everything you have asked me to help you improve your script(s). WOuld you please answer this for me:
Should the config include files in either
a) o1/config/includes, or
b) var/aegir/config/includes
have 127.0.0.1 for the "allow" for location = /cron.php and location = /core/cron.php ? If so, which files (the ones in var/aegir ... only or all of them including those in the o1 /config/includes location)?
- or are all of those values (in BOTH the o1 and the var locations suppossed to be the iPv4 address. Please answer this. Thank you
Comment #18
omega8cc commentedI think I have answered #9 in #14 and it should be pretty obvious that once you start messing with files like this, anything can happen.
You should run:
barracuda up-stableoctopus up-stable all aegirand then:
cd;rm -f boa-fix-upgrade.sh.txt*wget -q -U iCab http://files.aegir.cc/update/boa-fix-upgrade.sh.txtbash boa-fix-upgrade.sh.txtThat should be it.
Don't mess with config by hand, ever.
Comment #19
omega8cc commentedWe do appreciate your feedback and willingness to help us improve BOA. Just make sure to follow the hints and don't try to experiment with configs you don't understand, or you will create too much confusion otherwise, so nobody will be able to help further. We don't want to see this happen, obviously, so we may sometimes sound a bit harsh when referencing best practices you should follow. It is nothing personal, of course.
Comment #20
Anonymous (not verified) commentedOK, I followed your instructions and there were two errors -
After "syncing provision backend db_passwd" :
ERROR 1045 (28000) Access denied for user 'root'@'localhost' (using password = yes), and
and after "var/aegir/.drush/provision/platform/migrate.provision.inc":
ERROR 1045 (28000) Access denied for user 'aegir_root'@'localhost' (using password = yes),
This happened during the Aegir Master Instance Upgrade during the barracuda up-stable , and it happened again for 'root' and for 'o1' MySQL user during the octopus up-stable all aegir.
I then ran the boa-fix-upgrade.sh script and since then notice that in the optopus o1 instance "localhost" did not verify, and I cannot delete a few clone platforms (unable to connect with db errors). Because I forced the MySQL rebuild during the barracuda up-stable, I am going to run it all again, to see if that solves this access denied problem. I use strong passwords.
I received the exact same errors during the second up-stable. Do you recommend using the sync passwords script that I had to use once before?
Comment #21
omega8cc commentedYou have totally broken system and it is not related to boa-fix-upgrade.sh.txt script at all. Honestly, I have no idea what happened there.
You have to reset mysql root password first to match the value in
/root/.my.cnfand once mysql root access works again, reset passwords for aegir users with:syncpass fix aegirsyncpass fix o1You should be able to find the how-to on mysql root password reset in the queue.
Comment #22
Anonymous (not verified) commentedI know how to deal with MySQL, this whole problem began because I didn't know that the upgrade script would not work if I added the servers IPv6 address to /etc/hosts - that was the problem, it seems. Thank you for the syncpass commands - that did the trick.
Comment #23
omega8cc commentedYeah, we have improved the boa-fix-upgrade.sh.txt script so it now properly deals with IPv6 and IPv4, even if IPv6 is not yet supported in the Aegir variant shipped with BOA (and hence BOA itself), so at least it can properly manage the allow IPs for /cron.php access on all systems, even if you have IPv6 active.
Comment #24
omega8cc commentedFeel free to re-open if there is any follow up on this issue.