I woke up this moring to a server down, and nginx won't restart. The nginx include.conf files are becoming corrupted - merging IPv6 and 4 addresses and not allowing IPv6 letters and : thus nginx fails configtest and doesn't restart after latest overnight cron run (this is after I ran fix-it-hot-upgrade script 24 hours ago).

Command line errors:

server1:~# /etc/init.d/php53-fpm restart
Gracefully shutting down php53-fpm... done
Starting php53-fpm.. done
server1:~# service nginx status
nginx is NOT running.
server1:~# service nginx status
nginx is NOT running.
server1:~# service nginx start
[....] Starting Nginx Server...:nginx: [emerg] invalid parameter "20014102122.75
7.19.108" in /data/disk/fast1/config/includes/nginx_modern_include.conf:124
. ok
server1:~# service nginx status
nginx is NOT running.
server1:~# service nginx restart
[FAIL] Stopping Nginx Server...: failed!
[....] Starting Nginx Server...:nginx: [emerg] invalid parameter "20014102122.75
7.19.108" in /data/disk/fast1/config/includes/nginx_modern_include.conf:124
. ok
server1:~#

The "20014102122.757.19.108" in the above error is the servers correct IPv6 address (but just the numbers, not the letters or the :'s which are left out) run into the correct IPv4 address. I have changed the addresses for privacy.

This is on a debian wheezy 7 dedicated with stable boa 2.1.3 with the fix-it-hot update script being run on nov 28.

.cnf files and logs are forthcoming. I am testing the fix.

CommentFileSizeAuthor
#16 local.IP-list.txt72 bytesAnonymous (not verified)
#13 fast3.octopus-cnf.txt1.01 KBAnonymous (not verified)
#13 barracuda-cnf.txt1.79 KBAnonymous (not verified)
#13 octopus_log.txt110 bytesAnonymous (not verified)
#13 barracuda_log.txt1.84 KBAnonymous (not verified)
#6 terminal-output-anon.txt2.69 KBAnonymous (not verified)
#3 barracuda-cnf.txt1.79 KBAnonymous (not verified)
#3 barracuda_log.txt1.64 KBAnonymous (not verified)

Comments

Anonymous’s picture

I replaced the "IPv6-without-letters-and-:-plus-IPv4" with the complete IPv6 address only, in /data/disk/o1/config/includes/nginx_modern_include.conf in 2 places as well as the same 2 places in /var/aegir/config/includes/nginx_modern_include.conf and nginx passes the config test and restarts fine.

I tried to include both IPv4 and IPv6 addresses in those files in the allow directive, but I couldn't get the syntax right if 2 allow directives are even allowed in that file. Would an array work within the outer array?

Anonymous’s picture

I imagine that this fix must be made to all the other nginx .conf files in both the o1 and the /var/aegir locations, unless you are going to put the fix in the hot-fix script?

Anonymous’s picture

StatusFileSize
new1.64 KB
new1.79 KB
Anonymous’s picture

This problem MUST have been caused by the hot-fix upgrade script, since the nightly cron run was the first time nginx was (attempted) restarted since I ran the hot-fix update script yesterday. I have confirmed this problem on a second server - the IPv6 address's numbers ONly run straight into the IPv4 address, and nginx won't be able to restart. I believe the date of the hot-fix update script I ran was nov 28 or 29, even though I ran it on nov 30. I'll manually correct the nginx_include.conf files in both locations for now, and wait to hear from you if a new version of the hot-fix script needs to be run or not.

omega8cc’s picture

Title: nginx include.conf files becoming corrupted stopping nginx » boa-fix-upgrade script is not compatible with IPv6 and breaks nginx config
Status: Active » Postponed (maintainer needs more info)

Please post anonymized output from commands:

$ hostname

$ hostname -i

$ hostname -I

$ cat /etc/hosts

$ ifconfig

$ ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'

$ echo $(getent ahostsv4 `hostname`) | cut -d: -f2 | awk '{ print $1}' > /tmp/local-ip
$ cat /tmp/local-ip
$ sipcalc $(cat /tmp/local-ip)

Anonymous’s picture

StatusFileSize
new2.69 KB
Anonymous’s picture

I have posted the output - anonymized, from the above commands, and I used 33.333.33.333 as the correct IPv4 address

omega8cc’s picture

Version: 6.x-2.0-rc13 » 6.x-2.x-dev
Category: Bug report » Feature request
Priority: Critical » Normal
Status: Postponed (maintainer needs more info) » Active

Thanks, it is very helpful. This has been fixed in the boa-fix-upgrade script (but it will not fix already broken include files), but we still need to investigate IPv6 (in)compatibility in BOA to avoid similar issues in general.

Anonymous’s picture

I fixed my octopus by adding ONLY the full IPv6 IP address to the include files, and it seems to work fine. I have just seen your latest commit (Use IPv4-strict hostname and IP checks only.) and I want to make sure I am not setting myself up for trouble by putting the IPv6 IP address in those include files. Could this be a potential problem with a future boa up-stable? Do you recommend only listing the IPv4 address in those include files? Thanks for the clarification.

Anonymous’s picture

I woke up today, Sunday to find the server down, no sites, no ssh, nginx hadn't re-started at some point on the early Sunday AM (Saturday night cron run?) Is there something special that happens once a week at this time?

I looked at the same config include files and found that the "allow" had been changed to 127.0.0.1: (around line 128 for advanced and modern_include):

at /data/disk/fast1/config/includes/nginx_modern_include.conf

###
### Allow local access to support wget method in Aegir settings
### for running sites cron.
###
location = /cron.php {
tcp_nopush off;
keepalive_requests 0;
access_log off;
allow 127.0.0.1;
deny all;
try_files $uri =404;
fastcgi_pass 127.0.0.1:9090;
}

###
### Allow local access to support wget method in Aegir settings
### for running sites cron in Drupal 8.
###
location = /core/cron.php {
tcp_nopush off;
keepalive_requests 0;
access_log off;
allow 127.0.0.1;
deny all;
try_files $uri =404;
fastcgi_pass 127.0.0.1:9090;

Is this how it is suppossed to be, or should the server's IPv4 address be after allow in those two places?

I have also seen that the similarily named files in /var/aegir/config/includes/
( such as nginx_modern_include.conf) also have the localhost IP 127.0.0.1 instead of the IPv4 - can you confirm that this is intended, please?

Thank you for your response.

Anonymous’s picture

Category: Feature request » Bug report
Priority: Normal » Critical
Anonymous’s picture

I have not run the fix-upgrade-script again. I did run a boa up-stable to install a second octopus instance, but that was 4 or 5 days ago, and the server only was down this morning, Sunday morning.

Anonymous’s picture

StatusFileSize
new1.84 KB
new110 bytes
new1.79 KB
new1.01 KB
omega8cc’s picture

Component: Nginx Server » Miscellaneous
Category: Bug report » Support request
Priority: Critical » Normal

1. This is a support request and not a bug report.
2. BOA is not designed to work with IPv6 (yet)
3. You shouldn't ever hardcode hostname/IP pairs in the /etc/hosts (remove it from this file)
4. When you started editing config files manually, you are on your own and don't expect any support from us.

We are trying to make BOA so it will ignore incompatible system settings, but it *doesn't* support IPv6 at all (yet).

We can't reproduce this issue anywhere and no, there is nothing on the BOA system side which could break working config at Sunday.

It should be safe to run barracuda and octopus upgrade to stable and then run the fix scripts.

omega8cc’s picture

Please attach the contents of your /root/.local.IP.list file.

Anonymous’s picture

StatusFileSize
new72 bytes
Anonymous’s picture

You never answered my # 9, and it was your bad script that got me into this mess in the first place. So, I have done everything you have asked me to help you improve your script(s). WOuld you please answer this for me:

Should the config include files in either

a) o1/config/includes, or

b) var/aegir/config/includes

have 127.0.0.1 for the "allow" for location = /cron.php and location = /core/cron.php ? If so, which files (the ones in var/aegir ... only or all of them including those in the o1 /config/includes location)?

- or are all of those values (in BOTH the o1 and the var locations suppossed to be the iPv4 address. Please answer this. Thank you

omega8cc’s picture

I think I have answered #9 in #14 and it should be pretty obvious that once you start messing with files like this, anything can happen.

You should run:

barracuda up-stable
octopus up-stable all aegir

and then:

cd;rm -f boa-fix-upgrade.sh.txt*
wget -q -U iCab http://files.aegir.cc/update/boa-fix-upgrade.sh.txt
bash boa-fix-upgrade.sh.txt

That should be it.

Don't mess with config by hand, ever.

omega8cc’s picture

We do appreciate your feedback and willingness to help us improve BOA. Just make sure to follow the hints and don't try to experiment with configs you don't understand, or you will create too much confusion otherwise, so nobody will be able to help further. We don't want to see this happen, obviously, so we may sometimes sound a bit harsh when referencing best practices you should follow. It is nothing personal, of course.

Anonymous’s picture

OK, I followed your instructions and there were two errors -
After "syncing provision backend db_passwd" :
ERROR 1045 (28000) Access denied for user 'root'@'localhost' (using password = yes), and

and after "var/aegir/.drush/provision/platform/migrate.provision.inc":
ERROR 1045 (28000) Access denied for user 'aegir_root'@'localhost' (using password = yes),

This happened during the Aegir Master Instance Upgrade during the barracuda up-stable , and it happened again for 'root' and for 'o1' MySQL user during the octopus up-stable all aegir.

I then ran the boa-fix-upgrade.sh script and since then notice that in the optopus o1 instance "localhost" did not verify, and I cannot delete a few clone platforms (unable to connect with db errors). Because I forced the MySQL rebuild during the barracuda up-stable, I am going to run it all again, to see if that solves this access denied problem. I use strong passwords.

I received the exact same errors during the second up-stable. Do you recommend using the sync passwords script that I had to use once before?

omega8cc’s picture

You have totally broken system and it is not related to boa-fix-upgrade.sh.txt script at all. Honestly, I have no idea what happened there.

You have to reset mysql root password first to match the value in /root/.my.cnf and once mysql root access works again, reset passwords for aegir users with:

syncpass fix aegir
syncpass fix o1

You should be able to find the how-to on mysql root password reset in the queue.

Anonymous’s picture

I know how to deal with MySQL, this whole problem began because I didn't know that the upgrade script would not work if I added the servers IPv6 address to /etc/hosts - that was the problem, it seems. Thank you for the syncpass commands - that did the trick.

omega8cc’s picture

Yeah, we have improved the boa-fix-upgrade.sh.txt script so it now properly deals with IPv6 and IPv4, even if IPv6 is not yet supported in the Aegir variant shipped with BOA (and hence BOA itself), so at least it can properly manage the allow IPs for /cron.php access on all systems, even if you have IPv6 active.

omega8cc’s picture

Status: Active » Closed (cannot reproduce)

Feel free to re-open if there is any follow up on this issue.