Okay, this is a hairy one, and hard to reproduce, but I was able to do it while trying to reproduce #993944: ssl rollback failure.

I have now the following sites:

aegir@marcos:~/config/server_master/apache/vhost.d$ grep '<VirtualHost' * | grep 443
test2.orangeseeds.org:  <VirtualHost 192.168.0.3:443>
test3.orangeseeds.org:  <VirtualHost 192.168.0.3:443>
test4.orangeseeds.org:  <VirtualHost 127.0.0.2:443>
test5.orangeseeds.org:  <VirtualHost 127.0.0.3:443>

test2 and test3 were created in the same queue dispatch, and they have the same IP. That IP is listed only *once* in the server though, so that shouldn't be possible. Here are the certificates used:

test2.orangeseeds.org:    SSLCertificateFile /var/aegir/config/server_master/ssl.d/orangeseeds.org/openssl.crt
test3.orangeseeds.org:    SSLCertificateFile /var/aegir/config/server_master/ssl.d/test3.orangeseeds.org/openssl.crt
test4.orangeseeds.org:    SSLCertificateFile /var/aegir/config/server_master/ssl.d/test4.orangeseeds.org/openssl.crt
test5.orangeseeds.org:    SSLCertificateFile /var/aegir/config/server_master/ssl.d/test5.orangeseeds.org/openssl.crt

The frontend agrees that both sites have the same IP. To quote the issue i'm from:

The other problem is that the way the IP is taken is non-atomic - we glob for the pattern (using get_certificate_ip()) and then if it's "okay", we touch a file (in get_ip_certificate(), to avoid any confusion ;). Those two functions should at least be merged so that the lock file would be created in exclusive mode (which fails if the file exists).

So this is really about making that IP allocation atomic, I believe.

(What's odd about all this is that I don't think we are still running the queue in parallel - are we?)

Comments

omega8cc’s picture

Side note about the tasks queue - I have seen even two duplicate Import tasks created after running Clone - and it created duplicate sites nodes, but it was on VMWare based VPS and I was not able to reproduce it - yet, it was possible for some reason.

anarcat’s picture

Okay, this is just too hairy for me to take care of this now - it would require rewriting a bunch of functions to no end, and I'm *not* going to that *again* in the middle of the freeze.

So I think we'll go with the refactoring described in #1126640: move the SSL IP allocation to the frontend instead - unless somebody comes up with an elegant solution here.

(And we *do* process the queue in parallel thank you very much race conditions!)

pearcec’s picture

sub

pearcec’s picture

I think I ran into this problem. But I am not 100% sure. I couldn't find any descent documentation that explains how to associate an IP address with a certificate. Does anyone have a link or could describe how to do it? Or point me at a file with the code that I could read through.

anarcat’s picture

This is done in the backend. In the original post, I link to two functions that do most of that work. See also the comments in #993944: ssl rollback failure.

blueprint’s picture

I'm for #2 and am in fact working on it. We (my company I mean) need to have assignment of the ip possible when creating the site and as such are moving the ip address assignment to the site context.

anarcat’s picture

Note that care has been taken to make sure ip allocation is atomic in #1126640: move the SSL IP allocation to the frontend.

ergonlogic’s picture

Status: Active » Fixed

In Provision commits f0980e0..9b86038, we fixed a number of SSL bugs. Among them was the deletion of SSL dirs, and incorrect IP addresses being pulled into vhost config files, resulting in the symptoms described above. There was also an IP allocation issue in the front end, documented in: #2023621: Unable to allocate IP address for certificate, disabling SSL.

Please test against the latest 6.x-2.x, and re-open if the problems remain.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.