When saving ("Create Server") a cluster, the web response is database password is invalid, even though the database (mysql) option was not selected.
Steps to reproduce:
1) enable "Web Clusters" in /admin/hosting/features
2) create new server (type cluster) at /node/add/server
3) Fill out "Create server" page:
3a) Server hostname: webcluster.blah.local
3b) Ip addresses: 203.1.2.3
3c) Database: None
3d) Web: cluster
3e) Servers: web01.blah.local, web02.blah.local
4) Save
Response page:
1) error: "The specified passwords do not match."
2) database options is now shown (port, username, password) even though "None" is selected.
3) web options is now shown (port, restart command, port, restart command, ssl port), even though "cluster" is selected.
Not sure if this is related:
---
Log message
Task starts processing
Running: /usr/share/drush/drush.php provision-save '@server_webCLUSTERlocal' --backend
The command could not be executed successfully (returned: Segmentation fault , code: 139)
Running: /usr/share/drush/drush.php @server_webCLUSTERlocal provision-verify --backend
Drush bootstrap phase : _drush_bootstrap_drush()
Load alias @server_webCLUSTERlocal
Found command: provision-verify (commandfile=provision)
Initializing drush commandfile: drush_make
Initializing drush commandfile: drush_make_d_o
Initializing drush commandfile: provision
Load alias @server_master
Loading apache driver for the http service
Including /var/aegir/.drush/provision/dns/verify.provision.inc
Including /var/aegir/.drush/provision/platform/backupmigrate/verify.provision.inc
Including /var/aegir/.drush/provision/platform/verify.provision.inc
Provision configuration root path /var/aegir/config exists.
Provision configuration root ownership of /var/aegir/config has been changed to aegir.
Provision configuration root permissions of /var/aegir/config have been changed to 711.
Provision configuration root path /var/aegir/config is writable.
Provision configuration path /var/aegir/config/server_webCLUSTERlocal exists.
Provision configuration ownership of /var/aegir/config/server_webCLUSTERlocal has been changed to aegir.
Provision configuration permissions of /var/aegir/config/server_webCLUSTERlocal have been changed to 711.
Provision configuration path /var/aegir/config/server_webCLUSTERlocal is writable.
Command dispatch complete
Peak memory usage was 7.96 MB
An error occurred at function : drush_hosting_task
Command dispatch complete
Peak memory usage was 21.21 MB
---
| Comment | File | Size | Author |
|---|---|---|---|
| #32 | 1016890-provision-cluster-save-32.patch | 651 bytes | joestewart |
| #31 | 1016890-provision-cluster-save-31.patch | 518 bytes | joestewart |
| #28 | 1016890-web_servers-28.patch | 505 bytes | joestewart |
| #25 | 1016890_cluster_debug.patch | 1.38 KB | anarcat |
| #24 | 1016890_cluster_debug.patch | 1.09 KB | anarcat |
Comments
Comment #1
Anonymous (not verified) commentedI've seen this before and it's when your browser has saved your username/password and injected that into the Database user/password fields (which are hidden because you have not selected a DB service).
You get a 'passwords do not match' because it only injects it into the first password field and not the 'confirm password' field, typical browser stuff.
You can select MySQL in the form to erase the auto-injected credentials and then set it to None again (or just don't save passwords in the browser).
Sure that's not the case for you too?
Don't know about the task debug. Segmentation fault, possibly needs an strace or something to see what's going on. Please confirm whether you hit the above browser issue and whether fixing that fixes the segfault
Comment #2
dsobon commentedI can confirm that removing the "saved password" in browser solved the problem.
Now to figure out why drush then segfaults.
Comment #3
Anonymous (not verified) commentedShould only be marked 'for review' if there's a patch to review. Just for filters.
The segfault should really be a separate support request, but i can't be bothered.
Comment #4
Anonymous (not verified) commentedCan you test this patch and let me know if it makes clusters work?
Comment #5
Anonymous (not verified) commentedCafuego confirmed the fix in passing in #1047174: Reverse proxy support, which matches my experience. I'll commit this
Comment #6
Anonymous (not verified) commentedComment #7
omega8cc commentedThis commit makes Provision working only with Drush 3.3 and breaks everything for any attempt (both Aegir install and upgrade) with Drush 5/HEAD.
Comment #8
Anonymous (not verified) commentedOk, I'll revert.
Can we get a bit more info than 'it breaks everything' though?
Comment #9
omega8cc commentedWith this commit it is simply impossible to provision or upgrade the Hostmaster site (using Drush HEAD), its drupal root is simply not created. Here is an example upgrade log with debugging enabled: https://gist.github.com/812930
BTW: I discovered it trying to reproduce my previous issue with not rewritten paths in the files table.
Comment #10
Anonymous (not verified) commentedOK, reverted.
Someone is going to have to think of a fix that works in Drush 4. (How I wish there was never a Drush 4)
Comment #11
Anonymous (not verified) commentedI actually re-applied this after I realised what the over-arching issue was in #1047922: Drush 4.x and 5.0-dev/HEAD breaks Aegir in some (hidden) ways - paths in files not rewritten on clone/migrate/rename
The patch above had not been made against HEAD, and it blew away some changes we had already made to incorporate the new 'cli' context in Drush 4 that replaces the 'options' context.
So that's why it would've broken for you under Drush 4 on install: it would have been checking for an 'options' context that no longer exists.
I merged in this patch with the cli changes so it should work now.
Comment #12
Anonymous (not verified) commentedI am reopening this after we discovered this patch breaks the upgrade path and Migrate, and possibly other stuff too. So I have reverted this patch as it clearly needs more work.
See #1056864: Current upgrade path from 0.4-beta2 is broken for some other discussion, but I've marked it as a duplicate of this.
Comment #13
anarcat commentedI think this should be marked critical because it breaks the upgrade path, migrate and so on... Is that right? Or this commit was reverted and the upgrade path is okay?
Comment #14
omega8cc commentedThis commit has been reverted: http://git.aegirproject.org/?p=provision.git;a=commit;h=6fa9ea1f63af5be7... so now only the cluster stuff is broken again. Changing priority back to major (I hope that is correct).
Comment #15
Anonymous (not verified) commentedYep, I reverted it since it simply needs work before it can be reapplied.
The upgrade path is ok right now as a result (just tested).
Might switch HEAD to use Drush Make 2.0 too (I hope it doesn't break anything, I'll test it as best as I can)
Comment #16
Anonymous (not verified) commentedWe can't use Drush Make 2.0 yet, I've identified a bug: #1059238: No core project specified
Good thing I tested :)
Anyway, I'll stop hijacking this ticket.
Comment #17
jgabor commentedDo I understand it correctly that the primary thing that's "probably broken" with clusters at the moment, is that the hidden fields get auto-filled by the web browser? So if I fix that with Firebug, everything should work as intended?
Comment #18
anarcat commentedIndeed, I am not sure how a patch to provision would fix issues in the frontend. I think this should have been filed in the frontend in the first place...
mig5 - what is the provision patch supposed to fix? Should we throw it at this issue instead? #946606: provision-save core dumps on loops
Comment #19
jgabor commentedOk, I've been trying to get clusters working for a couple of days... And even though the RC3 release notes stated nothing about any work had been done on the multi-server feature, I crossed my fingers and hoped for the best. :)
But unfortunately, I ran into the same segfault as dsobon ran into:
We're trying to get a "production-ready" environment up and running for some of our corporate sites, and I'd love to get this working. So if there's anything I can help out with (except coding), just let me know.
(Keep up the awesome work, guys!)
Comment #20
anarcat commentedIt looks like you're a victim of #946606: provision-save core dumps on loops. Can you rerun with the php5-xdebug package enabled?
Please do run the command manually with --debug also so we get more information about what it's trying to save in that context...
Comment #21
anarcat commentedJoe pasted the goods here:
https://gist.github.com/58181fc3aaab74391fef
It looks like something is failing in the provision-save command:
When provision-save fails, it's usually because a referenced alias doesn't exist. Can you check if those aliases exist: @server_vs72mcnutilitycom,@server_vs73mcnutilitycom ? And try again with only one alias, or no alias at all...
Comment #22
joestewart commentedThe both existed. Had the same error when removing one. Removing "--cluster_web_servers='@server_vs72mcnutilitycom,@server_vs73mcnutilitycom'" allowed the command to run . Output added as a comment to https://gist.github.com/58181fc3aaab74391fef
Comment #23
anarcat commentedPlease try the attached patch to enable more debugging.
Comment #24
anarcat commentedmore debugging again.
Comment #25
anarcat commentedagain.
Comment #26
joestewart commentedafter applying the patch in #25
Comment #27
joestewart commentedSuccess though.
Adding "$this->server->cluster_web_servers = explode(',', $this->server->cluster_web_servers);" makes it work. But the check whether it is an array fails.
patch coming up.
Comment #28
joestewart commentedStill troubleshooting, but uploading a patch that shows what was required to avoid the recursion problem. This code really shouldn't always need to be run. There are still warnings from using $this->server->cluster_web_servers when it is either empty or not an array.
UPDATE: There is more going on. Still can't get a site to install properly. This patch may just highlight a symptom.
Comment #29
Anonymous (not verified) commentedFWIW,
Unconed's originL explanation of the recursion issue http://community.aegirproject.org/node/267
The patch in #4 was his attempted fix (but it broke aegir installs)
Comment #30
joestewart commentedFWIW this problem was first released with alpha15. It might be obvious to some but took me awhile to catch up. This is when the context changes were first released. So it seems that it wasn't entirely some bug introduced later but that the cluster feature wasn't completely working at that time.
Upgrades from alpha14 with the cluster already defined still work in in alpha15-beta2 and possibly later I didn't test further.
Comment #31
joestewart commentedPatch to provision attached for review. Worked for initial testing from cluster creation to site migrations. Will test on a different install next.
Comment #32
joestewart commentedanother patch for review.
Comment #33
anarcat commentedYou know, I thougth of doing exactly that too. Since cluster is mainly broken for everybody i hear trying to make it work, and that patch works for you, why not just get that in?
Can you confirm this works in your tests? If so please mark RTBC.
Comment #34
joestewart commentedI tested creating a cluster, adding a platform and site on another test envirnoment. RTBC for me.
Comment #35
Anonymous (not verified) commentedI pushed this fix and tested that it didn't break anything else obvious.
I didn't test the cluster feature itself yet.
Thanks for the patch!
Comment #36
joestewart commentedJust a note for future discussion...
Further testing still hasn't revealed any errors. I do see some things that look more like missing features.
1. Clusters seem to be http only. Nothing stops SSL sites from being created, but their vhost and certificate aren't setup correctly.
2. Adding a server to a cluster - platform on the cluster are verified. But the existing sites need a verify as well to have the configs sent to the new server.
3. Removing a server from a cluster - The existing vhosts that were part of the cluster are not removed.
Comment #37
Anonymous (not verified) commentedWe probably would need individual tickets for those.
1. sounds like a feature request
2. sounds like an FAQ thing: there are other cases in Aegir where auto-verifying sites after verifying platforms would mitigate issues like that, but we decided not to go down that route
3. sounds like a bug report
Comment #39
mrfelton commentedSorry to post in this old issue, but were tickets ever opened for 1, 2 and 3 as per #36/7? I can't find anything relating the the issue of nginx ssl vhosts not being created for clusters.