Postponed (maintainer needs more info)
Project:
Provision
Version:
7.x-3.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
11 Feb 2011 at 13:24 UTC
Updated:
9 Jul 2013 at 15:38 UTC
Comments
Comment #1
eugenmayer commentedside note - database is removed correctly.
In the clone log, Changes for drush_http_post_provision_deploy module have been rolled back. is logged and actually all those deletes are green - but well files and configs are still there.
Comment #2
eugenmayer commentedRC1:
Easy to reproduce:
- add a drush_set_error somewhere in platform/migrate.provision.inc
- run migrate
Migrating will fail unsuprisingly, the odd thing iss, that the "newly" and not needed DB is not removed in the rollback methods.
Still to check: Iam pretty sure "new" site folder is not deleted on the aegir master also, but did not verify this right now
Marking low prio, as its not mission critical. You just end up with ghost DBs, but everthing else stays functional
Comment #3
memtkmcc commentedIt is even worse, because not only database zombie is left, but also dbuser with random password, so any attempt to create/migrate the site again will fail, because you will end up with two grants for the same dbuser, but with two different passwords - I have seen that many times already.
Changing to major, as it is a pretty serious bug related to not-clean rollback procedure.
Comment #4
anarcat commentedIf I understand this correctly, rollback is not working, but only on multi-server: clarifying title.
Note that #1057736: [SSL] Cloning SSL sites will lead to a non SSL site .. but certificate is still copied should be fixed in rc1, so I'd be curious to see why you still have a failure in the migrate. Can you post the full debug log?
Comment #5
eugenmayer commentedA have no failure there, i provocate it to reproduce the bug ( thats why i add drush_set_error ). But there are other reasons why this can happen and then you need a proper rollback. so #1057736: [SSL] Cloning SSL sites will lead to a non SSL site .. but certificate is still copied works fine.
Comment #6
anarcat commentedNote that this may be a regression introduced while fixing #976300: web server migration results in content removal when site is verified...
Comment #7
anarcat commentedSo. I understand there's an issue here, but I'd like to reproduce it. So please state exactly where you added the drush_set_error() to trigger that bug. File name and line number, from our current HEAD or RC1.
From what I understand, the underlying bug that was triggering this one was fixed, so this is less of a priority now, downgrading priority to normal, unless we can find a case where this bug gets triggered by forces outside of provision that we can't fix another way.
Comment #8
eugenmayer commentedadd it behind http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p...
Iam not sure setting the prio lower is the right assumption but maybe jus try out to reproduce it and make up your mind on the issue. I think its still critical.
#976300: web server migration results in content removal when site is verified iam not sure about. But i pretty much know the data-flow:
1. pre backups the database + files, the tgz lands in backup http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p...
2. that backup gets deployed in http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p...
a) That is basicaly untaring the file on the the aegir master into platforms
b) syncinc files with remote
c) importing the db http://git.aegirproject.org/?p=provision.git;a=blob;f=db/deploy.provisio...
So and 2.c is our issues, as db/deploy.migrate.inc has a rollback http://git.aegirproject.org/?p=provision.git;a=blob;f=db/deploy.provisio... but those are not called recursivaly.
So e.g. if later on provision gets called in http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p... or the verify in http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p... ( which is often failing ), the database is never rolled back, as db/deploy.migrate.inc/rollback is never called.
This something we really have to think more deeply about, as this comes down to:
If a task calles other task which do complete, but the global task fails and is rolled back, the subtask should get rolled back also - all.
Comment #9
anarcat commentedThat is correct.
So I guess in this case, it would simply be making sure that provision-verify provokes a rollback if it fails.
I suspect this is the case right now.
Can you please provide a clear patch of what you have done to trigger the (non) rollback? Just "drush_set_error()" is not enough to raise an error, I believe.
Please provide a task backlog too to help in debugging.
Comment #10
eugenmayer commentedJust to give this one more drive and infos:
If the maste process fails, but triggered 4 backend forks before which all succeeded and the master process ( clone / migrate ) decides to rollback, only his drush rollback hook ( clone_rollback ) are called but NOT some of those backend forks:
- deploy rollback ( db..)
as they succeeded. So actually what we need is "pass down the rollback even to all subevents in the callstack tree
Comment #11
ergonlogictagging