I just upgraded my BOA system with barracuda up-stable which, apart from MySQL not restarting itself, seemed to go OK.

I moved on to updating my o1 instance with a octopus up-stable o1 and all seemed to be going well until:

[...]
Octopus [Sat May 19 16:54:59 BST 2012] ==> UPGRADE B: Running hostmaster-migrate, please wait...
Octopus [Sat May 19 16:55:15 BST 2012] ==> UPGRADE B: Hostmaster STATUS: upgrade completed
Octopus [Sat May 19 16:55:15 BST 2012] ==> UPGRADE B: Simple check if Aegir upgrade is successful
Octopus [Sat May 19 16:55:17 BST 2012] ==> UPGRADE B: FATAL ERROR: Required file /data/disk/o1/aegir/distro/005/sites/o1.XXXXXXX/settings.php does not exist
Octopus [Sat May 19 16:55:17 BST 2012] ==> UPGRADE B: FATAL ERROR: Aborting AegirSetupB installer NOW!
Octopus [Sat May 19 16:55:17 BST 2012] ==> UPGRADE A: FATAL ERROR: AegirSetupB installer failed
Octopus [Sat May 19 16:55:17 BST 2012] ==> UPGRADE A: FATAL ERROR: Aborting AegirSetupA installer NOW!
Octopus [Sat May 19 16:55:17 BST 2012] ==> FATAL ERROR: AegirSetupA installer failed
Octopus [Sat May 19 16:55:17 BST 2012] ==> FATAL ERROR: Aborting Octopus installer NOW!
Done for /data/disk/o1
OCTOPUS upgrade completed
Bye

So in /data/disk/o1/aegir/distro/005/sites I symlinked default/ to o1.XXXXXXX/ (domain redacted) /settings.php and ran the update again (after removing the contol file.

This time it got further until...

[...]
Octopus [Sat May 19 17:14:03 BST 2012] ==> UPGRADE B: Enhancing Aegir UI, please wait...
Octopus [Sat May 19 17:14:47 BST 2012] ==> UPGRADE A: FATAL ERROR: AegirSetupB installer failed
Octopus [Sat May 19 17:14:47 BST 2012] ==> UPGRADE A: FATAL ERROR: Aborting AegirSetupA installer NOW!
Octopus [Sat May 19 17:14:47 BST 2012] ==> FATAL ERROR: AegirSetupA installer failed
Octopus [Sat May 19 17:14:47 BST 2012] ==> FATAL ERROR: Aborting Octopus installer NOW!
Done for /data/disk/o1

So I've given up here... The o1 instance seems to work, am testing it now. Terminal output log (with redacted domains) attached.

I've not done much to my instances except move a few sites around in my own custom platforms... All other server customisations have been outside octopus/barracuda...

Any thoughts as to why it did this? How can I bring it back to normal?

Comments

Priority:Normal» Major

Further to my last, cron is not running any longer for o1... going to start looking through the octopus scripts.

Status:Postponed (maintainer needs more info)» Active

Please never try tricks like symlinking settings.php file! It will make the things only worse!

Please enable _DEBUG_MODE=YES in your Octopus cnf file and run upgrade again - then attach (as a file) the output.

The most common reason is that you have messed some permissions there, like adding/modifying some files as root etc. so Aegir fails to properly complete its work.

Priority:Major» Normal
Status:Active» Postponed (maintainer needs more info)

Status:Active» Postponed (maintainer needs more info)

Please revert everything you did trying to fix this and then run octopus upgrade again with debugging enabled.

Also, please make sure where your hostmaster site really is now.

Maybe it is in /data/disk/o1/aegir/distro/004/sites while /data/disk/o1/aegir/distro/005/sites is empty because of failed upgrade, then delete /data/disk/o1/aegir/distro/005 completely before running the upgrade again.

Again, make sure where your site exists and delete any newer /data/disk/o1/aegir/distro/00x directories before running the upgrade again.

Component:Code» Miscellaneous
Category:bug» support

Thanks for the very speedy response Grace, much appreciated! Will do the above and get back to you.

Have removed symlink... That's all the tweaks I tried.

I tried adding a custom logo in my o1 instance to see where it appeared, and went to the in the distro/006 instance... That's the newest, so I'll leave all folders as is for now.

Will re-run now _DEBUG_MODE=YES is set.

Have you read what I wrote in #5 above?

You must delete all directories with /data/disk/o1/aegir/distro/00x number higher that the one where your site exists now.

Status:Postponed (maintainer needs more info)» Active
StatusFileSize
new185.14 KB

Yes, I read 5, which is why I did what's in #8 to try to establish what directory the installation is using... There are no newer/higher folders than 006, and that's where the logo file I uploaded to my o1 Drupal instance got saved to... Hence I didn't remove any of them since it was the highest -- Or is there a better/safer way to deduce the current Aegir distro directory?

Anyway, I've set debug mode on in both .o1.octopus.cnf and .octopus.cnf and run the update again... Lots of output, but strangely the debugging appears to stop for Upgrade B. Unless there's a way to ensure debugging during AegirUpdateB?

I've attached all terminal output anyway... hope it helps flag up the cause.

So it looks like the migration to 007 went OK, but it failed to continue and proceed with adding new platforms only because the control file /opt/tmp/status-AegirSetupB-FAIL still exists for some reason (as a result of some previous fail probably) and it keeps aborting the upgrade after the point where there is no reason to fail.

Please check if that file really exists, delete it and any similar file you will find there /opt/tmp/status-*FAIL and then run the upgrade again.

Component:Miscellaneous» Code
Category:support» bug
Priority:Normal» Major
Status:Active» Fixed

This should be fixed in this commit: http://drupalcode.org/project/octopus.git/commit/8b7805d

Thanks for the report!

Re: commit in #13 -- There there was a file in /opt/tmp/ called "status-Octopus[something]" so you may want to double check there's an appropriate rm for that in your scripts, too.

So I removed the offending files and re-ran update it again and it's all good now, thank you very much.

One last question: If I'm now on aegir/distro/008, can I remove the old/lower aegir/distro/00x directories?

Thanks!

Status:Fixed» Needs work

Re-opening to ensure my point in #15 about other files in /opt/tmp was seen... and actioned or ignored!

Status:Needs work» Fixed

Have you read the patch? This *has been fixed* already in commit linked in #13 above. Thanks!

Hello I am experiencing the same bug but i have a question...in the file you say not to use the head version and for repairing the bug described here you proposed a head version of Octopus, when i run it i have this message :
Your system has to be configured/upgraded by BARRACUDA version BOA-2.0.4-dev first

You always have to upgrade Barracuda first, before you will be able to upgrade Octopus - this is expected and by design.

Yes I understood that but the fact that is a dev version and when i open the octopus file proposed here i see the line :

_AEGIR_VERSION=HEAD

So maybe i did not follow everything, but from what i understand, i need to install a HEAD version to solve the issue ?

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Status:Closed (fixed)» Active

I am getting the same UPGRADE B: FATAL ERROR: Required file /data/disk/o1/aegir/distro/005/sites/o1.XXXXXXX/settings.php does not exist when running octopus up-stable after upgrading to BOA-2.0.6 stable.

I have tried removing 00x-directories and clearing /opt/tmp/ without any luck, logs does not seem to show anything wrong.

It has affected two separate servers after upgrading from 2.0.5 to 2.0.6 so it does not seem to be just a local problem.

I have the same problem, and I noticed that on a properly upgraded "o2" octopus instance under the same barracuda that the symlink /data/disk/o2/www.o2.server1.domain.com has ownership of 114:100 while the non-upgraded 01 has symlink /data/disk/o1/www.o1.server1.domain.com ownership of 111:100 (changing it to 114:100 and removing all folders and files from the failed upgrade attempt and trying the octopus 01 up-stable again failed with the exact same error message).

YES UPGRADED:
/data/disk/o2/u/o2.server1.domain.com is 114:100 755 there is no symlink www.o2.server1.domain.com at all here.
/data/disk/o2/aegir/distro/002/sites/ has BOTH a o2.server1.domain.com folder AND a www.o2.server1.domain.com symlink BOTH set to 114:100 775

while
NOT-UPGRADED:
/data/disk/fast1/u/o1.server1.domain.com - a corrupt symlink
/data/disk/o1/aegir/distro/005 is 111:100 (this is where current o1 site is)
/data/disk/o1/aegir/distro/005/sites/o1.server1.domain.com is set to 111:100 (this after seting it to 114:100 775 before attempting last upgrade)
and symlink /data/disk/o1/aegir/distro/005/sites/www.o1.server1.domain.com is set to 111:100 (this after seting it to 114:100 775 before attempting last upgrade)

while in the failed o1 octopus instance upgrade to /006/ I find:

/data/disk/o1/aegir/distro/006/sites/www.o1.server1.domain.com is a corrupt symlink
and there is no folder : /data/disk/o1/aegir/distro/006/sites/o1.server1.domain.com at all.

So I'm going to try to set everything from data/disk/01/aegir/* in my o1 instance which failed to upgrade to 114:100 (instead of 111:100)
and I'll add to this report.

The above did NOT help at all, the o1 folders I had changed to 114:100 have been returned to 111:100 and this failed attempted upgrade did not even create another distro/006 as it always did in the past! So this is not the right approach.

Status:Active» Closed (fixed)

Please open a proper, separate ticket with all required information and complete logs.