This happened to us twice in the last 2 days with two d6 sites on different servers. No other errors and no visible reason why that happens with just vanilla d6 platform/site. Each of those sites has about 350 modules, but it doesn't seem to be an issue, since raising PHP limits to very high values doesn't help.
We tried it also with latest drush head, just to make sure it is not drush related, but it didn't made any difference.
I even tried to create a dummy site in the imported platform and then just overwrite its database to avoid importing it in Aegir but then it failed on verify with the same error, which is strange to me, because when running the import command from command line, it failed with this error on provision-save while provision-verify worked without any error.
I have found the same error in this log: http://pastebin.com/sexe8vXU
Comments
Comment #1
Anonymous (not verified) commentedIf you've time to debug, start sticking some dlm() into provision.context.inc
In the function services_invoke(), stick a 'dlm($services);' after $services = $this->get_services();
I guess this is returning a string or a null array or something like that.
Normally these are just a warning - in your pastebin, lines 73/74 seem more likely to be causing that error I would've thought
Comment #2
omega8cc commentedThanks.
It is, indeed, an empty array http://gist.github.com/618148, but why?
Comment #3
omega8cc commentedI'm now looking for similarities between those 2 sites, as I suspect it can be related to the already known single namespace nature of Drupal. Let's see what it can be.
Comment #4
omega8cc commentedNow that is even more strange. I tried to import the site using Aegir alpha14 instead and it fails with app error still, but this time there is nothing about "Undefined variable: results provision.context.inc:361".
[EDIT] alpha14 doesn't have provision.context.* files, so this is why it doesn't display that error, however the import fails with final "An error occurred at function : drush_provision_drupal_post_provision_import" and no other debug info is available.
Comment #5
omega8cc commented[EDIT] Both site are using in fact only 115-125 contrib modules, so Aegir reports false data about over 350 modules/themes used (probably using system table?), like:
Not sure if that has any impact here.
Comment #6
omega8cc commentedAnd now to make it even more complicated: on the same Aegir instance I was able to import without any errors another d6 site for the same client, but its import/verify tasks also have this strange "Undefined variable: results provision.context.inc:361" notice, however it doesn't break anything!
And let's make it even more complicated: the site which failed to import/verify in the front-end, after I tried manually verify it on the command line, generated also full-size drushrc.php file. The web server config was already created even after failed import (!)
Effectively Aegir created all required files, and the site is available online, and works without any issues!
The only issue is - it fails to verify in the front-end for some unknown reason so we can't manage it in the front-end.
I'm pretty sure it is a bug in Aegir, and not something related to the particular site, since I have seen exactly the same issue two times with completely different imports and also I have found similar log report posted/linked above.
Comment #7
omega8cc commentedFinally I decided to comment out the internal verify task
in the import, to pass the import step and start debugging
that failing verify task.
Then I started removing modules one by one and running verify
after each module was removed/restored.
The first site where I have seen this problem now is verified
without any errors after removing admin_menu 6.x-3.0-alpha4.
No idea why that helped, but after "results" with the second
site (details below) I'm not sure if restoring admin_menu
and then removing it again would cause another failed verify.
Now the second site - I started removing modules and finally
got the first succesfull verify after removing many of them:
The list of not-removed modules:
#
The list of removed modules
#
Next steps shows clearly that something is seriously wrong with Aegir now:
It looks completely random.
Hostmaster doesn't use any caching.
Restarting database server doesn't help.
What helps at some step (removing some random module)
fails after the module is restored, and still fails
after the module is removed again!
Never seen anything like that before.
Any ideas?
Comment #8
Anonymous (not verified) commentedI saw Adrian commit a fix for the 'results' issue, see
http://git.aegirproject.org/?p=provision.git;a=commitdiff;h=9dcf8b888228...
but i'm no longer sure if that's even where your problem is or whether it's circumstantial.
It could be a problem with our package logic if the success/fail is manipulated by enabling/disabling of modules.
Comment #9
omega8cc commentedThis notice *seems* to be irrelevant, as I mentioned in this comment: http://drupal.org/node/936390#comment-3555146 everything worked fine with another site import/verify on the same Aegir instance, and this notice (and/or the reasons of this notice) didn't have any impact. This obviously means we have hit something different.
I had a hope that removing modules will reveal the module causing this (because of single name space etc.) but was a bit shocked how it "works" and Aegir can't handle this situation.
This is why I believe the source of this issue is hidden somewhere inside the Aegir logic. The problem is, I can't find a way to make Aegir more verbose at this point. We know where it fails, but we probably need an enhanced debug mode there to reveal that hidden reason.
Comment #10
ergonlogicClosing due to lack of activity. Feel free to re-open if anyone sees this behaviour recur. Better yet, start a new issue referencing this one.