This happened to us twice in the last 2 days with two d6 sites on different servers. No other errors and no visible reason why that happens with just vanilla d6 platform/site. Each of those sites has about 350 modules, but it doesn't seem to be an issue, since raising PHP limits to very high values doesn't help.

We tried it also with latest drush head, just to make sure it is not drush related, but it didn't made any difference.

I even tried to create a dummy site in the imported platform and then just overwrite its database to avoid importing it in Aegir but then it failed on verify with the same error, which is strange to me, because when running the import command from command line, it failed with this error on provision-save while provision-verify worked without any error.

I have found the same error in this log: http://pastebin.com/sexe8vXU

Comments

Anonymous’s picture

If you've time to debug, start sticking some dlm() into provision.context.inc

In the function services_invoke(), stick a 'dlm($services);' after $services = $this->get_services();

I guess this is returning a string or a null array or something like that.

Normally these are just a warning - in your pastebin, lines 73/74 seem more likely to be causing that error I would've thought

omega8cc’s picture

Thanks.
It is, indeed, an empty array http://gist.github.com/618148, but why?

Found command: provision-save (commandfile=provision) [0.07 sec, 7.26 MB]                       [bootstrap]
Initializing drush commandfile: drush_make [0.07 sec, 7.27 MB]                                  [bootstrap]
Initializing drush commandfile: drush_make_d_o [0.07 sec, 7.27 MB]                              [bootstrap]
Initializing drush commandfile: provision [0.07 sec, 7.28 MB]                                   [bootstrap]
Load alias @self [0.07 sec, 7.28 MB]                                                               [notice]
Array                                                                                              [notice]
(
)
 [0.07 sec, 7.29 MB]
Undefined variable: results provision.context.inc:361 [0.07 sec, 7.29 MB]                          [notice]
Load alias @server_master [0.07 sec, 7.3 MB]                                                       [notice]
Array                                                                                              [notice]
(
)
 [0.07 sec, 7.32 MB]
Undefined variable: results provision.context.inc:361 [0.07 sec, 7.32 MB]                          [notice]
Loading mysql driver for the db service [0.07 sec, 7.33 MB]                                        [notice]
Loading nginx driver for the http service [0.07 sec, 7.34 MB]                                      [notice]
Array                                                                                              [notice]
(
    [db] => @server_master
    [dns] => @server_master
    [example] => @server_master
    [http] => @server_master
)
omega8cc’s picture

I'm now looking for similarities between those 2 sites, as I suspect it can be related to the already known single namespace nature of Drupal. Let's see what it can be.

omega8cc’s picture

Now that is even more strange. I tried to import the site using Aegir alpha14 instead and it fails with app error still, but this time there is nothing about "Undefined variable: results provision.context.inc:361".

[EDIT] alpha14 doesn't have provision.context.* files, so this is why it doesn't display that error, however the import fails with final "An error occurred at function : drush_provision_drupal_post_provision_import" and no other debug info is available.

omega8cc’s picture

[EDIT] Both site are using in fact only 115-125 contrib modules, so Aegir reports false data about over 350 modules/themes used (probably using system table?), like:

Found 289 modules
Found 16 themes

Not sure if that has any impact here.

omega8cc’s picture

And now to make it even more complicated: on the same Aegir instance I was able to import without any errors another d6 site for the same client, but its import/verify tasks also have this strange "Undefined variable: results provision.context.inc:361" notice, however it doesn't break anything!

And let's make it even more complicated: the site which failed to import/verify in the front-end, after I tried manually verify it on the command line, generated also full-size drushrc.php file. The web server config was already created even after failed import (!)

Effectively Aegir created all required files, and the site is available online, and works without any issues!

The only issue is - it fails to verify in the front-end for some unknown reason so we can't manage it in the front-end.

I'm pretty sure it is a bug in Aegir, and not something related to the particular site, since I have seen exactly the same issue two times with completely different imports and also I have found similar log report posted/linked above.

omega8cc’s picture

Finally I decided to comment out the internal verify task
in the import, to pass the import step and start debugging
that failing verify task.

Then I started removing modules one by one and running verify
after each module was removed/restored.

The first site where I have seen this problem now is verified
without any errors after removing admin_menu 6.x-3.0-alpha4.

No idea why that helped, but after "results" with the second
site (details below) I'm not sure if restoring admin_menu
and then removing it again would cause another failed verify.

Now the second site - I started removing modules and finally
got the first succesfull verify after removing many of them:

The list of not-removed modules:
#

cck               feedapi_mapper    imagecache_profiles  messaging      parser_ical    skinr               viewfield
ctools            filefield         imagefield           mobify         pathauto       smsframework        views
date              fivestar          image_resize_filter  mollom         path_redirect  syndication         views_accordion
ddblock           follow            insert               nd             private        taxonomy_access     views_bonus
devel             geo               invite               nd_contrib     privatemsg     taxonomy_title      views_slideshow
devel_themer      globalredirect    jplayer              nicemap        profile_role   thickbox            views_slideshow_ddblock
drupalforfirebug  gmap              jquery_ui            nice_menus     reroute_email  token               votingapi
ds                gmap_addons       jquery_update        nodewords      rules          transliteration     webform
ed_readmore       google_analytics  lightbox2            notifications  scheduler      twitter             webform_validation
email             googlenews        link                 oauth          schema         ubercart            wysiwyg
embed_quicktime   heartbeat         location             og             seo_checklist  ud                  xmlsitemap
emfield           htmlpurifier      menu_attributes      og_reg_codes   services       urllist
faq               imageapi          menu_block           page_title     site_map       user_relationships
feedapi           imagecache        menutrails           panels         site_verify    video_filter

The list of removed modules
#

admin_menu     advanced_profile  auto_nodetitle  calendar  chatroom           content_access   db_maintenance
adminrole      author_pane       backup_migrate  captcha   checkbox_validate  content_profile
advanced_help  autoassignrole    boost           cd        contact            custom_pagers

Next steps shows clearly that something is seriously wrong with Aegir now:

1. verify OK   after removing custom_pagers (why?)
2. verify FAIL after restoring admin_menu (?)
3. verify FAIL after removing admin_menu (!!!)
4. verify OK   after removing devel_themer, devel (why?)
5. verify FAIL after restoring adminrole (??)
6. verify OK   after removing heartbeat (why?)
7. verify FAIL after restoring heartbeat (oh!)
8. verify FAIL after removing heartbeat (!!!)
9. verify OK   after removing google_analytics (oh, again? why?)

It looks completely random.

Hostmaster doesn't use any caching.
Restarting database server doesn't help.

What helps at some step (removing some random module)
fails after the module is restored, and still fails
after the module is removed again!

Never seen anything like that before.

Any ideas?

Anonymous’s picture

I saw Adrian commit a fix for the 'results' issue, see

http://git.aegirproject.org/?p=provision.git;a=commitdiff;h=9dcf8b888228...

but i'm no longer sure if that's even where your problem is or whether it's circumstantial.

It could be a problem with our package logic if the success/fail is manipulated by enabling/disabling of modules.

omega8cc’s picture

This notice *seems* to be irrelevant, as I mentioned in this comment: http://drupal.org/node/936390#comment-3555146 everything worked fine with another site import/verify on the same Aegir instance, and this notice (and/or the reasons of this notice) didn't have any impact. This obviously means we have hit something different.

I had a hope that removing modules will reveal the module causing this (because of single name space etc.) but was a bit shocked how it "works" and Aegir can't handle this situation.

This is why I believe the source of this issue is hidden somewhere inside the Aegir logic. The problem is, I can't find a way to make Aegir more verbose at this point. We know where it fails, but we probably need an enhanced debug mode there to reveal that hidden reason.

ergonlogic’s picture

Version: » 6.x-1.x-dev
Status: Active » Closed (won't fix)

Closing due to lack of activity. Feel free to re-open if anyone sees this behaviour recur. Better yet, start a new issue referencing this one.