Very strange problem here that unfortunately I can't test to death because of a production client site on it.

First, here are my server files: https://gist.github.com/2258564

Here are the steps that I took today and once last week:

  1. Backed up server, cloned live site to a dev.site.com on same 6.22 platform and tested migration from 6.22 to 6.24. Success! works fine.
  2. Migrated live site from 6.22 to 6.24 - Green check, Success?
  3. Tested live site - I'm still logged into site and all seems to work, but opening a different browser as Anon user shows "site offline for maintenance" ??? Other computers show this too.
  4. Start trying to log in to the site by visting "/user" ... the page loads! click home, site loads fine now! (assumed maybe cache purged?) Thinking everything is OK again. In the meantime I had cleared all cache, rebooted linode, tried to reset maintenance mode
  5. go test on a completely different computer - STILL OFFLINE ON FIRST VISIT!
  6. freaked out and restored linode from backup.

Any thoughts on what could be causing this? It seems like a server side caching thing to me, especially since the "dev." clone seemed fine. Plus I've had other problems with backend pages being aggressively and confusingly cached before...

Please help :( Thanks so much

Comments

omega8cc’s picture

Component: Code » Documentation

It may happen when Aegir failed to delete the old grant for the same dbuser/dbname. Sometimes, for older installs and sites not verified in a long time (or as a result of some old failed/incomplete task), there may be a duplicate grant, with different (old) password and IP and another one (new) with new password and hostname. Aegir is not aware then that the old grant should be deleted (as it is for IP access, so it looks like a different grant) and this will cause really weird WTF issue, as the site will be confused and will work when the new grant (with correct password and hostname) is used and fail when the old is used (randomly). You may need to review grants in your database server and carefully delete those old duplicates.

leevh’s picture

Thank you for the feedback, I checked the grants (I think I checked correctly) and they look OK compared to my other working sites:

problem site (anonymized)

| Grants for problemsitecom@li200-212.members.linode.com                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------+


| GRANT USAGE ON *.* TO 'problemsitecom'@'li200-212.members.linode.com' IDENTIFIED BY PASSWORD '*5A5CBmypassword8783A0842' |
| GRANT ALL PRIVILEGES ON `problemsitecom`.* TO 'problemsitecom'@'li200-212.members.linode.com' 

The other working sites have the same grants. Am I looking at this correctly? Looking at the users table in Chive, it also compares the same.

leevh’s picture

Another error I've had in trying to clone this site to 6.24 as "test.problemsite.com" is the following:

Drush command terminated abnormally due to an unrecoverable error. Error: require_once(): Failed opening required './sites/www.problemsite.com/modules/htmlmail/htmlmail.mail.inc' (include_path='.:/usr/local/lib/php') in /data/disk/host/distro/002/pressflow-6.24.1-prod/sites/test.problemsite.com/modules/autoload/autoload.module, line 201

I've re-verified the problem site, and the both platforms involved... I am at wit's end trying to upgrade this site :( I also can't tell if this would be a drush problem, aegir, or octopus..

omega8cc’s picture

Project: Barracuda » Octopus

Using modules in the site space is a recipe for disaster. Avoid that. Or be prepared for more issue like this one. Registry Rebuild may help, but it is better to follow good practices and use either platform specific space in sites/all/modules or install profile specific space in profiles/name/modules.

Your original issue may be related to known issues with Speed Booster caching in previous BOA versions. We have fixed them all (I hope) in head.

You may want to purge Speed Booster cache manually: rm -f -r /var/lib/nginx/speed/*

omega8cc’s picture

Status: Active » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

added a bit