Problem/Motivation

Files created on the remote webservers (the "spokes") can be overwritten during verify or other aegir tasks. This is because right now the hostmaster (the "hub") is considered authoritative for everything (see #1079274: Sites data get lost on migrate or clone / dont "spoke" the site for an example of that discussion).

In other words: the multiserver support is broken right now as we can have data loss in the files/ directory.

Proposed resolution

Change the sync default to exclude the 'files/' and 'private/' from sync operations. Only while deploying a whole site do we need to push these directories to the slave.

Remaining tasks

The patch has some issues, namely the use of --update for some obscure reason, which should be removed or explained.

We are also considering adding --delete to the files rsync calls to allow spoke webservers to delete files in the directory.

The core team needs to approve the API exception. ( most have agreed )

Finally, documentation needs to be updated.

TESTS

We have defined a few test steps (#56 and #57):

1. add a module on the master and check that after a verify task it's present on the slave
2. remove a module on the master and check that after a verify task it gone on the slave

3. add a file to the files folder on the slave and check that after a backup task it's also on the master and included in the backup.
4. remove a file from the files folder on the slave and check that after a backup task it's also removed from the master and no longer in the new backup.

5. Update a file on the spoke (update files via the css_injector or js_injector modules or delete a file and upload a new / changed file with the same name). Verify the site and see if changes in the file are still intact.

User interface changes

No change expected here.

API changes

A significant change is that the _provision_drupal_create_settings_file() function will *not* be syncing the site's files anymore. This seems more with the actual function name, but may break third party applications that rely on this function (even though it is prefixed with an underscore and so assumed to be a private function).

Original report by EugenMayer

For my undestanding, nearly anything in site/$site is user-generated content. I think its clear for files and e.g. the db, but it should be the same for themes, libraries and modules in there. This is just the "user sandbox" which should never get overriden by the "master" at all.

Currently we would even sync files form the master during verify, which is completly nonsense.

The attached patch does exclude themes, libraries, modules and files during verifcation, as those simply are more up-to-date on the remote then on the aegir master.
During a migrate / clone, the folders must be synced, as the master holds the current most recent data ( as they got synced to the master before a clone / migrate starts, see provision-backup). As also that "well" named method create_site_setting is used there, i added a switch to skip syncing for verify..

This also speeds up the verification process.

I consider this as a bug, as e.g. if you have a theme which has been modified by the user on the remote, those changes get lost on verify.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

EugenMayer’s picture

FileSize
1.85 KB

well the patch

EugenMayer’s picture

Status: Active » Needs review
anarcat’s picture

Status: Needs review » Needs work

I think this is wrong:

-function _provision_drupal_create_settings_file() {
+function _provision_drupal_create_settings_file($sync = TRUE) {
   $config = new provisionConfig_drupal_settings(d()->name, drush_get_context('site'));
   $config->write();
-  provision_drupal_sync_site();
+  if($sync) {
+    provision_drupal_sync_site();
+  }
 }

I think the settigns file should be sync'd anyways, no?

Further down:

+    if (d()->type === 'site') {

... I would think that this is always true and therefore not necessary, unless I'm missing something.

Otherwise this seems on par with #1079274: Sites data get lost on migrate or clone / dont "spoke" the site and should therefore go in, logically.

Anonymous’s picture

Would this patch allow people to add modules to the remote server without adding it to the aegir master server first, and that module *not* being blown away on verify as a result?

That sounds like partially the intention, and it worries me, because then if you go to Migrate a site, Aegir doesn't know about that new module so won't check that it is present on any valid target platforms when it loads the Migrate form and does the platform/package comparison stuff.

Then you could migrate it to a target platform that doesn't have that module, and it would get disabled during the task, potentially breaking the site. (this is the specific problem that made us decide to go for the spoke model instead)

I may be wrong in my understanding of the patch here, but that's my concern. It might 'solve' annoying things about the spoke model, but it could also introduce bigger problems like that.

Anonymous’s picture

It looks like it only ignores the modules/themes etc inside the site specific dir, and that dir would always be tarballed up and moved to the target platform, so maybe it's not an issue here.

anarcat’s picture

I think you are right miguel - the files *would* be ignored ...

EugenMayer: how are those files supposed to propagate back to the spoke when we migrate or restore?

Anonymous’s picture

They'd be missed because provision-backup only fetches back the files dir. It operates on the assumption that that's the only thing being changed on the passive remote server.

Furthermore, I don't know why you'd want to actually put modules/themes in the site specific dir since you then can never invoke upgrades of them, they'd get carried across to each new target platform (if they were in fact fetched back from the remote, which they aren't.)

So we either need to sync back more than just files in provision/platform/backup.provision.inc if we were to do this.

pearcec’s picture

We have custom themes per domain/themes directory. Each client wants a uniquely built theme. If we blow this away during a verify it gets lost. We don't include this at part of a platform build, we have a standard platform, we don't want a unique makefile for each site we build.

Sure we can update it on the hub and then run a verify to sync to the spoke, but we want to be able to offer clients the ability to upload and modify their own theme as well. I think anything in the domain/* directory as far as upgrades and verification is buyers beware at that point. Further I think this is what the expectation is when those directories are created.

crea’s picture

Version: » 6.x-1.1

Subscribing.
Definitely there's a grey area here. What is a site and what is the platform and how do we manage it all.
See also http://community.aegirproject.org/discuss/sites-and-platforms-missing-pi...

jmcclelland’s picture

Currently we would even sync files form the master during verify, which is completly nonsense.

I agree - it's also error prone.

However, syncing the rest of the data seems important.

I think distinguishing between content controlled by the web server (which will be owned by www-data/nobody/whatever and should be limited to the files directory) and content added via other methods is a good idea.

I would suggest that verify, when rsyncing from the master to the remote:

  • excludes the files directory
  • rsyncs everything else with either --ignore-existing or --update (and without --delete) to ensure newer files on the remote don't get clobbered.

Then - a backup would first do a verify, and then run an rsync from the remote server to the master with --delete to ensure the two directories are in sync.

jmcclelland’s picture

FileSize
2.12 KB

Here's an alternate patch. It only excludes the files directory and it passes the update argument to rsync to ensure we don't overwrite any newer files on the target.

And - I couldn't get it to work without modifying drush_provision_drupal_provision_verify_validate as well. I'm still new to provision and am not entirely sure if this will have un-desirable repercussions when migrating.

anarcat’s picture

I think you're on the right track here, and I agree with your analysis.

The problems I outlined above still remain though - how do we migrate a site between servers if we do not touch the files/ directory? Do we just admit this doesn't work and is a current limitation of the model?

I have a few comments regarding the patch itself:

+++ b/platform/provision_drupal.drush.incundefined
@@ -115,10 +115,12 @@ function provision_drupal_sync_site() {
-function _provision_drupal_create_settings_file() {
+function _provision_drupal_create_settings_file($sync = TRUE) {
   $config = new provisionConfig_drupal_settings(d()->name, drush_get_context('site'));
   $config->write();
-  provision_drupal_sync_site();
+  if($sync) {
+    provision_drupal_sync_site();
+  }

it seems to me that this function should either sync or not. that way we *know* just by reading the code what it does. my guess is that it should just not sync the "site" (especially sinc sync_site() actually syncs the *whole platform*) and instead let the user decide what and when to sync. so this would just be:

+++ b/platform/provision_drupal.drush.incundefined
@@ -115,10 +115,12 @@ function provision_drupal_sync_site() {
-function _provision_drupal_create_settings_file() {
+function _provision_drupal_create_settings_file($sync = TRUE) {
   $config = new provisionConfig_drupal_settings(d()->name, drush_get_context('site'));
   $config->write();
-  provision_drupal_sync_site();

Another example:

+++ b/platform/verify.provision.incundefined
@@ -77,7 +77,19 @@ function drush_provision_drupal_pre_provision_verify() {
-   _provision_drupal_create_settings_file();
+
+    _provision_drupal_create_settings_file(FALSE);
+    d()->service('http')->sync(d()->root, array('exclude-sites' => TRUE));
+    // we dont clone or migrate, so no need to force the user-generated content
+    // as they are always more up-to-date on the remote
+    if (d()->type === 'site') {
+      // Sync all filesystem changes to the remote server.
+      d()->service('http')->sync(d()->site_path, array(
+       'no-delete' => TRUE,
+       'exclude' => 'files/*',
+       'update' => TRUE, )
+      );
+    }

this is a good example: this should just create the settings file and then sync. it should be as simple as:

_provision_drupal_create_settings_file();
provision_drupal_sync_site();

In other words, what we want to patch here is drupal_sync_site() and all instances of create_settings_file() to make sure they sync the site when necessary.

That is, if we're ready to ditch inter server migration, which I am still a little confused about.

Powered by Dreditor.

jmcclelland’s picture

Hi anarcat - I agree with your suggestions about removing the sync arg for the _provision_drupal_create_settings_file() function.

And - I only posted a diff of half of what I think we should do because I'm still learning my way around the code :).

What do you think about applying the proposed patch for verify and create an additional patch that changes the behavior of the backup code so that it syncs, with --delete, the files directory from the target to the host (and nothing else).

In other words, from a human perspective, we say:

* the target server always has the most recent version of the files directory
* the hostmaster always has the most recent version of every other directory

Then it's simple for developers: everything that the web server creates should be placed in the files directory and that should be the only place writable by www-data.

anarcat’s picture

I like your thinking.

I'd like to hear what you think about inter-server migrations, but in the meantime, please do provide a new patch!

jmcclelland’s picture

Is there any documentation on the new "private" directory that seems to be created for D7? Perhaps that's another directory that should be treated like files?

As for inter server migration... seems like the same should apply? It does seem wasteful to copy things back to the hostmaster, and then rsync them to the same target server they originally came from, but that would make things simpler. Again... I'm still not familiar enough with the code to say.

We'll be planning a big migration in the next two weeks, so I'll be doing a lot of migration testing and will come up with a patch soon...

jamie

jmcclelland’s picture

Ug. I just ran into a new corner case that throws my original suggestion into doubt.

cron jobs are triggered from the hostmaster.

We have a mail cron job (via civicrm) that relies on a file uploaded via the web by a user and placed in the files directory.

The file doesn't exist on the hostmaster - just on the target server.

I very much prefer having cron jobs run on the hostmaster - that's where all our aegir/provision drush code is (I don't want to maintain that code on all the web servers).

However - this problem is a hard one to get around....

jasonlttl’s picture

When a site verifies, it overwrites the files directory. Aka, files/test.txt on master is copied over files/test.txt on the slave (overwriting it).

This all seems very, very, very, very, very bad. For example, when using imce, our editors delete a file and then upload a new file with the same name (which is overwritten on verify). Alternatively, we've used modules like slideshowcs which show all images within a directory (images deleted on the slave come back after verify).

Given that it's a data loss issue... why isn't it critical? Am I missing something?

jmcclelland’s picture

FileSize
5.71 KB

I agree, this is a serious problem.

I've attached a patch that I think addresses both of anarcat's concerns:

  • Preserve ability to migrate sites: With this patch, provision-verify does not sync files/* and private/* when syncing a site. However, when we migrate or restore or install, we still sync all files, including files/* and private/*. When a site migrates, it first does a full copy from the spoke to the hub (including files/* and private/*) and then does a full sync out to the new spoke (including files/* and private/*).
  • I also cleaned up the code. Now, _provision_drupal_create_settings_file does not sync files. provision_drupal_sync_site() is explicitly called everywhere that _provision_drupal_create_settings_file was called.

Feedback welcome! I've described my intentions above, but may have mis-executed them.

jamie

jmcclelland’s picture

FileSize
6.13 KB

Here's a new version of that patch (my last patch doesn't work). drush's rsync code uses a different syntax if you want rsync to exclude one path as opposed to multiple paths. Sigh. This patch is less elegant but at least functional.

jamie

jmcclelland’s picture

FileSize
6.14 KB

Ok... hopefully three's a charm... this new patch fixes another bug in my previous two.

jasonlttl’s picture

Your approach sounds good. I wish I had a remotely easy way to test it.

We ended up solving the data loss portion in our environment by hacking our Aegir instance to always fetch files as the first step of any site sync which is what omega8cc suggested a year ago.
http://drupal.org/node/976300#comment-3855348

I think your solution is a better approach since files syncs are so expensive.

crea’s picture

We could use duplicity for (incremental) backups. This way it would be much faster because only changed files would need to be backed up.

chrisschaub’s picture

Does this bug still exist?

anarcat’s picture

Status: Needs work » Needs review

I assume it does, since the proposed patch have not been merged in. Can somebody get some testing in there?

Otherwise, users that want to better deal with this can try the new "pack" module which enables the master server to simply export (through NFS or similar) /var/aegir/platforms to slave servers, removing completely the need for sync.

jmcclelland’s picture

I've been running the patched code for two months now on an aegir setup with about 50 sites.

It would be great to get feedback from others.

The pack module sounds interesting - is it available in the current stable version of provision?

anarcat’s picture

The pack module is available in 1.x, the dev branch. You can try it out by switching your aegir repository to "testing" - you should then be able to upgrade to the a package with a version number that has "+pack" in it.

This will be shipped in 1.7 and the cluster module will be removed from 2.x.

chrisschaub’s picture

I'll be testing tonight. This is a pretty major bug and might affect ctools files etc -- I've seen some panel plugin path issues and have to clear cache after migrate. I think the migrate is putting back an old version of a ctools cache file from the master which causes the plugins to not be available on the remote since the remote has a different file path after migrate etc. I'll test tonight and let you know about this patch. Just curious, what version is this patch against, 1.6?

chrisschaub’s picture

Status: Needs review » Reviewed & tested by the community

Ok, I tested the patch in #20 and it works well, thank you very much.

I tested on a master and remote server against provision 1.6:

1. Changed non files/* dir file on remote (increased bytes); master replaces file of same name on verify as it should.
2. Files in files/* dir on remote are not affected by verify.
3. Clone syncs files/* from remote to master and then to clone.
4. Migrate syncs files/* from remote to master to new migrated site.
5. Backups contain the latest synced copy of files/*

So, it seems the patch works well. I'd vote for inclusion in the next release or dev since this is a pretty serious bug.

Steven Jones’s picture

Priority: Normal » Major
chrisschaub’s picture

Any chance this will make it into 1.7? I can re-roll the patch against the latest dev if that will help?

chrisschaub’s picture

I've re-rolled the patch against the latest 6.1.x dev becuase the latest patch had some white space issues.

anarcat’s picture

Title: Dont sync $site/themes ..libraries ..files on verify of a site » Dont sync $site/files on verify of a site
Version: 6.x-1.1 » 6.x-2.x-dev
Status: Reviewed & tested by the community » Needs work

Quite frankly, I am puzzled by this patch. I barely understand what it does, or how it does it. Everytime I plunge back into it, it takes me a good 30 minutes to figure out what the heck is going on. So as a note to myself in the future or other readers, read comment #13 to really understand the general idea here. :)

One issue with the patch is that it changes where/when the files are sync'd. currently, this is done in the _provision_drupal_create_settings_file() function. This patch seems to change that to make it so that files are not sync'd when the settings file is created, but at other times. There is no description on when that is exactly, or whether or not it exhaustively covers all current invocations of _provision_drupal_create_settings_file(). (Now I understand, after reading that thread again, that it's what *i* suggested a long time ago, but i'd still like to hear if proper dilligence was done here. ;)

Besides, that is an API change: not only we add parameters to the sync function (which is usually tolerated), but we change the way the _provision_drupal_create_settings_file() function operates significantly, which could impact third party modules.

Therefore, I would like to push this only in 2.x for testing for now and release 1.7 without it. I am sorry but we have lots of other bugfixes to push out and I just don't feel familiar enough with this clusterfuck to commit this with confidence. :)

Note that 2.x nightly deb packages should be built by jenkins shortly after 1.7 is released, so you'll be able to use that code anyways. :)

Now onto a final code review before committing. :)

+++ b/platform/provision_drupal.drush.incundefined
@@ -102,11 +102,24 @@ function drush_provision_drupal_provision_install_backend() {
+function provision_drupal_sync_site( $options = array(), $exclude = NULL ) {

seems to me $exclude should be part of the $options array - no?

i find it odd that we modify a magic drush option here, shouldn't this be handled by sync() instead?

+++ b/platform/verify.provision.incundefined
@@ -11,6 +11,9 @@ function drush_provision_drupal_provision_verify_validate() {
+    $exclude_path = 'files/*' . PATH_SEPARATOR . 'private/*';
+    $options = array( 'update' => TRUE );

do not use needless temporary variables.

Otherwise it seems that this is tested and works, and those comments about code are mostly cosmetic - I think you basically nailed it. I think we should *really* get the documentation in shape otherwise the reasoning behind all this (#13 and #28) will be lost in the mists of time...

I am sorry to review and push this back only right before release, but this is crazy code that no one really understands and i don't want to blow it. :) Let's target 2.x for now and merge down into 1.x once we have testing from (e.g.) the barracuda folks. Hopefully the patch can be easily ported to 2.x...

Thanks for your persistence, we're going to make it eventually! :)

chrisschaub’s picture

Ok, but what can be done about the issue in general -- Aegir overwrites files on verify and users will lose content. Seems pretty serious and should be a blocker for any next release imo. If you upload a file on the spoke and the exact same filename exists on the hub, the hub will overwrite the spoke. This patch prevents that from happening by syncing files from the spoke to the hub first.

ADDED:

Another problem with the patch on this issue is that there are still contrib modules (backup cleanup) that have issues when the files are out of sync. It gets cleaned up after a clone or migrate, but the backup cleanup jobs fail until that happens. Probably because this patch doesn't sync on verify. I actually now am thinking that a simpler, more absolute approach that always syncs the files from spoke to hub is safer. It uses rsync so it won't be that expensive.

Here's approach that is more absolute and should work every time (it was submitted a year ago by the barracuda folks):

http://drupal.org/files/issues/d365687f879a3f346cc44d40e45332e0783b7833....

I'll test this with 1.7 once it's out and submit a new patch if it works.

anarcat’s picture

What I don't understand is why such a file would be present on the hub in the first place...

Also, yes, it does look like that other patch is simpler and more understandable. It also seems to not change the behavior and API, maybe that's what should be shipped with 1.x.

@jmmclelland, what do you think?

chrisschaub’s picture

The workflow is that after a clone, the files are synced back to the hub. Then, if you updated the file on the spoke, same filename, then the next time a verify happens, aegir will replace that file on the spoke with the previous version that was on the hub.

chrisschaub’s picture

Any movement on this? I'll create a patch for 1.7 dev based on the approach in #33 if you want? This needs to get solved because data is getting overwritten.

EugenMayer’s picture

until you did not really change a lot with 1.0-1.6 in the sync code (what i dont know), this patch does only scratch the surface.

How to frankly "unfollow" on d.o? This button seems to not do anything :)

Steven Jones’s picture

Okay, we need to absolutely get better at handling files and not being stupid like we are now.

Thoughts:

  1. Could we just skip the sync of the files directory for everything except backups?
  2. Could we sync the files back every time we sync back out to the spoke.
  3. We should sync the private files in addition to the public files directories.
chrisschaub’s picture

#1 and #3 agreed.

#2 is the trick. "Files" should never be pushed from the hub to an existing spoke -- the hub could always be out of date due to user updates. The only case where a hub should be pushing "files" is in the case of a clone or migrate to a new spoke site. But never overwriting the files of an existing spoke. This is what's broken. Just update a file with the same name on the spoke and the hub will overwrite it on the next verify.

gmania’s picture

The files directory clearly needs to be synced in the case of a clone or a migrate, since files might be moving to another server, and in the case of a clone, there might be default or dummy files which need to be included for the initial version of the site to work correctly.

As I see it, the main issue is that the sync is blowing away locally changed files.

There's actually a MUCH EASIER fix to all this - use the "update" option for rsync, which will prevent rsync from overwriting any files on the spoke which are more recent than those on the hub.

1 line patch against 1.8 attached. Tested with the particular bug we were having (update to theme logo) and it does the trick, but should be tested more widely to make sure there aren't other unintended consequences.

This still doesn't fix the issue with drush cron running against an out of date version of the files directory, but I believe that running the cron on the individual server should get around this (either by a cron-job specific to the server, or via a web trigger of the cron)

chrisschaub’s picture

Oooh, that's a nice patch! This looks promising.

gandhiano’s picture

I applied the patch and it seems not to be overwritting files on the spoke. However, I don't think this quite addresses the issue of having a synced copy of the spoke on the hub (which a previous patch attempted to do). At least I'm not getting my hub sites/ folders synced.

EugenMayer’s picture

Well, and now, delete a file on the satellite....oh..it's getting restored all the the time...It's probably a kind of hotfix, but for sure, the real solution for this code base will look very complicated

chrisschaub’s picture

I'm still thinking the approache in #33

http://drupal.org/files/issues/d365687f879a3f346cc44d40e45332e0783b7833....

is the way to go since it's pretty blunt and should work every time. Especially if you rsync from spoke first and then rsync to remote. I think it gets complicated because of acl issues between hub and spokes.

gandhiano’s picture

I applied the same changes to 1.8 and made a new patch, similar to #33, but which also includes sites/themes in this spoke-hub sync model (to address #1561102: Allow spokes to change their themes and modules, allowing user themes on spokes)

Everything seems to work perfectly, although this is obviously a change in the way that provision deals with hub-spokes until now. If this model is followed (for 2.x?) then it should be useful to also include sites/modules folder, allowing users to directly upload/enable modules (in any case #762138: Design security issue with developer access to sites' modules and themes should be addressed)

gandhiano’s picture

Patch mentioned on #45 (and probably the one from #33, on which it was based) is breaking the installation process of new sites.

Comitted a new patch on #4 #1561102: Allow spokes to change their themes and modules, which also enabled the option update on rsync (patch on #40)

chrisschaub’s picture

i'd suggest we close this issue and move the discussion over to 1561102 then? Since the patch there fixes this issue and more.

anarcat’s picture

Some notes after talking in person with jmcclelland.

1. files/ and private/ should be authoritative on the spoke, in MOST cases: migrate and clone are exceptions where the files need to be synced back before being sent out again
2. themes, libraries and modules are NOT authoritative on the spoke and should be modified on the hostmaster (see #1561102: Allow spokes to change their themes and modules and #1079274: Sites data get lost on migrate or clone / dont "spoke" the site for a followup on that)
3. to #1, we need to change the way files are synced in some tasks, which is going to break the API

With regards to my previous comments on the patch:

* the exclude-path override is the only way to pass multiple --exclude arguments down the throat of the drush stack, so that can stay
* the --update option should go away, I think. it was unclear what it was for and never defined. if it actually is necessary, a comment should be added. For a fun conversation about this, see #1068126: make rsync transfer only changed files

I think we should break the API, in this exceptional case, because without this fix Aegir actually can delete user data, which we want to avoid at all costs. And since multi-server support was a stated (and realized!) goal of the 1.x branch, I think we should allow ourselves this exception to the API rules to fix a major issue.

In addition to the above, we should even consider using --delete in rsync, as right now there is no way to delete a file from a remote spoke if it was there during the last backup/migrate.

We disagree with the positions stated in #1561102: Allow spokes to change their themes and modules about allowing spokes to be authoritative for themes/ and modules/. This is an even more major change and we are not ready to perform this in a minor point release, especially if it doesn't break a critical issue. The assumption right now is that the hostmaster is authoritative in general. With the issue here, we add an exception for the files that are created by the webserver, but we do not want to push this further. Besides, this would be inconsistent with the way we handle platforms now which is that the hub is authoritative for platforms, not the spoke.

I will try to post an issue summary here, in the meantime, I am waiting for jmcclelland to post a new patch that will (basically) remove --update and that will be tested in his production environment. This patch will then be applied onto the 2.x tree for further testing and hopefully migrated into 1.x once the core team has validated the API exception.

anarcat’s picture

Title: Dont sync $site/files on verify of a site » Make the spokes authoritative for files/ and private/ directories
Version: 6.x-2.x-dev » 6.x-1.x-dev

I wrote an issue summary and changed the bug title, hopefully things will be clearer in the future.

anarcat’s picture

Issue summary: View changes

write a first issue summary which should help tremendously in the future reviews of this

jmcclelland’s picture

FileSize
6.14 KB

Attached is a patch that should apply cleanly to the current 6.x-1.x branch that includes changing update => true to delete => true consistent with anarcat's summary.

jmcclelland’s picture

Status: Needs work » Needs review
Anonymous’s picture

Just a quick +1 for the API change in 1.x. File loss should be avoided by whatever means necessary.

Will the pack module be affected in any way? The way it 'syncs' files (or not) is not too clear to me.

anarcat’s picture

I don't think pack will be affected, it doesn't actually sync any site files.

omega8cc’s picture

A quick +1 for the API change in 1.x from me. It is a pretty serious issue so we should allow this change.

Best,
Grace

EugenMayer’s picture

Took 16 months to finally recognize this as major... as i fixed this a long time ago (and it took me a horrible amount of work AFAIR), i really want to give you a truely well-meant advice:
Test this up and down, this will brake aegir processes in on several places for a lot of remote-setups.

helmo’s picture

I've done some basic testing in a mockup aegir master/slave setup.

Test steps:

1. add a module on the master and check that after a verify task it's present on the slave
2. remove a module on the master and check that after a verify task it gone on the slave

3. add a file to the files folder on the slave and check that after a backup task it's also on the master and included in the backup.
4. remove a file from the files folder on the slave and check that after a backup task it's also removed from the master and no longer in the new backup.

1,2,3 PASS
4 FAIL
Our fetch() method does not default to doing delete's. So this either has to be added to the fetch calls in the beginning of drush_provision_drupal_provision_backup() or in the class.

In the current 1.9 version both 2 and 4 FAIL

_vid’s picture

One more test that we've been running is:
5. Update a file on the spoke (update files via the css_injector or js_injector modules or delete a file and upload a new / changed file with the same name). Verify the site and see if changes in the file are still intact.

clemens.tolboom’s picture

@_vid : did your step fail?

With my Aegir install running on 1.9 step 5 (#58) fails

My case is with http://drupal.org/project/fontyourface having to resave the settings to get a correct _sites_/files/fontyourface/font.css

Running a verify overwrites the latest version (on slave) with the one from the master. (does the current rsync check for the timestamp?)

helmo’s picture

Test 5 fails on Aegir 1.9 and passes with the patch from #50 applied.

Apart from some whitespace issues this line drew my attention.

      $old_exclude = drush_get_option('exclude-path',$exclude);

The $exclude value is used as default for drush_get_option which later is restored with drush_set_option.

Here's an updated patch which fixes this and a few whitespace issues and adds phpDoc for the introduced parameters to provision_drupal_sync_site()

After this test 5 starts to FAIL again, so #50 depended on not restoring $old_exclude properly.
I'll look into this a bit more...

edit: typo

helmo’s picture

Status: Needs review » Needs work

provision_save_site_data() also calls provision_drupal_sync_site() but without the exclude being active.

I've gone over the usages of provision_save_site_data() and have the feeling that we could remove the sync functionality from this function.
The patch from #50 already adds a call to provision_drupal_sync_site just above those lines. So if we move it to be just below it....

Then one usage remains in provision_drupal_drush_exit() ...

EugenMayer’s picture

removing sync will not work, as it is needed by several other parts.

helmo’s picture

If we take the title of this issue: Make the spokes authoritative for files/ and private/ directories

Should we change the default for provision_drupal_sync_site() to exclude the files folder.
The tasks that need it could then add a parameter to not exclude...

theMusician’s picture

I like that idea helmo. If I understand the implications correctly that would make the spoke files/ and private/ directories be authoritative unless a provision function overrides the default to make the hub authoritative.

The only time that would need to happen is after the hub has synced with the spoke. The hub would then be up to date and a user could make a clone of that site or migrate it to a new server/platform.

theMusician’s picture

Issue summary: View changes

nitpicking

helmo’s picture

Status: Needs work » Needs review
FileSize
8.4 KB

Heres's a patch that indeed reflects the issue title.

It's dependent on the small refactoring I did in: #1812338: Refactor sync back

All 5 tests pass for me with this patch. I've added the test cases to the issue summary.

gandhiano’s picture

Status: Needs review » Needs work

Patch on #66 fails to apply properly on provision-1.9:

patching file backup.provision.inc
Hunk #1 succeeded at 77 (offset 5 lines).
Hunk #2 succeeded at 105 (offset 5 lines).
patching file delete.provision.inc
patching file deploy.provision.inc
patching file install.provision.inc
patching file migrate.provision.inc
patching file provision_drupal.drush.inc
Hunk #2 FAILED at 148.
Hunk #3 succeeded at 153 (offset -16 lines).
1 out of 3 hunks FAILED -- saving rejects to file provision_drupal.drush.inc.rej
patching file verify.provision.inc
patching file provision.context.server.inc
Hunk #1 FAILED at 210.
1 out of 1 hunk FAILED -- saving rejects to file provision.context.server.inc.rej
anarcat’s picture

It also fails to apply on 1.x.

helmo’s picture

Status: Needs work » Needs review

@gmania: You need the 6.x-1.x branch not the 1.9 release.

@anarcat: Did you have #1812338: Refactor sync back? It applies cleanly to the 6.x-1.x after that small refactoring is applied.

anarcat’s picture

Status: Needs review » Needs work

Patch still fails on 1.x and 2.x, even with the refactor.

I also question this hunk:

+++ b/platform/provision_drupal.drush.incundefined
@@ -117,12 +148,14 @@ function provision_drupal_sync_site() {
  * E.g. before a backup is made.
  */
 function provision_drupal_sync_site_back() {
+
   // synch filesystem changes back from the remote server.
   d()->service('http')->fetch(d()->site_path . '/files/');
   d()->service('http')->fetch(d()->site_path . '/private/');
-  d()->service('http')->fetch(d()->site_path . '/modules/');
-  d()->service('http')->fetch(d()->site_path . '/themes/');
-  d()->service('http')->fetch(d()->site_path . '/libraries/');
+  //d()->service('http')->fetch(d()->site_path . '/modules/');
+  //d()->service('http')->fetch(d()->site_path . '/themes/');
+  //d()->service('http')->fetch(d()->site_path . '/libraries/');
+  // Questionable... who is authoritive?
   d()->service('http')->fetch(d()->site_path . '/local.settings.php');
 }

As I mentionned earlier, and in #1561102: Allow spokes to change their themes and modules, this is a rather major change in the way modules and themes are handled.. See my comment #15 as to why this can't be refactored in 1.x. As I have said there, I would welcome a more general overhaul of this, but only within the context of rethinking the hub and spoke model.

This issue is about finally *not* destroying data uploaded by users in sites managed on spokes by Aegir, and I think this is something that should land in 1.x. Redoing the whole spoke and hub model is work for the 2.x branch, so please do reroll this patch by leaving in the regular _back() hook.

Thanks for the hard work! Nice to see this moving forward...

EugenMayer’s picture

Hey, just stumbled upon, this issue is about to have his 2nd birthday tomorrow, thats kind of mature 2 years. Happy birthday :)

helmo’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev
Status: Needs work » Needs review
FileSize
9.21 KB

Darn, I see that I had a patch here destined to be comment 70, I must have been distracted :(

I've re-rolled it against 6.x-2.x in my provision sandbox.
Now without the commented fetch lines for modules, themes and libraries.

See http://git.drupal.org/sandbox/helmo/1957834.git branch 6.x-2.x-1083366

I've recently upgraded my last 6.x-1.x to 2.x, so I'm upping the version of this issue.

helmo’s picture

Note that without this patch I'm getting an error

rsync: --no-delete: unknown option rsync error: syntax or usage error (code 1) at main.c(1443) [client=3.0.7])
Could not rsync from '/var/aegir/platforms/myplatform/sites/example.com' to 'aegir@aegirvm05.example.com:/'
/var/aegir/platforms/myplatform/sites/example.com could not be synced to remote server aegirvm05.example.com. Changes might not be available until this has been done. (error: rsync: --no-delete: unknown option rsync error: syntax or usage error (code 1) at main.c(1443) [client=3.0.7])
anarcat’s picture

are we good to go on this? it's a 2.x release blocker in our roadmap.

helmo’s picture

I've been using this live for a while now... Looking at the code I had only added one more call to provision_drupal_sync_site(). See http://drupalcode.org/sandbox/helmo/1957834.git/commitdiff/78cd403427402...

EugenMayer’s picture

helmo, good one! But, there are more of those :)

And when you stubled uppon those, provision_drupal_sync_site will needs some parametrisation.

helmo’s picture

helmo’s picture

New full patch to make testing easier.

clemens.tolboom’s picture

+++ b/Provision/Context/server.php
@@ -212,13 +212,25 @@ class Provision_Context_server extends Provision_Context {
+        if (!isset($additional_options['no-delete']) || $additional_options['no-delete'] == FALSE ) {

You should use === FALSE I guess.

helmo’s picture

@clemens.tolboom: I don't think so.

helmo’s picture

Status: Needs review » Fixed

I've committed this to 6.x-2.x (http://drupalcode.org/project/provision.git/commit/7b6dd11)

Let's test this further in alpha2

anarcat’s picture

awesome.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated proposed solution to recent work.
Add test steps

  • Commit 7b6dd11 on dev-drupal-8, 6.x-2.x, dev-ssl-ip-allocation-refactor, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x by helmo:
    Issue #1083366 by jmcclelland, helmo, EugenMayer, cschaub, gmania: Fixed...