The attached patch has some fixes for the archive-dump command. I made the patch against the version in the 4.x branch, but Mark S tells me the file is the same as in the 5.x branch.

Comments

msonnabaum’s picture

Status: Active » Reviewed & tested by the community

Looks pretty sensible to me.

bjaspan’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new5.32 KB

New patch.

I discovered a really surprising (to me) behavior of Linux tar. The following tar file cannot be extracted on Linux by a non-root user:

# tar tvf test.tar 
dr-xr-xr-x bjaspan/staff     0 2011-05-09 20:37 a/
-rw-r--r-- bjaspan/staff     0 2011-05-09 21:00 dummy
dr-xr-xr-x bjaspan/staff     0 2011-05-09 20:37 a/b/
-r--r--r-- bjaspan/staff     0 2011-05-09 20:37 a/b/c

The problem is that the directory "a" is created read-only, and then I guess when tar moves on to the file "dummy" which is not inside "a" it actually sets the mode on "a" to be 555 (instead of 755 which it must have been at first). Then, when it tries to create "a/b", it gets a permission denied error because "a" is read-only.

This bug is triggered on Acquia Cloud because our docroot directories are not writeable (mode 550). The current archive-dump command uses "tar --exclude $docroot/sites/*" which *includes* the sites directory, which is mode 550. Later it runs "tar -rf tarball $docroot/sites/all" which appends the sites/all directory, but this then hits the behavior described above on extract.

The new patch "fixes" the problem by using "--exclude $docroot/sites" so the sites directory is not included empty before it is included with content. I'm sure this is fragile and other similar problems are still possible, but this at least works in the basic case.

bjaspan’s picture

Status: Needs review » Needs work

#2 contains a bug.

bjaspan’s picture

StatusFileSize
new5.68 KB

Take 3. For an uninstalled site, $status['object']['%paths']['%site'] is empty which resulted in $docroot/sites/all being added to the tarball twice, causing problems on extract.

msonnabaum’s picture

Status: Needs work » Reviewed & tested by the community

This all looks good to me. Thanks Barry.

moshe weitzman’s picture

If Mark or anyone else has tested this, then I'm fine with committing it.

msonnabaum’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 5 and backported to 4.

das-peter’s picture

Status: Fixed » Needs work

As far as I can see this patch introduces an issue in the case the destination parameter is used.
Following code sets the destination variable to FALSE if the specified destination doesn't exist:

   }
+  else {
+    $destination = realpath($destination);
+  }

This causes following issues:

  • If you specify a folder as destination no filename is generated.
  • If you specify a file that doesn't exist the destination variable will become empty.

What is the desired behaviour? There are several possibilities to deal with the different input types (folder, file).

das-peter’s picture

Status: Needs work » Needs review
StatusFileSize
new2.92 KB

I've created a patch to fix the issue described in my last comment.
The patch introduces some logic to determine what to do with a passed destination value.

  • If the destination is a valid folder a filename is generated
  • If the destination is an existing file the script stops with a error message
  • If the destination doesn't exist the script assumes that the basename() of the defined destination is the filename and the dirname() is the directory structure. If the directory doesn't exist the script tries to create it - and stops on failure

Besides this changes I added a condition in the code which fetches the db connection information. If this fails the method sitealias_get_databases_from_record() returns NULL what would lead to an "Unsupported operand types" Exception because of this construct $full[$key] = $alias += sitealias_get_databases_from_record($alias);

bjaspan’s picture

I also think there are problems with the way --destination is handled, but I don't agree with the proposed changes in #9. They are too complicated to explain and document. My suggestion for how it should behave:

* --destination should accept only a file path (a path ending in a file name), not a directory path.
* If a destination file path is specified, that is the exact name of the generated archive; accepting --destination=/foo/bar but creating /foo/bar.gz is confusion.
* If the destination is not writable (possibly because the path does not exist), the command fails; no creating directories or changing modes or anything else.

Thoughts?

moshe weitzman’s picture

Barry's suggest sounds reasonable to me.

bjaspan’s picture

StatusFileSize
new6.25 KB

Okay, here we go. This turns out to be absurdly annoying to get right. I also kept the aliases fix from @das-peter.

bjaspan’s picture

Oh, right, I meant to include some manual tests:

[bjaspan] ~$ drush @master archive-dump --destination=dump.tar.gz
Archive saved to /Users/bjaspan/dump.tar.gz                          [ok]
[bjaspan] ~$ drush @master archive-dump --destination=dump.tgz
Archive saved to /Users/bjaspan/dump.tgz                             [ok]
[bjaspan] ~$ drush @master archive-dump --destination=dump.dar
Archive saved to /Users/bjaspan/dump.dar                             [ok]
[bjaspan] ~$ drush @master archive-dump --destination=dump.dar
destination /Users/bjaspan/dump.dar exists; specify --overwrite to   [error]
overwrite.
[bjaspan] ~$ drush @master archive-dump --destination=dump.dar --overwrite
Archive saved to /Users/bjaspan/dump.dar                             [ok]
[bjaspan] ~$ drush @master archive-dump --destination=/tmp/dump.tgz
Archive saved to /private/tmp/dump.tgz                               [ok]
[bjaspan] ~$ drush @master archive-dump --destination=drupal/dump.tgz
Archive saved to /Users/bjaspan/drupal/dump.tgz                      [ok]
[bjaspan] ~$ drush @master archive-dump
Archive saved to                                                     [ok]
/Users/bjaspan/drush-backups/archive-dump/20110528140028/drupal_fields.20110528_020028.tar.gz
[bjaspan] ~$ ls -l dump* /tmp/dump.tgz drupal/dump.tgz /Users/bjaspan/drush-backups/archive-dump/20110528140028/drupal_fields.20110528_020028.tar.gz 
-rw-r--r--  1 bjaspan  staff  4168303 May 28 10:00 /Users/bjaspan/drush-backups/archive-dump/20110528140028/drupal_fields.20110528_020028.tar.gz
-rw-r--r--  1 bjaspan  wheel  4168304 May 28 10:02 /tmp/dump.tgz
-rw-r--r--  1 bjaspan  staff  4168302 May 28 10:03 drupal/dump.tgz
-rw-r--r--  1 bjaspan  staff  4168303 May 28 10:00 dump.dar
-rw-r--r--  1 bjaspan  staff  4168305 May 28 10:00 dump.tar.gz
-rw-r--r--  1 bjaspan  staff  4168303 May 28 10:00 dump.tgz
[bjaspan] ~$ file dump* /tmp/dump.tgz drupal/dump.tgz /Users/bjaspan/drush-backups/archive-dump/20110528140028/drupal_fields.20110528_020028.tar.gzdump.dar:                                                                                      gzip compressed data, from Unix
dump.tar.gz:                                                                                   gzip compressed data, from Unix
dump.tgz:                                                                                      gzip compressed data, from Unix
/tmp/dump.tgz:                                                                                 gzip compressed data, from Unix
drupal/dump.tgz:                                                                               gzip compressed data, from Unix
/Users/bjaspan/drush-backups/archive-dump/20110528140028/drupal_fields.20110528_020028.tar.gz: gzip compressed data, from Unix
[bjaspan] ~$ 
moshe weitzman’s picture

Status: Needs review » Needs work

I know you guys are considering writing tests for this. That would be peachy :)

Lets use return drush_set_error() instead of drush_log('', 'error'); return. That initiates rollback which is appropriate here.

gzip has a --stdout option which does no rename, obviously. Not sure if thats helpful here.

msonnabaum’s picture

StatusFileSize
new8.04 KB

Updated the patch per Moshe's suggestions.

Also added a basic test that tests dumping to a destination and extracting it.

msonnabaum’s picture

StatusFileSize
new7.98 KB

Last patch missed the z in tar xzf.

bjaspan’s picture

Status: Needs work » Reviewed & tested by the community

I interpret Mark writing a test for my code as him RTBC'ing it, and now I'm RTBC'ing his test.

jonhattan’s picture

Status: Reviewed & tested by the community » Needs work

For consistency with other commands, destination should only accept a directory, and to specify a file name we already have --result-file. So --destination=/foo --result-file=bar.tgz

+++ b/commands/core/archive.drush.inc
@@ -36,27 +37,90 @@ function archive_drush_command() {
+    else {
+      drush_set_error(dt('DB connections not found for !alias', array('!alias' => $alias)));
+    }

1st param to drush_set_error() is a string with a error code, 2nd is the message. Several times in the patch.

+++ b/commands/core/archive.drush.inc
@@ -36,27 +37,90 @@ function archive_drush_command() {
+    // This doesn't perform realpath on the basename, but that's okay. This is
+    // not path-based security. We are just checking for permissions.
+    $dest_dir = realpath(dirname($final_destination));
+    $final_destination = $dest_dir . '/' . basename($final_destination);
+    drush_op('chdir', $command_cwd);

We check for path permissions with is_readable() and is_writable() in drush_move_dir(). Can't we do the same here?
Also, although not mandatory I like to do validation in hook_COMMAND_validate() whenever possible. See drush_pm_download_validate() for reference.

+++ b/commands/core/archive.drush.inc
@@ -131,9 +195,14 @@ function drush_archive_dump($sites_subdirs = '@self') {
+    drush_shell_exec("mv {$destination}.gz $final_destination");

I think drush_move_dir() is suitable for this (and cross platform).

bjaspan’s picture

Status: Needs work » Needs review

New patch. I fixed the calls the drush_set_error(). I replaced mv with drush_move_dir(), though I am a bit uncomfortable since it is called "move_dir"; I did read through the code and did not immediately see a problem with it moving files, though.

I did not change the meaning of --destination. I recognize that the pm commands use it for a directory. However, the pm commands are downloading a tarball that they will extract into a directory; thus, specifying a directory is the most natural thing for --destination. archive-dump is creating a file, so the most natural meaning of --destination is a file. If we make --destination specify a directory, we will end up answering a million support requests from users who do not understand why it takes a directory.

I only used --destination because Moshe put it in his first draft of the command. If we can't use it to specify a file, then we can rename it to --output or just print to stdout or something.

msonnabaum’s picture

StatusFileSize
new6.02 KB

Barry - you did not attach a patch :)

I actually worked up a version of what jonhattan suggested a few days ago so I'm attaching that. I also initially disagreed that we should break up the arguments or change what they do, but I think I came around to it after realizing it might be useful to specify a directory only but keep the name that drush auto-generates for you. I can go either way on that really.

moshe weitzman’s picture

FWIW, I really like the filename generation of sql-dump when you pass result-file. We should do same here, as suggested by msonnabaum and jonhattan. I think we can commit this really soon.

bjaspan’s picture

StatusFileSize
new8.21 KB

Here is the patch I meant to attach in #19.

Having --destination specify a directory into which an auto-generated filename will go will put a big burden on anyone that has to support customers using this command (i.e. me) because every time someone who isn't a drush developer tries to use it the first time, they will see "--destination" in the help and run

% drush archive-dump --destination=archive.tar.gz

and then get very confused when they end up with a directory named archive.tar.gz and some weirdly-named file inside it. I think having --destination and --result-file, where one specifies the path and the other the directory, will confuse users even more, because that is not how Unix command line tools work. Again, I think the natural meaning of --destination for this command is "where I want the archive to go", but if we can't use it because --destination is taken for something else, then let's just rename it to --output.

bjaspan’s picture

StatusFileSize
new8.33 KB

One more tweak for the help info for --destination. No matter what, we've agreed that "Specify where the archive should be stored. Include filename but omit the .gz suffix." is wrong.

msonnabaum’s picture

I think having --destination and --result-file, where one specifies the path and the other the directory, will confuse users even more, because that is not how Unix command line tools work.

Examples? If there is precedence in other popular non-drush commands, I think that's a good argument. However, I can't think of any.

msonnabaum’s picture

I think the best compromise might just be to keep Barry's functionality, but rename it to result-file to be consistent with sql-dump and the rest.

If we want to argue that result-file is a bad option name, that's an argument for another thread.

moshe weitzman’s picture

Status: Needs review » Needs work

I even think we can keep the name --destination as Barry has proposed. So, the only needed changes I can see are:

+ 'destination' => 'The full path and filename in which the archive should be stored.',. Can we elaborate on what happens when this omitted.

Instead of /tmp, use drush_tempdir() in master and drush_find_tmp() in 4.x

but gzip renames files and refuses to compress files ending with .gz and .tgz, making our lives difficult.. Neither is true if you use the --stdout option on gzip. So we should just save using redirection and simplify the code.

We should use drush_shell_cd_and_exec() and not manually chdir in the command. We don't want to forget to change back and leave rest of script in bad state.

msonnabaum’s picture

Version: » All-versions-4.x-dev
Status: Needs work » Patch (to be ported)

Committed Barry's last patch with Moshe's suggestions to 5.x. Still need to backport.

moshe weitzman’s picture

Status: Patch (to be ported) » Fixed

backported, and committed the tests. note that our test is mysql specific which is not too good. the test suite passes with postgres or did so recently.

moving on to archive-restore.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.