Currently we gzip all backups, including on clone and migrate operations. This step appears to be a huge choke point for I/O, and can cause system load to spike, and other nastiness. It'd be great to make this step optional.

Comments

ergonlogic’s picture

In backups.provision.inc, we have:

function drush_provision_drupal_provision_backup_validate($backup_file = NULL) {
[...]
  drush_set_option('backup_file', $suggested);
[...]

and

function drush_provision_drupal_provision_backup() {
  $backup_file = drush_get_option('backup_file');
[...]
   if (substr($backup_file, -2) == 'gz') {
    // same as above: some do not support -z
    $command = 'tar cpf - . | gzip -c > %s';
  } else {
    $command = 'tar cpf %s .';
  }
  $result = drush_shell_exec($command,  $backup_file);
[...]

This seems to indicate that we could strip the '.gz' from the 'backup_file' Drush option in a new drush_provision_drupal_pre_provision_backup() function. If that's the case, we could add settings to control this on the front-end, in contrib even.

helmo’s picture

I still hope to delegate backups to drush archive-dump someday... but this looks possible.

gboudrias’s picture

I've implemented the file name change in a testing module, and I've encountered the following error:

Running: gunzip -c /var/aegir/backups/testsite1-20140107.231342.tar | tar pxf - in /var/aegir/hostmaster-6.x-2.x/sites/testsite1.restore

Failed to extract the contents of /var/aegir/hostmaster-6.x-2.x/sites/testsite1.restore to @target (The file could not be extracted)

It seems like it doesn't take the file name into account when restoring.

I'm working on a patch.

gboudrias’s picture

Sorry I meant the backup works as intended (no gzipping), but the restore fails.

gboudrias’s picture

Status: Active » Needs review

See issue in Provision project: https://drupal.org/node/2169025

ergonlogic’s picture

Project: Hostmaster (Aegir) » Hosting Site Backup Manager
Status: Needs review » Needs work

Since most of the development of new backup functionality is happening in hosting_site_backup_manager, this feature request probably deserves to be here for now. It may even be merged into Aegir 3.x.

Since un-compressed backups are easily 10x the size, we presumably only want to do this occasionally, and we might want to garbage collect them right away. This would probably mean adding a checkbox on backup (?), clone, and migrate task forms, and having to pass a 'no-gzip' flag to the back-end on these tasks.

gboudrias’s picture

What about the backup queue? Maybe there should be an option in the site's edit form for uncompressed backups. Although that doesn't invalidate the need for a checkbox in the backup forms.

gboudrias’s picture

Assigned: Unassigned » gboudrias

I'm working on this btw, implementing the solution largely as described in #6. It will be a new feature.

I'm about 60% of the way through.

gboudrias’s picture

I'm stuck at getting the backup name change to the backend.

// This should have   $backup_file = drush_get_option('backup_file'); but doesn't
// Need hook ordering to get this hook to fire last
function drush_hosting_backup_nogzip_pre_hosting_task($task) {
  $task = &drush_get_context('HOSTING_TASK');

  $nogzip = $task->task_args['nogzip'];

  if (isset($nogzip)) {
    $process = &drush_get_context('process');
    drush_set_option('nogzip', $nogzip);

    $process['nogzip'] = $nogzip;

    drush_log('Setting nogzip: ' . $task->task_args['nogzip']);
    drush_log('Settings: ' . implode(' --- ', array_keys($task->task_args)));
  }
}
// This function is probably not needed if we can set the backup_file option in drush_hosting_backup_nogzip_pre_hosting_task
function drush_hosting_backup_nogzip_pre_provision_backup($url = NULL) {
  $backup_file = drush_get_option('backup_file');
  $nogzip = drush_get_option('nogzip'); // This does NOT appear to exist at this point

[...]
gboudrias’s picture

helmo’s picture

StatusFileSize
new2.14 KB

Your drush_hosting_backup_nogzip_pre_provision_backup() hook is not called because it's not in the right context.

While drush_hosting_backup_nogzip_pre_hosting_task runs in the context of a hostmaster site, the actual backup is done in the context of the site to be backed up.
So to make the hook work it needs to reside in ~/.drush. That's also the reason why many aegir contribs have a provision_ and a hosting_ d.o. project.

Here's also a patch against your github copy to make it work after you move the drush.inc file.

Still needs more cleanup though ;)

gboudrias’s picture

Status: Needs work » Needs review
StatusFileSize
new2.69 KB

Thanks a bunch!

I've pushed your changes to my fork, as well as condensed them in the attached patch. Note that I've removed the temporary code for future features.

For this feature, we'd either need instructions in the README or a hsbm_provision module. The latter seems a little overkill for an optional feature. What do you think?

helmo’s picture

There already is a https://drupal.org/project/provision_site_backup_manager module :)

You can submit a patch for that module in this same issue.

A few notes:

$nogzip = $task->task_args['nogzip'];

maybe needs an if to avoid a PHP notice.

The comment above drush_hosting_backup_nogzip_pre_provision_backup() is incorrect.

Questions:
Is a restore task working? (see #3)

gboudrias’s picture

StatusFileSize
new2.69 KB
new1.35 KB

Wow, that's good to know!

I've moved my changes to the provision module, so this is a patch for the provision module, plus a patch for this module (non-provision part) that doesn't include the .drush.inc file anymore. (I've also pushed the changes to my temporary fork if you prefer a messy history).

Task restoring works if you use the Provision patch in https://drupal.org/node/2169025 :) There wouldn't be a way for it to work otherwise.

gboudrias’s picture

Turns out the function names are wrong in that last patch. I don't know if it's worth another patch, but you might want to fix that before committing...

ergonlogic’s picture

We ran some benchmarks to better understand the repercussions of skipping gzipping during backups. We only considered clone, where we then also deleted the uncompressed backup. We used vmstat to pull stats at 1-second intervals. Here are the results of our (admittedly) small sample (3 runs each with and without gzipping):

                  gzipped	Un-gzipped   Difference	
Memory max        439679        725705       165.1%
Memory avg        289164        416717       144.1%
			
User CPU max      85            77           90.6%
User CPU avg      30            24           81.3%
	
I/O BO max        26023         25957        99.7%
I/O BO avg        7611          3684         48.4%
			
execution time	  83            37           44.6%

Execution time and I/O dropped by more than half. CPU usage dropped moderately, and memory usage went up fairly significantly.

gboudrias’s picture

Status: Needs review » Reviewed & tested by the community

I think ergonlogic reviewed this for our benchmarks. Please commit and/or review some more.

helmo’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/hosting_backup_nogzip/hosting_backup_nogzip.module
@@ -0,0 +1,24 @@
+    $form['#submit'][] = 'hosting_backup_nogzip_form_submit';

Where is this submit handler?

As you mentioned in #15 this needs a bit of work... did you mean hosting_backup_nogzip_form_alter?

helmo’s picture

Version: 6.x-2.x-dev » 7.x-3.x-dev