Active
Project:
Provision
Version:
7.x-3.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
15 Mar 2011 at 16:31 UTC
Updated:
22 Jun 2017 at 17:55 UTC
Jump to comment: Most recent
Is there a way to exclude tables from backup sets, either through the UI or through a code hook?
There are many reasons why it might be undesirable to have certain tables included in backup sets. Some, such as cache tables, take up disproportionately huge amounts of space and are simply not needed. Others, such as notifications_queue, can actually cause a bit of trouble if backed up and restored (people may start getting duplicate emails from the site).
Comments
Comment #1
danepowell commentedDoes this issue better belong in another Aegir queue, or on the community site? I'd really like some more information on it... even just advice on how I might go about writing a contrib module to deal with it.
Like I said, there are very good reasons to exclude these tables (caches and search index are huge, and can always be rebuilt; and other tables such as the notifications queue can cause real trouble), and I have no idea how else to get around this problem.
Comment #2
steven jones commentedThis probably belongs in the provision queue, so moving over there.
As for how to do this, I suspect that we'd need to send some more commands to MySQL dump to not dump the data for specific tables. However, I don't think the logic is going to be as simple as, here's a list of tables to not dump, because we could break a site very easily like that (by clearing the D7 registry for example) or creating a cache stampede.
I suspect that most of want you want to be able to do this for could be covered by a some kind of 'environment' migrate, so you'd do a migrate to 'DEV' which would know to empty the notifications table after migration, whereas the migrate in LIVE would not.
Comment #3
danepowell commented@Steven Jones - good point, I would want to exclude tables during regular backups and migrations to dev platforms, but NOT exclude tables during migration of live sites.
Comment #4
anarcat commentedSo you can use --ignore-table in mysqldump, but the problem with that is that it *completely* excludes the table from the dump, even the CREATE TABLE statements. Using that flag would result in a broken dump... The trick I have used in the past to workaround this problem is to dump the DB structure first (with -d/--no-data) and then dump the table data only for the relevant tables (--no-create-info, -t and --ignore-tables), but that means calling mysqldump twice and appending to the file on the second pass...
In other words, it's not just adding a flag. :) But I do agree this would be a useful optimization...
Comment #5
danepowell commentedAnother useful and related feature would be to replace all of the email addresses in the user table with user@example.com or null@localhost or something like that when migrating to a Dev platform, so that emails can't be sent from a development site to actual users.
Comment #6
joestewart commented#5 seems to be a different feature, but a little context to a solution for sanitizing email addresses - drush 4.5 should have a command - #1112142: Run sanitize without running full sql-sync
Comment #7
danepowell commented@joestewart- thanks for the tip! Not to hijack this thread too much, but can you point to any advice on how to utilize that with Aegir migrations? i.e., is it possible to implement hook_drush_sql_sync_sanitize() and somehow detect the source and target platforms of a site migration?
Comment #8
jason.fisher commentedIf you have sed installed you can use this workaround to clean data from the database.sql after extraction and prior to gzip. A PHP-native solution would probably use stream filters to avoid memory issues..
YMMV -- if you are using this in production, you better test restores thoroughly. The regexp can certainly be improved.
Paste the following into: /var/aegir/.drush/zbackup_filter.drush.inc -- be careful of excess whitespace.
Comment #9
danepowell commentedNice bit of lateral thinking there jason, sanitizing the database while it's in text form using a simple regex. It's still not quite ideal, since mysql is still going to burn unnecessary cycles dumping the tables that will just get thrown away anyway, but it's a start!
Of course, it's still true that you might not want to sanitize the databases for every backup- for instance, during a simply migration, you might not want to kill the caches and search_index, which would have to get rebuilt if the migration fails. A good alternative would be to add this script to Hosting Backup GC as a way to free up space without losing any valuable data.
Comment #10
jason.fisher commentedYeah, not quite ideal .. but was helpful in making our Cloud Files offsite a bit more manageable.
Here's an update that adds filtering of certain files from the sites/*/files/ directory. I am using it to exclude backup_migrate backup files from aegir backups. Again .. it is another hack. This one moves backup_migrate out of the sites/*/files directory prior to gzip and moves it back in after. If the process fails, those backup_migrate files may be left in /tmp. YMMV.
I would love to see some way to extract tables as individual files, so space could be saved by de-duping those tables that rarely change..
Comment #11
steven jones commentedComment #12
mvcthat's an enormous chunk of text in a single sed regex. at minimum i would recommend searching for the INSERT lines instead.
why don't we just use the --structure-tables-* options for drush sql-dump?
Comment #13
mvcComment #14
ergonlogicNew features need to be implemented in Aegir 3.x, then we can consider back-porting to Aegir 2.x.
Comment #15
anarcat commented@mvc is right, we shold just use --structure-tables-list or --structure-tables-key.
aegir3 has a new local.drushrc.php config file that can be edited by site admins that allows operators to just drop options like this. you can then use this drush config to skip the data on the tables you want to exclude:
then calling:
drush sql-dump --structure-tables-key=common... will do what you'd expect. you could also skip those tables entirely for all commands with:
i guess we could add those as options to provision-backup, provision-migrate and (maybe) provision-clone ...
this is where an eventual patch should go...
Comment #16
helmo commentedWe would have to change the implementation of generate_dump() in e.g. db/Provision/Service/db/mysql.php. It now just calls mysqldump.
There must have been a reason why we did'nt use drush sql-dump, but chances are that Drush does all we need today.
Comment #17
steven jones commented@helmo drush sql-dump wasn't around at first, and then when it was it leaked DB credentials into the system process listing. Anarcat will know more on these topics I suspect.
Comment #18
ergonlogicI believe the db credentials leaks are solved in Drush 7.x. We should consider refactoring as suggested by @helmo in #16. It'd be pretty straight-forward to modify the tables in a backup in any number of ways at that point.
Comment #19
colanYes, those leaks appear to be plugged as per mysql credentials leak in drush sqlc.
So we can basically just replace this line in generate_dump()
$cmd = sprintf("mysqldump --defaults-file=/dev/fd/3 --single-transaction --quick --no-autocommit %s", escapeshellcmd(drush_get_option('db_name')));...with
drush sql-dump --structure-tables...?Not sure what that
--defaults-fileis about.Looks like we can simplify importing as well, dropping safe_shell_exec(), but let's do that in a follow-up issue.
Comment #20
helmo commentedThe 'defaults-file' was to pass in credentials.
Big +1 to rewrite this, hosting_sync already uses sql-sync.