Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Currently drush sql dump (via mysqldump) outputs the data of each table in one single line insert. This makes it really hard or impossible to use a version control system to backup drupal sites. This patch adds the following options to the mysqldump command for having one insert line per record and keep them ordered by primary key
--skip-extended-insert --order-by-primary
Note that this also makes the dumps more human-friendly to read/debug. While the dump files might be larger, git and friends will be able to only store the changes and therefore save space in the long run compared to one-liner dumps.
Comment | File | Size | Author |
---|---|---|---|
#12 | drush_dump_vcs3.patch | 1.99 KB | scor |
#9 | drush_dump_vcs2.patch | 1.92 KB | scor |
#6 | 638282_backup_friendly.patch | 1.91 KB | scor |
drush_dump_vcs.patch | 863 bytes | scor |
Comments
Comment #1
scor CreditAttribution: scor commentedComment #2
moshe weitzman CreditAttribution: moshe weitzman commentedmy understanding is that sometimes order-by-primary can slow down the export significantly. anyone confirm or deny that? otherwise, i like this proposal.
Comment #3
greg.1.anderson CreditAttribution: greg.1.anderson commentedI have wanted this feature too; the only thing that held me back is that it is not supported by postgres (which shouldn't keep drush from supporting it in mysql, of course; it was just a question of priority, since I use postgres).
However, note that this feature would likely have a negative impact on the performance of sql sync. Order by primary will slow down the dump, and the larger dump file will take longer to rsync. However, there is another factor that might make sql sync faster; order by primary will help rsync optimizations in cases where new inserts cause the unordered dump to scramble the output file, which is possible by the specification. I haven't experimented with this extensively, but in postgres at least, unordered dumps tend to be fairly stable in their order. Overall, then, I would expect a decrease in performance, but it would only affect people with very large databases.
I recommend changing this patch so that it is controlled by a flag, similar to how $data_only is set. This would allow people to control the setting, so that they could have the higher-performance sql sync and a more ordered sql dump for backups, if they desired.
I don't much care if the default is ordered or unordered.
Please check the flag via
drush_get_option(array('source-ordered-dump', 'ordered-dump'))
, for the benefit of site aliases.I'd make the patch myself, but I'm too busy to test it right now -- sorry.
Comment #4
scor CreditAttribution: scor commentedYes, as the mysqldump documentation mentions, the dump will take longer, but I think this is balanced by the quality of the dumps you get. I believe committing this to a repository will be faster since there will less diff chunks to transfer. Many parts of the dump will not change (like the older node revisions etc.) if you keep a record per line and in order.
I ran a benchmark with a Drupal 6 db of 427 M (size in the mysql folder) and dump size is 306M. Times for a mysqldump were pretty variable so I did 10 dumps with each of the 3 following configuration for both MyISAM and InnoDB:
These are the average times I have found (MySQL 5.1 on Mac OS 10.5):
MyISAM
default: 9.2s
case 1 : 9.7s (+ 5% / default)
case 2 : 11.9s (+ 29% / default)
InnoDB
default: 12.8s
case 1 : 13.4s (+ 4% / default)
case 2 : 13.8s (+ 8% / default)
I also ran some benchmark for the loading and found the
--skip-extended-insert
options slows down the loading by at least 2 which is quite a lot if you just want to do some db sync, but not a problem if you've had a bad day, your server crashed and you're happy to have a backup of your db :). I think it's best not to activate these options by default but make them easily available for people like me who would like to use drush for backup purposes. AFAIK there is no way to add options directly to mysqldump via drushrc.php (correct me if I'm wrong) but I think it would be really useful.Comment #5
greg.1.anderson CreditAttribution: greg.1.anderson commentedThank you for the benchmarks. The 2x slowdown for the import function would be a big hit to take on sql sync, so I do think it is best to parametrize this and make the default 'off', so you don't pay the penalty unless you want to.
In order to add options to mysqldump via drushrc.php, you would have to modify your patch as I suggest above; this might take some inspection of other parts of the drush souce code to understand, but it's not too bad. If you don't know how to do it, assign it to me and I'll do it for you when I have time -- late Dec., early Jan., probably.
Comment #6
scor CreditAttribution: scor commentedpatch à la
data-only
. useEDIT: sorry for the noise in the patch, see #638324: Strip annoying trailing spaces
Comment #7
greg.1.anderson CreditAttribution: greg.1.anderson commentedLooks very nice to me. I have just two suggestions.
My opinions only.
Comment #8
moshe weitzman CreditAttribution: moshe weitzman commentedAgree with greg's remarks. Afterwards, is rtbc
Comment #9
scor CreditAttribution: scor commented@greg.1.anderson: agreed. I rerolled the patch with your comments.
Comment #10
scor CreditAttribution: scor commentedI'm not sure if there is a way to display these useful options via the command line. Does anyone know how to do it? or is this missing? I've left the @todo in drupal_sql_dump() for that reason.
Comment #11
scor CreditAttribution: scor commentedplanning to present this as part of a session on "backing up your Drupal site with git and Drush" at the Boston meetup tonight. Hopefully this will be committed soon ;) I'll let someone else RTBC this.
Comment #12
scor CreditAttribution: scor commentedcorrect a typo in the documentation.
Comment #13
greg.1.anderson CreditAttribution: greg.1.anderson commentedI don't know if perhaps moshe might be waiting for an external reviewer. I'm happy to test this, but probably won't have time until later this month -- I'm kind of slammed right now.
Comment #14
moshe weitzman CreditAttribution: moshe weitzman commentedcommitted. thanks.
Comment #16
nnewton CreditAttribution: nnewton commentedHi All,
We just ran into an issue with extremely long load times using drush sql-sync and were shocked to see skip-extended-insert as a default paramater. This really needs to not be a default option, there is a reason its not the default on mysqldump. It dramatically increases load time, in our case by over an hour. Why does this need to be the default?
-N
Comment #17
greg.1.anderson CreditAttribution: greg.1.anderson commentedClosing this issue and re-opening #1011008: Re-visit automatically setting --ordered-dump option in sql-sync.