Currently drush sql dump (via mysqldump) outputs the data of each table in one single line insert. This makes it really hard or impossible to use a version control system to backup drupal sites. This patch adds the following options to the mysqldump command for having one insert line per record and keep them ordered by primary key

--skip-extended-insert --order-by-primary

Note that this also makes the dumps more human-friendly to read/debug. While the dump files might be larger, git and friends will be able to only store the changes and therefore save space in the long run compared to one-liner dumps.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

scor’s picture

Status: Active » Needs review
moshe weitzman’s picture

my understanding is that sometimes order-by-primary can slow down the export significantly. anyone confirm or deny that? otherwise, i like this proposal.

greg.1.anderson’s picture

Status: Needs review » Needs work

I have wanted this feature too; the only thing that held me back is that it is not supported by postgres (which shouldn't keep drush from supporting it in mysql, of course; it was just a question of priority, since I use postgres).

However, note that this feature would likely have a negative impact on the performance of sql sync. Order by primary will slow down the dump, and the larger dump file will take longer to rsync. However, there is another factor that might make sql sync faster; order by primary will help rsync optimizations in cases where new inserts cause the unordered dump to scramble the output file, which is possible by the specification. I haven't experimented with this extensively, but in postgres at least, unordered dumps tend to be fairly stable in their order. Overall, then, I would expect a decrease in performance, but it would only affect people with very large databases.

I recommend changing this patch so that it is controlled by a flag, similar to how $data_only is set. This would allow people to control the setting, so that they could have the higher-performance sql sync and a more ordered sql dump for backups, if they desired.

I don't much care if the default is ordered or unordered.

Please check the flag via drush_get_option(array('source-ordered-dump', 'ordered-dump')), for the benefit of site aliases.

I'd make the patch myself, but I'm too busy to test it right now -- sorry.

scor’s picture

Yes, as the mysqldump documentation mentions, the dump will take longer, but I think this is balanced by the quality of the dumps you get. I believe committing this to a repository will be faster since there will less diff chunks to transfer. Many parts of the dump will not change (like the older node revisions etc.) if you keep a record per line and in order.

I ran a benchmark with a Drupal 6 db of 427 M (size in the mysql folder) and dump size is 306M. Times for a mysqldump were pretty variable so I did 10 dumps with each of the 3 following configuration for both MyISAM and InnoDB:

 mysqldump -uroot -proot $db > dump1.sql
 mysqldump -uroot -proot $db --skip-extended-insert > dump2.sql
 mysqldump -uroot -proot $db --skip-extended-insert --order-by-primary > dump3.sql

These are the average times I have found (MySQL 5.1 on Mac OS 10.5):

MyISAM
default: 9.2s
case 1 : 9.7s (+ 5% / default)
case 2 : 11.9s (+ 29% / default)

InnoDB
default: 12.8s
case 1 : 13.4s (+ 4% / default)
case 2 : 13.8s (+ 8% / default)

I also ran some benchmark for the loading and found the --skip-extended-insert options slows down the loading by at least 2 which is quite a lot if you just want to do some db sync, but not a problem if you've had a bad day, your server crashed and you're happy to have a backup of your db :). I think it's best not to activate these options by default but make them easily available for people like me who would like to use drush for backup purposes. AFAIK there is no way to add options directly to mysqldump via drushrc.php (correct me if I'm wrong) but I think it would be really useful.

greg.1.anderson’s picture

Thank you for the benchmarks. The 2x slowdown for the import function would be a big hit to take on sql sync, so I do think it is best to parametrize this and make the default 'off', so you don't pay the penalty unless you want to.

In order to add options to mysqldump via drushrc.php, you would have to modify your patch as I suggest above; this might take some inspection of other parts of the drush souce code to understand, but it's not too bad. If you don't know how to do it, assign it to me and I'll do it for you when I have time -- late Dec., early Jan., probably.

scor’s picture

Status: Needs work » Needs review
FileSize
1.91 KB

patch à la data-only. use

drush sql dump --backup-friendly

EDIT: sorry for the noise in the patch, see #638324: Strip annoying trailing spaces

greg.1.anderson’s picture

Looks very nice to me. I have just two suggestions.

  1. backup-friendly probably isn't the best flag name; really what we're being friendly to is committing the dump to a version control system. I'd just call it 'ordered-dump' or something like that.
  2. it would be nice to have the option documented in the sql help function.

My opinions only.

moshe weitzman’s picture

Status: Needs review » Needs work

Agree with greg's remarks. Afterwards, is rtbc

scor’s picture

Status: Needs work » Needs review
FileSize
1.92 KB

@greg.1.anderson: agreed. I rerolled the patch with your comments.

scor’s picture

+++ commands/sql/sql.drush.inc	21 Nov 2009 14:00:16 -0000
@@ -58,6 +58,7 @@ function sql_drush_command() {
       '--result-file' => 'Save to a file. The file should be relative to Drupal root. Recommended.',
       '--skip-tables-key' => 'A key in the $skip_tables array. @see example.drushrc.php. Optional.',
       '--structure-tables-key' => 'A key in the $structure_tables array. @see example.drushrc.php. Optional.',
+      '--ordered-dump' => 'Use this option to output ordered INSERT statements in the dump. Useful when backups are managed in a Version Control System. Optional.',

I'm not sure if there is a way to display these useful options via the command line. Does anyone know how to do it? or is this missing? I've left the @todo in drupal_sql_dump() for that reason.

scor’s picture

planning to present this as part of a session on "backing up your Drupal site with git and Drush" at the Boston meetup tonight. Hopefully this will be committed soon ;) I'll let someone else RTBC this.

scor’s picture

FileSize
1.99 KB

correct a typo in the documentation.

greg.1.anderson’s picture

I don't know if perhaps moshe might be waiting for an external reviewer. I'm happy to test this, but probably won't have time until later this month -- I'm kind of slammed right now.

moshe weitzman’s picture

Status: Needs review » Fixed

committed. thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nnewton’s picture

Version: » 7.x-5.8
Component: Code » Base system (internal API)
Status: Closed (fixed) » Needs work

Hi All,

We just ran into an issue with extremely long load times using drush sql-sync and were shocked to see skip-extended-insert as a default paramater. This really needs to not be a default option, there is a reason its not the default on mysqldump. It dramatically increases load time, in our case by over an hour. Why does this need to be the default?

-N

greg.1.anderson’s picture

Status: Needs work » Closed (fixed)