(1) Tired of having to manually add every new cache_* table to your drushrc.php file? What if drush allowed you to just have cache_* in the structure-tables array to include all the cache tables?

(2) drush sql-dump will throw warnings if --structure-tables-key contains some inexisting table names. For instance, if you use the default --structure-tables-key=common and haven't install the search module, you will get warnings for the 'search_dataset', 'search_index' and 'search_total' tables (same for the statistics module).

This patch fixes both issues above by adding support for (1) table name expansion with the * wildcard and (2) removing non existing tables from _drush_sql_get_table_list(), ensuring that all the table names used by sql-dump and sql-sync actually exist.

The immediate benefit is that we could safely bundle drush with a default set of tables for --structure-tables-key=common without the user having to always hack drushrc.php.

First draft patch attached.

CommentFileSizeAuthor
#96 interdiff.patch24.42 KBmoshe weitzman
#95 wildcard.patch28.86 KBmoshe weitzman
#95 interdiff.patch6.31 KBmoshe weitzman
#93 drush-exclude-tables-wildcard-93.patch26.09 KBposulliv
#91 drush-exclude-tables-wildcard-91.patch27.23 KBposulliv
#89 drush-exclude-tables-wildcard-89.patch26.65 KBposulliv
#87 drush-exclude-tables-wildcard-87.patch26.65 KBposulliv
#84 drush-exclude-tables-wildcard-84.patch24.96 KBposulliv
#82 drush-exclude-tables-wildcard-82.patch23.99 KBposulliv
#79 drush-exclude-tables-wildcard-79.patch24.02 KBposulliv
#77 drush-exclude-tables-wildcard-77.patch22.54 KBposulliv
#75 drush-exclude-tables-wildcard-75.patch15.09 KBposulliv
#72 drush-exclude-tables-wildcard-72.patch14.55 KBjuampynr
#71 drush-exclude-tables-wildcard-71.patch14.56 KBjuampynr
#69 drush-exclude-tables-wildcard-69.patch13.48 KBposulliv
#67 drush-exclude-tables-wildcard-67.patch12.25 KBposulliv
#65 drush-exclude-tables-wildcard-65.patch12.27 KBposulliv
#50 drush-exclude-tables-wildcard-698264-50.patch1.9 KBjuampynr
#47 drush-exclude-tables-wildcard-698264-47.patch1.75 KBjuampynr
#39 698264-sql_wildcards-39.patch12.59 KBgreg.1.anderson
#34 698264-sql_wildcards-34.patch11.76 KBWim Leers
#32 698264-sql_wildcards-32.patch11.5 KBWim Leers
#26 better_structure_tables_no_sync_support-698264-26.patch3.78 KBdalin
#11 698264_better_tables_key_11.patch5.46 KBscor
#9 698264_better_tables_key_9.patch4.92 KBscor
#7 698264_better_tables_key_7.patch4.58 KBscor
#2 698264_better_tables_key_2.patch5.05 KBscor
drush_better_db_tables.patch2.85 KBscor
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

moshe weitzman’s picture

Assigned: Unassigned » greg.1.anderson

Useful stuff. Assigning to Greg for review.

scor’s picture

Status: Needs work » Needs review
FileSize
5.05 KB

and now including some hunks to leverage the existing example.drushrc.php as default setting for structure-tables common. If people rename/move this file somewhere else, it should still work and pick up their custom file.

greg.1.anderson’s picture

Status: Needs review » Needs work

This is good stuff, but it does have some problems. I'll start with the first one. sql-dump is called by sql-sync. You changed the bootstrap of sql-dump to DRUSH_BOOTSTRAP_DRUPAL_DATABASE, but left sql-sync at DRUSH_BOOTSTRAP_DRUPAL_CONFIGURATION. This means that _drush_sql_get_db_table_list() will fail when called indirectly from sql-sync. You can't get around this by changing the bootstrap level of sql-sync, though, because _drush_sql_get_db_table_list() operates on the bootstrapped database, but sql-sync wants the information to come from the databases associated with its source and destination parameters, so there's really no good way to to use DRUSH_BOOTSTRAP_DRUPAL_DATABASE with sql-dump.

This means that you should just call drush_sql_query("show tables;") to get the table list, and just deal with the fact that you get back strings instead of php objects.

However, even at this point you are not yet done, because the databases that sql-sync is operating on might be remote. Currently, sql-sync will use sql-dump to build shell commands that it sends to a remote server via ssh for execution. sql-sync currently does not require drush to be installed on the remote machine, a facility that would have to be duplicated in sql-query in order for this to work with remote sites.

That would be a bit of work, but at the moment I can't think of a better way.

greg.1.anderson’s picture

Two more notes:

  1. The patch in #2 loads the "example.drushrc.php" preference file. I think this is a bad idea; users should copy the settings they want from the example file to their drushrc.php file.
  2. Wildcards are already supported in postgres. Try this: drush sql-dump --tables-list="cache_*". Also, according to Lullabot, you can use the % character as a wildcard in mysqldump statements. I haven't tried this myself; you'd have to rewrite the drush sql-dump code for mysql a bit to use --ignore-table if you wanted this to work, I think.
scor’s picture

# The patch in #2 loads the "example.drushrc.php" preference file. I think this is a bad idea; users should copy the settings they want from the example file to their drushrc.php file.

this idea is that they don't need to do that to get the default settings of example.drushrc.php (but this might be better to be implemented as a default in the actual drush code).

moshe weitzman’s picture

I definately don't want to read in example.drushrc.org. Could go either way on whether we automatically use the common key in structure-tables if no other is supplied. seems like a good idea.

scor’s picture

Status: Needs work » Needs review
FileSize
4.58 KB

I've refactored _drush_sql_get_db_table_list() so we don't require a bootstrapped database, but instead only use a show table; query via the command line. Maybe we could refactor _drush_sql_query() so it accepts a file instead of outputting the result in the console. The problem it has right now is that it expects a database connection to support prefixes. I'd be interested to know how this plays wrt to #3.

Also, according to Lullabot, you can use the % character as a wildcard in mysqldump statements.

I've tried this on 2 different machines, and none of them accepted the wildcard. (MySQL dump 10.11, MySQL server 5.0.51a-24+lenny2 and MySQL dump 10.13, MySQL server 5.1.37-log). We should not rely on it, especially since this is non documented feature of mysqldump.

We're not reading example.drushrc.php anymore.

greg.1.anderson’s picture

Status: Needs review » Needs work

A little closer, but...

In this part of your code:

function _drush_sql_get_db_table_list() {
  static $db_tables = array();

  if (!empty($db_tables)) {
    return $db_tables;
  }

  $scheme = _drush_sql_get_scheme($db_spec);

$db_spec is not passed in, so it is null, so it will refer to the current database. This leads to the same problem as in #2.

Currently, drush_sql_dump does not pass $db_spec into _drush_sql_get_table_list, so _drush_sql_get_db_table_list does not have the information it needs to tell which database to use. Even if this were fixed, sql-sync does not pass the information that sql-dump would need to determine if the $db_spec refers to a database on a remote machine.

I am planning on refactoring sql-sync so that the remote machine handling capabilities of the dump phase are moved to sql-dump. This would help your patch. I'm not sure when I'll get to it, though, so feel free to attempt it if you wish.

The refactoring you suggest for sql-query would also help this patch, I think.

scor’s picture

Status: Needs work » Needs review
FileSize
4.92 KB

Let's simply support this for the sql-dump command only. I've added this condition in _drush_sql_get_table_list(), so the behavior of the sql-sync is not changed, and table name expansion and validation only happens on the sql-dump command.

greg.1.anderson’s picture

Status: Needs review » Needs work

I'm ambivalent about leaving sql-sync out.

I don't like if ($command['command'] == 'sql-dump'). If this is going to be special-cased, it needs to be a little cleaner (i.e. by passing args rather than checking command strings).

scor’s picture

Status: Needs work » Needs review
FileSize
5.46 KB

Changed the condition to allow an argument instead: drush sql-dump --structure-tables-key=common --friendly-table-names

greg.1.anderson’s picture

Status: Needs review » Needs work

Well, that is a little cleaner than the patch before, but I actually meant function arguments, not command line arguments. Sorry for the confusion. The problem with the patch in #11 is that you can still pass --friendly-table-names to sql-sync, and things will break. You could perhaps set --friendly-table-names in drush_sql_dump_execute, and make sure that it is not set in sql-sync, but in that instance you'd still cause things to break if you used wildcards in your structure tables and then accidentally used the --structure-tables-key with sql-sync.

I don't think we can in good conscience leave sql-sync out of this patch; I think it's going to need a full refactoring.

scor’s picture

in that instance you'd still cause things to break if you used wildcards in your structure tables and then accidentally used the --structure-tables-key with sql-sync.

yes, the same way you break today if you use the common key and don't have a table accesslog in your database. In the end you have to know what your doing when you're customizing drushrc.php, depending on what functionnality of drush you're using. I could well define a --structure-tables-key key for sql-sync and another for sql-dump. In my case, actually, I only use sql-dump so I don't have that problem.

greg.1.anderson’s picture

True, but if someone started off with using only sql-dump and didn't understand the limitations, switching to remote sql syncs would be a bit surprising.

How about drush sql-dump --structure-tables-list=`mytables`, with 'mytables' replaced with a script (maybe drush script ...?) that processes your wildcards? That might get you by until sql-sync is refactored in a way that would allow your patch to work universally.

scor’s picture

Status: Needs work » Postponed (maintainer needs more info)

I'm giving up. Please re-open when Drush is ready for this...

greg.1.anderson’s picture

Status: Postponed (maintainer needs more info) » Closed (works as designed)
scor’s picture

in the meantime I've created a drush command dgb which (sadly) includes some forked functions from drush. I'm happy to align dgb with Drush whenever the issues above have been resolved.

greg.1.anderson’s picture

Status: Closed (works as designed) » Postponed

Setting to 'postponed' for later consideration.

lionstone’s picture

+1 for this feature. I'd like wildcards, too.

scor - Your patches fail for me, unfortunately - I'm running this in the base drush directory, is that right?

With the last patch doesn't work for me:

$ ~/drush$ patch -p0 <  698264_better_tables_key_11.patch 
patching file example.drushrc.php
Hunk #1 FAILED at 67.
1 out of 1 hunk FAILED -- saving rejects to file example.drushrc.php.rej
patching file commands/sql/sql.drush.inc
Hunk #1 succeeded at 68 with fuzz 1 (offset 1 line).
Hunk #2 FAILED at 312.
1 out of 2 hunks FAILED -- saving rejects to file commands/sql/sql.drush.inc.rej
scor’s picture

Component: Code » SQL
Status: Postponed » Active

Reopening to reconsider this feature for Drush 4.x. @Greg, I'll be happy to work on a patch for Drush HEAD and make this work for sql-sync, unless you think the situation has not changed since Drush 3?

greg.1.anderson’s picture

#716412: add a --delete-all-tables-first option for sql-sync ? is on my to-do list in the near future. sql-dump, sql-drop, sql-create and sql-import (the last two being new functions) will be rewritten to operate on a site alias provided as the first parameter. The site alias can be local or remote. @self is the default, so dump and drop will work as they do today. This refactor is likely to affect code here.

Also note that sqlite support is in-progress for drush in #698078: SQLite support; a lot of this code has already been committed.

I would welcome enhancements to structure-tables & c. with wildcards, but the patch would have to work with three databases and on local and remote targets (or at least not break horribly if used in an unsupported configuration).

My opinions; other maintainers would also have to agree to commit, as usual.

greg.1.anderson’s picture

#21 was a case of not quite meaning what I said. It's important to realize that there are multiple databases, but I think we'd be in danger of not making any progress if a patch had to work on all three databases before being committed.

It should work locally and remotely, though.

greg.1.anderson’s picture

@scor: I won't get to my sql-sync factor for a while. If you'd like, please feel free to work on this with the sql-sync code as-is. Just the mysql local and remote case is sufficient. Postgres already supports pattern matching in table names, so perhaps nothing is necessary for the pgsql case. (Extra credit: make the same syntax work in both mysql and pgsql.)

patcon’s picture

Was just playing around with this myself. Wildcards in structure-tables seems really useful. Hopefully it doesn't die in the queues :)

Anyone know the sql rewrite well enough to gauge how entry-level this patch rework would be?

greg.1.anderson’s picture

The main issue is if you use wildcards in a context where the database is remote (e.g. sql-sync), things don't work. Not too hard to fix if you know how to use drush_invoke_sitealias_args to call the sqlq command with a 'show tables;' command. Patches welcome.

dalin’s picture

I needed this to work, but I've never used sql-sync and I wouldn't know how to write code for it. This patch is basically a re-roll of #2 for Drush 4. I'm not even marking this as "Needs Review" since this is mostly a shim.

Wim Leers’s picture

Title: Better handling of structure-tables and skip-tables options » Better handling of structure-tables and skip-tables options (including cache_* support!)

I'm surprised there's so little interest in this neat feature! Appending to the title to hopefully get more attention.

greg.1.anderson’s picture

Assigned: greg.1.anderson » Unassigned

I'm unlikely to work on this in the near future.

greg.1.anderson’s picture

If anyone picks this up again in the future, it would be good to also cover the needs of #1446454: drush sql-dump : "mysqldump: Couldn't find table" if the structure-tables list is too inclusive, which are very similar to what is needed for wildcard handling.

scor’s picture

FYI, re #29, this feature is also supported in dgb.

Wim Leers’s picture

For people with the same concerns as me, i.e. for automated DB back-ups that grow unnecessarily large: you can alternatively use my forked version of AutoMySQLBackup, which supports wildcard table exclusion (e.g. exclude "cache%" to exclude "cache", "cache_filter", etc.): https://github.com/wimleers/AutoMySQLBackup.

Wim Leers’s picture

Assigned: Unassigned » Wim Leers
Status: Active » Needs review
FileSize
11.5 KB

Major reroll, against master (currently 6.0). It also cleanly applies to 5.1.

- Based on #11.
- Kept scor's _drush_sql_get_db_table_list(), but completely reimplemented it. Well, not quite; most of the logic is a verbatim copy from _drush_sql_drop(), with added logic to deal with remote DBs (for this, I struggled for a long time, but then stumbled upon drush_invoke_process($site_record, 'sqlq', array($query)), which makes it trivial!).
- Kept most of scor's drush_sql_expand_wildcard_tables(). However, I split out the validating of tables to a separate function, so that it can be reused elsewhere.
- I've added drush_sql_expand_table_selection(), which can be fed directly the output of drush_sql_get_table_selection() (which remains unchanged). This will then expand (which includes validation) the "skip-tables" and "structure-tables" table selections, and validate the "tables" tables. In fact, I only had to change one line of code in sql/sync.sql.inc (the one where the table selections happens)!
- This also finishes the TODO in _drush_sql_drop()!

It's now a *bliss* to use sql-dump and sql-sync. Finally it's no longer necessary to specify 'structure tables' on a per-site basis. No, now you can specify your exclusions *once*, for all sites, in a base-level drushrc.php or base.aliases.drushrc.php file and never have to deal with this again!

I spent the better part of my day on this. I've tested it thoroughly (local dump, sync remote to local) and it works flawlessly. This is my first Drush patch, so I'm probably not aware of all the conventions in the code, but I tried to find as many of the conventions as possible and follow them. I think the check for a missing site record may need to be handled differently, but other than that, it should mostly be good to go.

Let's get this done. It's a very useful feature that's been missing for too long!

greg.1.anderson’s picture

Status: Needs review » Needs work

This looks pretty good. I think it's reasonable to require Drush on the remote machine to get this feature. I shouldn't have said otherwise in #3. It looks like you took care of my other concerns vis-a-vis sql-sync not behaving in a surprising way when using this feature with remote databases, which is the most important thing.

In _drush_sql_get_db_table_list, you should use:

$result = drush_invoke_process($site_record, 'sqlq', array($query), array(), array('override-simulated' => TRUE));

If you do this, you will not need to save and restore the 'SIMULATED' context. Note that if the command is not remote, you can still use drush_invoke_process if you pass '@self' for the first parameter.

Speaking of contexts, it would be better to use drush_get_context / drush_set_context instead of static $db_tables.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
11.76 KB

I tried implementing your array('override-simulated' => TRUE) suggestion, but stumbled upon a problem: this prevents local queries from being executed. So I replaced the code I was using for local queries (which I got from scor's patch) with a similar line for local queries, replacing $site_record with '@self'.

This works fine as long as we're not running in SIMULATION context. If we do, we'll just get sql-query: SHOW TABLES;… even *with* array('override-simulated' => TRUE). I also tried array('integrate' => FALSE, 'override-simulated' => TRUE) because that's what everything else seems to be using, but without the intended effect, unfortunately.

So, in the above, I'll either need some help or we'll have to do it the way I did it right now.

The rerolled patch *does* use drush_(g|s)et_context() successfully.

greg.1.anderson’s picture

Assigned: Wim Leers » greg.1.anderson

Okay, this is looking really close; I'll see if I can carve out some time to fix override-simulated for you.

Wim Leers’s picture

Great :)

Wim Leers’s picture

Any news? :)

moshe weitzman’s picture

Status: Needs review » Needs work

Hopefully Greg can get to this as it will be quite useful.

greg.1.anderson’s picture

Assigned: greg.1.anderson » Unassigned
FileSize
12.59 KB

Here's a fix for override-simulated; seems that the simulated global option did not have the 'never-propagate' flag set on it, so it was being passed through by backend invoke.

Backend invoke is now returning the table list just fine; however, the rest of the patch doesn't seem to be working for me. Adding different table skip or table add options does not seem to affect the sql-dump command that is generated.

Wim Leers’s picture

Assigned: Unassigned » Wim Leers

Okay, so #39 includes a Drush core bugfix if I understand it correctly?

Assigning back to me, will work on this as soon as I find the time, which may not be very soon as I'm starting at Acquia on Monday :)

greg.1.anderson’s picture

Yes, the core fix is here:

-  $options['simulate']           = array('short-form' => 's', 'context' => 'DRUSH_SIMULATE', 'description' => "Simulate all relevant actions (don't actually change the system).");
+  $options['simulate']           = array('short-form' => 's', 'context' => 'DRUSH_SIMULATE', 'never-propagate' => TRUE, 'description' => "Simulate all relevant actions (don't actually change the system).");

Congratulations on your new job. Hope this change brings you to the Drush queue some more. :)

Thithi32’s picture

Hi,
I'm looking for this feature in order to exclude 'field_data_*' and 'field_revision_*' .
Is the patch ok to use? The syntax will be with * or with % for the wildcard?
Hope it'll work.
Thanks!
Thierry

greg.1.anderson’s picture

#39 does not yet work correctly; if you want to use this feature, you will need to either wait, or help to fix the patch.

Thithi32’s picture

thanks Greg for the answer... no time nor enough experience to deal with this kind of issue, so I'll wait and keep watching the issue

amontero’s picture

juampynr’s picture

Version: » 8.x-6.x-dev
Status: Needs work » Needs review

Here is an alternative patch where I am altering sql sync through drush_hook_pre_sql_sync() to add a list of tables form the source database. It could be more generic but it may serve as a starting point.

juampynr’s picture

Doh! Here it is.

greg.1.anderson’s picture

Status: Needs review » Needs work

I think I like this approach better. That example won't run on postgres, though. If you make your query "show tables;" and then filter with php, it would be more portable. Better yet, use _drush_sql_get_db_table_list() from the patch in #39.

juampynr’s picture

I have got a question regarding the _drush_sql_get_db_table_list() function at #39 in order to use it at #47:

+++ b/commands/sql/sql.drush.incundefined
@@ -527,6 +604,96 @@ function _drush_sql_get_table_list($option_name) {
+function _drush_sql_get_db_table_list($db_spec, $site_record = NULL) {

How do I obtain the $db_spec information of the $source site alias when I am at drush_hook_pre_sql_sync()?

juampynr’s picture

Status: Needs work » Needs review
FileSize
1.9 KB

As an alternative, here is another patch using SHOW tables and filtering with PHP.

Wim Leers’s picture

But this approach requires code instead of simple, declarative configuration?

moshe weitzman’s picture

Assigned: Wim Leers » greg.1.anderson

Seems like Greg is reviewing patches here. Thanks!

moshe weitzman’s picture

Anyone want to move this along?

moshe weitzman’s picture

Status: Needs review » Needs work
juampynr’s picture

@greg.1.anderson, can you provide more feedback on whether this could stay as an example like the patch at #50 or should it be something configurable such as #39?

greg.1.anderson’s picture

#50 is less than ideal, as it uses --extra to pass a mysql-specfic flag, and hardcodes "/^foo_/" as the pattern. I think it would be fine to commit an example first, and make something configurable later, but even the example should be database-independent, and should have some way to pass paramters. See examples/sync_enable.drush.inc, which uses hook_drush_help_alter to add commandline options to sql-sync. Leave off the -N parameter, and just skip the first line of the result containing the column name.

juampynr’s picture

#50 is an example under examples directory. I will see how can I make it database agnostic based on your suggestions.

greg.1.anderson’s picture

Sorry, I garbled what I was trying to say in #56. What I meant was, it would be fine to make an example under the examples directory if it was configurable. Integrating it into the core sql-sync code is optional, although I'm not sure why we wouldn't want to do that.

Wim Leers’s picture

As I said in #51:

But this approach requires code instead of simple, declarative configuration?

This is why I believe the approach I took (see #34 & #39) is superior: much simpler, less worries.

greg.1.anderson’s picture

Continuations from #39 would still be welcome.

Dave Reid’s picture

Having a solution in Drush by default like #39 would be really great. This is something I've always been meaning to file an issue for, and glad that it has been filed already.

moshe weitzman’s picture

Assigned: greg.1.anderson » Unassigned

Anyone is welcome to move this along

Dave Reid’s picture

So it sounds like we want to continue #39. There is an open question in #49 for those more familiar with the internals, and I also ran into the same issue when implementing this.

greg.1.anderson’s picture

Hm, not sure why #49 never got answered. The code may have changed since it was asked, but here is how it currently works.

In drush_sql_sync_init(), the following is executed:

  sitealias_get_databases_from_record($source_settings);
  sitealias_get_databases_from_record($destination_settings);

This will look up the $db_spec record from the source and destination alias records. This info is used later by sql-sync; example usage is as follows:

  // recover the source settings
  $source_settings = drush_sitealias_overlay_options(drush_sitealias_get_record($source), 'source-');
  $source_db_url = drush_sitealias_get_db_spec($source_settings, FALSE, 'source-');

You could run these same two lines in the pre-sql-sync hook to get the $db_spec. Note the extreme evil-ness going on here: the code is putting a $db_spec object into a variable that is called $xxx_db_url. Sorry -- I don't know how that happened. It would be very helpful to rename all of these _db_url variables to _db_spec; that would make things much easier to understand.

posulliv’s picture

Attached is a re-roll of the patch from #39. When testing this with the following in drushrc.php:

$options['structure-tables']['common'] = array('cache', 'cache_*', 'history', 'search_*', 'sessions', 'watchdog');

And a sql-dump command of:

drush sql-dump --result-file=/tmp/dump.sql --structure-tables-key=common 

The mysqldump command generated is:

mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q  drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.history --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_total --ignore-table=drupal.sessions --ignore-table=drupal.watchdog && mysqldump --no-data  --no-autocommit --single-transaction --opt -Q  drupal --host=localhost --user=drupal --password=drupal cache cache_block cache_bootstrap cache_field cache_filter cache_form cache_image cache_menu cache_page cache_path cache_update history search_dataset search_index search_node_links search_total sessions watchdog >> /tmp/dump.sqlCalling system(mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q  drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.history --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_total --ignore-table=drupal.sessions --ignore-table=drupal.watchdog && mysqldump --no-data  --no-autocommit --single-transaction --opt -Q  drupal --host=localhost --user=drupal --password=drupal cache cache_block cache_bootstrap cache_field cache_filter cache_form cache_image cache_menu cache_page cache_path cache_update history search_dataset search_index search_node_links search_total sessions watchdog >> /tmp/dump.sql

Testing the same thing when skipping tables. Assume the following is in drushrc.php:

$options['skip-tables']['common'] = array('cache*', 'search_*');

Running sql-dump like:

drush sql-dump --result-file=/tmp/dump.sql --skip-tables-key=common 

Results in a mysqldump command like:

mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q  drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_totalCalling system(mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q  drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_total

The patch seems to be working for me. Is there any specific things in particular that did not work per your comment in #39?

juampynr’s picture

@posulliv, please remove trailing white spaces from your patch at #65.

posulliv’s picture

Sorry about that. My IDE looks to have inserted trailing white spaces on new lines.

Attached is an updated version.

juampynr’s picture

+++ b/commands/sql/sql.drush.incundefined
@@ -355,6 +355,83 @@ function drush_sql_get_table_selection() {
+function drush_sql_expand_table_selection($table_selection, $db_spec, $site_record = NULL) {

Looks like this needs a blank line in between plus a docblock that explains what the function does.

+++ b/commands/sql/sql.drush.incundefined
@@ -355,6 +355,83 @@ function drush_sql_get_table_selection() {
+function drush_sql_get_expanded_table_selection($db_spec, $site_record = NULL) {

Docblock please.

+++ b/commands/sql/sql.drush.incundefined
@@ -594,6 +670,95 @@ function _drush_sql_get_table_list($option_name) {
+      exit;

Since this is an error, we may want to do exit(1);

posulliv’s picture

Thanks for the review comments @juampy

Latest patch addresses the issues brought up in #68.

moshe weitzman’s picture

Assigned: Unassigned » greg.1.anderson

I doubt that exit(1) is really the right thing to do there. At minimum, we should be calling drush_set_error() instead of drush_log(). Perhaps Greg or juampy have an idea about that and can give this a final review.

juampynr’s picture

Re-rolling using drush_set_error() instead of exit(1).

juampynr’s picture

Here is my review:

  • Works fine for MySQL and PostgreSQL.
  • Support for SQLite has not been implemented (see drush_sql_build_dump_command()).
  • I have not tested it with SQL Server nor Oracle, but I asked the community on Twitter claiming help.
  • I have changed the suggested structure tables list so it uses cache* instead of cache, cache_*.
moshe weitzman’s picture

Issue tags: +Needs tests

Seems like we should write tests for this.

moshe weitzman’s picture

Do we really need the "caching" we do with drush_set_context('DRUSH_SQL_DB_TABLES', $db_tables);?

There seems to be some intermingling of the expanding of table names and the filtering of unknown table names. It would be good if those were sequential function calls. The mingling is happenning in drush_sql_expand_wildcard_tables(). I would prefer we expand, and then filter. As you can see, I think we should use the word 'filter' instead of 'validate'. This is minor stuff so if this refactoring gets too complex, we can skip it.

I'm OK with proceeding without tests for this.

posulliv’s picture

I personally don't see the need for the caching. The extra overhead with the caching removed is:

* 1 call to _drush_sql_get_scheme()
* 1 SQL query to retrieve list of tables
* 1 call to drush_invoke_process()
* N calls to preg_match_all where N is the number of tables in the database (only for sqlite)

Seems like most of the time for this operation would be spent performing the SQL dump. In this updated patch, I removed the caching.

Attached is an updated patch with an attempt at some refactoring as mentioned in #74.

I'd be very happy to add tests for this if you could give some pointers. Is there any existing tests for similar functionality that I could have a look at? Or do you have any suggestions?

greg.1.anderson’s picture

+1 for no caching.

For tests, look at tests/sqlSyncTest.php

posulliv’s picture

Latest patch has the re-factoring as mentioned in #75.

I added new tests for the sql-dump command in the sqlDumpTest class. I added a few basic tests of the sql-dump functionality along with 2 tests that specifically test the wildcard functionality.

This is my first time writing drush tests so apologies if I didn't write them in the normal way. I also wasn't sure what group these tests should go in. Any feedback would be very useful and happy to update them as needed.

Running these sql-dump tests locally, the results I get are:

Time: 01:15, Memory: 6.75Mb

OK (6 tests, 37 assertions)

It should also be noted that these tests are not exactly database independent, some of SQL syntax used is MySQL specific. If that is an issue, let me know and I can work on making it more database independent if possible.

greg.1.anderson’s picture

Tests should be database-independent; when this is not possible, then the tests should check the database being used, and skip it if it is incompatible with the code used in the test.

Use something like $this->markTestSkipped('SQL Sync does not apply to SQLite.'); to mark a test as skipped. That example is from sqlSyncTest.php.

posulliv’s picture

Latest patch updates the tests to be database independent.

juampynr’s picture

+++ b/tests/sqlDumpTest.phpundefined
@@ -0,0 +1,254 @@
+    $this->assertEquals(0, $new_row_count, 'watchdog table should be empty');

What is this for?

+++ b/tests/sqlDumpTest.phpundefined
@@ -0,0 +1,254 @@
+    print_r(UNISH_DB_URL);

Do we need this?

+++ b/tests/sqlDumpTest.phpundefined
@@ -0,0 +1,254 @@
+  protected function getTableList($wildcard = NULL) {

Don't we have a function for this already? Can we reuse it instead of copying it here?

greg.1.anderson’s picture

Status: Needs work » Needs review

The Drush functional tests do not have access to functions defined inside Drush. The print_r in sqlDumpTest is innocuous, but should probably be remove anyway. This is basically looking pretty good right now, but I have not had a chance to test it. If no one else has any objections, I'm okay if with committing it once it's RTBC.

posulliv’s picture

My bad about the print_r. Attached patch does not include it.

The assertion:

$this->assertEquals(0, $new_row_count, 'watchdog table should be empty');

Is testing the contents of the watchdog table after loading a dump file where the watchdog table was skipped. So basically, that test performs the following:

1) create dump file and skip the watchdog table
2) truncate the watchdog table
3) load the dump file created in step(1)
4) verify that the watchdog still contains no rows since it should not have been included in the dump created in step (1)

Wim Leers’s picture

Amazing steps forward in terms of test coverage! Great work :)

I'm sure everything works at least equally well as it did when I was working on this last year, so I didn't test the patch, nor debug the code. I just read the code and the comments. The tests look sane and cover the crucial scenarios.

Consequently, I *only* have nitpicks :)


+++ b/commands/sql/sql.drush.incundefined
@@ -353,6 +353,19 @@ function drush_sql_dump_execute() {
+ * The keys of the array are:
+ *   skip      - tables to be skipped completed in the dump
+ *   structure - tables to only have their structure i.e. DDL dumped
+ *   tables    - tables to have structure and data dumped

Nitpick: This is unlike all other such comments. Typically we do it like this:

The keys of the array are:
- 'skip': tables to…
- 'structure': tables to…
- 'tables': tables to…
+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+ * Given the table names in the input array that may contain the `*` wildcard,

s/`*` wildcard/wildcards (`*`)/

For consistency with other comments. Alternatively, don't change this, but change it elsewhere.

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+ * expand the table names so that the array returned only contains valid table

s/valid table names/valid table names, i.e. actual tables that exist rather than table names containing a wildcard./

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+ *   An array of table names where the table names can contain the

s/can/may/

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+ * @param $site_record

Doc missing.

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+function drush_sql_get_expanded_table_selection($db_spec, $site_record = NULL) {

No docs?

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+  // table name expansion based on `*` wildcard.

s/table/Table/

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+    // only deal with table names containing a wildcard.

s/only/Only/

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+ * Filters tables

Missing trailing period.

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,92 @@ function drush_sql_get_table_selection() {
+  // Get the existing table names contained in the current database.

"contained in the current database" sounds weird?

What about "Get all the table names in the current database."?

+++ b/commands/sql/sql.drush.incundefined
@@ -603,9 +724,90 @@ function _drush_sql_get_table_list($option_name) {
+  // Prepare the query to obtain the list of tables depending on the database type.

Beyond 80 col?

+++ b/tests/sqlDumpTest.phpundefined
@@ -0,0 +1,253 @@
+ * @file
+ *   Tests for sql-dump commands.

These should be on the same line?

+++ b/tests/sqlDumpTest.phpundefined
@@ -0,0 +1,253 @@
+class sqlDumpTest extends Drush_CommandTestCase {

Class name that beings with lower case, shouldn't this be upper case?

+++ b/tests/sqlDumpTest.phpundefined
@@ -0,0 +1,253 @@
+    // retrieve list of tables in the database

Capitalization and punctuation, here and elsewhere in the tests.

+++ b/tests/sqlDumpTest.phpundefined
@@ -0,0 +1,253 @@
+   * This is a MySQL specific test.

s/MySQL specific/MySQL-specific/

posulliv’s picture

Many thanks for the comments! Attached is an updated patch.

I applied everything mentioned except for the comment related to the @file docblock.

I was following the docblock standards so placed the description for the file on a new line. Happy to modify though if drush has different standards related to this.

Wim Leers’s picture

Awesome, thanks!

And no, clearly that document is wiser than I am; I was wrong on that :)

greg.1.anderson’s picture

Status: Needs review » Needs work

This is looking really good to me; good work here. Just a couple of questions.

-  if (is_null($db_spec)) {
-    $db_spec = _drush_sql_get_db_spec();
-  }

Why was this removed? Looks potentially dangerous to me. If this does go away, then you need to remove the = NULL from the $db_spec in the parameter list and add error checking. Seems best to just put it back.

+  if (isset($db_spec['remote-host'])) {
+    if (is_null($site_record)) {
+      return drush_set_error('DRUSH_SQL_NO_SITE_ALIAS', dt('Retrieving the table list for a remote DB, but the site record is missing.'));
+    }
+    $target_site = $site_record;
+  }

This seems less than ideal to me. The only reason that 'remote-host' exists in $db_spec is that sql-sync caches it there for convenience. Relying on $db_spec['remote-host'] for the target machine, while using $site_record for everything else is less than ideal. Seems to me that you should either cache the site record in $db_spec at the same time that 'remote-host' is being inserted, or go the other way and remove the $db_spec['remote-host'] caching, and go with parameter passing only.

Does this work with sql-sync? I don't see how the filtering is done in the remote case, since sql-sync calls drush_sql_get_table_selection() to get the list of tables to include or exclude (no site record passed here), and then uses drush_sql_build_dump_command() to build the sql dump command -- again, no site record passed in. If you went the route of caching the site record in $db_spec, seems like you'd be most of the way there vis-a-vis sql-sync support is concerned.

posulliv’s picture

Attached is an updated patch. This patch now also includes a test of the wildcard functionality when calling sql-sync. This is contained within the testWithSqlSync test.

greg.1.anderson’s picture

Updates look good.

Sorry, but I missed this in the last review:

+  // Shift off the "Calling system(…);" line at the top of the output.
+  array_shift($tables_raw);

If I'm not mistaken, it's just the comment that is wrong here -- that is, you're shifting off the column name ("tablename" or "Tables_in_xxx", depending on your database) -- there isn't a "Calling system(...)" in the invoke process output, is there? Shifting off the column header is fine, but if there is debugging lines in the output, that should be addressed.

posulliv’s picture

Nice catch! Right you are, the comment is in-correct. Attached is an updated patch with correct comment.

moshe weitzman’s picture

I reviewed a patch from a couple days ago, but didn't get a chance to post this review. Hope it still helps.

I don't see that #74 has been addressed. Let me know if I should be more clear.

+++ b/commands/sql/sql.drush.incundefined
@@ -365,6 +378,109 @@ function drush_sql_get_table_selection() {
+function drush_sql_get_expanded_table_selection($db_spec, $site_record = NULL) {

this function name sounds a lot like drush_sql_expand_table_selection(). I wonder if we could make them more different?

Not applicable anymore?: // Save $query to a tmp file if needed. We will redirect it in.

Lets capitalize Oracle also: + // Is this an oracle query?

-# $options['structure-tables']['common'] = array('cache', 'cache_filter', 'cache_menu', 'cache_page', 'history', 'sessions', 'watchdog');
+# $options['structure-tables']['common'] = array('cache*', 'history', 'search_*', 'sessions', 'watchdog');

I think it is safer if we default to cache and cache_* instead of a single cache* in drushrc.php

The tests need work.

  1. Lets try to setup only one Drupal site, for the whole class, and not one per method. This is quite a bit faster, but less modular. I'm OK with less modular here
  2. The call to setUpDrupal() should not list 'standard' or any other profile. In that case, setUpDrupal() will use faster profiles.
  3. Having said all that, I'd be even happy with a unit test (Drush_UnitTestCase) for the new wildcard expansion functions and just leave archive-dump and sql-sync testing to archiveDumpCase and sqlSyncTest (for example).
posulliv’s picture

Thanks for the comments! Attached is an updated patch with some updates based on comments. Some points I did not address yet are discussed below.

I thought I had addressed your comments in #74 but perhaps I misunderstood. There is now 2 separate functions: 1 for expanding wildcards (drush_sql_expand_wildcard_tables) and another for filtering tables (drush_sql_filter_tables). These functions are called sequentially from the drush_sql_expand_and_filter_tables function. The only reason I created the drush_sql_expand_and_filter_tables function was to avoid repeating the same code pattern 3 different times in the drush_sql_expand_table_selection function.

this function name sounds a lot like drush_sql_expand_table_selection(). I wonder if we could make them more different?

Perhaps drush_sql_expand_selection() is not even needed? I don't really see any benefit to having that functionality in a separate function.

Lets try to setup only one Drupal site, for the whole class, and not one per method. This is quite a bit faster, but less modular. I'm OK with less modular here

I could not figure out how to do this. Ideally, I want to use setUpBeforeClass() to create a site once for the entire class but since this is a static function, its not possible to call setUpDrupal() from it. What I have in this updated patch is the sites created in the setUp() method which is called before every test is run.

Not ideal but was the best I could do without more input into how I could create a site just the once for a class and reuse it. I looked at other tests (such as coreTest.php) and it calls setUpDrupal at the start of every test as well.

Having said all that, I'd be even happy with a unit test (Drush_UnitTestCase) for the new wildcard expansion functions and just leave archive-dump and sql-sync testing to archiveDumpCase and sqlSyncTest (for example).

I added 2 simple unit tests for this as you specified.

moshe weitzman’s picture

I think the only way setup less drupal sites is to consolidate into fewer methods, and just use code comments to describe whats being tested. i personally prefer speed over modularity in this case.

I think you can move any interesting bits from testWithSqlSync to existing sqlSyncTest class. Maybe integrate with an existing method if possible.

Sure, lets remove the drush_sql_expand_selection() function.

I prefer this for sql-dump test: $dump_dest = UNISH_SANDBOX . DIRECTORY_SEPARATOR . 'createddumpfile.sql';

posulliv’s picture

Thanks for the clarifications. Updated patch based on your comments is attached.

There is now only 1 test method with the various assertions testing different functionality delimited by comments.

I didn't move the 1 test of sql-dump with sql-sync to the sql sync tests because they use some helper functions defined in the sqlDumpTest class.

moshe weitzman’s picture

FileSize
6.31 KB
28.86 KB

I ran this against the test suite and it is failing. I put some time into improving the patch but I just can't continue with it. You can see my improvements in the attached patch and interdiff.

For help on running the Drush test suite, called Unish, see /tests/README.txt.

moshe weitzman’s picture

Status: Needs work » Fixed
FileSize
24.42 KB

OK, I just had to keep working on this and committed it to 8.x-6.x. The interdiff is attached here, against #93 again. I made the following changes since #93

  1. Added hidden db-spec option to sql-query command and am passing that in our new drush_invoke_process which gets the table list from a DB. The drush_invoke_process() was failing during a site-install of a D7 site since the $databases array was not yet written to settings.php and thus sql-query had no idea what database to operate against. This was the hardest thing to debug.
  2. Renamed _drush_sql_get_table_list() to _drush_sql_get_raw_table_list()
  3. I factored out $db_tables = _drush_sql_get_db_table_list($db_spec, $site_record); from _drush_sql_expand_and_filter_tables() to the caller which passes it as a param. This lets call drush_invoke_process() only once instead of up to 3 times.
  4. I was removed an extra array_shift($tables_raw); when getting table list.
  5. Simplified sqlDumpTest.php a lot. I started to move sql-sync wildcard test to the existing sqlSyncTest.php but left it as a @todo.
  6. We lost the sql-dump --gzip test but no big deal.
  7. Misc doc improvements that I found along the way.
Wim Leers’s picture

Issue tags: -Needs tests

Yayyy! :)

posulliv’s picture

oh wow, thanks a lot Moshe! I was working on this today and just noticed your comment.

Thanks for all the review comments everyone in helping me here. Learned a lot!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

mrP’s picture

this is just awesome. thanks to all!