| Project: | Drush |
| Version: | 8.x-6.x-dev |
| Component: | SQL |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
Issue Summary
(1) Tired of having to manually add every new cache_* table to your drushrc.php file? What if drush allowed you to just have cache_* in the structure-tables array to include all the cache tables?
(2) drush sql-dump will throw warnings if --structure-tables-key contains some inexisting table names. For instance, if you use the default --structure-tables-key=common and haven't install the search module, you will get warnings for the 'search_dataset', 'search_index' and 'search_total' tables (same for the statistics module).
This patch fixes both issues above by adding support for (1) table name expansion with the * wildcard and (2) removing non existing tables from _drush_sql_get_table_list(), ensuring that all the table names used by sql-dump and sql-sync actually exist.
The immediate benefit is that we could safely bundle drush with a default set of tables for --structure-tables-key=common without the user having to always hack drushrc.php.
First draft patch attached.
| Attachment | Size |
|---|---|
| drush_better_db_tables.patch | 2.85 KB |
Comments
#1
Useful stuff. Assigning to Greg for review.
#2
and now including some hunks to leverage the existing example.drushrc.php as default setting for structure-tables common. If people rename/move this file somewhere else, it should still work and pick up their custom file.
#3
This is good stuff, but it does have some problems. I'll start with the first one. sql-dump is called by sql-sync. You changed the bootstrap of sql-dump to DRUSH_BOOTSTRAP_DRUPAL_DATABASE, but left sql-sync at DRUSH_BOOTSTRAP_DRUPAL_CONFIGURATION. This means that
_drush_sql_get_db_table_list()will fail when called indirectly from sql-sync. You can't get around this by changing the bootstrap level of sql-sync, though, because_drush_sql_get_db_table_list()operates on the bootstrapped database, but sql-sync wants the information to come from the databases associated with its source and destination parameters, so there's really no good way to to use DRUSH_BOOTSTRAP_DRUPAL_DATABASE with sql-dump.This means that you should just call drush_sql_query("show tables;") to get the table list, and just deal with the fact that you get back strings instead of php objects.
However, even at this point you are not yet done, because the databases that sql-sync is operating on might be remote. Currently, sql-sync will use sql-dump to build shell commands that it sends to a remote server via ssh for execution. sql-sync currently does not require drush to be installed on the remote machine, a facility that would have to be duplicated in sql-query in order for this to work with remote sites.
That would be a bit of work, but at the moment I can't think of a better way.
#4
Two more notes:
drush sql-dump --tables-list="cache_*". Also, according to Lullabot, you can use the % character as a wildcard in mysqldump statements. I haven't tried this myself; you'd have to rewrite the drush sql-dump code for mysql a bit to use --ignore-table if you wanted this to work, I think.#5
this idea is that they don't need to do that to get the default settings of example.drushrc.php (but this might be better to be implemented as a default in the actual drush code).
#6
I definately don't want to read in example.drushrc.org. Could go either way on whether we automatically use the common key in structure-tables if no other is supplied. seems like a good idea.
#7
I've refactored _drush_sql_get_db_table_list() so we don't require a bootstrapped database, but instead only use a show table; query via the command line. Maybe we could refactor _drush_sql_query() so it accepts a file instead of outputting the result in the console. The problem it has right now is that it expects a database connection to support prefixes. I'd be interested to know how this plays wrt to #3.
I've tried this on 2 different machines, and none of them accepted the wildcard. (MySQL dump 10.11, MySQL server 5.0.51a-24+lenny2 and MySQL dump 10.13, MySQL server 5.1.37-log). We should not rely on it, especially since this is non documented feature of mysqldump.
We're not reading example.drushrc.php anymore.
#8
A little closer, but...
In this part of your code:
function _drush_sql_get_db_table_list() {
static $db_tables = array();
if (!empty($db_tables)) {
return $db_tables;
}
$scheme = _drush_sql_get_scheme($db_spec);
$db_spec is not passed in, so it is null, so it will refer to the current database. This leads to the same problem as in #2.
Currently, drush_sql_dump does not pass $db_spec into _drush_sql_get_table_list, so _drush_sql_get_db_table_list does not have the information it needs to tell which database to use. Even if this were fixed, sql-sync does not pass the information that sql-dump would need to determine if the $db_spec refers to a database on a remote machine.
I am planning on refactoring sql-sync so that the remote machine handling capabilities of the dump phase are moved to sql-dump. This would help your patch. I'm not sure when I'll get to it, though, so feel free to attempt it if you wish.
The refactoring you suggest for sql-query would also help this patch, I think.
#9
Let's simply support this for the sql-dump command only. I've added this condition in _drush_sql_get_table_list(), so the behavior of the sql-sync is not changed, and table name expansion and validation only happens on the sql-dump command.
#10
I'm ambivalent about leaving sql-sync out.
I don't like
if ($command['command'] == 'sql-dump'). If this is going to be special-cased, it needs to be a little cleaner (i.e. by passing args rather than checking command strings).#11
Changed the condition to allow an argument instead:
drush sql-dump --structure-tables-key=common --friendly-table-names#12
Well, that is a little cleaner than the patch before, but I actually meant function arguments, not command line arguments. Sorry for the confusion. The problem with the patch in #11 is that you can still pass --friendly-table-names to sql-sync, and things will break. You could perhaps set --friendly-table-names in drush_sql_dump_execute, and make sure that it is not set in sql-sync, but in that instance you'd still cause things to break if you used wildcards in your structure tables and then accidentally used the --structure-tables-key with sql-sync.
I don't think we can in good conscience leave sql-sync out of this patch; I think it's going to need a full refactoring.
#13
yes, the same way you break today if you use the common key and don't have a table accesslog in your database. In the end you have to know what your doing when you're customizing drushrc.php, depending on what functionnality of drush you're using. I could well define a --structure-tables-key key for sql-sync and another for sql-dump. In my case, actually, I only use sql-dump so I don't have that problem.
#14
True, but if someone started off with using only sql-dump and didn't understand the limitations, switching to remote sql syncs would be a bit surprising.
How about
drush sql-dump --structure-tables-list=`mytables`, with 'mytables' replaced with a script (maybedrush script ...?) that processes your wildcards? That might get you by until sql-sync is refactored in a way that would allow your patch to work universally.#15
I'm giving up. Please re-open when Drush is ready for this...
#16
#17
in the meantime I've created a drush command dgb which (sadly) includes some forked functions from drush. I'm happy to align dgb with Drush whenever the issues above have been resolved.
#18
Setting to 'postponed' for later consideration.
#19
+1 for this feature. I'd like wildcards, too.
scor - Your patches fail for me, unfortunately - I'm running this in the base drush directory, is that right?
With the last patch doesn't work for me:
$ ~/drush$ patch -p0 < 698264_better_tables_key_11.patchpatching file example.drushrc.php
Hunk #1 FAILED at 67.
1 out of 1 hunk FAILED -- saving rejects to file example.drushrc.php.rej
patching file commands/sql/sql.drush.inc
Hunk #1 succeeded at 68 with fuzz 1 (offset 1 line).
Hunk #2 FAILED at 312.
1 out of 2 hunks FAILED -- saving rejects to file commands/sql/sql.drush.inc.rej
#20
Reopening to reconsider this feature for Drush 4.x. @Greg, I'll be happy to work on a patch for Drush HEAD and make this work for sql-sync, unless you think the situation has not changed since Drush 3?
#21
#716412: add a --delete-all-tables-first option for sql-sync ? is on my to-do list in the near future. sql-dump, sql-drop, sql-create and sql-import (the last two being new functions) will be rewritten to operate on a site alias provided as the first parameter. The site alias can be local or remote. @self is the default, so dump and drop will work as they do today. This refactor is likely to affect code here.
Also note that sqlite support is in-progress for drush in #698078: SQLite support; a lot of this code has already been committed.
I would welcome enhancements to structure-tables & c. with wildcards, but the patch would have to work with three databases and on local and remote targets (or at least not break horribly if used in an unsupported configuration).
My opinions; other maintainers would also have to agree to commit, as usual.
#22
#21 was a case of not quite meaning what I said. It's important to realize that there are multiple databases, but I think we'd be in danger of not making any progress if a patch had to work on all three databases before being committed.
It should work locally and remotely, though.
#23
@scor: I won't get to my sql-sync factor for a while. If you'd like, please feel free to work on this with the sql-sync code as-is. Just the mysql local and remote case is sufficient. Postgres already supports pattern matching in table names, so perhaps nothing is necessary for the pgsql case. (Extra credit: make the same syntax work in both mysql and pgsql.)
#24
Was just playing around with this myself. Wildcards in structure-tables seems really useful. Hopefully it doesn't die in the queues :)
Anyone know the sql rewrite well enough to gauge how entry-level this patch rework would be?
#25
The main issue is if you use wildcards in a context where the database is remote (e.g. sql-sync), things don't work. Not too hard to fix if you know how to use drush_invoke_sitealias_args to call the sqlq command with a 'show tables;' command. Patches welcome.
#26
I needed this to work, but I've never used sql-sync and I wouldn't know how to write code for it. This patch is basically a re-roll of #2 for Drush 4. I'm not even marking this as "Needs Review" since this is mostly a shim.
#27
I'm surprised there's so little interest in this neat feature! Appending to the title to hopefully get more attention.
#28
I'm unlikely to work on this in the near future.
#29
If anyone picks this up again in the future, it would be good to also cover the needs of #1446454: drush sql-dump : "mysqldump: Couldn't find table" if the structure-tables list is too inclusive, which are very similar to what is needed for wildcard handling.
#30
FYI, re #29, this feature is also supported in dgb.
#31
For people with the same concerns as me, i.e. for automated DB back-ups that grow unnecessarily large: you can alternatively use my forked version of AutoMySQLBackup, which supports wildcard table exclusion (e.g. exclude "cache%" to exclude "cache", "cache_filter", etc.): https://github.com/wimleers/AutoMySQLBackup.
#32
Major reroll, against
master(currently 6.0). It also cleanly applies to 5.1.- Based on #11.
- Kept scor's
_drush_sql_get_db_table_list(), but completely reimplemented it. Well, not quite; most of the logic is a verbatim copy from_drush_sql_drop(), with added logic to deal with remote DBs (for this, I struggled for a long time, but then stumbled upondrush_invoke_process($site_record, 'sqlq', array($query)), which makes it trivial!).- Kept most of scor's
drush_sql_expand_wildcard_tables(). However, I split out the validating of tables to a separate function, so that it can be reused elsewhere.- I've added
drush_sql_expand_table_selection(), which can be fed directly the output ofdrush_sql_get_table_selection()(which remains unchanged). This will then expand (which includes validation) the "skip-tables" and "structure-tables" table selections, and validate the "tables" tables. In fact, I only had to change one line of code insql/sync.sql.inc(the one where the table selections happens)!- This also finishes the TODO in
_drush_sql_drop()!It's now a *bliss* to use
sql-dumpandsql-sync. Finally it's no longer necessary to specify 'structure tables' on a per-site basis. No, now you can specify your exclusions *once*, for all sites, in a base-leveldrushrc.phporbase.aliases.drushrc.phpfile and never have to deal with this again!I spent the better part of my day on this. I've tested it thoroughly (local dump, sync remote to local) and it works flawlessly. This is my first Drush patch, so I'm probably not aware of all the conventions in the code, but I tried to find as many of the conventions as possible and follow them. I think the check for a missing site record may need to be handled differently, but other than that, it should mostly be good to go.
Let's get this done. It's a very useful feature that's been missing for too long!
#33
This looks pretty good. I think it's reasonable to require Drush on the remote machine to get this feature. I shouldn't have said otherwise in #3. It looks like you took care of my other concerns vis-a-vis sql-sync not behaving in a surprising way when using this feature with remote databases, which is the most important thing.
In
_drush_sql_get_db_table_list, you should use:$result = drush_invoke_process($site_record, 'sqlq', array($query), array(), array('override-simulated' => TRUE));If you do this, you will not need to save and restore the 'SIMULATED' context. Note that if the command is not remote, you can still use drush_invoke_process if you pass '@self' for the first parameter.
Speaking of contexts, it would be better to use drush_get_context / drush_set_context instead of
static $db_tables.#34
I tried implementing your
array('override-simulated' => TRUE)suggestion, but stumbled upon a problem: this prevents local queries from being executed. So I replaced the code I was using for local queries (which I got from scor's patch) with a similar line for local queries, replacing$site_recordwith'@self'.This works fine as long as we're not running in SIMULATION context. If we do, we'll just get
sql-query: SHOW TABLES;… even *with*array('override-simulated' => TRUE). I also triedarray('integrate' => FALSE, 'override-simulated' => TRUE)because that's what everything else seems to be using, but without the intended effect, unfortunately.So, in the above, I'll either need some help or we'll have to do it the way I did it right now.
The rerolled patch *does* use
drush_(g|s)et_context()successfully.#35
Okay, this is looking really close; I'll see if I can carve out some time to fix override-simulated for you.
#36
Great :)
#37
Any news? :)
#38
Hopefully Greg can get to this as it will be quite useful.
#39
Here's a fix for override-simulated; seems that the simulated global option did not have the 'never-propagate' flag set on it, so it was being passed through by backend invoke.
Backend invoke is now returning the table list just fine; however, the rest of the patch doesn't seem to be working for me. Adding different table skip or table add options does not seem to affect the sql-dump command that is generated.
#40
Okay, so #39 includes a Drush core bugfix if I understand it correctly?
Assigning back to me, will work on this as soon as I find the time, which may not be very soon as I'm starting at Acquia on Monday :)
#41
Yes, the core fix is here:
- $options['simulate'] = array('short-form' => 's', 'context' => 'DRUSH_SIMULATE', 'description' => "Simulate all relevant actions (don't actually change the system).");+ $options['simulate'] = array('short-form' => 's', 'context' => 'DRUSH_SIMULATE', 'never-propagate' => TRUE, 'description' => "Simulate all relevant actions (don't actually change the system).");
Congratulations on your new job. Hope this change brings you to the Drush queue some more. :)
#42
Hi,
I'm looking for this feature in order to exclude 'field_data_*' and 'field_revision_*' .
Is the patch ok to use? The syntax will be with * or with % for the wildcard?
Hope it'll work.
Thanks!
Thierry
#43
#39 does not yet work correctly; if you want to use this feature, you will need to either wait, or help to fix the patch.
#44
thanks Greg for the answer... no time nor enough experience to deal with this kind of issue, so I'll wait and keep watching the issue
#45
I would be thankful to get feedback on this alternative/complemetary approach:
#1811234: Add option to automatically find all Drupal cache tables for skip/structure table lists (via hook_flush_caches)
#46
Here is an alternative patch where I am altering sql sync through drush_hook_pre_sql_sync() to add a list of tables form the source database. It could be more generic but it may serve as a starting point.
#47
Doh! Here it is.
#48
I think I like this approach better. That example won't run on postgres, though. If you make your query "show tables;" and then filter with php, it would be more portable. Better yet, use _drush_sql_get_db_table_list() from the patch in #39.
#49
I have got a question regarding the _drush_sql_get_db_table_list() function at #39 in order to use it at #47:
+++ b/commands/sql/sql.drush.incundefined@@ -527,6 +604,96 @@ function _drush_sql_get_table_list($option_name) {
+function _drush_sql_get_db_table_list($db_spec, $site_record = NULL) {
How do I obtain the $db_spec information of the $source site alias when I am at drush_hook_pre_sql_sync()?
#50
As an alternative, here is another patch using SHOW tables and filtering with PHP.
#51
But this approach requires code instead of simple, declarative configuration?
#52
Seems like Greg is reviewing patches here. Thanks!
#53
Anyone want to move this along?
#54
#55
@greg.1.anderson, can you provide more feedback on whether this could stay as an example like the patch at #50 or should it be something configurable such as #39?
#56
#50 is less than ideal, as it uses --extra to pass a mysql-specfic flag, and hardcodes "/^foo_/" as the pattern. I think it would be fine to commit an example first, and make something configurable later, but even the example should be database-independent, and should have some way to pass paramters. See examples/sync_enable.drush.inc, which uses hook_drush_help_alter to add commandline options to sql-sync. Leave off the -N parameter, and just skip the first line of the result containing the column name.
#57
#50 is an example under examples directory. I will see how can I make it database agnostic based on your suggestions.
#58
Sorry, I garbled what I was trying to say in #56. What I meant was, it would be fine to make an example under the examples directory if it was configurable. Integrating it into the core sql-sync code is optional, although I'm not sure why we wouldn't want to do that.
#59
As I said in #51:
This is why I believe the approach I took (see #34 & #39) is superior: much simpler, less worries.
#60
Continuations from #39 would still be welcome.
#61
Having a solution in Drush by default like #39 would be really great. This is something I've always been meaning to file an issue for, and glad that it has been filed already.
#62
Anyone is welcome to move this along
#63
So it sounds like we want to continue #39. There is an open question in #49 for those more familiar with the internals, and I also ran into the same issue when implementing this.
#64
Hm, not sure why #49 never got answered. The code may have changed since it was asked, but here is how it currently works.
In drush_sql_sync_init(), the following is executed:
sitealias_get_databases_from_record($source_settings);sitealias_get_databases_from_record($destination_settings);
This will look up the $db_spec record from the source and destination alias records. This info is used later by sql-sync; example usage is as follows:
// recover the source settings$source_settings = drush_sitealias_overlay_options(drush_sitealias_get_record($source), 'source-');
$source_db_url = drush_sitealias_get_db_spec($source_settings, FALSE, 'source-');
You could run these same two lines in the pre-sql-sync hook to get the $db_spec. Note the extreme evil-ness going on here: the code is putting a $db_spec object into a variable that is called $xxx_db_url. Sorry -- I don't know how that happened. It would be very helpful to rename all of these _db_url variables to _db_spec; that would make things much easier to understand.
#65
Attached is a re-roll of the patch from #39. When testing this with the following in
drushrc.php:$options['structure-tables']['common'] = array('cache', 'cache_*', 'history', 'search_*', 'sessions', 'watchdog');And a
sql-dumpcommand of:drush sql-dump --result-file=/tmp/dump.sql --structure-tables-key=commonThe
mysqldumpcommand generated is:mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.history --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_total --ignore-table=drupal.sessions --ignore-table=drupal.watchdog && mysqldump --no-data --no-autocommit --single-transaction --opt -Q drupal --host=localhost --user=drupal --password=drupal cache cache_block cache_bootstrap cache_field cache_filter cache_form cache_image cache_menu cache_page cache_path cache_update history search_dataset search_index search_node_links search_total sessions watchdog >> /tmp/dump.sqlCalling system(mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.history --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_total --ignore-table=drupal.sessions --ignore-table=drupal.watchdog && mysqldump --no-data --no-autocommit --single-transaction --opt -Q drupal --host=localhost --user=drupal --password=drupal cache cache_block cache_bootstrap cache_field cache_filter cache_form cache_image cache_menu cache_page cache_path cache_update history search_dataset search_index search_node_links search_total sessions watchdog >> /tmp/dump.sqlTesting the same thing when skipping tables. Assume the following is in
drushrc.php:$options['skip-tables']['common'] = array('cache*', 'search_*');Running
sql-dumplike:drush sql-dump --result-file=/tmp/dump.sql --skip-tables-key=commonResults in a
mysqldumpcommand like:mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_totalCalling system(mysqldump --result-file /tmp/dump.sql --no-autocommit --single-transaction --opt -Q drupal --host=localhost --user=drupal --password=drupal --ignore-table=drupal.cache --ignore-table=drupal.cache_block --ignore-table=drupal.cache_bootstrap --ignore-table=drupal.cache_field --ignore-table=drupal.cache_filter --ignore-table=drupal.cache_form --ignore-table=drupal.cache_image --ignore-table=drupal.cache_menu --ignore-table=drupal.cache_page --ignore-table=drupal.cache_path --ignore-table=drupal.cache_update --ignore-table=drupal.search_dataset --ignore-table=drupal.search_index --ignore-table=drupal.search_node_links --ignore-table=drupal.search_totalThe patch seems to be working for me. Is there any specific things in particular that did not work per your comment in #39?
#66
@posulliv, please remove trailing white spaces from your patch at #65.
#67
Sorry about that. My IDE looks to have inserted trailing white spaces on new lines.
Attached is an updated version.
#68
+++ b/commands/sql/sql.drush.incundefined@@ -355,6 +355,83 @@ function drush_sql_get_table_selection() {
+function drush_sql_expand_table_selection($table_selection, $db_spec, $site_record = NULL) {
Looks like this needs a blank line in between plus a docblock that explains what the function does.
+++ b/commands/sql/sql.drush.incundefined@@ -355,6 +355,83 @@ function drush_sql_get_table_selection() {
+function drush_sql_get_expanded_table_selection($db_spec, $site_record = NULL) {
Docblock please.
+++ b/commands/sql/sql.drush.incundefined@@ -594,6 +670,95 @@ function _drush_sql_get_table_list($option_name) {
+ exit;
Since this is an error, we may want to do exit(1);
#69
Thanks for the review comments @juampy
Latest patch addresses the issues brought up in #68.