Run sanitize without running full sql-sync [#1112142]

Comment	File	Size	Author
#15	1112142-no-prefix.patch	644 bytes	djdevin
#2	sanitize-command.patch	2.13 KB	msonnabaum

Comment #1

greg.1.anderson commented 31 March 2011 at 22:25

Assigned:	Unassigned	» greg.1.anderson
Status:	Active	» Closed (won't fix)

I must admit that I was, and still am, rather ambivalent about the ability to run sanitize without sql-sync. I don't know that adding this capability would actually help your particular use case much, because a run of sanitize on an already-sanitized db is a different thing than a run of sanitize on an unsanitized db.

I think that you would do better to make yourself a small empty Drupal site, and then copy some records from your large db, or generate some random sample records. Then use your small site to test sanitize, targeting another test db. You know on each run that you'll be starting from the same unsanitized data, which is a plus. I can't imagine that you would actually be more happy re-running sql-sanitize on a huge already sanitized database; it's not as good of a test case, and it's got to take some time to run sanitize on the big db even without the sql-sync op thrown in.

I am also really concerned that if 'sanitize' operated on any database that it might cause someone a lot of regret if they accidentally ran it on the wrong database. sql-sync will not destroy just any database; you have to name the target site as a parameter to the sql-sync command. If there was an sql-sanitize command, then cd /my/production/docroot && drush sql-sanitize -y would be a very bad thing to do. I know, it's not helpful to go too far in protecting the user from themselves, but in this case I don't think that the usefulness of the proposed feature is great enough to justify the added risk.

Log in or register to post comments

Comment #2

msonnabaum commented 25 May 2011 at 19:19

Version:	7.x-4.4	»
Status:	Closed (won't fix)	» Needs review

Status	File	Size
new	sanitize-command.patch	2.13 KB

I regularly find myself wanting this command, so here's a first stab at it.

I think it's a good illustration of the fact that we should probably have it abstracted from sql-sync either way.

Log in or register to post comments

Comment #3

moshe weitzman commented 6 June 2011 at 15:38

I'm on the fence about danger versus utility here. I suspect enough people will want it that we should provide it.

The confirm message should be more explicit about the database it is operating on. See the confirm for sql-sync for an example.

Log in or register to post comments

Comment #4

greg.1.anderson commented 7 June 2011 at 22:29

Status:

Needs review

» Needs work

I suppose I can go along with this, reluctantly, as long as we make it a hidden command and give it a really big warning message.

I still cannot think of any good use case that does not involve running sql-sync first. The best I can come up with is an instance where you forget to run sanitize and later want to sanitize a dev site. In that instance, though, I still think you'd be better off making yourself a workflow that automatically sanitizes your prod -> dev syncs, and not trust yourself to hand-sanitize at the right time. I don't know, I guess people trust themselves not to sync dev -> prod, so maybe we can trust them with a manual sanitize command too.

Maybe sanitize should write a marker to the database (variables table?), and sql-sync should print an even scarier warning when you sync from a source db that is already sanitized. I suppose this feature is completely unrelated to having a separate sanitize command, though.

Log in or register to post comments

Comment #5

msonnabaum commented 8 June 2011 at 03:43

Here's your scenario: I have drush sql-dump making backups on the hour on my server and I often need to just pull one of those dumps down and import it locally. It may not be the latest dump, so sql-sync doesn't help me. Also, if I need to pull down a db from prod, I don't necessarily want to ever sql-sync with prod, so I prefer to just pull down one of the backups. Call me paranoid, but it just doesn't buy me all that much in that scenario.

In general, I don't think we should get too hung up on defensible workflows. sql-sync is an awesome command, but I honestly never used it until Moshe showed it to me on a project. I think it's pretty natural for people to be scared off by the more "full service" drush commands, so I don't see any harm in having a few more simple commands.

Really, we probably should have implemented sql-sanitize on its own from the beginning whether we exposed it or not and just had sql-sync use it. It certainly wouldn't hurt us to decouple some functionality from our giant-commands.

Log in or register to post comments

Comment #6

greg.1.anderson commented 8 June 2011 at 16:39

You actually can use sql-sync in your scenario above:

$ drush sql-sync @live @dev --no-dump --source-dump=/data/backups/mydb-2011-06-01.sql -s
Calling system(rsync -e 'ssh ' -rlptz --exclude=".bzr" --exclude=".bzrignore" --exclude=".bzrtags" --exclude=".svn" wwwadmin@myserver.com:/data/backups/mydb-2011-06-01.sql /tmp/mydb.sql.O9E4jA);
Calling system(psql --dbname=mydb --host=localhost --port=5432 --username=www-data -q < /tmp/mydb.sql.O9E4jA);

This is probably more convenient than your current workflow in terms of number of characters to type, but I admit it is probably easier to remember the individual steps of rsync, import, then sanitize.

I agree that it is a good idea to break commands up into their component parts; just as pm-update = pm-updatecode + updatedb, it would be better architecturally if sql-sync = sql-dump + sql-import + sql-sanitize. I have wanted to do the sql-dump / sql-import factor for a while, rewriting sql-dump to accept a site alias parameter, handling the remote dump + rsync step if the site is remote, and also implement sql-import the same way, handling an rsync + remote import via ssh if necessary.

I would still prefer to see sanitize be a hidden command, but I agree with the premise of factorization in #5.

Log in or register to post comments

Comment #7

moshe weitzman commented 8 June 2011 at 16:47

I'd be OK with a hidden command.

We should add an option to `help` command such like --show-hidden or somesuch. We'll tuck some less prominent commands there. Seems very analogous to git command families: plumbing versus porcelain. (I love that bathroom imagery).

Log in or register to post comments

Comment #8

msonnabaum commented 8 June 2011 at 20:03

I have no objections to a hidden command. I just want to stop maintaining my patch :)

I agree that we should work towards the git model of plumbing vs porcelain commands. It'll give the power users what they need while also hiding potentially site-destroying commands from new folks.

Now that we're in agreement that sql-sync needs to be broken up, I could see the argument for just doing the simplest possible refactor now rather than committing my duct-tape version above. I can go either way on that. Maybe we just start by implementing sql-import and sql-sanitize first and then start moving bits of sql-sync out into those?

Log in or register to post comments

Comment #9

greg.1.anderson commented 8 June 2011 at 20:25

Yeah, I'm willing, I just need to get some other work out of the way first -- various Windows issues (blocked on #1058874: drush_backend_invoke buffers output till end of command. Show progress.), etc.

Log in or register to post comments

Comment #10

moshe weitzman commented 8 June 2011 at 20:30

Status:

Needs work

» Reviewed & tested by the community

I'm fine with Mark committing the patch here (add the HIDDEN) and then we'll strive to get all that refactoring done soon. Meanwhile, mark can stop maintaining a local patch :)

Log in or register to post comments

Comment #11

msonnabaum commented 1 July 2011 at 02:41

Version:		» All-versions-4.x-dev
Assigned:	greg.1.anderson	» msonnabaum
Status:	Reviewed & tested by the community	» Patch (to be ported)

Committed as a hidden command in 5.x.

Marking for backport since I think its pretty safe.

Log in or register to post comments

Comment #12

msonnabaum commented 16 July 2011 at 20:00

Status:

Patch (to be ported)

» Fixed

Now in 4.x.

Log in or register to post comments

Comment #13

30 July 2011 at 20:01

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Comment #14

greggles

he/him

English

Denver, Colorado, USA

commented 30 September 2011 at 20:39

Are there any examples of mymodule_drush_sql_sync_sanitize hook in contrib?

I see sql_drush_sql_sync_sanitize but it looks pretty complex. I guess my ideal is that the function could just return sql statements, maybe in an array keyed with weights on them so they can run in different orders.

Log in or register to post comments

Comment #15

djdevin

he/him

English

Philadelphia

commented 25 October 2011 at 20:10

Version:	All-versions-4.x-dev	»
Category:	feature	» bug
Status:	Closed (fixed)	» Needs review

Status	File	Size
new	1112142-no-prefix.patch	644 bytes

Doesn't work with prefixed databases, patch attached. Didn't think the problem was big enough to warrant another issue.

Log in or register to post comments

Comment #16

greg.1.anderson commented 25 October 2011 at 20:26

#15 looks good.

@greggles: See drush.api.php; in particular, hook_drush_sql_sync_sanitize. If your sanitization operation runs invariantly (whenever --sanitize is specified for sql-sync, or whenever sql-sanitize is executed), then this is all you need. If you add any options to control how the sanitization works, then you will also need to declare them via hook_drush_help_alter, which is documented in the same file.

The sanitize ops do not support weights; they run in arbitrary order. If you need to order them, it would be a reasonable feature request to add them.

It was always sort of my vision that contrib modules that maintained sensitive information in the tables they add would also define a sanitize hook. If you spin me around three times and push me in the right direction (i.e., let me know where you want it), I'll write some documentation for it. There must be a place in the module developer's guide that we could drop it.

Log in or register to post comments

Comment #17

moshe weitzman commented 5 November 2011 at 02:32

Status:

Needs review

» Fixed

committed to 5 and 4. back to fixed.

Log in or register to post comments

Comment #18

greg.1.anderson commented 10 November 2011 at 08:06

Status:

Fixed

» Active

Backed out #15 from master and 7.x-4.x; it does not work.

Log in or register to post comments

Comment #19

djdevin

he/him

English

Philadelphia

commented 10 November 2011 at 16:09

Why doesn't it? Seems to be the only thing that does work when running on a prefixed DB.

Log in or register to post comments

Comment #20

greg.1.anderson commented 10 November 2011 at 16:36

This does not work on master due to #1219850: sqlq removes all curly braces from sql, including those that are in the text values.. I rolled it out of master because it was causing the sql-sync functional tests to fail. I rolled it out of 7.x-4.x because we must correctly support new features in master before we backport them to drush-4.x.

We can put this back in after the above-mentioned issue is resolved.

Log in or register to post comments

Comment #21

osopolar

German

🇩🇪 GER 🌐

commented 5 December 2011 at 23:08

As already suggested by Moshe Weitzman in #3:

The confirm message should be more explicit about the database it is operating on. See the confirm for sql-sync for an example.

Current contains to less information:

Do you really want to sanitize the current database?

At least it should contain the database name like

Do you really want to sanitize the current database $db_name?

Log in or register to post comments

Comment #22

moshe weitzman commented 11 February 2012 at 14:38

Category:

bug

» feature

Log in or register to post comments

Comment #23

moshe weitzman commented 11 February 2012 at 16:44

Status:

Active

» Fixed

This was committed to master and 4.x.

Log in or register to post comments

Comment #24

25 February 2012 at 16:50

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Run sanitize without running full sql-sync

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

News items

Our community

Documentation

Drupal code base

Governance of community