Not sure if this is too "featurey" for Drush core, but it's something that Drupal.org needs as well for its DB exports, and comes up on most client projects.
If you take a copy of the production database into dev/staging, you might want to test some e-mail related functionality without spamming actual users. You can set Devel module to log emails, but that doesn't actually test that SMTP is working. So it'd be great to be able to run a command to reset all e-mail addresses to something safe, like replacing all domains with @$base_url or similar.
Better still is this trick I learned from Fen at CivicActions, where you can give it a gmail address, and it can encode all e-mail addresses as $gmail_account . '+' . $account->name . '@gmail.com';
. This way, any mail sent from your server could be captured in a dummy account, and you could clearly see which users were affected by a bulk mailing, for example.
Comment | File | Size | Author |
---|---|---|---|
#51 | sanitize.sh_.txt | 1.93 KB | fen |
#43 | sanitize-bootstrap-2.patch | 26.54 KB | greg.1.anderson |
#38 | sanitize-bootstrap.patch | 7.15 KB | greg.1.anderson |
#33 | sanitize-4.patch | 21.75 KB | greg.1.anderson |
#31 | sanitize-3.patch | 20.93 KB | greg.1.anderson |
Comments
Comment #1
moshe weitzman CreditAttribution: moshe weitzman commentedWe discussed this while designing sql-sync but postponed the feature at that time. First mentioned at the epic issue #460924-10: Remote site aliases for Drush (Proposal and implementation). Especially see point 3 in #98 where Greg suggests doing a sql-sync to a backup DB and manipulating there. I think this request comes down to documentation. We need to document some best practices with drush since they are non obvious. I've got a plan - stay tuned!
Comment #2
greg.1.anderson CreditAttribution: greg.1.anderson commentedRelated to this, I wanted to factor sql-sync into sql-import and sql-dump (create the former, and move the remote-db features from sql-sync into sql-dump). Then, sql-sync just does setup and fires off an sql-dump followed by an sql-import. This would make it easier to do extensions via the drush hook mechanism.
Comment #3
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere is a start:
Postgres:
drush @gklive sqlq "select 'myaddr+' || replace(mail, '@', '_') || '@gmail.com' from users where uid <> 0;"
Mysql:
drush @gklive sqlq "select concat('myaddr+', replace(mail, '@', '_'), '@gmail.com') from users where uid <> 0;"
This could be changed to an 'update' statement with a little more work, and from there, it would be easy enough to make this a post-sql-sync hook, even without the refactor in #2.
I'll continue to post here, as by coincidence, I'm doing a similar operation on one of my sites right now.
Comment #4
moshe weitzman CreditAttribution: moshe weitzman commentedSplitting those two parts (#2) into two commands makes sense, but I slightly worried we will introduce problems like what 'update' command has. update uses --backend to isolate its second step (updatedb) and many people have their php configured oddly and get error on updb step. lets consider this if we find ourselves needing to issue the sql-import command with --backend.
hopefully i have been coherent here.
Comment #5
greg.1.anderson CreditAttribution: greg.1.anderson commentedYes, I understand completely. When I was experimenting with this before, I just called drush_invoke (or one of the calls in the command-dispatch chain) directly and avoided that problem nicely.
Comment #6
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere is some better SQL.
Postgres:
update users set mail = 'myaddr+' || replace(mail, '@', '_') || '@testdomain.org' where uid <> 0 and strpos(mail, '@testdomain.org') = 0;
Mysql:
update users set mail = concat('myaddr+', replace(mail, '@', '_'), '@testdomain.org') where uid <> 0 and instr(mail, '@testdomain.org') = 0;
Note that we skip users who are already using email addresses @testdomain.org. This has the dual effect of making this transform stable -- you can run it multiple times without corrupting / complicating the test email -- and also it allows you to set up test users in the main user table that receive the test emails at their own addresses rather than the unified '+' address, if you want.
The thing I was thinking about here is that if you have >100 users, you wouldn't really want to send hundreds or thousands of messages to gmail just to test notifications. This sort of environment really demands that you have a test SMTP server on your local domain.
Even with a local SMTP server, though, it might be nicer to test notifications with 10 users rather than thousands of users, so I was considering also doing a drop from users where the uid > 10 or something like that. It would also be good if this dropped cross-references to users that were dropped (e.g., also drop all nodes where the uid > 10), just to avoid finding 'bugs' that caused by chopped sql tables. Updating the schema to cascade delete would be another way to do this. The problem here is that there may be a lot of tables with uid references, depending on what modules are installed.
I started turning greenknowe.org into a local-community social networking site about two weeks ago, and so far have << 100 users, so I'm not going to worry about the 'drop' issue for a while. I'll provide a post-sql-sync hook to do the above mail transformation in ~< a week, though.
Note also that none of this addresses the security issues that Moshe in #1 (if you don't want to let your devs to even be able to see your user's email addresses), but it does address the test needs to not spam your users.
Comment #7
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere is an example post-sync hook for sanitizing email addresses. Drop the sanitize.drush.inc file into your .drush folder, and then call it like this:
drush sql-sync @gklive @gkdev --sanitize --test-domain=greenknowe.org --test-addr=myaddr
sql-sync will behave as it currently does unless the --sanitize option is used. Note that you can also add the sanitize option to your site alias, like so:
If you do this, then your email addresses will be sanitized every time that alias is used as the destination target of an sql-sync operation.
Should work with both postgres and mysql, but I have not tried mysql.
Comment #8
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere's the file; rename to
sanitize.drush.inc
and put in$HOME/.drush
.Comment #9
moshe weitzman CreditAttribution: moshe weitzman commentedNice.
Any interest in changing all passwords as part of sanitize or a separate option? You can do it quickly in SQL with MD5('password'). I think it is OK for all passwords to be the same. In Drupal 7, this technique yields unusable passwords (by default) since we use php to generate very secure (and slow to generate) passwords. See http://drupalcode.org/viewvc/drupal/drupal/scripts/password-hash.sh?view... for an example. I think D7 should just do the same md5('password') - its OK in this case that passwords are not usable. One could make them usable with a swappable password.inc if thats really desired.
Comment #10
moshe weitzman CreditAttribution: moshe weitzman commentedI do think sanitize is core-worthy once we scramble passwords.
Comment #11
greg.1.anderson CreditAttribution: greg.1.anderson commentedSure, I'll throw in a password scramble and see if it works on D6/D7 postgres/mysql; easy to code, just need some time to test.
Comment #12
greg.1.anderson CreditAttribution: greg.1.anderson commentedBefore this goes into core, I want to implement seatbelts.
A new hook, called from drush_sql_post_sql_sync, that allows drush modules to sanitize tables and return an array of tables touched might also be a nice addition.
Comment #13
moshe weitzman CreditAttribution: moshe weitzman commentedThose seatbelts feel a bit over the top for me. And without the seatbelts, the new hook doesn't provide much value over the post_sql_sync that drush_invoke offers.
Comment #14
greg.1.anderson CreditAttribution: greg.1.anderson commentedTrue. I'll test and commit as-is for now.
Comment #15
greg.1.anderson CreditAttribution: greg.1.anderson commentedOh, one advantage of including a new hook is that you could have a single drush_confirm that controlled all of the sanitizations in a single switch.
I suppose if I put the built-in --sanitize (with confirm) in the main function, and left post_sql_sync for the modules, that would have the same effect. Each module would just need to test --sanitize by convention.
Comment #16
moshe weitzman CreditAttribution: moshe weitzman commentedDoes sanitize really need a confirm? We are operating on a copy of the data.
Comment #17
greg.1.anderson CreditAttribution: greg.1.anderson commentedYes, I think that's a valid point. I'll just take out the confirm, and move sanitize to drush_sql_post_sql_sync, to serve as a template for other sanitize operations in other modules.
Comment #18
moshe weitzman CreditAttribution: moshe weitzman commentedComment #19
greg.1.anderson CreditAttribution: greg.1.anderson commentedThe thing that's bugging me right now is that in order to sanitize passwords per #9, I need to bootstrap drupal, and sql-sync doesn't do that. Note also that the db to be sanitized could be remote, although I imagine in the typical case it would be local.
There are three ways I could do it.
drush sql-query
to fix up password-modifying queries based on the Drupal version (just as it currently unifies 'show tables' between psql and mysql).drush status
on the target drupal instance and cache the result.The thing I realized recently is that confirmation pre-sanitize is actually very important, because users are going to want that sense of security that they've done their persistent settings correctly if they put sanitize options into their aliases. A confirmation will show that they are not about to sanitize their live site. However, the confirm should come up front, as part of the existing sql-sync confirm (just as the existing confirm tells you if you have excluded any tables from the sync).
Therefore I think that #3 is best.
Just thinking aloud, but feel free to chime in if you have any comments.
Comment #20
moshe weitzman CreditAttribution: moshe weitzman commentedI'll have to re-read this again to fully grok it. Sounds like it got more complicated than I had hoped. I'm not sure confirmation merits it. We could recommend --simulate to folks who really want to see whats gonna happen.
I realize now that you don't need to bootstrap to sanitize passwords.
That last update statement needs to be a db_update() in D7. We could try to wrap UPDATE and INSERT statements for all drupal versions like we did for drush_db_select()
Comment #21
greg.1.anderson CreditAttribution: greg.1.anderson commentedThe description sounds complicated, but simple sanitizations like the one above won't be too complicated in code.
Does drush_drupal_major_version work without bootstraping drupal?
Comment #22
moshe weitzman CreditAttribution: moshe weitzman commentedYes. All it needs is DRUSH_DRUPAL_ROOT. It includes system.module or bootstrap.inc (D7) as needed.
Comment #23
greg.1.anderson CreditAttribution: greg.1.anderson commentedNote to self: there are more sanitization ideas in #636340: Provide generally available, regular exports of already public data from the drupal.org database.
Comment #24
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere is the patch, per #19. Needs a little tweaking before commit, but ready for review. Not too many extra lines to get the confirmation messages -- really worth it, in my opinion, but let me know if you have comments or concerns.
Not sure how many, if any, of the ops from the issue mentioned in #23 should be considered for drush core.
Comment #25
moshe weitzman CreditAttribution: moshe weitzman commentedLooking very nice. Lets make sure to document the new options and add doxygen for new functions.
looks like debug code
trailing whitespace. happens several other times.
a bit verbose. if ($messages) will do
Comment #26
greg.1.anderson CreditAttribution: greg.1.anderson commentedregarding the
if (drush_get_context('DRUSH_SIMULATE') || drush_get_context('DRUSH_VERBOSE'))
, the thing I was noticing was that in simulate mode, sqlq outputs the operation being executed, which contains the name of the temporary file that contains the query, but it does not show the query itself. I wanted to make sure that the query was displayed in simulated mode. Doing it during verbose operation seemed like a good idea at the time; I'll take that part out.I'll submit a new patch with the other changes + more comments shortly.
Two more possible additions: the current version never sanitizes the admin account; an option could be added to include admin in the updates. Also, the current version transforms the email address, so you can tell, for debugging purposes, who the email was supposed to go to. Perhaps another option could obscure the email address, removing it from the database.
Comment #27
moshe weitzman CreditAttribution: moshe weitzman commentedI do think we should sanitize admin account and i do think that wiping email addresses (like we do for passwords) is even more important than transforming. I consider transforming emails to be specialized enough that it ought to be in drush_extras or elsewhere.
Comment #28
greg.1.anderson CreditAttribution: greg.1.anderson commentedSounds good.
Comment #29
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere's a cleaned-up patch with all of the suggestions from #25 and #27.
Comment #30
moshe weitzman CreditAttribution: moshe weitzman commentedLooking good. We might want to mention the sanitize API in the command help.
lets indent the lines below --sanitize
lets state the default value for each option.
the meaning of 'test' in the last two options is not clear from reading the help. anyway, i think those will be moved to contrib is favor of resetting email addresses.
I've since learned about the -e option of mysql command so we could actually stop using a tmp file for sql-query. thats for another issue though.
lets actually use the function name here: drush_sql_register_post_sync_op(). that way it hyperlinks in the api docs.
Comment #31
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere's an updated patch. --test-email and --test-domain are now both gone in favor of --sanitize-email, which takes the form
--sanitize-email="user+%uid@localhost"
. Valid substitutions are %uid, %login and %mail. Note that we shouldn't just move the email option to contrib; it's very important to be able to test email notification operations on a dev site, so I think it should be possible to set the address in core. The new code allows emails to be transliterated or obscured with a minimum of extra options and only a handful of lines of code.I think we should stick with using temporary files with mysql and psql in the sqlq command. It's true that we could try to switch to -e, but then we would need to worry about escaping rules, which may or may not be platform dependent... I really don't want to have to support "sqlq bug with postgres under Windows", and temp files are the cleanest way to stay clear of this possibility.
Other changes made as suggested.
Comment #32
moshe weitzman CreditAttribution: moshe weitzman commentedI think the code would be a bit more readable if we refactor password and email sanitizers into their own functions.
Minor code formatting stuff below. Feel free to commit after those are resolved. I like this new functionality a lot. Maybe Civicactions can finally drop their pushdb stuff :)
a bit awkward text here
typo: concatenated
leading whitespace. a few more of these.
looks like we have tabs instead of spaces here. dreditor is reporting a problem
Comment #33
greg.1.anderson CreditAttribution: greg.1.anderson commentedI did not want to split the email and password sanitizers into separate functions. Two reasons for this. First, separate functions would mean that there would be two sql queries needed, rather than one; that's minor. The more significant issue is that we only have one pre-sql-sync hook available to us in the sql command file; I could have drush_sql_pre_sql_sync call two separate functions that are not drush hook functions, but I don't know if that would fully satisfy the purpose of simplifying the example.
Instead, I decided to put a really simple example hook in drush.api.php. I think this will make it easy for savvy users to start making their own custom sanitization functions. I think that pushdb could now be replaced with sql-sync + a pre-sync hook function.
The code committed is attached here; we can keep discussing this if you'd like to see a different solution to the simplification issue.
Comment #34
greg.1.anderson CreditAttribution: greg.1.anderson commentedI'm re-opening because I realize that sanitize is missing a critical feature. It should be possible for modules such as project-issues to add drush hooks to define sanitization options to clean up the tables they manage. If this were done consistently, a drush project could be easily sanitized without (a) every module user understanding the schema of the module and (b) customizing the sanitize scripts frequently or testing 'if table x exists then...'.
There are four ways this could happen:
1a. sql-sync could call another function sql-sanitize that bootstrapped to the source site. This could still be done in such a way that you got the prompt prior to the sync by calling with --no to prevent it from sanitizing.
1b. sql-sync could call another function sql-sanitize-preview that bootstrapped and reported back what needed to be done. Maybe safer than 1a, which calls 'sanitize' on the site that you do not want to sanitize...
2a. sql-sync could clear the commandfile cache and force a bootstrap to whichever site of src or dest was local. (Would probably work; need to investigate.)
2b. sql-sync could find the drupal root and site folder of src or dest, whichever was local, and force-add all of the *.drush.inc files it could find to the commandfile cache.
1a and 1b are the cleanest, but they create new commands that will appear in drush help.
2b might be a little cleaner operation-wise (but more "hackey" code-wise), but then sanitize would not always work right when the destination is local (most common case) unless the user was careful to rsync the files before the database after upgrading modules whose schema changed.
I think I favor 1b or 2b, but have not decided. Comments on approach welcome.
Comment #35
moshe weitzman CreditAttribution: moshe weitzman commentedNot sure I follow. These modules could use the API you just created. drush_project_issue_pre_sql_sync() could choose to add a new query when --sanitize is true or it could check for a new --sanitize-project-issue option.
I think all thats needed is a way for project_issue to document its new option. We need a hook_command_alter on command definitions. That would also be useful for injecting personal command aliases.
Comment #36
moshe weitzman CreditAttribution: moshe weitzman commentedOh, I get it now. project_issue isn't even loaded. I feel like bootstrap is committing patricide this week.
Yet another option is to require admins to use --config to load the project_issue.sanitize.inc or ask admins to move/symlink that file to ~/drush or /etc/drush. They can't copy the file there since that leads to duplicate function error.
Comment #37
greg.1.anderson CreditAttribution: greg.1.anderson commentedAs far as usability is concerned, I would favor a convention where Drupal modules that add sanitization steps key off of --sanitize, not --sanitize-project-issue, so that you automatically get all of the recommended sanitizations for all of your enabled modules. Moving / symlinking files isn't as good, usability-wise, because it's not automatic enough, and also will affect sites that don't have that module enabled.
I'll try coding up #34.2b and see how it comes out.
I also agree that it would be good to allow commandfiles to alter the command definition of other commands for just this sort of purpose.
Comment #38
greg.1.anderson CreditAttribution: greg.1.anderson commentedThis patch fixes the sql-sync bootstrap problem so that site modules can define pre-sql-sync hooks in their drush commandfiles.
The first thing that this patch does is define drush_hook_init, which is called in drush_invoke prior to the creation of the commandfile list. This is done so that the hook init function can add items to the commandfile list, e.g. through bootstrapping.
n.b. drush already had a hook for commandfile initialization, but that was different, as it was per commandfile, and this hook is per command.
n.n.b. Prior to the introduction of drush_hook_init, drush_bootstrap_max never really worked right. It would in fact bootstrap the site, but it was always called to late for any site thus bootstrapped to participate (via drush commandfiles in included modules) in the drush hook mechanism. This means that without drush_hook_init, drush help could never have a meaningful command-alter hook. If this patch is accepted, I'll go back and move drush_bootstrap_max into the init function for all commands that use it.
The next thing that this patch does is add a drush option 'enabled-commandfiles-only' that, if set, will subvert the loading of commandfiles during the DRUSH_BOOTSTRAP_DRUPAL_SITE phase so that the more reliable process in DRUSH_BOOTSTRAP_DRUPAL_FULL can be used instead.
Finally, a few lines of code were added to call bootstrap_max on a particular site alias record.
With these changes, I created a test file sites/all/modules/project_issue/project_issue.drush.inc:
With this file in place, the project issue sanitization operation was added iff the project issue module is enabled on the local site.
n.b. it might be prudent to call sql-sync twice if the source site is remote, and modules were recently enabled or disabled. Since drush uses the local site to control sanitization rather than the source site, under some circumstances a sanitize option might be missed. Using a remote command to control sanitization (e.g. #34.2a) would be quite a bit more complicated, so overall I think I am quite happy with the attached solution.
Comment #39
moshe weitzman CreditAttribution: moshe weitzman commentedWe're at the point where we need a diagram of the drush bootstrap. This is getting complicated enough that perhaps a phone call is in order.
I like the skipping of commandfiles from disabled modules. Can we generalize that further? It seems like we should skip anytime we know that we are destined for FULL_BOOTSTRAP (e.g. cron, cache-clear, ...)
A different way to fix this is to change drush_invoke() sequence a bit. It assumes that the list of commandfiles is static. If we reran $list = drush_commandfile_list(); after each $variation, we would pickup newly discovered command files. So if a 'pre' hook did a bootstrap level change to FULL, the new commandfiles would be recognized in later $variations. Not sure if this is any better.
If we go with the new hook, it needs to be documented in API file.
this line has trailing whitespace. happens several times (mostly in blank lines). dreditor is very good at highlighting these. just getting back into using it.
I need to research more, but I think commands like updb and Aegir commands would object to this statement. They want to control the bootstrap process and here we essentially tell them that all their work has to be done in this new hook? Note that my suggestion above makes this recommendation unneeded.
drush_drupal_major_version_of_site() is now unused.
Comment #40
greg.1.anderson CreditAttribution: greg.1.anderson commentedI considered the drush_invoke alternative first, but rejected it because the current algorithm just makes a long list of function names, and it's not currently clear where one section stops and the next begins. Based on your comments above, though, I think it's better to go this route anyway. It can be solved just by breaking the current list into a list of lists, or something like that.
I'll re-roll the patch using this technique.
With the current drush bootstrap, we keep stepping through the bootstrap levels until we find a command that matches the bootstrap level, so this is currently impossible. Now that commands can't have spaces, I could rewrite this too; then we would know in advance how far we were bootstrapping in advance, because we could find the command right away. Might be time to just rewrite the whole stack..
In "the new world", then, we would find the command first and call bootstrap_max to the level desired. If a command wanted to bootstrap further, it could just call bootstrap_max again.
Comment #41
greg.1.anderson CreditAttribution: greg.1.anderson commentedComment #42
moshe weitzman CreditAttribution: moshe weitzman commentedI think that the drush_invoke() loop can be refactored a little such that we add a helper function that accepts a commandfile list. The contents of that commandfile will get refetched after each invoke function gets called. that way newly discovered commandfiles are in the game.
Comment #43
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere is the "rewrite" of the drush bootstrap process -- although the required changes were substantially fewer than I initially thought. Note that this patch also fixes #698102: Remove backwards compatibilty for spaces in commands.
I started off this version of the patch by removing the new command init hook, but I ended up putting it back in. It is still valid to bootstrap in drush_hook_command() if you really want to, but if you do that, you'll be denying any of the commandfiles loaded by the bootstrap from participating in the "_validate" and "_pre" stages of the drush command dispatch. If you bootstrap in drush_command_init(), then the new commandfiles can define policies that affect the current command in drush_hook_command_validate(), etc. So, in short, init is the preferred place to bootstrap, but Aegir may continue to bootstrap in command() with no ill effect (and in fact it will work better than it used to).
I did not go back and make 'help', etc. bootstrap in _init(). I did add _init to drush.api.php.
The "shortcut" commandfile searching done in DRUSH_BOOTSTRAP_SITE is now only done for commandfiles in the global context (/etc/drush, $HOME/.drush, etc.) that explicitly request a SITE, CONFIGURATION or DATABASE bootstrap. Global context commands that request a FULL or LOGIN bootstrap, or any commands found in an enabled module of a bootstrapped site, will always get at least a FULL bootstrap, regardless of what they request. Drush will now do a FULL bootstrap before considering any site-wide drush commandfiles, so only commandfiles in enabled modules will ever be seen.
Note that I added a new function drush_bootstrap_to_phase that looks almost exactly like drush_bootstrap_max, except the former returns the result of the last bootstrap operation, and the later returns the highest phase reached. I needed the former result code, but could not be sure that some contrib command might depend on the result code of bootstrap_max. There was no elegant way to factor these two functions, and since they are short, I opted against an inelegant factoring.
Comment #44
moshe weitzman CreditAttribution: moshe weitzman commentedAwesome work.
We can delete the 'spaces in commands' text in example.drushrc.php
This is an inevitable byproduct of this patch. Kind of a bummer that commandfiles in modules have to be slow. I wonder how we can document the fact that the 'bootstrap' element is ignored in module shipped commandfiles. Maybe add a code comment to the bootstrap element in 'make-me-a-sandwich' command.
can we use a constant instead of -1?
Isn't this advice unnecessary now that we re-fetch the commandfile list on every variation?
Comment #45
moshe weitzman CreditAttribution: moshe weitzman commentedOK, that advice is still valid for command callbacks. Its not totally valid for a pre hook for example but we can just leave the text as is.
Comment #46
moshe weitzman CreditAttribution: moshe weitzman commentedFeel free to consider my feedback above and then commit. Thanks.
Comment #47
greg.1.anderson CreditAttribution: greg.1.anderson commentedCommitted with constant for not-yet-bootstrapped case.
Comment #48
webchickMy goodness. I didn't think my wee feature request would result in a re-write of the Drush bootstrap. :) Thanks SO much for your work on this, Greg! Can't wait to try it out!
Comment #49
greg.1.anderson CreditAttribution: greg.1.anderson commented@webchick: You're welcome. The next thing I'm planning on doing is submitting drush pre-sync hooks to the projects containing tables sanitized in #636340: Provide generally available, regular exports of already public data from the drupal.org database, for example, like drush_project_issue_pre_sql_sync(). I'd appreciate your help in evangelising this convention to module writers who cache sensitive information; maybe you could write a short blog entry about sql-sync --sanitize? I'll write something up in the drupal handbook. It's already in drush.api.php; I should find a place on drush.ws for it too.
Comment #51
fen CreditAttribution: fen commentedI'm sorry that I'm coming late to the party (Owen just showed me this thread despite the fact that I may have helped inspire it): I have a couple questions that I'm not sure I see resolved in this:
1) Can there be a list of people/email addresses for which sanitize doesn't work? In the script I use, I have a list of emails to keep untouched (see attached shell script).
2) Can the separator between user and uid be defined or is it set as '+'? Some systems use '-' as the separator...
3) I changed my sanitize method to include a HASH of the original address, so that the final emails results in ${MAIL_PREFIX}SHA1(mail)@$MAIL_SUFFIX
This latter one makes it easy to also run the same function on a CiviCRM database and ensure that two email addresses that are the same in Drupal and CiviCRM before the sanitization are also the same after the sanitization which is essential for proper operation. With CiviCRM, I suppose I can get the $uid from the civicrm_uf_match table if the synchronize process has run, but this is not guaranteed.
I didn't re-open this as I have to admit that I have not yet read through all the comments/patches nor tried the code, so perhaps all my concerns have already been dealt with.
Comment #52
moshe weitzman CreditAttribution: moshe weitzman commentedWell, at least you admit your sloth :)
The sanitize routine is pluggable so you can easily achieve all the customizations you describe.
Comment #53
greg.1.anderson CreditAttribution: greg.1.anderson commentedHm, yes, it is pluggable to easily allow adding more sanitization options, but if you wanted to change the existing sanitization options, it's a bit more complicated.
If you have specific requests on how this should work, please open a new issue. I was considering a helper function for sanitize that would build email address modification SQL, for use by any sanitize op that needed to clean up an email address. If we put enough configuration options on that routine, it would probably meet your needs.
Comment #54
moshe weitzman CreditAttribution: moshe weitzman commentedI would rather make it easier to disable default sanitizer than make it more configurable. these requests are endless.
Comment #55
greg.1.anderson CreditAttribution: greg.1.anderson commentedTrue. Every sanitizer already has an 'id' parameter; I'll add a '--no-xxx' option to sanitize (or '--without-xxx' if you prefer) so that each one can be disabled individually. Patch coming on separate issue.
Comment #56
Chris CharltonThis is something I was dreaming up for Drush to help with PCI compliance.
My vote +1000
Comment #57
greg.1.anderson CreditAttribution: greg.1.anderson commentedFor the benefit of those who may still be following this issue, #898818: drush_bootstrap_max_to_local_sitealias breaks first-time sql-sync operations changed the way sanitize hooks work. You should now put your sanitize hooks in
HOOK_drush_sql_sync_sanitize
instead ofdrush_HOOK_pre_sql_sync
. The implementation is otherwise very similar. See drush.api.php.Comment #58
amonteroSanitize addition/followup:
#1808516: Remove cron_key when performing sanitized dumps