Drupal 6 port [#299888]

Comment	File	Size	Author
#19	merge_dbs.txt	3.65 KB	nick.dap

Comment #1

ceardach commented 25 August 2008 at 23:00

Assigned:	Unassigned	» ceardach
Category:	support	» task
Priority:	Normal	» Critical

Yes, this is definitely important to do. I recently sat down and figured out an estimate, and it would take roughly 80 hours to upgrade it.

There are two other severe issues that also need to be addressed: database size scalability and team size scalability. I estimated those would each take 40 hours. It would be best to do them all lumped together since they all need to be addressed by the fundamental method merging is performed.

In the end... there's a month of work for ya. heh. I'm going to have to sit down and do it one of these days, it's just a matter of when.

Log in or register to post comments

Comment #2

ceardach commented 18 September 2008 at 19:13

We may have found someone who will fund this. I'll let you know when I actually start.

Log in or register to post comments

Comment #3

ryan_courtnage commented 18 September 2008 at 19:19

That's excellent!

Log in or register to post comments

Comment #4

phl3tch commented 9 October 2008 at 17:16

Any movement on this?

ceardach -- I sent you an email asking about funding needs. Never got a response. Is this project dead in the water?

Log in or register to post comments

Comment #5

ceardach commented 9 October 2008 at 18:16

The project is not dead in the water. It still remains critical for my own D6 upgrade path.

I will update this thread when I have begun the upgrade.

Log in or register to post comments

Comment #6

ceardach commented 10 January 2009 at 20:46

Sorry for the extreme delay in getting to this :( Various things have happened in the past few months.

However, I'm back on the job. I've made a port that will support dumping and restoring databases with D6's new structure which I'll upload shortly. I'm now doing research on how to get database merging working with table-based sequences.

Log in or register to post comments

Comment #7

halcyonandon commented 20 January 2009 at 19:12

subscribing

Log in or register to post comments

Comment #8

ceardach commented 20 January 2009 at 20:45

Woo hoo! I'm making progress!

I've now committed to CVS and made release nodes for 6.x-1.x-dev and 6.x-2.x-dev. They should be available by tomorrow or so from the package manager.

1.x version only provides a basic dump and restore, while 2.x changes the whole way dumping and restoring works, and provides a table-based merging.

I have ideas for merging data and have started working on it.

Log in or register to post comments

Comment #9

nick.dap commented 22 January 2009 at 01:53

I'm very interested in this module. Particularly the merging part.

I looked at the code for version 5 and version 6 and I have a question. Please correct me if I am understanding it wrong.

In version 5:

You took the structure of the database from dev.
Then diffed the data of all three.
Then appended the data to the dev structure.

In version 6:

You are taking some tables from dev, and some tables from live, without merging the data.

Is that right?

The question I have is why not do what you did in version 5 to the data in version 6? Assuming that we set up the database to use different auto_increment id ranges for different environments/team members, why does it matter that Drupal 6 uses auto_increment on all tables? In fact, doesn't it make the problem simpler?

Log in or register to post comments

Comment #10

ceardach commented 22 January 2009 at 05:32

The question I have is why not do what you did in version 5 to the data in version 6?

The auto_increment on the table instead of storing it in a sequences table is a huge problem.

With the sequences table, anyone could easily allot a sequence "range" to avoid overlapping. Auto_increment, on the other hand, does not have this freedom. You cannot ever set an auto_increment value lower than the highest value currently existing in the table.

Although I could, in effect, keep "leap frogging" the sequence ranges after each merge, the scripts could never move forward performance-wise. There would still be problems with multiple developers, and there would still be problems with large databases.

The theory I am currently trying to implement is to "rewrite" all the sequences of new content-data from the development branch to be increments above the current production branch. If I can do it for merging branches, then I can also do it for merging with other developers, allowing many developers to actively develop simultaneously without stepping on each other's toes (such as, two developers could create views simultaneously). Theoretically, then, this would allow merging entire databases without limiting you to only merging content data.

And finally, I'm hoping this method will make it easier for me to create an "update patch sql statement" that could be imported to the production database, increasing performance drastically as you wouldn't have to restore the entire database.

Anywho... I think it'll be a lot better. Here's hopin' I can get it to work! I'm about 50% of the way to seeing if the theory will work or not.

Log in or register to post comments

Comment #11

nick.dap commented 12 February 2009 at 22:56

You cannot ever set an auto_increment value lower than the highest value currently existing in the table.

Ah. I think this is why I'm confused a little. I was under the assumption that to use these scripts, at least in version 6, you would have to hack the Drupal core on each environment so they each user their own auto_increment_increment and auto_increment_offset values in db_connect($url) function located in includes/database.mysql.inc (or, depending on your environment, database.mysqli.inc). That sounds awful, but I still think its a cleaner solution. We just apply the correct "hack" as part of our deployment process (moving from dev to stage, stage to live, etc.).

If you rewrite all the sequences (ids), then you break any references to the old ids from other tables, or do you want to rewrite those as well? That sounds like a monumental task. I could be misunderstanding you, since I don't know what you mean by "leap frogging" in this context.

Here is the solution as I saw it:

To make it simpler, lets say we have only dev and live branches and we are working with a clean install in both environments.

* dev hack
* DO NOT USE 0 for offset, the range is 1-65535, this caused me a lot of headache
auto_increment_offset=1
auto_increment_increment=10

* live hack
auto_increment_increment=2
auto_increment_increment=10

* dev ids being created
1
11
21
31
...

* live ids being created
2
12
22
32
...

Then, apply your merge procedure from Drupal 5 scripts

Merged ids
1
2
11
12
21
22
31
32
...

I don't see a problem with that. An additional benefit is that depending on the value of auto_increment_increment that you use, you can create up to 65535 of these environments and merge them with, at least hypothetically, no problems. (http://dev.mysql.com/doc/refman/5.0/en/replication-options-master.html#s...)

Thank you for writing these. I spent a whole lot of time trying to research a more elegant solution to this problem, with limited success.

-Nick

Log in or register to post comments

Comment #12

nick.dap commented 22 January 2009 at 18:08

More details to the hack.

In db_connect($url) located in includes/database.mysql.inc

Add:

mysql_query('SET @@auto_increment_increment=10', $connection);
mysql_query('SET @@auto_increment_offset=1', $connection);

Right after:

mysql_query('SET NAMES "utf8"', $connection);

Note: If you are using database.mysqli.inc note that the arguments in mysqli_query($connection, 'SET @@auto_increment_increment=10'); are reversed. Welcome to PHP...

Log in or register to post comments

Comment #13

ceardach commented 23 January 2009 at 06:22

Pacemkr,

I did research into those methods, but I had gained the impression that it required MySQL 5.1? Which is not a recommended version yet.

Even without the v5.1 requirement, I do not like hacking core (providing a patch for CCK is about my limits), and I do not like effecting all databases for a MySQL instance. I would like to, as best as I can, to be open to even shared hosting environments (as much as can be expected). The information I had found from my research found that I either had to hack core, or support MySQL 5.1 && effect all databases in a MySQL instance.

Even if I could set auto_increment_increment and auto_increment_offset, it does not guarantee that I can set an increment lower than the highest key currently in the table. The purpose of those settings, as I had read, was to prevent overlap with concurrent databases.

The "leap frogging" theory was to always allot a set of production sequences to be a certain increment above the last-merge sequence values, and preserve the in-between for development. I can always fall back on that theory if my current direction doesn't work, though.

My current plan is to rewrite all foreign keys to a given table's primary key, after updating the primary key to be above the production's current auto_increment value. It sounds... horrible. And it is.... horrible (honestly, the entire merging concept is horrible). But I got something that is making progress to working at this moment in my dev environment.

I have taken the Acquia Drupal distribution (including other modules I have added in a client's site I'm working on), and have configured a mapping of all references through those modules -- 111 tables in all. I also created a function that will return all possible references to a given table. I check the possible references against the configure references (which also defines possible false positives, with wildcards), and if it all matches up, then it'll go through and update all the tables with those values.

Of course... it only updates *column* values. If there's a "node/###" reference in an href in some body, it won't update it (erm, speaking of, I need to ensure that url_alias gets updated). I also need to do some testing with views to ensure any references to a term, user, profile, node, etc, will get updated correctly.

This is what I'm hoping will work. If it doesn't, I can fall back on requiring MySQL 5.1 and no shared hosting... or fall back on "leap frogging". I'd like this method to work, though, so I'm going to give it a shot.

Log in or register to post comments

Comment #14

nick.dap commented 23 January 2009 at 22:36

I was afraid that you were going to describe something like this. =)

One correction. It seems that the requirement would be MySQL 5.0.2 since this is when both auto_increment_increment and auto_increment_offset were introduced. I can still see it being an issue for the modules. It is not an issue for my project however.

Hmmm... The only thing I can say is that considering how many issues there are that have to be addressed with the approach you are describing it seems that it will always be playing catchup with the bugs, since its more complex and feels like a very hacky way of doing this. I mean this in the most friend way possible, by the way.

I will get my team to do it the core hack way, since it seems like it would require less work and be, in a way, equally as hacky, just in a different place. I don't like the idea of changing MySQL configuration just because there are multiple databases on the server and not all of them require this functionality. I know there is no innocent way to hack core, but this seems innocent enough to get away with to get the flexibility we need in our project.

I'll make sure to send you the script when we have it. In all likeliness its going to be a bash script, because that works better with our deployment processes, but perhaps you will still find it useful. In any case it is going to be based on your scripts for Drupal 5. Again, I truly appreciate the work you've done. We have very limited resources and it means a lot to us to be able to benefit from your work. Thank you.

P.S. I wonder if Drupal 7 is still going to have these "deployment problems"...

-Nick

Log in or register to post comments

Comment #15

ceardach commented 23 January 2009 at 22:54

Yes, this is hacky... however, this entire concept is hacky. I would much prefer to not "spread" the hacking to other areas, like MySQL and core, though. I would like to keep this as a very external tool.

There are a lot of edge cases to consider, and it has always been that way. I feel confident I'll figure out a solution with reasonable levels of exceptions.

The original iteration of these scripts were in bash (you can still see many of the remnants left). I brought in using PHP for ease of performing more complicated behaviour and looping.

Drupal 7 will likely still have these problems. They have lots of large projects they're already trying to undertake, and at this stage there is no time to take on a new project with a fall release time frame. My goal, though, is to try to be more vocal about implementing a better solution for Drupal 8.

Log in or register to post comments

Comment #16

nick.dap commented 23 January 2009 at 23:10

I agree that is usually better to keep these things decoupled. Unfortunately, the time line and the resources available for my project are forcing me to do it the simpler way even though it is against the best practices.

As for future versions of Drupal, it does seem like we are a minority in this field. It seems that most people do their changes on dev and then replicate them by hand to live (I shiver at the idea) or worse yet work directly on live at least when it comes to making CCK and views changes.

-Nick

Log in or register to post comments

Comment #17

ceardach commented 24 January 2009 at 02:22

As for future versions of Drupal, it does seem like we are a minority in this field. It seems that most people do their changes on dev and then replicate them by hand to live (I shiver at the idea) or worse yet work directly on live at least when it comes to making CCK and views changes.

Yes! I wholeheartedly disagree with that workflow. I hope something better comes along... I'm going to try to become more active and involved so we can have something better for Drupal 8.

Log in or register to post comments

Comment #18

ceardach commented 22 February 2009 at 22:57

Status:

Active

» Fixed

it.... works :) It's up in CVS now and should be packaged into the dev tarball soon.

I managed to be able to raise the AUTO_INCREMENT values on the fly as needed. I'm using "awk" which can read a file as columns, and then I can make modifications to those columns.

To do this, I'm checking for all possible references to a table (such as all tables that use a NID value), and then separately configuring actual references. Whenever getting references, I check the configured references to the possible references so I can warn a user if there is a new table that has references. I've set a base configuration based on somewhere around 100+ tables, so I should save a lot of work for you when configuring your website. The only reason finding possible references is possible is because Drupal frequently uses the same column name, or contains the original column name in the foreign key column name (such as tid1, tid2, field_blah_nid, etc).

I now consider "users" as content data instead of user data. This means you can add new users to the development version and have them remain in the production version. All other changes should be merged. All existing users will have their timestamps reset to the last-merge version in the development version to avoid conflicts when merging and developing.

Please test the 2.x branch, and post any bugs as a separate issue. I will not be releasing it until I can personally test it on a 6.x site (which may be awhile).

Now I need to update documentation...

Log in or register to post comments

Comment #19

nick.dap commented 23 February 2009 at 04:08

Status	File	Size
new	merge_dbs.txt	3.65 KB

To be honest I am still scared of your approach. =P

Attached is the bash function that I've written for the merge process. Naturally its nowhere near stable as we only started testing it out "in the wild", but we've been getting pretty good results. We have used it for a total of five promotes from dev to stage with very minor conflicts all of which were caught and took minutes to fix as part of the deployment process.

As I've mentioned in my previous messages, we are using this in conjunction with the auto_increment hack. I still think that a two line addition to core as part of our deployment process is a small price to pay. We are already changing our settings.php files to point to the correct databases as part of the deployment process, among other things. I think that changing just another file is not such a cardinal sin as we are trained to believe. I do think that ours is a more robust approach with a lesser chance of surprising results (I hate surprises when it comes to data). You are relying on semantics (unenforced naming conventions) to do the merge. For me that is big red flashing lights. To each his own I guess. =)

Once again, thank you for your original code, it really put us on the right track. I hope that you find our expansion on it useful.

Log in or register to post comments

Comment #20

ceardach commented 23 February 2009 at 14:52

It's not *entirely* semantics based. The semantics are for one level of error checking. The actual reference configuration is hard coded. You will have to watch your changes to the database to see what changes are made where.

I took a peak at using schema API but found that it didn't see references to CCK tables. I'll take a closer look some other day.

I took a look at your script. It is based on my Drupal 5 version, and I'd recommend taking a look at the Drupal 6 version as I made a ton of improvements (the part about increments is very small). You may find that diff3 fails on large changes unnecessarily. The users table is frequently going to conflict because of the timestamps. And finally there is a huge performance hit with dumping all the tables of the database -- the filtered tables actually contain a lot of data (I've heard the sessions table alone taking an hour to process).

I could make it a configurable option to ignore my raising increments method so the scripts can be used with alternate methods.

Log in or register to post comments

Comment #21

nick.dap commented 26 February 2009 at 05:35

Oh great to hear. Will definitely take a look since I have noticed that diff3 has been failing in places where it seems it shouldn't.

Also, I forgot to mention that the dumps have to be performed differently.

mysqldump --extended-insert=FALSE --user=pass --password=pass db > db.sql

Otherwise you'll get the whole table dumped into one insert on one really long line. I'm pretty sure that would apply to your method as well. Just in case someone else is reading this.

Log in or register to post comments

Comment #22

ceardach commented 26 February 2009 at 05:43

The dump options I'm using are:

--skip-opt --add-drop-table --add-locks --create-options --quick --lock-tables --set-charset --disable-keys --order-by-primary --skip-comments --comments=FALSE --default-character-set=utf8 --character-sets-dir=$charsets --hex-blob

I did a ton of trial and error to get to that, and don't quite remember all the reasons, but it dumps in the way I want, and preserves special characters.

Log in or register to post comments

Comment #23

ceardach commented 28 February 2009 at 21:20

@pacemkr: It's now a configurable ability to disable raising increments, allowing you to use any method you'd like.

see #381050: Create a configurable option to ignore raising increments when merging

Log in or register to post comments

Comment #24

nick.dap commented 2 March 2009 at 22:01

Thats freaking sweet. I will integrate as soon as I get a chance! I'm a little deep right now, we have an important release schedule for the end of the week.

I just had a revelation though; been hacking away at modules. What if we...

  function really_nasty_module_db_rewrite_sql( $query, $primary_table, $primary_field, $args )
  {
    // Set $nasty_increment and $nasty_offset in your settings.php
    global $active_db, $nasty_increment, $nasty_offset;
  
    mysql_query( 'SET @@auto_increment_increment=' . $nasty_increment, $active_db );
    mysql_query( 'SET @@auto_increment_offset=' . $nasty_offset, $active_db );
  }

I'm sure there is a more appropriate hook as well. This isn't "technically" hacking the core, just exploiting the procedural nature of Drupal. =)
We can package that manure of a code in a custom module, perhaps with a more appropriate hook... and viola you can distribute this as a module versus as something that has "please hack core" instructions on it.

Log in or register to post comments

Comment #25

ceardach commented 2 March 2009 at 22:12

Status:

Fixed

» Closed (fixed)

I'll look closer into your suggestion when my brain is not overwhelmed with Drupalcon issues :)

Lets branch the auto_increment topic to #389028: Explore better options of handling auto_increment IDs

Marking this issue closed.

Log in or register to post comments

Drupal 6 port

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

News items

Our community

Documentation

Drupal code base

Governance of community