3. Database Merging

Last modified: June 1, 2009 - 03:05

Modifications to the database will happen during development, and within the production branch. The merge script is used for merging the databases in the development and production branch, while your version control system can be used for merging changes from different developers.

Handling sequences

Drupal 5 still uses the sequences table - this script will need to be modified to support table-based auto increment sequences in Drupal 6.

Each database version would have its own sequence ID range. ID 1 through 10,000 can be reserved for development, while 10,001 and up will be left for production. When creating the first separation of development and production, you would edit the sequences table and increase all sequences to something really high.

If you find your development is about to enter into the production range, then you can raise your development sequences to above the current production range, and raise the production sequences much higher. Try to raise them high enough so you will not encounter this issue.

During Drupalcon Boston 2008, there were lots of mentions of using the "even / odd" method where you hack either Drupal core or MySQL to issue only even numbers to production and odd numbers to development. Hacking anything can cause problems and general headaches, though. Hacking Drupal's code would mean you would have a code difference between development and production which could cause an extra effort to ensure you maintain those differences. Hacking MySQL will add an extra headache when setting up a new working space environment.

"Even / odd", though, would limit you to only two ranges. This is OK in a small development environment, but will not scale well when adding more developers and having them multitask on the same area (like creating nodes and views). Using the sequence reserve method would allow you to distribute ID range blocks to other developers and ensure that multiple developers will not cause ID conflicts.

One suggestion that I thought could be very useful was mentioned during the MySQL keynote at Drupalcon Boston 2008. That method was to hack MySQL to issue sequences not by two (like "even / odd") but by 10. So production would use 1, 11, 21, 31, etc, while development would use 2, 12, 22, 32, etc. This would scale much better, allowing you to add more developers rather easily. However, this potentially would require a unique MySQL instance for each developer.

Merging changes from another developer

The best practice is to update your working copy as often as possible so you can continually grab your coworker's work. Every time you commit, you should also be updating. Whenever the database is updated, you must restore it to your working copy, or else when you do a dump you will be reverting their changes to the database.

The next best practice is to not work on the same sequences area at the same time. For example, one developer does work that involves creating nodes, and another will do work that involves creating views. Two people working on creating views simultaneously will end up with sequence ID conflicts.

Currently the script only supports two sequence ranges: production and development. For those who need more multitasking, the scripts could be modified to more easily support having a sequence ID range for each developer. Another option would be to have multitasking developers work off of the same database. But, this would eliminate small task-based commits and make finding bugs more difficult (this may not be an issue for some groups).

Commit frequently whenever working with database-based configuration settings. Keeping each step in your process small will mean it would not be such a big deal if there is a conflict because it would be quickly noticed and quick to repeat your settings changes.

In practice, when you make a configuration change and go to commit it, you could discover that another developer had already modified the dump file. Don't panic! Do an update and their changes will be merged with yours through your version control system.

And if the database has conflicts after the update? Don't panic! Take a look at what conflicted and resolve it manually. If you followed the above best practices, there is a good chance the conflicts will be minor. The two most frequent conflicts are the user table (it keeps track of every time you hit the site) and table-based sequences on user data tables. These are very innocuous and quick to resolve and does not matter which version you choose.

If it is too complicated to resolve manually, then you can revert and restore to your coworker's version. If you were keeping along with the practice of committing frequently, then you should only have a small amount of things to repeat. Alternatively, if you were working a long time configuring a content type or view, you could export the settings, restore to your coworker's version and then import it. Then talk to your development team about how to avoid these issues. Communication is important to avoid conflicts.

Merging development and production databases

The general concept is that the production site branches off in its own direction as more content and user data continues to be added, modified and deleted. Development branches off in the opposite direction adding features, bug fixes and also adding, modifying and deleting content data. What is needed then is to merge those two branches together to be unified again.

The merge script requires three database files: development, production and last-merge. Development and production are obvious to their namesake, while last-merge records the last point the two branches were merged at. Since both development and production have the power to create, delete and modify content, the last-merge database serves as a starting point to compare with so the difference between an addition and a subtraction can be accurately tracked.

Whats the catch?

There are limitations to what can be done safely on each version. During a merge, only content and user data is preserved from the production database. Content data is considered nodes, menus, taxonomy and anything else that is directly related to a piece of content. User data is considered anything that tracks users or their actions. That would include the user table, watchdog, comments, etc. In development, though, everything is preserved except for user data. That means content data is actually merged from development and production.

After a production site has been deployed, you would not be able to make any site configuration changes without losing those settings. If you changed your password during development, your password would be changed back to the production version after a merge. You would then have to change your password on both development and production in order to preserve the change.

If a merge gets committed to version control, it must be deployed to production. Otherwise production will become out of sync with HEAD and merging again will not be possible. Test merges can be done, and encouraged - but they should not be saved and pushed out to development.

Updating database schema changes

In order to even do a merge, the schema of all three database versions must be the same. Running update.php will handle the majority of schema changes, but CCK must be handled differently. To record changes made with CCK during development, you would export each content type as it is modified using the content_copy module and save it in a file.

The content_copy module currently only allows importing newly created fields. A small patch is provided along with the Database Scripts project to allow modifying existing fields, deleting fields that are not in the imported file and adding existing fields already available to other content types.

The patch will perform any schema changes needed, but does not change everything for a full and clean import of a modified content type. However, for the purposes of a merge, updating the schema is all we need. The rest of the configuration changes will be brought over during the merge.

Before a merge, last-merge and production databases will be brought to the same structure as the development database by loading both last-merge.sql and production.sql databases into MySQL, accessing the site through a web browser, running update.php, importing the content types and dumping the newly modified version of the database.

Performing a merge

After the schema has been updated on last-merge and production, it is ready to merge! In most cases, running the merge script is all that is needed. It will do its thing, and once complete the new database dump files will be ready to commit to version control. The production version would be loaded in MySQL and ready to be tested.

There of course is the possibility of a conflict. There are two instances of conflicts: a modification of the same content on both versions, and scaling issues.

The best practice would be to not modify the same piece of content on both development and production. However, this may be unavoidable. To reduce the conflict burden, enable revisions on all content types and install the diff module. Enabling revisions would mean conflicts would only be reported on the node table, making conflict resolution quicker - you would essentially only have to pick either the production or development version to be visible by default. Since there are revisions, the production and development versions are both still preserved. You can then use the diff module to see the difference between the two and manually resolve it.

The merge script has some scaling issues. I have been working on the script to reduce this issue, and it may have been solved. Although, it still happens if, for example, you change all entries in a table and production adds an entry to that same table. Resolving it is rather easy, involving only removing the conflict markers.

After a merge has been performed, then it can be committed to version control and deployed to production.

mergedb.php script snippet

The script mergedb.php renumbers the database sequences conveniently after the merge operation, so there is no problem with duplicated IDs between the development and the production databases.

http://drupal.org/node/206066

 
 

Drupal is a registered trademark of Dries Buytaert.