Problem/Motivation

We're massively changing how issues (and comments) work in D7. We're going to need a data migration path. This is going to use migrate not hook_update_N(). Moshe has already agreed to tackle this (YAY!! Thanks!!!), and has some preliminary work done at the Project* D7 porting sprint after BADCamp 2011. This issue is for tracking progress and recording potential gotchas.

Gotchas

- Any D6 comment that updates the issue in some way needs to generate a new issue revision, and then populate the nodechanges field for the comment to record the diff.

- Any files attached to a D6 comment via comment_upload need to be attached directly to the issue node and then the nodechanges field on that comment needs to reflect the new file.

Resolutions

  1. The "first" migration will update the original post for each issue based on the D6 serialized array of values. On the destination side, it will update the existing node revision and slam in field values for component, status, etc. This will be slightly dirty on the destination side.
  2. The second migration will SELECT all issue comments and comment_upload fids and create new node revisions for each. node changes will in turn update the comment for each, populating the Nodechanges Field. We need #1700530: Ability to update an existing comment in order to preserve comment IDs.
  3. A third migration is needed to "fix up" comment threading. BDragon has the details

Next steps

This issue is currently postponed pending the Drupal.org D7 Upgrade deployment. After that, general migration for other peoples' sites still needs to be looked into and made available.

CommentFileSizeAuthor
#12 503e5cd7e7a19.migrateTest.txt421.06 KBbdragon

Comments

rgristroph’s picture

In D6 project_issue stored custom help text in a variable for the project_issue node type; in D7 we will use the system help text for each the project_issue node types we create.

In http://drupal.org/node/1571396 on comment #18 there is a patch with some code in a update hook, that grabs the old help text and puts it in the system help text field. It would have to be re-examined for use in a migration process (the drupal_set_message and watchdog would probably not be used, it might just overwrite any existing help text instead of appending, for example) but maybe that can help someone as a starting point.

--Rob

senpai’s picture

Title: Provide a data migration path for D6 project_issue and comments to D7 » Provide a D6 to D7 data migration path for project_issue and comments
Assigned: Unassigned » moshe weitzman
Issue tags: +sprint 7

Assigning to Moshe, and tagging for Sprint 7.

moshe weitzman’s picture

I looked into this. Some notes:

We need two migrations. The first migrates the original node (i.e. revision 0). The second migration migrates each subsequent revision. The nodechanges module looks like it will automatically create the corresponding comment, including populating the nodechanges field.

TODO
------
1. Handle comment_upload files.
2. We will want to preserve file ids, comment ids, etc. Core doesn't allow that by default - we will need to slightly hack core I think. I think it is OK to let filenames get renamed differently on occasion. This will break a few links but nothing serious.

bdragon’s picture

Oof, I see what you mean. Interleaving D6 manually created node revisions with the comment-created revisions will throw off the numbering, unless all of the manually-created revisions are tacked onto the end instead of being interleaved.. (not to mention skips in the numbering due to deleted spam, etc.) -- but interleaving makes more sense data wise...

By the way, the comment numbers are going to come from $comment->thread in D7. #1632492: Figure out and port project_issue comment numbering functionality to D7

I already went through and manually fixed any glitches in the drupal.org database that would prevent rethreading the comments to match the current "#x" comment_number, but I didn't consider that existing node revisions would have to be shoehorned in as well...

I guess we will have to twiddle the text to fix up as many of the ad-hoc links to comment numbers ("tested patch from #4") as we can, and go with renumbering everything from scratch...

We probably want to preserve comment ids for the existing comments, yeah.

How about this for procedure:
* Bump max cid.
* Import original node revision.
* Interleave importing revisions with importing comments that have differences from the previous state of the issue fields into revisions, and comments NOT differing (me toos, etc) into comments, all ordered by timestamp.
* Move comments that existed previously into their original cid.
* Put together a map of old comment_number to new thread value.
* Rewrite comment bodies to fixup ad-hoc references to comment numbers.

moshe weitzman’s picture

Issue summary: View changes

Updated issue summary.

drumm’s picture

project_release too, or should that migrate in-place?

dww’s picture

Assigned: moshe weitzman » bdragon

Let's worry about releases elsewhere. Issues and issue comments are complicated enough for this issue. Instead, let's talk here:

#1716028: Data migration for project nodes to D7
#1716030: Data migration for release nodes to D7

Also, seems like moshe hasn't had time to work on this and bdragon is the one driving this home, so giving this a more accurate assignment.

bdragon’s picture

I'm still working on it, but I have cleaned up the current version and committed it.

http://drupalcode.org/project/project_issue.git/commit/b3ab429a68fccb64e...
http://drupalcode.org/project/project_issue.git/commit/c61e1d0b369732b62...
http://drupalcode.org/project/project_issue.git/commit/bdb8f8e025f270d23...
http://drupalcode.org/project/project_issue.git/commit/538efb673c57c4015...

Current run order is:
1) Run updb.
2) ProjectIssueFixInitFiles
3) ProjectIssueRethreadIssueFollowups
4) ProjectIssueTimelinePhaseOne
5) ProjectIssueTimelinePhaseTwo
6) ProjectIssueTimelinePhaseThree
7) ProjectIssuePhaseTwo

ProjectIssuePhaseTwo runs for a *long* time with the full dump. It has to do something like 2 million node_save()s...

On my hardware, I have not successfully done a continuous start to finish run yet on the same code, but I estimate that a full run on said hardware (a rather elderly dual xeon with software mirrored WD Raptors) to take about 3.5 days.

moshe weitzman’s picture

Ah yes. Optimization is an important step at the end of development. My general way to proceed here is to XHProf PHP extension and configure devel to use it. Then you will get links to XHProf runs at the end of each drush request. Review the run report for long migrations and start beating down the time sucking functions. It can make sense to disable some non-essential modules during the migration. SHow me an XHProf run and I can help optimize.

bdragon’s picture

Actually it will be a matter of manually starting and stopping xhprof, and taking a sample of maybe 100 times through the loop, because the entire run on the reduced data set takes a couple hours....

I'll get a couple captures at some point.

senpai’s picture

Assigned: bdragon » moshe weitzman
Status: Active » Needs work
Issue tags: +sprint 8

It might be extremely valuable at this point to pass the upcoming optimization step(s) to Moshe, now that the entire thing is finally running all the way through as an alpha. In this manner, Bdragon can continue to lead the rest of the Project porting effort and let Moshe bring the optimization stuff home.

moshe weitzman’s picture

One can run migrations with --limit so you don't profile for too long ... Sure, I'm happy to do the profiling and fixing myself if I can get an environment to work in that has a DB ready to upgrade and has XHProf extension.

bdragon’s picture

StatusFileSize
new421.06 KB

Here's a short sample of the inner loop as it is now.

bdragon’s picture

Things that stick out at a quick glance:
* Views content cache
* Tracker (although I thought I was locking it out already...)

moshe weitzman’s picture

nodechanges should provide $comment->original so that we avoid a entity_load_unchanged() in cumment_save(). Bdragon aslo suggested to disable DB transactions.

I'm also wondering if entitycache module could speed things up.

bdragon’s picture

Assigned: moshe weitzman » bdragon
Status: Needs work » Postponed
Issue tags: -project, -drupal.org D7, -sprint 7, -sprint 8

We are so far past this now.... We are on the second iteration of the completed migration path and we have something working. For drupal.org at least.

General migration for other peoples' sites still needs to be looked into.

Therefore, this is currently postponed pending drupal.org d7 deployment.

bdragon’s picture

Issue summary: View changes

cleanup

webchick’s picture

I will say that as long as the migration code for d.o is in Git somewhere for users of Project module to borrow and continue to improve, I am 30000% okay with de-scoping this issue from the d.o D7 migration team's responsibilities. Project module has 1,242 usage, out of 764,140 total sites (less than 1% for all you math-lovers). Project Issue has even less, with 468 sites.

dww’s picture

Priority: Major » Critical
Status: Postponed » Needs work
Issue tags: +project, +drupal.org D7

This is obviously critical for launch.

Based on a click-through on git7site, there are a few major problems still:

A) The version field on issues is not getting populated during migration.

B) #1818662: Issue node type on git7site has 2 file upload fields

Also, it's not clear why we're trying to node edit during migration for issue comments that don't change the node at all. See #1813438-2: Bug in if() prevents saving comment with only nodechanges_body change.

dww’s picture

p.s. @webchick: Apparently this issue *is* the place to track d.o's data migration for issues and comments. I was sent here by Senpai to document the stuff in #17. I agree that a generic migration solution is out of scope for the d.o D7 upgrade, but for now, this issue is really about d.o...

bdragon’s picture

The #1813438 patch was for the uncommitted version of the "Make a formattable comment body textarea on node edit" that I was working on that would have lived in drupalorg_project. Although it's still a bug, dww implemented the textarea independently directly in nodechanges, thereby making that patch irrelevant for *our* purposes.

B) is an easy fix, I'll toss that in today.

A) is probably something stupid on my part. I'll get it fixed asap.

bdragon’s picture

Status: Needs work » Active

I think I have the version field migration fixed up.

http://drupalcode.org/project/project_issue.git/commit/036ae9abb7397fcec...

I was unable to test it all together but I tested it in parts and it appears to work ok.

senpai’s picture

Priority: Critical » Normal
Issue tags: -project, -drupal.org D7

This issue is no longer affecting the Drupal.org D7 Upgrade initiative.

bdragon’s picture

Tagging having issues?

senpai’s picture

Not anymore, but until you posted that comment #22, this issue was still showing the very tags I had removed in comment #21. Weird.

dww’s picture

Assigned: bdragon » Unassigned
Issue tags: +7.x-2.0 blocker

I assume bdragon isn't actively working on this right now, so unassigning until work resumes here (in case someone *not* working on d.o wants to take this on in the meanwhile).

dww’s picture

Issue summary: View changes

This issue is currently postponed pending the Drupal.org D7 Upgrade deployment. After that, general migration for other peoples' sites still needs to be looked into and made available.