D5 to D7 migration fails if one source CCK field is a substring of another [#2078913]

Migrate_d2d knows about the CCK naming structure used in D5 CCK and attempts to figure out field subvalues from their names. But it gets confused if a node has two CCK fields, one that is a substring of another (in our case, field_triprequest_budget and field_triprequest_budgetnotes). In this case, d2d thinks that field_triprequest_budgetnotes is a subvalue and creates queries to read field_triprequest_budget_otes_value. The 'n' is lost because it assumes that there is a _ or : following the matched substring.

I'm attaching a patch that fixes this problem for me, but I don't believe it's a complete solution, because there could be other field names that would still cause the problem. (Ie, I don't think there's a perfect solution to the problem.) Another way around this issue, if possible, is to rename the field in the source DB so it doesn't contain the name of another field.

Comment	File	Size	Author
	d5.inc_.patch	556 bytes	darrylri

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

EclipseGc CreditAttribution: EclipseGc commented 22 February 2014 at 19:48

Issue summary:	View changes
Priority:	Normal	» Major
Status:	Needs work	» Reviewed & tested by the community

Confirmed this issue exists and that the patch solves it, at least in my case. Would love to see this fixed soonish.

Eclipse

Comment #2

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 25 April 2014 at 16:02

Version:	7.x-2.0	» 7.x-2.x-dev
Status:	Reviewed & tested by the community	» Needs work

Just like module hook namespacing issues, eh?

I think making the check simply strncmp($field_name . '_', $column_name, strlen($field_name) + 1) would work just as well for your case, but neither would work if that longer field name was field_triprequest_budget_notes. In that case, there's no way getFieldTypeColumns() with the information it has available to it can know for sure whether the column field_triprequest_budget_notes_value is the 'value' data for field_triprequest_budget_notes, or the 'notes_value' data for field_triprequest_budget. Actually, it only knows about one field at a time, so it's always going to accept anything that looks like it could be a subfield of that field.

So, the field detection needs to have more context, at least for fields in content_type_% tables. Basically, get all the columns from the table once, keep track of which ones it's already accounted for, and check against field names in descending order of field name length (meaning, it will favor the shorter possible subfield - 'value' over 'notes_value' in the scenario above).