Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Created a patch to add parent term and weight mappers for taxonomy term imports, so that the basic hierarchy and weights can be preserved. Probably needs more testing.
Comments
Comment #1
bblake CreditAttribution: bblake commentedPatch included
Comment #2
bblake CreditAttribution: bblake commentedRe-rolling with -p0 compatability for unpatched drush make.
Comment #3
rickmanelius CreditAttribution: rickmanelius commentedRelated post asking for instructions and how to do this
#1159628: How do you import a hierarchical taxonomy with feeds?
I'm going to test the above and get back to you. Thanks for the patch!
Comment #4
rickmanelius CreditAttribution: rickmanelius commentedHi bblake,
I'm testing using 7.x-2.x-dev with the required patches here #1169986: Fatal error: Class 'FeedsPlugin' not found.
Here is my very basic CSV data
Term,Parent
Product,
Ice Cream,Product
Vanilla,Ice Cream
And my Feed Import goes as follows in source:target:unique
Term:Term name:unique
Parent:Parent Term Name:not unique
The terms will import, but the parent/child relationships still appear flat in the GUI and in the backend database tables. Your patch code looks pretty straightforward, so it's unclear why it's not getting set properly. Am I doing any of the steps above incorrectly? (do I need a GUI? Is taxonomy_get_term_by_name expecting something else?
Comment #5
rickmanelius CreditAttribution: rickmanelius commentedFound the error, $value is a string, not an array, so these lines
Should be
When I run the above, it works.
The weight portion of the patch uses $value instead of $value[0], so that should be correct. Testing this next.
Comment #6
rickmanelius CreditAttribution: rickmanelius commentedFinal confirmation:
Attaching the next version of this patch with the changes made in #5. I can confirm both work. Here is my test CSV.
And my output
As intended, the terms are ranked by the parent/child relationship, and the weight was properly applied between strawberry, vanilla, and chocolate.
Thanks for all your hard work bblake. I really needed this!
Comment #7
rickmanelius CreditAttribution: rickmanelius commentedThis appears to be a duplicate of this issue, also reviewed and tested
#1152940: Feeds term import with hierarchy and weight
Comment #8
g089h515r806 CreditAttribution: g089h515r806 commentedi change the code and add my latest code to feeds_terms-1152940-3.patch,
if parent value is not empty, and it does not exist in database, i create a new term for it, and set $target_node->parent = new created term id instead set it to 0. This method was copy from migrate module. here is my code,it works on our site.:
Comment #9
g089h515r806 CreditAttribution: g089h515r806 commentedThis is correct patch code, previous is wrong.
Comment #10
darrylriI found that parents weren't good enough for my needs; I have a legacy system with a multilevel categorization system where many of the terms in different branches have the same names. That would cause collisions with just the parent mapper.
So I implemented the following patch to support parentGUID as a mapping item. It's a patch on top of the patch given by RickManelius in #6 above.
Comment #11
darrylriOops, somehow I uploaded a zero length file.
Comment #12
iMiksuHere you have #6 and #11 patches combined together against 7.x-2.0-alpha4.
Thanks for you work! Looking forward to get this committed o/
Comment #13
g089h515r806 CreditAttribution: g089h515r806 commentedfor #6, if the order is :
Term,Parent,Weight
,Product,10
Ice Cream,Product,10
Vanilla,Ice Cream,10
Chocolate,Ice Cream,15
Strawberry,Ice Cream,12
#6 patches works very well
if we change the order,for example:
Term,Parent,Weight
Vanilla,Ice Cream,10
Chocolate,Ice Cream,15
Strawberry,Ice Cream,12
,Product,10
Ice Cream,Product,10
#6 patches does not work correctly. We could not always ensure parent term appear before subterms, when we use views output term into csv, parent term may appear after subterms.
my code in #8 solve this problem,
Comment #14
sdrycroft CreditAttribution: sdrycroft commentedThe current patch forces, for no apparent reason, the use of integers for the "Parent GUID". This does not make much sense, so I would recommend removal of line 141 from FeedsTermProcessor.inc.
Comment #15
13rac1 CreditAttribution: 13rac1 commentedMinor update to #14 to remove whitespace. Works correctly. Thanks!
Comment #16
sdrycroft CreditAttribution: sdrycroft commentedThe attached patch reverses the changes made by g089h515r806† and instead puts any term with a parent that isn't known at the root of the taxonomy. I have also added a small change to the process function that forces a term to be updated, even if it looks like the data has not changed since a previous import. This means that badly ordered files can be imported twice, the first time to import all the names and data, and the second to correct any of the mistakes in the hierarchy due to the order of the file.
The following example, ordered alphabetically, best illustrates my reasons for the change:
With patch #15, this would result in:
Note the extra "United Kingdom" term.
With the attached patch, after the first import you would have:
After a second import, this would be corrected to:
This to me makes far more sense than creating additional unwanted terms in a hierarchy, which is exactly what happened to me when trying to import the attached CoreidaeTestData.xls Excel file.
†Can you please satisfy my curiosity and explain that user name, is it failed hanzi?
Comment #17
sdrycroft CreditAttribution: sdrycroft commentedJust noticed I missed out
from my previous patch.
Comment #18
johnbarclay CreditAttribution: johnbarclay commentedPatch #17 worked fine for me against 7 alpha4. It should apply cleanly to dev also. It needs to be tested with an existing vocabulary also.
Comment #19
13rac1 CreditAttribution: 13rac1 commentedApplies to current dev. Attached, again ;), is a patch w/o white space warnings.
Comment #20
benjaminkyta CreditAttribution: benjaminkyta commentedHey. Thanks.
Works well with Drupal 7.7 too. With these extensions: txt csv tsv xml opml
Comment #21
johnbarclay CreditAttribution: johnbarclay commented==== the comment below is incorrect. disregard it ======
#19 works for drupal 7.8, feeds 7.x-2.x-dev. I tested it quite thoroughly because I applied it against alpha4 and it had an issue on the second import. I think this is ready for commit after it has a simpletest. The simpletests should probably have a 3 levels of hierarchy and some terms whose parents map to their children to make sure recursion is dealt with correctly. Other simpletest cases seem pretty well outlined in this thread. If someone wants to come up with the .csv files and some natural language test cases, I can write the simpletest code.
Comment #22
johnbarclay CreditAttribution: johnbarclay commented==== the comment below is incorrect. disregard it ======
there is one caveat I ran across. When
The terms with no parents are not included. I believe the fix for this is simply to validate for only parent term name OR parent guid in the mapping, never both. The field is only for finding the parent, so having both is redundant.
Comment #23
johnbarclay CreditAttribution: johnbarclay commentedI ran across a number of issues with #19 and created the attached patch against sept 30 dev snapshot (http://drupal.org/node/927032).
The patch solves the following:
$parent_tid = $query->condition('guid', $value)->execute()->fetchField();
to
$parent_tid = $query->condition('guid', $value)->condition('id', $this->id)->execute()->fetchField();
takes care of this.
Comment #24
maxilein CreditAttribution: maxilein commentedIn the patch from #23 move the function declaration
function sortbylevels($item1, $item2) { return $item1['level'] - $item2['level'];}
out of the body of the function orderHierarchicalTaxonomies() to make patch work with larger imports.
Thanks for the very useful patch!
Comment #25
Summit CreditAttribution: Summit commentedChanged title. Will this patch also give a GUI Mapper change as that parentterm can be selected of a term? May be one of has a great screenshot to show the working?
greetings, Martijn
Comment #26
maxilein CreditAttribution: maxilein commentedYou can only import 1 level (with a parent) at a time. So if you have three levels you need to configure 3 jobs which must be run sequentially.
Hope that hepls.
Comment #27
Summit CreditAttribution: Summit commentedHi,
Thanks for the quick reply:
1) +1 for supporting parent-parent-term : usecase: country-region-place.
2) Would it also be possible to have this functionality in node-parser, as such that I have 2 fields:
- Field A is parent of term B
- Field B is term B.
Thanks a lot in advance for considering this!
Greetings,
Martijn
Comment #28
maxilein CreditAttribution: maxilein commentedImport of hierarchy:
Lets consider a hierarchy by the order of taxonomy fields:
New mapping field type: "child-to-previous".
If you have
country -> regions -> city -> district
you would add 4 mappers in this order, each of the type "child-to-previous":
country "child-to-previous" or "top-level"
regions "child-to-previous"
city "child-to-previous"
district "child-to-previous"
The function could automatically analyse that this hierarchy consists of 4 fields. Taking the first as top and the last as the lowest (in the order in which the mappers are specified).
BR, Max
Comment #29
Summit CreditAttribution: Summit commentedHi Maxilien, can you place this again in a image please? I do not got it exactly :(
Thanks a lot in advance,
greetings,
Martijn
Comment #30
Dave ReidComment #31
maxilein CreditAttribution: maxilein commentedHi Martjin,
here is an image. Hope it clarifies the idea.
Sorry for posting so late.
Thank you.
BR, Max
Comment #32
Summit CreditAttribution: Summit commented@Maxilein,
Thanks for clarifying! Will investigate this!
Greetings, Martijn
Comment #33
febbraro CreditAttribution: febbraro commented#17 was marked RTBC, but has since been superseded by #23. Setting it back to needs review so folks and look at/test it.
Comment #34
Summit CreditAttribution: Summit commentedHi,
@maxilein #24. where do I have to move the function sortbylevels to get this working for larger imports please?
Could you reply in code please?
Thanks a lot in advance for your reply.
greetings, Martijn
Comment #35
apmsooner CreditAttribution: apmsooner commentedpatch#19 works on latest dev snapshot. Patch #23 i couldn't get to work....
Comment #36
Summit CreditAttribution: Summit commentedHi,
I am looking at the solution of #23.
1) Am I correct I have to build 3 mappers to get hierarchy of 4 in?
1 mapper (mapper 1) for country region and set the mapper to the correct csv fields.
1 mapper (mapper 2) for region and city and set the mapper to the correct csv fields
1 mapper (mapper 3) for city and district and set the mapper to the correct csv fields.
So it becomes the following proces
2) Build the taxonomy term mapper with correct vocabulary for country-region
3) Clone this mapper for a new mapper region-city and change the mapper fields accordingly
4) Clone this mapper for a new mapper city-district and change the mapper fields accordingly
Now have the source-csv file with the 4 columns: country, region, city, district.
Run the 3 mappers in hierarchy: mapper 1, mapper 2 and mapper 3.
5) Will then my hierarchical vocabulary be build?
6) Isn't not possible to have one mapper with 4 hierarchies: parent-parent-termname, parent-termname, termname, child-termname?
I will test this further.
Greetings, Martijn
Comment #37
maxilein CreditAttribution: maxilein commented@Summit #34
Just out of the other function's body. (You created a function definition INSIDE another function definition!)
Make it an independent function like all the other functions in the module.
e.g. Just put the function right after the end of the function you take it out of.
Now you have:
and it shoud be:
R, Max
Comment #38
batje CreditAttribution: batje commentedsorry, my preview got saved
Comment #39
batje CreditAttribution: batje commentedI took #23 and did 2 things:
By all means have another go at the orderHierarchicalTaxonomies function.
Comment #40
batje CreditAttribution: batje commentedI figured out why i didnt get the Parent GUID import working. I am using different datasets to import the hierarchy and on this line, the code presumes you are using 1 importer to import the full hierarchy.
$parent_tid = $query->condition('guid', (string)$value)->condition('id', $this->id)->execute()->fetchField();
changed that to
$parent_tid = $query->condition('guid', $value)->execute()->fetchField();
After all, a GUID is a Global Unique Identifier, so it should be unique in the whole system (and beyond). That is the responsibility of the user.
This patch includes #38 as well.
Comment #41
bjalford CreditAttribution: bjalford commentedI can't apply this patch to the latest dev.
Comment #42
geek-merlinVERY strange,
seems patches up to #23 are applicable, can find blob 0d44e84
but #39 and #40 by @batje refer to a blob a8dd9ad which i cannot find.
EDIT: phuh, there is a dead master branch which the commit is coming from
EDIT: added #1460066: bogus master branch in repo
Comment #43
geek-merlinOK, this is #40 rebased to 7.x-1.x branch
which took quite a bit of dirty git plumbing.
future comitters: beware, do NOT use master branch, it's dead.
Comment #44
bjalford CreditAttribution: bjalford commentedI've tested the patch and it's working using parent guid. It's worked with a 22k term, 10 level deep hiercharcy.
Comment #45
geek-merlinstrange... just noted it does not have any parent guid code anymore as noted in #39 =:-()
EDIT: in fact the sort code is missing, so it depends on the order if parents are found. no good.
Comment #46
bjalford CreditAttribution: bjalford commentedYes, I noticed that but forgot to mention. I'm doing a csv import and the parent has be before the children for it to be imported in the correct structure.
Comment #47
geek-merlinhey, life's a bitch!
i did a complete rewrite of the sort feature to (just when testing) realize the idea is flawed from the beginning:
feeds splits up our feed to chunks of 50 items - which is surely a good thing.
but it means we never have a complete item array to sort (never? let's check that)
just in case we find a way, here is the code - should be quite understandable.
Comment #48
geek-merlinOK, there is hope:
* of course we can always import in 2 passes - not elegant nor fast but working
* we can clear the 50 item limit and hope we have enuff memory
Comment #49
geek-merlinOK, a summary:
* to set perent(s) of a term they need to be imported in advance
* pulling all terms into memory and sorting them does not work in for medium and big imports
* but even if the source is not ordered, we always can do a 2-pass import, and on the second pass all parents should be set ok
problem: 2nd pass won't update because hash says "no new items".
this patch adds an option "force update" which solves this for the second pass.
after testing i made some code cleanup so ther may be small issues with it - please test and report.
TODO: write tests.
Comment #50
philipz CreditAttribution: philipz commentedI get the memory exhousted php error. I've got 200 terms and I set memory limit to 256 MB but that didn't help. I'm trying to find what causes this problem but no luck so far.
What did work:
1. Created importer and imported all terms (parent terms and child terms) WITHOUT parent-child relations, only GUID and term names. GUID and Term name Unique targets.
2. Created importer only for child terms. Force update on. GUID and Term name Unique. This updated the previously imported child terms adding them parent relation.
Comment #51
geek-merlinhmm, thinking loud... TODO:
* log and display peak memory usage
* make numberofitemsprocessedinonestep configurable
which both should be separate issues
would like to see another success story before marking this rtbc
Comment #52
philipz CreditAttribution: philipz commented@axel.rutz are you referring to my story? :)
I expected that this should work in one pass if all parents are imported before children or 2 passes: first parents only then children.
How I did this is none of these and not so obvious either.
Still... thank you very much for the work done!
Comment #53
philipz CreditAttribution: philipz commentedThe way I imported terms described in my post #50 did not work that well after all. The problem is I have duplicate child term names so it can't be unique target.
I'm trying something else right now:
1. First I import all terms (parents and children). Mapped fields: GUID and term names.
2. Then I import only child terms and map. I map only GUID (unique target) and Parent GUID.
I was hoping this approach would work but I get "Term name missing." errors for every term. It's trying to create new terms and doesn't try to update existing ones by their GUID. Debugging GUID and Parent GUID shows they're ok.
Comment #54
philipz CreditAttribution: philipz commentedI added guid as unique target in
existingEntityId
function and it worked.Comment #55
geek-merlin@philipz: thank you for pointing this out, in fact unique target "guid" is urgently needen and was lost somewhere in the battle!
so for anyone to test: try patch #49 and then #54 on top of it
;-)
Comment #56
GaëlGUsed #55 info to patch against dev version, but I ran into the same problem as #50 : memory exhausted.I tried importing 5 lines only but it fails too. There must be a infinite loop or something like that.
"Force update" disabled causes "no new terms" message and vocabulary remains flat. "Force update" enabled causes memory exhausted error and the first item in the csv disappears from the vocabulary.
I guess it's not a problem of csv size, the importer gets stuck at the first line. I'll investigate further.
Edit: my CSV file was wrong, some terms had themselves as parent. To track this kind of error, use this :
This verification may be added with the hierarchy feature.
Comment #57
geek-merlinyes this does completely make sense to me.
so this new patch contains #49, #54 and #56
Comment #58
geek-merlin"force update" did not work due to missing default.
fixed here.
Comment #59
TimelessDomain CreditAttribution: TimelessDomain commented#58 worked well for me. thanks
Comment #60
geek-merlinComment #61
franzWow, a lot of people worked on this, thanks for the hard work and testing. I guess it was about time we get this committed.
Comment #63
pindaman CreditAttribution: pindaman commentedany documentation on how to setup this feature would be rellay nice
thanks
Comment #64
g089h515r806 CreditAttribution: g089h515r806 commentedDoes not work.
I use 7.x-2.0-alpah5, I get following error when import hierarchical terms:
•SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '151-119' for key 'PRIMARY'.
After i remove the mapping of parent term name . it works correctly.
Comment #65
geek-merlin@g089h515r806: you might want to file a new bug for this and link it here.
thank you.
Comment #66
g089h515r806 CreditAttribution: g089h515r806 commentedIn mappings:
I set 2 mappings for parent term:
pcode -------------------Parent GUID
pname ------------------Parent term name
3 mappings for current term:
code -------------------GUID
name ------------------term name
weight------------------Weight
I got "Integrity constraint violation" when i import terms.
After I remove mapping:
pname ------------------Parent term name
It works correctly. Maybe we need write a document.
Comment #67
geek-merlin#66: please DO create a separte bug report and provide a COMPLETE exception so hapy bugfixers won't have to guess.
thank you.
Comment #68
geek-merlin#66: please DO create a separte bug report and provide a COMPLETE exception so hapy bugfixers won't have to guess.
thank you.
Comment #70
twistor CreditAttribution: twistor commentedSeems the GUID behavior was changed in this patch. #1152940: Feeds term import with hierarchy and weight
It's a long issue, can someone explain the reason?
Comment #71
g089h515r806 CreditAttribution: g089h515r806 commentedIt seems that both "Parent GUID" and "Parent term name" could generate a parent term object.
One term have 2 parent terms, both of its parent term have the same tid, same name.
When save the term with 2 same parent terms, I get error:
"
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '151-119' for key 'PRIMARY'.
"
It try to save the same parent term twice. This is the reason.
Comment #72
twistor CreditAttribution: twistor commented@g089h515r806, That should be posted in a new, followup issue.
To give everyone a heads up on this issue, I'm reverting the change made in:
I see in the issue that there was some minor conversation about the behavior of GUID and whether it is a TRUE GUID or not. You cannot simply change the behavior like this without some serious discussion and documentation.
Comment #73
twistor CreditAttribution: twistor commentedSee #1784436: FeedsTermProcessor existingEntityId doesn't honour id column in feeds_item.
Comment #74
twistor CreditAttribution: twistor commentedErr, did not mean to re-open this.
Comment #75
crystaldawn CreditAttribution: crystaldawn commentedIs this patch documented anywhere. Is it even committed to dev or a release? I just spent 2 hours creating what looks the exact same functionality that others had been working on in this issue for over a year (then I posted it here: https://drupal.org/comment/8177547#comment-8177547 and didnt even see this issue until after I had it posted). Some docs on how to import hierarchical terms and tag content with children would have been useful so that all the work put into this issue by others would not have gone to waste for me. https://drupal.org/node/1301604 says that it is not supported and then links this now closed ticket. Is this committed or not? If so, it now needs docs.
Comment #76
casper83 CreditAttribution: casper83 commentedname,guid,parent_name,parent_guid
Primary Energy,32001,,
Transformation,32002,,
Final Energy,32003,,
Electricity,32004,Transformation,32002
Generation efficiency,32005,Electricity,32004
Installed capacity,32006,Electricity,32004
Heat,32007,Transformation,32002
I am getting this error:
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1018-0' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1019-0' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1020-0' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1022-1021' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1023-1021' for key 'PRIMARY'
the headers are named with the meaning of the mapping fields.
is there any thing wrong with structure of the CSV file & values inside it ?
Comment #77
bibo CreditAttribution: bibo at Wunder commentedIf I understood this correctly, it's been in feeds 7.2-branch for almost 5 years, based on this comment #61. I'd still like to see the usage properly documented somewhere?