Created a patch to add parent term and weight mappers for taxonomy term imports, so that the basic hierarchy and weights can be preserved. Probably needs more testing.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

bblake’s picture

FileSize
1.98 KB

Patch included

bblake’s picture

FileSize
1.97 KB

Re-rolling with -p0 compatability for unpatched drush make.

rickmanelius’s picture

Related post asking for instructions and how to do this
#1159628: How do you import a hierarchical taxonomy with feeds?

I'm going to test the above and get back to you. Thanks for the patch!

rickmanelius’s picture

Version: 7.x-2.0-alpha3 » 7.x-2.x-dev
Status: Active » Needs work

Hi bblake,

I'm testing using 7.x-2.x-dev with the required patches here #1169986: Fatal error: Class 'FeedsPlugin' not found.
Here is my very basic CSV data

Term,Parent
Product,
Ice Cream,Product
Vanilla,Ice Cream

And my Feed Import goes as follows in source:target:unique

Term:Term name:unique
Parent:Parent Term Name:not unique

The terms will import, but the parent/child relationships still appear flat in the GUI and in the backend database tables. Your patch code looks pretty straightforward, so it's unclear why it's not getting set properly. Am I doing any of the steps above incorrectly? (do I need a GUI? Is taxonomy_get_term_by_name expecting something else?

rickmanelius’s picture

Found the error, $value is a string, not an array, so these lines

+        if (!empty($value[0])) {
+          $terms = taxonomy_get_term_by_name($value[0]);

Should be

+        if (!empty($value)) {
+          $terms = taxonomy_get_term_by_name($value);

When I run the above, it works.

The weight portion of the patch uses $value instead of $value[0], so that should be correct. Testing this next.

rickmanelius’s picture

Status: Needs work » Needs review
FileSize
1.96 KB

Final confirmation:

Attaching the next version of this patch with the changes made in #5. I can confirm both work. Here is my test CSV.

Term,Parent,Weight
Product,10
Ice Cream,Product,10
Vanilla,Ice Cream,10
Chocolate,Ice Cream,15
Strawberry,Ice Cream,12

And my output

Product
- Ice Cream
  * Vanilla
  * Strawberry
  * Chocolate

As intended, the terms are ranked by the parent/child relationship, and the weight was properly applied between strawberry, vanilla, and chocolate.

Thanks for all your hard work bblake. I really needed this!

rickmanelius’s picture

This appears to be a duplicate of this issue, also reviewed and tested
#1152940: Feeds term import with hierarchy and weight

g089h515r806’s picture

FileSize
2.17 KB

i change the code and add my latest code to feeds_terms-1152940-3.patch,
if parent value is not empty, and it does not exist in database, i create a new term for it, and set $target_node->parent = new created term id instead set it to 0. This method was copy from migrate module. here is my code,it works on our site.:


			case 'parent':      
				if (!empty($value)) {
					$terms = taxonomy_get_term_by_name($value);
					$parent_tid = '';
					foreach ($terms as $term) {
						if ($term->vid == $target_node->vid) {
							$parent_tid = $term->tid;
						}
					}
					
					if (!empty($parent_tid)) {
						$target_node->parent = $parent_tid;
					}
					else {
					  $term = new stdClass();
				    $term->vid = $target_node->vid;
				    $term->name = $value;
			      $status = taxonomy_term_save($term);
				    $tid = $term->tid;
						$target_node->parent = $tid;
					}
				}
				else {
					$target_node->parent = 0;
				}    
g089h515r806’s picture

FileSize
2.18 KB

This is correct patch code, previous is wrong.

darrylri’s picture

FileSize
0 bytes

I found that parents weren't good enough for my needs; I have a legacy system with a multilevel categorization system where many of the terms in different branches have the same names. That would cause collisions with just the parent mapper.

So I implemented the following patch to support parentGUID as a mapping item. It's a patch on top of the patch given by RickManelius in #6 above.

darrylri’s picture

FileSize
1.28 KB

Oops, somehow I uploaded a zero length file.

iMiksu’s picture

Here you have #6 and #11 patches combined together against 7.x-2.0-alpha4.

Thanks for you work! Looking forward to get this committed o/

g089h515r806’s picture

for #6, if the order is :

Term,Parent,Weight
,Product,10
Ice Cream,Product,10
Vanilla,Ice Cream,10
Chocolate,Ice Cream,15
Strawberry,Ice Cream,12

#6 patches works very well

if we change the order,for example:

Term,Parent,Weight
Vanilla,Ice Cream,10
Chocolate,Ice Cream,15
Strawberry,Ice Cream,12
,Product,10
Ice Cream,Product,10

#6 patches does not work correctly. We could not always ensure parent term appear before subterms, when we use views output term into csv, parent term may appear after subterms.

my code in #8 solve this problem,

sdrycroft’s picture

The current patch forces, for no apparent reason, the use of integers for the "Parent GUID". This does not make much sense, so I would recommend removal of line 141 from FeedsTermProcessor.inc.

13rac1’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
2.94 KB

Minor update to #14 to remove whitespace. Works correctly. Thanks!

sdrycroft’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
386.5 KB
6.18 KB

The attached patch reverses the changes made by g089h515r806† and instead puts any term with a parent that isn't known at the root of the taxonomy. I have also added a small change to the process function that forces a term to be updated, even if it looks like the data has not changed since a previous import. This means that badly ordered files can be imported twice, the first time to import all the names and data, and the second to correct any of the mistakes in the hierarchy due to the order of the file.

The following example, ordered alphabetically, best illustrates my reasons for the change:

Name                Parent               
Birmingham          United Kingdom       
Europe              
London              United Kingdom       
Manchester          United Kingdom       
United Kingdom      Europe

With patch #15, this would result in:

United Kingdom
-- Birmingham
-- London
-- Manchester
Europe
-- United Kingdom

Note the extra "United Kingdom" term.

With the attached patch, after the first import you would have:

Birmingham
Europe
-- United Kingdom
London
Manchester

After a second import, this would be corrected to:

Europe
-- United Kingdom
---- Birmingham
---- London
---- Manchester

This to me makes far more sense than creating additional unwanted terms in a hierarchy, which is exactly what happened to me when trying to import the attached CoreidaeTestData.xls Excel file.

†Can you please satisfy my curiosity and explain that user name, is it failed hanzi?

sdrycroft’s picture

Just noticed I missed out

$hash = $this->hash($item);

from my previous patch.

johnbarclay’s picture

Status: Needs review » Reviewed & tested by the community

Patch #17 worked fine for me against 7 alpha4. It should apply cleanly to dev also. It needs to be tested with an existing vocabulary also.

13rac1’s picture

Applies to current dev. Attached, again ;), is a patch w/o white space warnings.

benjaminkyta’s picture

Assigned: Unassigned » benjaminkyta

Hey. Thanks.
Works well with Drupal 7.7 too. With these extensions: txt csv tsv xml opml

johnbarclay’s picture

==== the comment below is incorrect. disregard it ======

#19 works for drupal 7.8, feeds 7.x-2.x-dev. I tested it quite thoroughly because I applied it against alpha4 and it had an issue on the second import. I think this is ready for commit after it has a simpletest. The simpletests should probably have a 3 levels of hierarchy and some terms whose parents map to their children to make sure recursion is dealt with correctly. Other simpletest cases seem pretty well outlined in this thread. If someone wants to come up with the .csv files and some natural language test cases, I can write the simpletest code.

johnbarclay’s picture

==== the comment below is incorrect. disregard it ======
there is one caveat I ran across. When

  • parent guid
  • parent term name
  • parent term name is before parent guid in mapping order

The terms with no parents are not included. I believe the fix for this is simply to validate for only parent term name OR parent guid in the mapping, never both. The field is only for finding the parent, so having both is redundant.

johnbarclay’s picture

I ran across a number of issues with #19 and created the attached patch against sept 30 dev snapshot (http://drupal.org/node/927032).

The patch solves the following:

  1. previously, the query for parent id looks for any matching guid; not just for the current feed processor. This was fixed by changing:
    $parent_tid = $query->condition('guid', $value)->execute()->fetchField();
    to
    $parent_tid = $query->condition('guid', $value)->condition('id', $this->id)->execute()->fetchField();
    takes care of this.
  2. resolves order concern of comment 13 and 14 by sorting terms by level. Terms within a level of the hierarcy are still are created in the same order and fetcher presents them, but level0 terms are created first, then level1, and so on. The removes the need to do multiple imports to get full hierarchy.
  3. The use case where a level0 taxonomy term is designated as having itself as a parent in the fetcher now works. This is one of my use cases
  4. parentguid and parent term name still both work, but when both are mapped parentguid is used to determine hierarchy
maxilein’s picture

Title: Add Parent and Weight mappers to taxonomy term import. » @ patch #23 correction of function declaration

In the patch from #23 move the function declaration
function sortbylevels($item1, $item2) { return $item1['level'] - $item2['level'];}
out of the body of the function orderHierarchicalTaxonomies() to make patch work with larger imports.

Thanks for the very useful patch!

Summit’s picture

Title: @ patch #23 correction of function declaration » patch to add parent term and weight mappers for taxonomy term imports

Changed title. Will this patch also give a GUI Mapper change as that parentterm can be selected of a term? May be one of has a great screenshot to show the working?
greetings, Martijn

maxilein’s picture

FileSize
55.33 KB

You can only import 1 level (with a parent) at a time. So if you have three levels you need to configure 3 jobs which must be run sequentially.
Hope that hepls.

Summit’s picture

Hi,
Thanks for the quick reply:
1) +1 for supporting parent-parent-term : usecase: country-region-place.
2) Would it also be possible to have this functionality in node-parser, as such that I have 2 fields:
- Field A is parent of term B
- Field B is term B.
Thanks a lot in advance for considering this!

Greetings,
Martijn

maxilein’s picture

Title: patch to add parent term and weight mappers for taxonomy term imports » Hierarchy import for top to level-n

Import of hierarchy:

Lets consider a hierarchy by the order of taxonomy fields:

New mapping field type: "child-to-previous".

If you have
country -> regions -> city -> district

you would add 4 mappers in this order, each of the type "child-to-previous":

country "child-to-previous" or "top-level"
regions "child-to-previous"
city "child-to-previous"
district "child-to-previous"

The function could automatically analyse that this hierarchy consists of 4 fields. Taking the first as top and the last as the lowest (in the order in which the mappers are specified).

BR, Max

Summit’s picture

Hi Maxilien, can you place this again in a image please? I do not got it exactly :(
Thanks a lot in advance,
greetings,
Martijn

Dave Reid’s picture

Issue tags: +Needs tests
maxilein’s picture

FileSize
27.48 KB

Hi Martjin,

here is an image. Hope it clarifies the idea.
Sorry for posting so late.

Thank you.
BR, Max

Summit’s picture

@Maxilein,
Thanks for clarifying! Will investigate this!
Greetings, Martijn

febbraro’s picture

Status: Reviewed & tested by the community » Needs review

#17 was marked RTBC, but has since been superseded by #23. Setting it back to needs review so folks and look at/test it.

Summit’s picture

Hi,
@maxilein #24. where do I have to move the function sortbylevels to get this working for larger imports please?
Could you reply in code please?
Thanks a lot in advance for your reply.
greetings, Martijn

apmsooner’s picture

patch#19 works on latest dev snapshot. Patch #23 i couldn't get to work....

Summit’s picture

Hi,
I am looking at the solution of #23.
1) Am I correct I have to build 3 mappers to get hierarchy of 4 in?
1 mapper (mapper 1) for country region and set the mapper to the correct csv fields.
1 mapper (mapper 2) for region and city and set the mapper to the correct csv fields
1 mapper (mapper 3) for city and district and set the mapper to the correct csv fields.
So it becomes the following proces
2) Build the taxonomy term mapper with correct vocabulary for country-region
3) Clone this mapper for a new mapper region-city and change the mapper fields accordingly
4) Clone this mapper for a new mapper city-district and change the mapper fields accordingly

Now have the source-csv file with the 4 columns: country, region, city, district.
Run the 3 mappers in hierarchy: mapper 1, mapper 2 and mapper 3.
5) Will then my hierarchical vocabulary be build?

6) Isn't not possible to have one mapper with 4 hierarchies: parent-parent-termname, parent-termname, termname, child-termname?
I will test this further.
Greetings, Martijn

maxilein’s picture

@Summit #34
Just out of the other function's body. (You created a function definition INSIDE another function definition!)
Make it an independent function like all the other functions in the module.
e.g. Just put the function right after the end of the function you take it out of.

Now you have:

function orderHierarchicalTaxonomies()
{

 ... code

 function sortbylevels()
 {
    ... code
 }

 ... code

}

and it shoud be:

function orderHierarchicalTaxonomies()
{

 ... code

 sortbylevels()

 ... code

}
function sortbylevels()
{
    ... code
}

R, Max

batje’s picture

Issue tags: -Needs tests

sorry, my preview got saved

batje’s picture

I took #23 and did 2 things:

  • Removed the orderHierarchicalTaxonomies function. It just didnt quite work. And in some cases it prevented parents from loading at all.
  • Added [] behind all (4) calls to $target_node->parent[] which enables multi-parent terms., That is quite nice. Just define 2 parent targets and they both get added.

By all means have another go at the orderHierarchicalTaxonomies function.

batje’s picture

I figured out why i didnt get the Parent GUID import working. I am using different datasets to import the hierarchy and on this line, the code presumes you are using 1 importer to import the full hierarchy.

$parent_tid = $query->condition('guid', (string)$value)->condition('id', $this->id)->execute()->fetchField();

changed that to
$parent_tid = $query->condition('guid', $value)->execute()->fetchField();

After all, a GUID is a Global Unique Identifier, so it should be unique in the whole system (and beyond). That is the responsibility of the user.

This patch includes #38 as well.

bjalford’s picture

I can't apply this patch to the latest dev.

geek-merlin’s picture

VERY strange,
seems patches up to #23 are applicable, can find blob 0d44e84
but #39 and #40 by @batje refer to a blob a8dd9ad which i cannot find.

EDIT: phuh, there is a dead master branch which the commit is coming from
EDIT: added #1460066: bogus master branch in repo

geek-merlin’s picture

OK, this is #40 rebased to 7.x-1.x branch

which took quite a bit of dirty git plumbing.
future comitters: beware, do NOT use master branch, it's dead.

bjalford’s picture

I've tested the patch and it's working using parent guid. It's worked with a 22k term, 10 level deep hiercharcy.

geek-merlin’s picture

strange... just noted it does not have any parent guid code anymore as noted in #39 =:-()

EDIT: in fact the sort code is missing, so it depends on the order if parents are found. no good.

bjalford’s picture

Yes, I noticed that but forgot to mention. I'm doing a csv import and the parent has be before the children for it to be imported in the correct structure.

geek-merlin’s picture

hey, life's a bitch!

i did a complete rewrite of the sort feature to (just when testing) realize the idea is flawed from the beginning:
feeds splits up our feed to chunks of 50 items - which is surely a good thing.
but it means we never have a complete item array to sort (never? let's check that)

just in case we find a way, here is the code - should be quite understandable.

geek-merlin’s picture

OK, there is hope:
* of course we can always import in 2 passes - not elegant nor fast but working
* we can clear the 50 item limit and hope we have enuff memory

geek-merlin’s picture

Title: Hierarchy import for top to level-n » Feeds term import with hierarchy and weight
FileSize
4.46 KB

OK, a summary:
* to set perent(s) of a term they need to be imported in advance
* pulling all terms into memory and sorting them does not work in for medium and big imports
* but even if the source is not ordered, we always can do a 2-pass import, and on the second pass all parents should be set ok
problem: 2nd pass won't update because hash says "no new items".
this patch adds an option "force update" which solves this for the second pass.

after testing i made some code cleanup so ther may be small issues with it - please test and report.

TODO: write tests.

philipz’s picture

I get the memory exhousted php error. I've got 200 terms and I set memory limit to 256 MB but that didn't help. I'm trying to find what causes this problem but no luck so far.
What did work:
1. Created importer and imported all terms (parent terms and child terms) WITHOUT parent-child relations, only GUID and term names. GUID and Term name Unique targets.
2. Created importer only for child terms. Force update on. GUID and Term name Unique. This updated the previously imported child terms adding them parent relation.

geek-merlin’s picture

Assigned: benjaminkyta » Unassigned

hmm, thinking loud... TODO:
* log and display peak memory usage
* make numberofitemsprocessedinonestep configurable
which both should be separate issues

would like to see another success story before marking this rtbc

philipz’s picture

would like to see another success story before marking this rtbc

@axel.rutz are you referring to my story? :)
I expected that this should work in one pass if all parents are imported before children or 2 passes: first parents only then children.
How I did this is none of these and not so obvious either.

Still... thank you very much for the work done!

philipz’s picture

The way I imported terms described in my post #50 did not work that well after all. The problem is I have duplicate child term names so it can't be unique target.

I'm trying something else right now:
1. First I import all terms (parents and children). Mapped fields: GUID and term names.
2. Then I import only child terms and map. I map only GUID (unique target) and Parent GUID.

I was hoping this approach would work but I get "Term name missing." errors for every term. It's trying to create new terms and doesn't try to update existing ones by their GUID. Debugging GUID and Parent GUID shows they're ok.

philipz’s picture

I added guid as unique target in existingEntityId function and it worked.

geek-merlin’s picture

@philipz: thank you for pointing this out, in fact unique target "guid" is urgently needen and was lost somewhere in the battle!
so for anyone to test: try patch #49 and then #54 on top of it
;-)

GaëlG’s picture

Used #55 info to patch against dev version, but I ran into the same problem as #50 : memory exhausted.
I tried importing 5 lines only but it fails too. There must be a infinite loop or something like that.
"Force update" disabled causes "no new terms" message and vocabulary remains flat. "Force update" enabled causes memory exhausted error and the first item in the csv disappears from the vocabulary.
I guess it's not a problem of csv size, the importer gets stuck at the first line. I'll investigate further.

Edit: my CSV file was wrong, some terms had themselves as parent. To track this kind of error, use this :

  protected function entitySave($term) {
    if (isset($term->parent)) {
      if (is_array($term->parent) && count($term->parent) == 1) {
        $term->parent = reset($term->parent);
      }
      if (isset($term->tid) && ($term->parent == $term->tid || (is_array($term->parent) && in_array($term->tid, $term->parent)))) {
        throw new FeedsValidationException(t("A term can't be its own child. GUID:@guid", array('@guid' => $term->feeds_item->guid)));
      }
    }
  taxonomy_term_save($term);
  }

This verification may be added with the hierarchy feature.

geek-merlin’s picture

yes this does completely make sense to me.
so this new patch contains #49, #54 and #56

geek-merlin’s picture

"force update" did not work due to missing default.
fixed here.

TimelessDomain’s picture

#58 worked well for me. thanks

geek-merlin’s picture

Status: Needs review » Reviewed & tested by the community
franz’s picture

Status: Reviewed & tested by the community » Fixed

Wow, a lot of people worked on this, thanks for the hard work and testing. I guess it was about time we get this committed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

pindaman’s picture

any documentation on how to setup this feature would be rellay nice
thanks

g089h515r806’s picture

Version: 7.x-2.x-dev » 7.x-2.0-alpha5
Status: Closed (fixed) » Fixed

Does not work.
I use 7.x-2.0-alpah5, I get following error when import hierarchical terms:
•SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '151-119' for key 'PRIMARY'.

After i remove the mapping of parent term name . it works correctly.

geek-merlin’s picture

@g089h515r806: you might want to file a new bug for this and link it here.
thank you.

g089h515r806’s picture

In mappings:
I set 2 mappings for parent term:
pcode -------------------Parent GUID
pname ------------------Parent term name
3 mappings for current term:
code -------------------GUID
name ------------------term name
weight------------------Weight

I got "Integrity constraint violation" when i import terms.

After I remove mapping:
pname ------------------Parent term name

It works correctly. Maybe we need write a document.

geek-merlin’s picture

#66: please DO create a separte bug report and provide a COMPLETE exception so hapy bugfixers won't have to guess.
thank you.

geek-merlin’s picture

#66: please DO create a separte bug report and provide a COMPLETE exception so hapy bugfixers won't have to guess.
thank you.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

twistor’s picture

Seems the GUID behavior was changed in this patch. #1152940: Feeds term import with hierarchy and weight

It's a long issue, can someone explain the reason?

g089h515r806’s picture

It seems that both "Parent GUID" and "Parent term name" could generate a parent term object.
One term have 2 parent terms, both of its parent term have the same tid, same name.
When save the term with 2 same parent terms, I get error:
"
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '151-119' for key 'PRIMARY'.
"

It try to save the same parent term twice. This is the reason.

twistor’s picture

@g089h515r806, That should be posted in a new, followup issue.

To give everyone a heads up on this issue, I'm reverting the change made in:

+      } else if($target == 'guid') {
+        $query = db_select('feeds_item')
+          ->fields('feeds_item', array('entity_id'))
+          ->condition('entity_type', 'taxonomy_term');
+        $tid = $query->condition('guid', $value)->execute()->fetchField();
+        return ($tid) ? $tid : 0;

I see in the issue that there was some minor conversation about the behavior of GUID and whether it is a TRUE GUID or not. You cannot simply change the behavior like this without some serious discussion and documentation.

twistor’s picture

twistor’s picture

Status: Needs work » Closed (fixed)

Err, did not mean to re-open this.

crystaldawn’s picture

Is this patch documented anywhere. Is it even committed to dev or a release? I just spent 2 hours creating what looks the exact same functionality that others had been working on in this issue for over a year (then I posted it here: https://drupal.org/comment/8177547#comment-8177547 and didnt even see this issue until after I had it posted). Some docs on how to import hierarchical terms and tag content with children would have been useful so that all the work put into this issue by others would not have gone to waste for me. https://drupal.org/node/1301604 says that it is not supported and then links this now closed ticket. Is this committed or not? If so, it now needs docs.

casper83’s picture

Issue summary: View changes

name,guid,parent_name,parent_guid
Primary Energy,32001,,
Transformation,32002,,
Final Energy,32003,,
Electricity,32004,Transformation,32002
Generation efficiency,32005,Electricity,32004
Installed capacity,32006,Electricity,32004
Heat,32007,Transformation,32002

I am getting this error:
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1018-0' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1019-0' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1020-0' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1022-1021' for key 'PRIMARY'
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1023-1021' for key 'PRIMARY'

the headers are named with the meaning of the mapping fields.
is there any thing wrong with structure of the CSV file & values inside it ?

bibo’s picture

If I understood this correctly, it's been in feeds 7.2-branch for almost 5 years, based on this comment #61. I'd still like to see the usage properly documented somewhere?