Closed (fixed)
Project:
Taxonomy import/export via XML
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
23 Feb 2009 at 20:22 UTC
Updated:
17 Mar 2009 at 22:00 UTC
I'm trying to write my own custom script for a proprietary RDF format. I am building off tcs_format.inc, but I can't seem to find any documentation regarding the $terms array that is returned by the main parsing function. Specifically, how do I express the synonyms, and parents. Also, what is this predicate object? My current code only imports flat files, and can't seem to create parent-child relationships. Is there any documentation? Thanks in advance, Christian
Comments
Comment #1
dman commentedGood questions.
The $terms array contains almost-ready-to-save $term objects in the stucture used internally by Drupal.
(tid may not be set, that's not your job, and the array key can be anything also)
As you'll see, that doesn't normally define relationships. That's what the predicate list is for.
An intermediate stage in the parsing is to convert all relationships and attributes out of the source documents and turn it into a set of data statements.
Eg in the RDF example,
is interpreted as
"predicates" contains all unknown extra bits of info in a nicely tagged bag. (Actually it contains recognised info as well)
This is in turn then translated into statements of relationships
this is what taxonomy_xml_canonicize_predicates() does
(see taxonomy_xml_relationship_synonyms() at the end of taxonomy_xml.module for a full list of current relationship-statement-synonyms extracted from real-world taxonomy schema dialects - maybe you now have a few more to add)
A little later taxonomy_xml_set_term_relations(), sting-matching is performed to hook the string 'Canada' with term-id '17' or whatever.
We eventually get the full term object with arrays:
And THEN things can be saved.
There are a few other stages in the middle (like how it handles referencing terms that have not yet been defined) but that's probably OK.
Using the predicates step in the middle CAN sorta be skipped - if you want to do everything by hand and do all your own relinking etc and just set $term->parent yourself. The abstract way of using predicates is there so that all the format inc files can re-use the library routines to massage data that is the same shape.
For your job, I'd advise that:
For large data sets, I found it neccessary to retain the external key/GUID in my imported data. For completeness, for later syncronizing, and for references internally during import. This is not supported in core Drupal so we had to use taxonomy_enhancer.module to add a few more values to our term object. This will probably be beneficial to you anyway. With big sets, and complex taxonomies, string-matching is no longer reliable.
If you use taxonomy_enhancer, and add a field called 'id' - then saving a value in $term->id during your parse phase, it will be retained and also used as a lookup key during the relinking phase
Comment #2
dman commented