This patch does a number of things:
- define a new hook
hook_node_import_prepare; - add taxonomy support, using this hook;
- add support for story, page and weblink.
Introduction
The main purpose of this patch is to add taxonomy support. If you look at http://drupal.org/node/23644 you'll see that I tried to add similar taxonomy support to node_import before, but now that drumm has changed the way things work (much better btw), I needed to rewrite the patch.
My understanding of "add taxonomy support" is to allow for the import of a CSV-file like:
"title","body","taxonomy"
"Taxonomy test: empty","no body",""
"Taxonomy test: numeric","nobody","1"
"Taxonomy test: term","anybody","An example term"
"Taxonomy test: non-existant term","somebody","No such term"
"Taxonomy test: list of numbers","some body","1|2|3"
"Taxonomy test: list of terms","foo bar","Another term|An example term|No such term"
"Taxonomy test: mixed list","foz baz","An example term|1|2|Another term"
As you can see, there is a field called "taxonomy" which is a list of strings seperated by "|". The idea was to let node_import.module look up the "tid" of each of these strings and construct an array of "tids" to pass on to taxonomy.module. One can import a large amount of nodes like this which already have terms associated with them so you don't need to do this manually.
This kind of "taxonomy support" is quite different from the taxonomy_node_form added to hook_node_import_global as I hope you see.
hook_node_import_prepare
To allow for this, we need a new hook. We need the chance to take an input like "An example term|1|2|Another term" and convert it into something like array(42, 1, 2, 69). Enters: hook_node_import_prepare.
The new hook is run after node_import.module creates a $node with all fields from the CSV-file and the static and global fields added by the different hooks.
The hooks purpose is to convert some input inside $node so that node_save can interpret it.
The way this is done in taxonomy_node_import_prepare is to convert $node->node_import_taxonomy_list (a field "matched" with a field from the CSV file) into an array and save this array to $node->taxonomy.
BTW: the same hook can be used for example by import_event.inc: as it is now, the user needs to input 10 (!) fields for start and end date (startmonth, startday, startyear, ...). I know this is because node_save expects this format for event.module, but wouldn't it be more user-friendly if the user only has to provide 2 fields: "startdate" and "enddate", eg in a format as "yyyy/mm/dd hh:mm am/pm", and that a (to be written) event_node_import_prepare splits this fields up into the needed 10 (!) fields?
hook_node_import_global
I think the taxonomy_node_form of each node type (eg story) should appear in taxonomy_node_import_global and not in story_node_import_global as shown in the example. Setting taxonomy terms should only be the responsability of import_taxonomy.inc.
hook_node_import_static
Similar: why does each node type need to define the uid and name as static fields? Shouldn't this be the responsability of one import_node.inc? This could then also define common things like title.
So the patch adds this file: it doesn't define a hook_node_import_types because a "node" isn't a seperate type. It does define the "title", "uid" and "name" fields.
story, page and weblink support
The patch also adds support for "story", "page" and "weblink" (quite trivial).
Example
- Save the CSV-file above to your computer;
- Create an empty vocabulary (eg called "test");
- Add only the term "An example term" to it;
- Enable eg the story.module and associate it with the "test" vocabulary;
- Go to admin/node/import and import the CSV-file;
- "Match" the "taxonomy" field with "List of taxonomy terms";
- Try out the different options.
I think you will agree, this is GREAT! :-)
Summary
I think this patch adds great taxonomy support to node_import. Of course this is just a first iteration, so there will be bugs and maybe you don't all agree on wether this new hook is needed or "The Right Thing(tm)".
I didn't apply this to CVS because I think we need to discuss whether this is something we want and whether we agree on the interface.
I'm open for all comments or questions.
Kind regards,
Robrecht
| Comment | File | Size | Author |
|---|---|---|---|
| #10 | node_import.module | 5 KB | ljet |
| #6 | screenshot_9.png | 49.16 KB | drumm |
| #4 | import_taxonomy.inc | 5.15 KB | Robrecht Jacques |
| node_import-taxonomy.patch | 10.77 KB | Robrecht Jacques |
Comments
Comment #1
drummComment #2
drummI comitted all of this with some minor cleanup except the taxonomy file itself. I moved the module_exist() calls to hopefully work better with chx's splitting code.
The taxonomy file contains a hard coded table which should be removed. Some tabless CSS could replace it if needed.
Comment #3
drummComment #4
Robrecht Jacques commentedI understand why one would not like the table. I added this because I don't see how it should be formatted as the "form_radio" and "form_select" always wrap the control in a "div".
I looked at "node.module" and there they use a "definition list" for an option list. I don't think this is really the best option, but have modified the patch to use this too.
Better?
Another remark: maybe it is better if the "global option" page is a seperate page (after "matching"). This way we could only add the "how to handle unknown terms" only if there the user actually selected a "taxonomy" matched field.
I also notice you moved the "title" to the "import_page.inc", "import_story.inc", ... files instead of keeping it in "import_node.inc". Was this by design? I really think that fields as "title" (and maybe even "changed", ...) should be set by one common "import_node.inc". A "page" node isn't responsable for the "title" field, it is the "node.module" itself which handles it.
A matter of taste maybe.
Anyway, I'm glad you accepted the "hook_node_import_prepare".
Comment #5
drummComment #6
drummI think this could use a little more UI polish which I can try at later in the week. For now, I am attaching a screenshot for anyone to provide input on if they don't want to try installing this file.
Comment #7
Robrecht Jacques commentedI agree it looks overloaded :-)
I propose two things:
Personally I really like the ability to automatically create taxonomy terms. But I am willing to alter the patch to remove these global options.
In any case, node_import will need some "documentation" loving :-)
Comment #8
Robrecht Jacques commentedAdded to CVS.
Comment #9
(not verified) commentedComment #10
ljet commentedHi
I'm trying to add taxonomy into node_import module. I applied all your patches that i found on the web but i could not fix it. I here attached my node_import.module file to check. It doesn't work. It did not import anything.I could not find why i can't import flexinode-node as well as taxonomy. If i use the original node_import module(only for flexinode), i can import the nodes but not taxonomy.
I prepared my csv file similar like this.
title name type body taxonomy
nodeimp test admin flexinode-1 no body 1|11|111
I have many sub-taxonomy and then i put the term ids similar above. e.g 1. A
11. aa
111. aaa
Comment #11
sym commentedI'm not sure if this is the same task as http://drupal.org/node/65001 but it might be a bug that you should know about.
I've looked and I can't understand why the terms aren't added to the node.
Comment #12
Robrecht Jacques commented4.6 doesn't support import of taxonomy as explained in this patch.
4.7 (current CSV until it gets branched) does.
Closing this.
Comment #13
TheWhippinpost commentedI'm having a few major pains with this (or tying myself in knots!).
The way the CSV field structure has been documented here, seems very prescriptive - I'm not sure how much easier a burden it is for the user if one has to format the CSV in the way suggested. I'm particularly referring to the piped ("|") delimiter to specify multiple terms that sit in a single field of a record ... Am I right in surmising that this is purely for "re-importing" nodes from one Drupal installation into another (and by extension, I assume, exported from that "old" installation in the format described?)?
If that's the case, then fine (at least I can move-on :-p) Otherwise, I'm not sure how many people would realistically spend time fiddling with a large CSV file of hundreds, if not thousands of records inputting all the targetted terms in this format.
I'm trying to import a "typical" product feed. It "needs" hierarchical terms:
... where "Name" is the intended file/product name. (There are more fields but they're tied directly to the product (Name) from this point on).
Now this seems to me, a more typically encountered CSV file, no?
I'm attempting to import this data (with the aid of CCK and Pathauto) to at least loosely reflect the implied term hierarchy and path structure in the feed. Given that this is a feed delivered daily (around 5,000 records), I'm struggling to find a way to turn-it-around without too much fiddling of the actual CSV data.
I'm throwing it out there anyway. Unfortunately there isn't a great deal of documentation (although I know you've written quite a lot Robert) and where there is, it doesn't seem to tie-in very tightly with what's presented at the interface level, if you see what I mean. If this is a documentation issue, I apologise (but would dearly welcome a pointer).
If it isn't then I leave it here as an issue (perhaps) of module direction.
Comment #14
Robrecht Jacques commentedLet's try to help.
There is no export format connected to this import format. If you have a better way in a CSV file to specify multiple terms I'm open to suggestions.
One would not be supposed to fiddle with large CSV files. If you exporting the data from somewhere else, let it export into this format. If not possible, you probably need to preprocess that data with some own written script or possibly in Excel.
And what does this "Category" look like? How would you indicate hierarchy in that?
Note that | is used for specifying *multiple* terms. So if you only have one term for each product, you don't need it.
Suppose you have some categories for products as follows:
- hardware
--- pc
--- imac
- software
--- pc
--- imac
What would *you* expect to put into the Category column if you have some imac software product to import?
What would *you* expect to put into the Category column if you have a hardware product that works for both imac and pc?
(maybe this particular example could be solved by making 2 vocabularies, one with hardware/software and one with pc/imac, but you get the point I hope)
Yes. I had a support request by email that uses exactly this format too for importing e-commerce products (hence my first implementation of support for the tangible/shippable product this week).
Right.
I'm writing the documentation as I go. Is it not clear enough? Probably there should be some documentation on the wizard pages itself. I'm open to improvements there. If you could indicate the parts of the documentation (basically README.txt) that need improvements or the parts of the GUI that are not clear, let me know.
I'm willing to improve the module and your feedback is certainly appreciated.
Comment #15
TheWhippinpost commentedWOW! I'm so sorry - For some unfathomable reason, I have not seen your reply to this until just now Robert... despite visiting here almost daily!
OK, let's catch-up (Hopefully with a bit more insight now I have some more experience playing around with it)...
I see what you're saying, yes. My reservation, I suppose, really stems from the POV of not ever having received a CSV file formatted like this, for the purposes of designating multiple terms. It's a useful feature, no doubt, but my (rhetorical) question would be; Isn't it more of a "proprietary" feature of Drupal, than a universal "everyday" usage solution?
Yes, well, I'm hoping to avoid pre-processing for obvious reasons, but if needs-must, I'll accomodate.
OK, here's what I thought I'd be able to do:
Given row titles and example listings thus:
Product_Type, Category, Name...
Hardware, Hardware Widgets, Super Widget Model
Software, Software Widgets, Super Widget Model
So, having a taxonomy called, "products" - and using a CCK content-type - to be able to map (within import_node) "Product_Type", to be, for instance, a parent term of the next field, "Category", and so on, x levels deep as mapped by the user... and those terms be tagged by the respective "Name" fields (which is the product title (file name)).
So, in essence, resulting in a URL path like thus:
www.example.com/hardware/hardware-widgets/super-widget-model.htm
I actually posted this request in node_import future directions by suggesting an alternative means of expressing the current multiple terms feature to achieve the above. However, after considering the issue of not being able to rely upon the order in which multiple terms are presented, it clearly wouldn't work using that methodology... which is to our advantage (if do-able) because then we don't need to perform any pre-processing on the CSV file.
I hope that's understandable.
I'm snipping your quote for brevity, but I think I understand what you're saying/asking.
Where our paths differ, if I'm understanding you correctly, is that the product feeds can change. Therefore, there may be scenarios where terms, and/or sub-terms are removed, introduced (or whatever...) etc... Of course, I'm assuming that what you described is predicated upon an already-constructed taxonomy which has terms, and sub-terms pre-built, which I'd like to avoid but if this is how I can achieve it, pls advise.
I don't know if this would be an accurate way of stating all the above, but I suppose what I'm saying is; I was hoping to be able to create terms, with sub-terms from a CSV file - That is where I'm falling-down.
Yes, I think probably the first wizard page would be appropriate to explain any expected CSV fields needed to perform the features the module can perform; for example the multiple terms import feature (piped "|") etc...
I'm more than happy to help in this regard if that would be of use but obviously, I need to be sure first that I'm absolutely clear on how the whole thing works.
Anyway, apologies again for only coming back to this now, and thanks for taking the time.
Comment #16
oliver soell commentedWhen importing, root terms must be used - synonyms don't work. It would be nice if they did :)
Comment #17
Robrecht Jacques commentedVersions 4.6 and 4.7 are no longer supported. If this bug is still present in the 5.x version, please reopen this issue.
Taxonomy support has been built in in 5.x versions. Synonyms are also supported in 5.x-1.5 or later.
There is still an issue with specifying hierarchical terms. There is a separate issue for this: #162474 : taxonomy hierarchy.
Setting it as "fixed".
Comment #18
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.