CSV Format clues: the simple path

jjalocha - May 22, 2009 - 19:49
Project:Taxonomy import/export via XML
Version:HEAD
Component:CVS format
Category:support request
Priority:normal
Assigned:Unassigned
Status:closed
Description

As a first-time user of taxonomies and taxonomy_xml, I would like to see more information about this import/export module available before installation. I.e. don't force a new user to install a module, just to find out if it suits him or not. Good documentation helps with overal user-friendlyness of Drupal.

Once installed, good information is available:

  • sites/default/modules/taxonomy_xml/formats.html
  • admin/help/taxonomy_xml

On your project page I do miss a description of the accepted formats, ie. the content of admin/help/taxonomy_xml.

In the README.txt file, the information is outdated:

This module makes it possible to import and export vocabularies and taxonomy terms via XML. It requires taxonomy.module.

No mention of "CSV, RDF and other formats" that are listed in the project page. (BTW: what "other formats"?)

README only mentions one example, today, you have made many more are available. A little description or guide to these examples would be very useful, because ISO 2788 is not an intuitive CSV format (at least for me), and the standard is nowhere available for free on the web.

#1

dman - May 23, 2009 - 03:41

I'm sorry you couldn't find the /docs folder in the module - in there I thought I'd written too much documentation. Certainly more and better than most other Drupal modules.
But you are right about finding them from the project page, so I've just added some links to the repository browser where these docs are found.

README is probably out of date, yeah. I put more information updates in the documentation pages proper.
page updated

I didn't write the ISO :-) and if you look at some of the references I had to use it's pretty hard to find anything intuative - but we had to use the most portable, robust syntaxes out there.
The door is open for folk to support their own notation dialect - like tab-indented one-word tree structures. But I've not found simple systems in use beyond a few dozen terms, at which point you may as well type it in.

#2

dman - May 23, 2009 - 08:26
Status:active» fixed

double-post got stuck in the cache for a few hours...

#3

jjalocha - May 23, 2009 - 13:49

That was really quick, Dan! Now, there's everything I could ask for!

I understand now, why I couldn't find the documentation --It's in the development release, but not in 1.3. :S

Excellent information there! I definitely don't think it's too much. I will read all of it, because it looks very interesting, and I need to create a rather complex vocabulary by hand. With the proper documentation, you can make so much more with a piece of software!

I am sorry, if it sounded like I criticized your choice of ISO 2788, that was not my intention at all. Quite on the contrary, I am very glad that you did choose a standard! And with your amazing documentation, it should be possible to make use of it to the fullest extent.

Thank you very much for your excellent module. Taxonomies are a critical part of many projects, and with this tool I can "code" them comfortably, review, and share!

Cheers,
Jerzy

#4

boabjohn - June 3, 2009 - 12:17
Title:update & more documentation» CSV Format clues: the simple path
Component:Documentation» CVS format
Category:feature request» support request
Status:fixed» active

Hey guys...add my support and compliments to the chef: this module is simmering some great ideas, especially at the higher-end of things. I suppose there will be lots of "semantic" type glue strategies developed over time (eg SPARQL into RDFa triplestores), but being able to work with standards-based taxonomies is fantastic.

My immediate issue is at the very *low end* of drupal/taxonomy management, and I'm finding the barrrier to entry a bit too oblique.

I have about 200 terms that each have a single synonym. They are in an csv in two columns. Now: what is the shortest way to import via the csv powers vested in this module??

Reviewing the lovely samples and docs, I got the idea I might be able to modify my spreadsheet like this:

term, relation, synonym
weather, used for, storms
cats, used for, moggies
..

Testing this, however, made the module spit out a lot of (very polite) error messages, and it did not create any synonyms for me.

So: I guess my take on the module is to encourage you to add at least a passing reference to the existing logic of the *Drupal* Taxomony module...this is the target of closest interest for people in this forum, yes? At the end of the day, regardless of what ISO says, we have to prepare a file that populates a DRUPAL TAXONOMY...so I'm looking for a csv template format that addresses the settings and options relevant as such.

Can you suggest what the best way forward is for me now?

I looked for online taxonomy tools, thinking I could pop in my simple csv and get a nicely wrapped load of syntax back, which could then be fed to this here module...but again, no immediate success.

Any mercy for us bottom feeders?

Cheers!

#5

dman - June 3, 2009 - 12:50

#1 - this deserves a new issue.
#2 - if the module displays an error message - don't just tell us about it - tell us what it said!
#3 - OK, so with a little telepathy and testing, I'm guessing your error was

Not quite sure what 'used for' ('used for') in 'weather, used for, storms' means. You may add this term to the translation array in the module code to make it become useful.

... which is saying "I don't know what the relation 'used for' means."

In this case it's a typo at your end. Simple.

Currently, the supported synonyms for the concept of "synonym" (!) are:

    'Used for'        => TAXONOMY_XML_HAS_SYNONYM,
    'AKA'             => TAXONOMY_XML_HAS_SYNONYM,
    'synonym'         => TAXONOMY_XML_HAS_SYNONYM,
    'altLabel'        => TAXONOMY_XML_HAS_SYNONYM, # SKOS
    'equivalentClass' => TAXONOMY_XML_HAS_SYNONYM, # RDFS
    'has synonym'     => TAXONOMY_XML_HAS_SYNONYM, #TCS
    'has vernacular'  => TAXONOMY_XML_HAS_SYNONYM, #TCS

Case sensitive - for a reason.
"used for" != "Used for"

Adding yet another one from an unspecified vernacular is not going to happen unless you can point to a big institution that is publishing its data using those words. Seeing as you just made up your spreadsheet (Excellent work so far in understanding the intent of the system though) I think you can just fix that up at your end and try again.

When I repaired your example and tried a CSV:

weather, Used for, storms
cats, Used for, moggies

All went as expected .
... for me anyway!

Does this get you further ahead? I don't think it's a failing of the system being Drupal friendly (although feel free to discuss this further - in a new issue) but it's possible that I need to write case sensitive in big bold letters somewhere in the docs.

See the very bottom of taxonomy_xml.module for the full list of terminologies I've had to support so far. It's mad.

#6

boabjohn - June 11, 2009 - 02:47

@dman... You're a marvel of engineering. Thanks for the telepathy and the clear explanation/fix...and apologies again for coming in from the side. You might have seen that I did try to make this a new issue, but did not even get that part right (sorry!).

Very happy to stay subscribed and suporting the project: the science and art of semantic integration is not drifting toward anachronism any time soon!

#7

slandry - July 30, 2009 - 16:59

I'm trying like the devil to get a taxonomy loaded into Drupal (going on 3 days). It's quite an exhaustive list of all the US States, Counties and Cities. The CSV looks like this:

Alabama,Henry County,Abbeville
Alabama,Clay County,Abel
Alabama,Tuscaloosa County,Abernant
...

I've been reading through the docs and playing with the samples but when I import I get an error that says:

Not quite sure what 'Henry County' ('Henry County') in 'Alabama,Henry County,Abbeville' means...

The only thing that gets loaded is the States.

I get one of these errors for every line in the CSV. I'm sure I have a syntax problem but can't figure out what the proper format would be. Any help is much appreciated!

#8

dman - July 30, 2009 - 23:00

If you looked at the csv example or the docs, you'd find that the expected CSV columns are in the form [subject,predicate,object]
eg:

Architecture,Broader Terms,The arts
Plastic arts Sculpture,Broader Terms,The arts
Drawing & decorative arts,Broader Terms,The arts
Painting & paintings,Broader Terms,The arts

Because each of your lines is saying two different things, (Henry county is inside Alabama, Abbeville is inside Henry county) You need to split those statements up.
And say what the relationship is : (container of, contained by, see also, or synonym)

What you want is:

'Alabama', 'Narrower Terms', 'Henry County'
'Henry County' 'Narrower Terms', 'Abbeville'
'Alabama', 'Narrower Terms', 'Clay County'
'Clay County', 'Narrower Terms', 'Abel'
'Alabama', 'Narrower Terms', 'Tuscaloosa County'
'Tuscaloosa County', 'Narrower Terms', 'Abernant'

Due to the synonym-mashing described in this thread, you can also use 'ParentOf' or 'hasChild' to describe the relationship.
Maybe I'll add 'contains' to that list on day.

Anyway, a little bit of spreadsheet-mashing should get you to a good result pretty quick.
Let us know how you go!

#9

dman - July 30, 2009 - 23:14

Also - important.
Note that because your input data does not have a unique identifier and the CSV method can only do string matching - you will quickly run into trouble when you hit [Dallas, Missouri; Dallas, Alabama; Dallas, Arkansas; Dallas, Iowa;] etc...

As such, the only quick solution is to keep the region name as part of the locality name.

Like this example

COUNTY_STATE Broad Term STATE
'Bacon, Georgia','Broad Term','Georgia'
'Baker, Georgia','Broad Term','Georgia'
'Baldwin, Georgia','Broad Term','Georgia'

AttachmentSize
03 US Counties-States Heirarchy triples.csv_.txt 147.12 KB

#10

dman - August 30, 2009 - 11:05
Status:active» closed

issue queue cleanup. No further attention needed here.

 
 

Drupal is a registered trademark of Dries Buytaert.