I have read http://drupal.org/node/560326, and installed the current DEV version with all the prerequisites (RDF, Upload). However, when I put "music.artist", I get this:

------------------------------
Using locally cached copy 5ae054e5f0530d97504762ea8ac5127b
Using locally cached copy 5ae054e5f0530d97504762ea8ac5127b
Parsing RDF
118 data triples (atomic statements) found in the source RDF doc
Found 3 different kinds of resources in the input : http://rdf.freebase.com/ns/type.type, http://rdf.freebase.com/ns/freebase.type_profile, http://rdf.freebase.com/ns/common.topic
Found 0 resources to be used as vocabulary definitions
Found a target vocabulary already in the database, matching by name 'Imported Vocabulary' vid=6 . This will be used, but not updated.
Found 1 resources to be imported as terms into vocabulary 6
Updated 1 term(s) Musical Artist.
Imported vocabulary Imported Vocabulary. You may now need to Review the vocabulary settings or List the terms
Dunno what to do with 'type.type.properties'. Subject 'http://rdf.freebase.com/ns/music.artist' has value(s) =
Array
(
[0] => http://rdf.freebase.com/ns/music.artist.origin
[1] => http://rdf.freebase.com/ns/music.artist.contribution
[2] => http://rdf.freebase.com/ns/music.artist.similar_artist
[3] => http://rdf.freebase.com/ns/music.artist.acquire_webpage
------------------------------
Should Taxonomy XML not import all the terms (instances) and properties in that schema? I selected recurse, so don't know what I am doing wrong. Library of Congress imported fine.

Comments

dman’s picture

Depends entirely on the source data whether it can be used as a taxonomy term. What is your source data?

Currently the importer recognizes about 20 different 'types' of things that can be interpreted as terms. I'm not sure what 'instance' means in this context, but it's probably something that can be added to the list.
Drupal terms do not support many 'properties' apart from 'synonym', 'description' and 'parent' so other things don't get retained - there is nowhere to store them.

Looking at freebase, it seems that 'instance' is not meant to be the same thing as 'concept' or 'category'. In the Drupal content management model, an 'instance' would normally be an actual node (and a Content type would be an rdf:type)
BUT, in other cases it would make sense to import a load of 'instances' and use them as tags.

But it's not been tried yet. Adding a few synonyms to the big list at the bottom of the module may work...

I had a look at http://rdf.freebase.com/rdf/music.artist
... You can bump up the debug level as an 'advanced' option.

It seems that query returns the fact that there are many 'instances' of 'music.artist'.
The question we need to ask, is " what resources are of type 'music.artist'? " ... and need to add that 'music.artist' counts as a type of taxonomy term topic.

deltab’s picture

Well, in a Freebase universe, music.artist would be a vocabulary, and the instances are actually taxonomy terms, if you see how they do this with Open Calais Drupal module, Event Facts becomes a category, and the instances are filled in under. So here as well, perhaps we need to see how to create new Vocabularies and then put instances as terms under each.

Some terms will have multiple vocabularies as parents, I don't know if Drupal allows this, though.

dman’s picture

I can see where we expect this to go, and there is a chance.
(I just committed the advanced debugging option earlier, I hadn't noticed it wasn't in the public dev version yet, I did it a few weeks ago, check out todays version)

OK, It looks like the music.artist instances I can see are also each a common.topic. Taxonomy_xml is currently supporting common.topic as being a taxonomy term ... because that makes heaps of sense.
This means that when you feed it data about a band, eg
http://rdf.freebase.com/rdf/en.simon_and_garfunkel
... it recognises that data as being about one thing that it can load in as a term. And that's working fine for me.

The question is, what URL or datastream you are using to import from? One that lists the music artists by name and type? what is your input data?

From what I can see, I don't know how you can access freebase data on big datasets. The 'recursive' option in taxonomy_xml seems to work with recursive data structures (see musical genres - that works) but it doesn't do 'paging' because it seems I'm only seeing the first few dozen of items in a topic.
I'm seeing some good data come in, but I'm having trouble asking the right questions of freebase. Working on it.

deltab’s picture

hi, I am trying this http://rdf.freebase.com/rdf/music.artist which corresponds to this: http://www.freebase.com/type/schema/music/artist - so the way I see it,

Freebase:

Type = music.artist
Instance = Simon and Garfunkel (en) = en.simon_and_garfunkel

So in Drupal, I should see
Vocaublary ID="12", Label: "Music Artist",
Taxonomy Term = "112239", Label: Simon and Garfunkel (en)

However, after this comes the can of worms, (a) how to get big datasets? (b) how to disambiguate between synonyms (en and es), and many other issues. However, I guess the import module can make recursive queries in some way? Also, how do we handle RDF/Directed Graph properties? (Simon and Garfunkel -> music.artist.origin -> United States)

Another strange issue, bulk import a top-level Library of Congress data, and all imported terms become synonyms of one another (try something very basic like "Architecture")

dman’s picture

Ah, you've hit my under-documented assumption :-)

In my special world I have the missing piece of the puzzle, the URI as a GUID, retained locally.
In Drupal 5 this was handled by taxonomy_enhancer - which sorta adds CCK fields to terms. A D6 version we fixed up WORKS but the project is dead in the water.
In Drupal 7, this will be handled with 'fieldable' terms.
In D6, you need either the unreleased taxonomy_enhancer OR an unreleased extension to RDF.module :-) sorry about that.

Anyway, what these things potentially do is record the remote URI of the data object that is retrieved, so that when spidering continues the terms can be matched up with each other and uniquely identified again - not just by string match (which is the fallback, and worked OK (mostly) for small-to-medium data sets). The taxonomy_xml engine hooks into this, but only if other things are set up just right.

ANYWAY.
I had another night playing with freebase (super-cool) and was able to get some joy from importing truly hierarchical 'topic' structures like music.genre. Multi-parented even.
But as I mentioned, I'm not sure what to do about broad flat lists like music.artist unless I actually write a custom implementation of the API just to get paging out of freebase. That's your problem (a)
Which is potentially possible, because it's got lots of hope there, and I've built a plug-in system that will support remote web services. I just haven't had anything more significant that REST interfaces to talk to until now.

Problem (b) is well solved using my GUID-recording mechanism ... but needs public release. I'll post my contrib to the rdf project soon I guess.
Simon and Garfunkel -> music.artist.origin -> United States is way too far out of scope for Drupal taxonomy. I tried to do that with my RDF metadata = 'relationship' project a few years ago. I failed when I tried to define the schema in terms of itself. Freebase however has since taken that idea and succeeded where I failed!
To re-represent that sort of structure in Drupal, you need to move away from taxonomy - which is faceted - and into RDF node graphs, which is not just multi-dimensional, but, um, "multimorphic" or something :-B
Freebase has done that. I'm not going to try to replicate that (again) but I do want to be able to at least suck slices of it.

music.genre or location.locality or religion.religious_order are good candidates for classifications. music.artist, film.film or beverage.cocktail , less so. These are usually modeled as the nodes that GET classified. That rule has been shattered by freebase, but I'm not as clever as Stefano Mazzocchi (I realized that back in 2000) so I can't do that with taxonomy_xml alone.

I may have only grabbed sub-trees from the LOC import, but that sounds like it could be a bug. Or just a side-effect of the system expecting a GUID and not finding it (or finding a blank). Which is my fault for not double-checking somewhere.

I didn't want to introduce dependencies on modules when a lot of the time a simple dump is all folk need. But I think I should stick in a road-block that doesn't launch the spider (yes, it is recursive when pointed at interesting datasets like LOC) if your installation isn't set to handle it.
And, of course, find a way to distribute the supplementary tool to do so.
I keep meaning to do so, but every time I get set to start a podcast, testing throws up something new and interesting :-)

deltab’s picture

@dman, I am usually flummoxed by all Drupal talk (not a coder, but seem to know a lot about RDF and Ontologies). Having re-re-re-reread your text, and I guess I understand all of this, it occurs to me ....

(a) It should be possible to dump Drupal Taxonomy and replace it with a pure RDF thing (say RDF/OWL, XML or some type of OWL-DL) so that all types of predicates become possible, you see where this goes, right?

(b) (not a very healthy way) OR import Taxonomy XML as some type of RDF/XML or just store it in a separate table and write a very funny module to make more relations (synonyms, antonyms, homonyms, holonyms, god-help-what-elsenyms) as warrented.

Bulk import from Freebase may not be needed, Calais has some examples already, and we just need to (in a healthy use-case situation) dereference and disambiguate found terms, from say Yahoo! or Calais or Amplify or many more taggers that will come about. So bulk-import I don't think is a major issue, but creating vocabularies, terms (or instances), mapping multiple vocabularies (FOAF:Person, DC:Creator) etc -- I see seeds of this type of an idea coming about here: http://drupal.org/node/257039

I know it is a very different project, code-wise, from what Taxonomy_XML does, but as it seems to me, Taxonomy_XML is the best-match to go in many very powerful directions....

dman’s picture

(a) If you want the terms themselves to be classified by other terms in different vocabs, as in
Simon and Garfunkel -> music.artist.origin -> United States
term hasRelation:origin term
... then Drupal taxonomy can't do that. Just don't.
You need RDF or noderelation.

Normally we'd model S&G -> US as node -tag-> [vocab:location, term:US]

(b) Yes, extra relations can sorta be done (I discovered an extra dozen -nyms when working with the biologists at the Encyclopedia of Life . They don't mean the same thing by 'synonym' as other human beings)
But yes, you need another storage mechanism. taxonomy_enhancer OR RDF OR D7 fieldable terms are required.

What you are looking for to emulate the freebase knowledge net is quite simply beyond the word 'taxonomy' and way beyond Drupals ability to store "terms in a list".
I see convergence with RDF as the only way to go.

deltab’s picture

Looked at Drupal database, what if there is a .....

(for (b))
companion table to term_relations, much rather, a table that can bridge incoming RDF (stored as ARQ RDF in MySQL) and Taxonomy? After this, some templates can generate the relations -- so I am thinking Taxonomy XML ingest, store as ARQ RDF, store predicates as term_relations, and then you get synonyms, holoymys, SKOS:TopConcept or whatever displayed on screen through templates.

I know it is a hack (replication of Data etc) and not a best practice, but does the job.

(for (a)) What also becomes compelling, thinking deeper, is an Apache Solr engine type approach, a triplestore warfile (say Joseki or Sesame) is uploaded, and Taxonomy XML actually uses it to store data and display, so RDF returns to earth as RDF... now this has a possibility of connecting to reasoners and inferencers, and all nine yards of a CRUD operation (I notice the SPARQL engine in Drupal only does C, really)...

Just thinking aloud