Mapping Calais entities to internal Drupal taxonomies
Nigeria - May 10, 2008 - 21:55
| Project: | Calais |
| Version: | 6.x-3.1 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
Would it be possible to have options that make
a) all the calais terms go to a single free tagging taxonomy instead of the ones that the module predefines
b) each of the calais terms to be mapped to a particular free tagging taxonomy
--
Ade Atobatele

#1
I like option B) being able to choose which vocabulary each set of entities maps.
However I think A) pushing all tags to a single free tagging vocabulary actually removes the real benefit of the Calais service that that is knowing the context of the tags you are getting. There are lots of services that will extract your keywords from your content, but not many will tell you that "Michael R. Bloomberg" is actually a Person. I feel that if you dump all tags into one vocabulary you lose the contextual gold.
#2
While I agree with your sentiments, the idea was to put the power in the hands of the users to actually make the decision rather than force them to agree with us!
Besides if you implement B, then all the user has to do is map all the entities to the same vocabulary anyway!
--
Ade Atobatele
#3
I knew I liked B for a reason :-)
This one is a bit more medium term than short term, there are probably a few other pieces of low hanging fruit that I might be looking to get into first, but something like this is certainly on the short list. Someone here also brought up that they might like tags associated with CCK fields instead of Taxonomy vocabularies, so that is yet another alternative.
#4
Rethinking the "Johnny Cash" term in my "Artists" vocabulary... (from Blacklist issue).
Optimally, I'd like to map the Calais "Person" vocabulary to my "Artists" vocabulary. However, instead of blacklisting terms, I'd like to Whitelist only those terms that I want (from my predefined list of Artists).
If "Tony Blair" and "Bono" are Calais-extracted terms, I don't want "Tony Blair" to be tagged on the node. He's not an artist. I only want those "Person"/"Artists" term tags that I have predefined. "Bono" would be on my list, so that term would be tagged.
Calais with the blacklist will do me good, no doubt. But the above-mentioned use case really would hit a sweet spot for me while providing an innovative use of the service.
#5
Whitelist is an interesting idea, but as a side-note for more specific cases: even if some concrete way of handling tags is not in the user-interface, it does not mean that it can not be achieved.
Calais implements hook_calais_preprocess( &$node, &$keywords) and hook_calais_postprocess( $node, $keywords) that gives some interesting flexibilities to module developers. The "preprocess" hook is invoked after the list of keywords is invoked from the Calais web-service, but before it is saved into Drupal!
#6
I think much/all of this could be done using the existing hooks.
#7
I'm revisiting this, as yes it probably /can/ be done with with existing hooks, but whats needed is a nice interface to allow less technical users to leverage this module without creating a lot of unwanted vocabularies.
#8
After looking at some code, it looks to me like its going to need modification of the calais.module at least, if not the .install as well.
The ideal is to have on the configuration page a "Mapping field set" with the following radio button options
O - 1 to 1 - Each Calais Entities is mapped direct to a Drupal vocabulary
O - All to 1 - All Calais entities are mapped to a single Drupal vocabulary
Map all Calais entities to [Select Box of vocabularies]
O - Custom Mapping
Calais Entity | Drupal Vocabularies
Anniversary | [Select Box of vocabularies + option for none]
City | [Select Box of vocabularies + option for none]
Company | [Select Box of vocabularies + option for none]
....
Anniversary | [Select Box of vocabularies + option for none]
Unknown Entities [Select Box of vocabularies + option for none + option for Create new Vocabulary]
As the mapping will be chosen post module activation the creation of all the vocabularies (if 1:1) needs to be done just in time, instead of as part of the install.
I would be happy to help with this, but don't know Calais well enough to get rolling, I have been looking at the calais_get_entity_vocabularies(), calais_get_vocabularies() and calais_process_node() - but working out exactly how/where to start hacking (or what things i might break as a result).
#9
#10
Is there a likely possibilty that these features may be implemented?
#11
I'm interested in coding something like this, but not quite. More like Rob's #4. We want to map people to "players" and regions to "leagues." If the incoming term isn't in our list of 'accepted' terms, then it'll either get dropped, or we'll dual-tag it: once with Calais's stuff and once in our own vocabulary. I'm still trying to decide which is the best way to go on that... if anyone has thoughts, let me know.
I'll try and code our custom module in a somewhat generic way and attach it here, but I can't promise anything remotely approaching support for it.
#12
I think there is some good stuff in what people are talking about here. I think ideally the array stored in the variables table under the key calais_vocabulary_names just needs a UI to allow editing of the vocabulary id (vid) for each entity type. I think it will need to get taken one step further though. Seems like for each entity type we would need to know if there should be a "parent" term which is the entity type name such that if all terms go to one vocabulary then the Person types would be as "General Vocab" => "Person" => "Johnny Cash"
As far as whitelisting goes, I think that needs to be a separate module that makes use of the hooks.
If someone write any of these I would be more than happy to integrate it into the modules, as I don't see myself having much time in the near future.
#13
subscribing.
#14
I think a possible way to accomplish this would be to integrate the mapping via the Content Taxonomy http://drupal.org/project/content_taxonomy. That would be a killer feature. If this were achieved, you'd be able to have only one tab for editing your node taxonomy fields rather than having EDIT and CALAIS like it would be now. Actually, it already works with Content Taxonomy. So "Mapping Calais entities to internal Drupal taxonomies" can already be done. However, it is not elegant because as I mentioned you have two separate tabs so a lot of clicking back and forth. But where I have a Content Taxonomy CCK field mapped to a vocabulary on the EDIT tab and the vocabulary tagging field on the CALAIS tab, if I enter a keyword in the CCK field it does appear in the Calias field also. So we are actually closer to a solution than you might imagine. Tighter integration between the two modules seems to me would solve this request. Let me know what you think. Kevin
#15
I have tried for the last 28 hours straight to get this to work with content taxonomy along with feedapi. I tried everything I could, but could not get content taxonomy to automatically populate the custom fields that were created for each calais/taxonomy vocabulary. I tried all night and thought that since you "Mapped"" the content taxonomy to the Calais vocab after the information was gathered from calais that it would automatically populate the custom content taxonomy fields... but absolutely no luck. Actually like 48 hours no sleep for nothing now. If someone could help me with this I would be greatly appreciative.
#16
Thread seems to have gone cold - webchick did you get anywhere with this?
#17
I would really enjoy seeing "a) all the calais terms go to a single free tagging taxonomy instead of the ones that the module predefines" implemented. In fact, I won't consider using this module before this is implemented.
#18
(a) is a very good option, amongst other things, we have duplications between very generic Calais categories at this time, I find almost everything from document categories replicated in social tags
#19
subscribing