The Yahoo Term Extraction service returns tags, which have no meaning. seems to happen with non-english text, as you can see here: http://vorarlblog.at/tagadelic/chunk/5

Has this happened to others/ in other languages?
german words, which I'd like to ban from beeing used as tags: ihr ein wenn als und

If this problem happens to others, it would probably make sense to create a config field, where unwanted terms could be entered as a workaround.

Comments

alex_b’s picture

except helper words like the ones you listed - does yahoo terms extraction service actually return useful keywords for german content?

epe’s picture

The extracted Terms are more or less fine otherwise. Feel free to have a look at it, yourself: http://beta.vorarlblog.at/

alex_b’s picture

Title: Yahoo Terms returns meaningless german words. » Create blacklist for yahoo terms

That's great that the extraction service actually also works in german. I didn't expect it to.

However, we don't really have the time to implement this feature - feel free to go ahead though :)

I rename the feature request for the time being - I hope it describes more accurately what you're looking for.

Ian Ward’s picture

Since yahoo terms uses a taxonomy vocab to store its terms, would it be just as good to basically allow blacklists to be kept for any free tagging vocab? I know if you use a static vocab for yahoo terms it will only tag if the term exists in that vocab, but that does not help in your scenario here.

aron novak’s picture

Status: Active » Closed (fixed)

A new separate project was started, the leech_yahoo_terms is separated from leech. It is a standalone module now. Leech can use this module to provide the same functionality.
http://www.drupal.org/project/yahoo_terms
The blacklist logic is implemented in this module, only the UI part is missing. It can be expected to do soon. Patches are welcome, i think it's straightforward to create a form element where the users can specify the blacklisted words.
The relevant ticket is here: http://drupal.org/node/140362