Greetings,

We're looking to developing what we call smart tags -- adding document classification to Drupal.

Smart Tags – A proposal for an intelligent tagging mechanism in Drupal. Text with diagrams can be found @:
http://newhelix.com/blog/SmartTagsAProposalForAnIntelligentTaggingMechan...

Would you know of any existing work that we can align ourselves with.

Thanks,

Arman.

TEXT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The tagging of web content is becoming an increasingly important problem. It would be redundant to cite the importance of tagging as reams have been produced on this subject. Although formal mechanism such as RDF exist to address the meta-data aspects of resources, the web2.0 landscape is filled with ad-hoc tagging schemes as is evidenced by popular cites such as del.icio.us, youtube.com, etc.

Drupal currently has such tagging and taxonomy functionality and work efforts are underway that are adding to this front. The current approaches are focused on making the tagging experience cleaner and making sure the tag precision is higher.

We, however think, the time is right to consider a new angle to this problem. Our proposal is to automate, if only partially, the tagging mechanism that will allow Drupal to:
1. Learn the tag classifiers – understand when a tag is relevant to a node.
2. Propose the application of a tag to the user if it thinks the tag applies to a page.
3. Continuously refine its understanding of tag applicability.
4. Be able to identify tag redundancy and propose to the user, merging of tags.

Comments

MacRonin’s picture

Would the The add on module leech_yahoo_terms which allows autotagging of aggregated articles with the Yahoo term extraction web service be part of what you are thinking? I saw it mentioned as part of the Leech module

-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world

newhelix.com’s picture

Greetings,

I actually ran into the "Yahoo term extraction web service" at the Washington DC Drupal meet.

The only problem with that is that with organizations that have privacy concerns, posting the text to a yahoo web service is not an option.

NH

dman’s picture

I'd think you want to add ownership of tags, provenance, which I think is a useful step, and shouldn't be too hard to knit into Drupal.

Although decentralized free-tagging has done fascinating things to the new world of 2.0, I feel like we should be getting back towards a canonic restricted vocabulary like taxonomy server and other initiatives suggest. I like RDF, so I like wordnet URIs as my terms. That's just one possibility, and discussion is welcome. It is sorta the opposite of your 'organic semantic' proposal.

I don't know how successful you'll be with the 'automatic' compiling of similar tags. A photo gallery may come up with a high correlation between images tagged 'baseball' and 'outside' but there's not much of a semantic link there.

The 'game' aspect (I imagine you mean something like google image labeler ) will work in some contexts, but not in others.

Further to that I'd like to develop a weighting/trust system on top of the ownership of tags, following some ideas I had a while back on a collabarative indexing/metadata system.

My other thoughts for UI of this sort of scheme were
- micro-payments of site 'credits' for all useful tags supplied
- - micro-micro credits paid back to users when other folk agree with them
- - which had to then be vetted against other users to avoid false input, and penalize spamming,
- - penalties applied if you class things differently than [a certain number of] other users.
- After a threshold of consensus was reached, agreed tags were removed from the list of potentials, as we're not learning anything new any more.

I'm aware that this model encourages groupthink, which may not always be appropriate, but I was looking at organically indexing websites and images, not opinions or discussion boards.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

newhelix.com’s picture

well, here are my thoughts on some of the points:
1. 'automatic' compiling of similar tags
What I'm thinking here is that the system should have some automated mechanism by which it presents such opportunities to say an oncologist. The human would be the final arbitrar of the actual refactoring decision.
2. On the game paradigm I think something like Cyc Corps FAC Tory (http://207.207.9.186/) should do the trick -- I believe it keeps track of how well your answers are and it vetts against other users.
3. Yes and Yes on the micro-payments -- or a prize or something like that.

Boris Mann’s picture

See http://drupal.org/project/community_tags -- it starts by adding tag ownership, along with in-line tagging a la 43 Places / Flickr.

--
The future is Bryght.

David Latapie’s picture

Subscribe

djsdjs’s picture

It seems that if I could provide a whitelist to yahoo terms I could get some "Smart Tagging" with very little effort. The logic of yahoo terms could be applied and then the whitelist could screen the tags returned.

For a little more effort maybe the module could record "screened out" tags for viewing by an administrator to see if any of them merit adding to the whitelist?