Looking at the latest version of drupal 4.7b3, there are two relationships among terms: relatedness and synonymy. Since 4.7 appears to support tags and possibly user-defined keywords, I think the notions of relatedness and synonymy should be analyzed very carefully.
First on relatedness, I think this feature is currently implemented "correctly". That term A is related to terms B and C implies terms B and C are related to A. But B is not assumed to be related to C.
On synonymy, I think this notion must be rectified by Drupal before it can be applied effectively to a loosely (user-) defined folksonomy or tagging application. Currently, synonymy is NOT even symmetric. Term A is set to be synonymous to term B, but Drupal can not derive that term B is symmetrically synonymous to term A.
I suggest that synonymy be implemented as follows:
+ Assosciated to each term is a set of senses. In each sense, a term can be caterogized to be synonymous to any number of terms in the vocab.
+ All terms synonymous in a sense must be mutually synomyous to one another.
For example, in the creation of a new term "chair", I can define two senses for this term. In the first sense, I define that the following 4 terms are synonymous to it:
# S: president, chairman, chairwoman, chairperson
And in the second sense, I define that the following 3 terms are also synonymous to it.
# S: electric chair, death chair, hot seat
This characterization of the term "chair" should automatically cause the following actions:
1. a new sense is created for each of the terms president, chairman, chairwoman and chairperson. Further, this relationship must be *mutually* synonymous. That is, for example, the sense entry for term "presisent" should consist of chair, chairman, chairwoman and chairperson (even though I didn't explicitly declare a synonymy between the terms president and chairman, or chairwoman, chairperson, respectively).
2. similar actions for each of the terms in the second sense, namely electric chair, death chair, hot seat.
[I came up with these terms by doing a lookup for "chair" at WordNet: http://wordnet.princeton.edu/perl/webwn]
It would be efficient if there is a seperate database table for "senses" to avoid duplicates.
The Bottom Line is this: currently Drupal can get away with a loose definition of synonymy. The reason is that the current taxonomy is very well controlled by a carefully selected group of "super users" of the website; few synonymous terms exist. But this notion of synonymy (and perhaps other linguistic notions too) but be clarified and supported in order for Drupal to be applicable *effectively* to a community environment, where taxonomies, vocabularies are dynamicallly defined by just normal users, who will inevitably expand the boundary of synonymy.
Comments
Comment #1
gerhard killesreiter commentedThanks for your analysis of our taxonomy features. It is unlikely, thought, that somebody will be going to implement this feature because synonymy and relatedness aren't even exposed by Drupal's core and need contributed modules to display them. So the interest in them is pretty low. Nevertheless better implementations would sure be appreciated.
Comment #2
David Lesieur commentedThere is a problem only depending on how you use taxonomies.
I understand the problem in the context of folksonomies and free-tagging, but under the perspective of controlled vocabularies/thesauruses, I think the current structure is fine.
About the non-symmetry of synonyms: I think this has been made on purpose. Synonyms do not need to be full-fledged terms, because in a controlled vocabulary there are preferred terms and non-preferred terms (synonyms). When classifying stuff, we want to deal only with the preferred terms, otherwise we would end-up with rather long lists of (mostly redundant) terms.
About the various senses of a word (homonyms): Controlled vocabularies usually have an appropriate hierarchy, in which case the meaning of a term is clear from its context. Following your example, we could have "Object > Furniture > Chair" and "Occupation > Chair". Here, there are two distinct "Chair" terms, each with its own parents, synonyms, and related terms.
My point is: If anything is to be changed, thesaurus capabilities must not be broken or weakened to accomodate folksonomies. There must be a way to accomodate both. ;-)
Comment #3
vph commentedI respectfully disagree. If there's difference between what is said about something (no less, a crucial part of a system) and what it actually is, then we have a systematic problem. Synonyms are supposed to be symmetric. If a relation is not symmetric, it can't be synonymy. If there's no need to the notion of synonym, then don't call it that. But calling it something it is not is problematic.
I think that this point you've raised is a technical point, not a conceptual point. Technically, any term in a synset (set of synonyms) can be made the representative of that set. This is what WordNet does.
The real question is whether Drupal should be made to support uncontrolled vocabularies. I think it should. Yielding control to vocabulary is what makes the language/taxonomy rich and beautiful. And when you let loose control of vocabularies, you ought to support it. If a user wants to classify a picture/node/poem as "twightlight" instead of the prefered synonymous term "afterdark", should you stop him to classify it that way?
If you don't, then the synonymous relation between these terms as represented by Drupal must be symmetric. It must be convenient for users to infer that "twightlight" is a synonym of "afterdark" AND vice versa.
On the other hand, if Drupal is meant to be used exclusively for controlled vocabularies, then there is no need for the existence of synonymy at all.
Which means ... don't do it at all or do it right.
Comment #4
David Lesieur commentedThis makes sense. ;-)
If we make synonyms mutually symmetric, then we certainly need this capability (to select a representative). This feature solves my concern. When classifying stuff with controlled vocabularies, it is by showing only the representatives that we can make the lists of terms humanly manageable.
I fully agree. Drupal needs to support both controlled and uncontrolled vocabularies.
Synonyms as they are implemented right now are still useful with controlled vocabularies when indexing terms (and their synonyms) in a keyword search index. But certainly, different data structures could accomodate this as well.
What you are proposing is more powerful that the current architecture. Even with controlled vocabularies, symmetric synonyms would be useful, if only to change a synset's representative.
My conclusion: +1 for symmetric synonymity with synset representative selection.
Comment #5
killes@www.drop.org commentedmoving
Comment #6
mlncn commentedSynonyms in Drupal currently are expressed as a Drupal taxonomy term (with term ID) and an optional text list of synonymous terms (that have no existence outside the term they are attached to, that is, they are not links to other Drupal taxonomy terms but simply words or phrases of the same meaning as the representative term.
Sounds to me like Drupal has always had "symmetric synonymity with synset representative selection"?
benjamin, Agaric Design Collective
Comment #7
David Lesieur commentedSome features might be missing (for example, "take this [existing] term and make it a synonym of that other term"), but yes, the data structure already fits that quote. ;)