Looking at the latest version of drupal 4.7b3, there are two relationships among terms: relatedness and synonymy. Since 4.7 appears to support tags and possibly user-defined keywords, I think the notions of relatedness and synonymy should be analyzed very carefully.

First on relatedness, I think this feature is currently implemented "correctly". That term A is related to terms B and C implies terms B and C are related to A. But B is not assumed to be related to C.

On synonymy, I think this notion must be rectified by Drupal before it can be applied effectively to a loosely (user-) defined folksonomy or tagging application. Currently, synonymy is NOT even symmetric. Term A is set to be synonymous to term B, but Drupal can not derive that term B is symmetrically synonymous to term A.

I suggest that synonymy be implemented as follows:

+ Assosciated to each term is a set of senses. In each sense, a term can be caterogized to be synonymous to any number of terms in the vocab.

+ All terms synonymous in a sense must be mutually synomyous to one another.

For example, in the creation of a new term "chair", I can define two senses for this term. In the first sense, I define that the following 4 terms are synonymous to it:

# S: president, chairman, chairwoman, chairperson

And in the second sense, I define that the following 3 terms are also synonymous to it.

# S: electric chair, death chair, hot seat

This characterization of the term "chair" should automatically cause the following actions:

1. a new sense is created for each of the terms president, chairman, chairwoman and chairperson. Further, this relationship must be *mutually* synonymous. That is, for example, the sense entry for term "presisent" should consist of chair, chairman, chairwoman and chairperson (even though I didn't explicitly declare a synonymy between the terms president and chairman, or chairwoman, chairperson, respectively).

2. similar actions for each of the terms in the second sense, namely electric chair, death chair, hot seat.

[I came up with these terms by doing a lookup for "chair" at WordNet: http://wordnet.princeton.edu/perl/webwn]

It would be efficient if there is a seperate database table for "senses" to avoid duplicates.

The Bottom Line is this: currently Drupal can get away with a loose definition of synonymy. The reason is that the current taxonomy is very well controlled by a carefully selected group of "super users" of the website; few synonymous terms exist. But this notion of synonymy (and perhaps other linguistic notions too) but be clarified and supported in order for Drupal to be applicable *effectively* to a community environment, where taxonomies, vocabularies are dynamicallly defined by just normal users, who will inevitably expand the boundary of synonymy.

Comments

gerhard killesreiter’s picture

Version: 4.5.7 » 4.7.0-beta3

Thanks for your analysis of our taxonomy features. It is unlikely, thought, that somebody will be going to implement this feature because synonymy and relatedness aren't even exposed by Drupal's core and need contributed modules to display them. So the interest in them is pretty low. Nevertheless better implementations would sure be appreciated.

David Lesieur’s picture

There is a problem only depending on how you use taxonomies.

I understand the problem in the context of folksonomies and free-tagging, but under the perspective of controlled vocabularies/thesauruses, I think the current structure is fine.

About the non-symmetry of synonyms: I think this has been made on purpose. Synonyms do not need to be full-fledged terms, because in a controlled vocabulary there are preferred terms and non-preferred terms (synonyms). When classifying stuff, we want to deal only with the preferred terms, otherwise we would end-up with rather long lists of (mostly redundant) terms.

About the various senses of a word (homonyms): Controlled vocabularies usually have an appropriate hierarchy, in which case the meaning of a term is clear from its context. Following your example, we could have "Object > Furniture > Chair" and "Occupation > Chair". Here, there are two distinct "Chair" terms, each with its own parents, synonyms, and related terms.

My point is: If anything is to be changed, thesaurus capabilities must not be broken or weakened to accomodate folksonomies. There must be a way to accomodate both. ;-)

vph’s picture

There is a problem only depending on how you use taxonomies.

I understand the problem in the context of folksonomies and free-tagging, but under the perspective of controlled vocabularies/thesauruses, I think the current structure is fine.

I respectfully disagree. If there's difference between what is said about something (no less, a crucial part of a system) and what it actually is, then we have a systematic problem. Synonyms are supposed to be symmetric. If a relation is not symmetric, it can't be synonymy. If there's no need to the notion of synonym, then don't call it that. But calling it something it is not is problematic.

About the non-symmetry of synonyms: I think this has been made on purpose. Synonyms do not need to be full-fledged terms, because in a controlled vocabulary there are preferred terms and non-preferred terms (synonyms). When classifying stuff, we want to deal only with the preferred terms, otherwise we would end-up with rather long lists of (mostly redundant) terms.

I think that this point you've raised is a technical point, not a conceptual point. Technically, any term in a synset (set of synonyms) can be made the representative of that set. This is what WordNet does.

The real question is whether Drupal should be made to support uncontrolled vocabularies. I think it should. Yielding control to vocabulary is what makes the language/taxonomy rich and beautiful. And when you let loose control of vocabularies, you ought to support it. If a user wants to classify a picture/node/poem as "twightlight" instead of the prefered synonymous term "afterdark", should you stop him to classify it that way?

If you don't, then the synonymous relation between these terms as represented by Drupal must be symmetric. It must be convenient for users to infer that "twightlight" is a synonym of "afterdark" AND vice versa.

On the other hand, if Drupal is meant to be used exclusively for controlled vocabularies, then there is no need for the existence of synonymy at all.

Which means ... don't do it at all or do it right.

David Lesieur’s picture

Synonyms are supposed to be symmetric. If a relation is not symmetric, it can't be synonymy.

This makes sense. ;-)

Technically, any term in a synset (set of synonyms) can be made the representative of that set.

If we make synonyms mutually symmetric, then we certainly need this capability (to select a representative). This feature solves my concern. When classifying stuff with controlled vocabularies, it is by showing only the representatives that we can make the lists of terms humanly manageable.

The real question is whether Drupal should be made to support uncontrolled vocabularies. I think it should.

I fully agree. Drupal needs to support both controlled and uncontrolled vocabularies.

if Drupal is meant to be used exclusively for controlled vocabularies, then there is no need for the existence of synonymy at all.

Synonyms as they are implemented right now are still useful with controlled vocabularies when indexing terms (and their synonyms) in a keyword search index. But certainly, different data structures could accomodate this as well.

What you are proposing is more powerful that the current architecture. Even with controlled vocabularies, symmetric synonyms would be useful, if only to change a synset's representative.

My conclusion: +1 for symmetric synonymity with synset representative selection.

killes@www.drop.org’s picture

Version: 4.7.0-beta3 » x.y.z

moving

mlncn’s picture

Version: x.y.z » 7.x-dev
Status: Active » Closed (works as designed)

My conclusion: +1 for symmetric synonymity with synset representative selection.

Synonyms in Drupal currently are expressed as a Drupal taxonomy term (with term ID) and an optional text list of synonymous terms (that have no existence outside the term they are attached to, that is, they are not links to other Drupal taxonomy terms but simply words or phrases of the same meaning as the representative term.

Sounds to me like Drupal has always had "symmetric synonymity with synset representative selection"?

benjamin, Agaric Design Collective

David Lesieur’s picture

Some features might be missing (for example, "take this [existing] term and make it a synonym of that other term"), but yes, the data structure already fits that quote. ;)