Task title: Integrate the existing taxonomy 'synonyms' support with Drupal search and freetagging.

Task description: Drupal currently provides an interface for term synonyms, but those synonyms are not available or used in the system anywhere.
This task is in two parts, both small and highly related.

1: Synonyms would be a very useful feature in the search.module.
Add a taxonomy terms synonyms to the search index function taxonomy_node_update_index()

This would mean that upon running a search for the synonym, the nodes that are tagged with the major, cannonic term, would be returned, even though the exact synonym term itself may not appear anywhere in the node content.

2: When using the 'freetagging' feature of taxonomy, entering a new word or phrase creates a new term entry, even if that word or phrase is a valid synonym for an existing term.
This creates bogus entries and duplication.
The behaviour should be to (optionally) 'collapse' and re-phrase the user-submitted synonym into the cannonic term.
2.1: Create an interface setting to enable/disable this behaviour
2.2: When a word that is detected as a viable synonym is submitted in 'freetagging' input, convert that word into its base term.
2.3: Enhance the current AJAX autocomplete behaviour to offer the base term as a suggestion when a synonym is detected.

Task 2.3 may be a little harder than it looks, as the autocomplete term lookup currently fires on all partial matches within words. In a large vocabulary, containing synonyms this could result in unintuative false positives. But investigate.

Example, in a taxonomy of US states, a term "Virginia" may have a synonym "VA".
User should be able to type "VA" in a taxonomy autocomplete field and submit the node and have it matched up with "Virginia" in the database.

Deliverables would include a patch against
1: taxonomy.node taxonomy_node_update_index()
2: taxonomy.node taxonomy_autocomplete() (maybe more like autocomplete.js , but that should be avoided)

This will involve skills in search engine behaviour, meaningful metadata (taxonomy) management, and, for fun, AJAX!

Resources: http://drupal.org/node/200751

Primary contact: dman

Comments

dman’s picture

Whoops, I've been pointed at synonyms.module which may already take care of suggestion 1.
I still think it's just 3 lines of code in core ... but maybe we can cross that out.
I'm pretty sure the AJAX task is interesting tho.
A much earlier patch proposal more ambitions, but related, appears to have been bumped out to D7 (!)
Sorry, am I not supposed to be suggesting anything that would touch the D6 code freeze? I didn't see that in the submission guidelines.

cwgordon7’s picture

Status: Active » Needs work

My feedback:

1) As stated in #1, the synonyms module already exists. In order for this to be considered, the description must be rewritten to reflect that.

Future review possible upon completion of my first statement, when you can set this back to patch (code needs review).

-cwgordon7

dman’s picture

Status: Needs work » Needs review

Cool. Just like this then?
-----------------------
Task title: Integrate the existing taxonomy 'synonyms' support with Drupal freetagging.

Task description: Drupal currently provides an interface for term synonyms, but those synonyms are not available or used in the system anywhere.

When using the 'freetagging' feature of taxonomy, entering a new word or phrase creates a new term entry, even if that word or phrase is a valid synonym for an existing term.
This creates bogus entries and duplication.

The behavior should be to (optionally) 'collapse' and re-phrase the user-submitted synonym into the canonic term.
1: Create an interface setting to enable/disable this behavior on a per-vocabulary basis
2: When a word that is detected as a viable synonym is submitted in 'freetagging' input, convert that word into its base term.
3: Enhance the current AJAX autocomplete behavior to offer the base term as a suggestion when a synonym is detected.

Task 3 may be a little harder than it looks, as the autocomplete term lookup currently fires on all partial matches within words. In a large vocabulary containing many synonyms this could result in unintuitive false positives. But investigate.

Example
In a taxonomy of US states, a term "Virginia" may have a synonym "VA".
User should be able to type "VA" in a taxonomy autocomplete field and submit the node and have it matched up with "Virginia" in the database.

Deliverables would include a patch against
taxonomy.module [DRUPAL-6] taxonomy_autocomplete() (maybe other places, like autocomplete.js , but that should be avoided)

This will involve skills in meaningful metadata (taxonomy) management, and, for fun, AJAX!

Resources: http://drupal.org/node/200751

Primary contact: dman
---------
(small edit to correct renumbering from OP)

webchick’s picture

Title: Integrate the existing taxonomy 'synonyms' support with Drupal search and freetagging. » Integrate the existing taxonomy 'synonyms' support with Drupal freetagging.
Status: Needs work » Needs review

#1: Awesome task write-ups, dman! Thanks so much!
#2: This one is way over my head. ;)
#3: But, could you just check the http://drupal.org/project/taxonomy_manager project and confirm that it does not do what you're proposing in this task?

dman’s picture

Yup, I just checked the taxonomy_manager code (interesting). It has the ability to merge terms into the synonym list ... but as ever, never pulls them out again.

aclight’s picture

Status: Needs review » Needs work

dman--this is an interesting idea but I can see this opening up a whole can of worms.

It just seems to me like there will be a lot of little "gotchas" that pop up when working on this task and will make it hard to finish in a week.

Are you willing to be a mentor on the task? If so, and you still think it's feasable, here are specific comments about your proposal from #3:
1. Should the user interface/setting to enable/disable this behavior be per vocabulary or global?
2. Please specify that a term entered as a free tag would only be collapsed if a synonym existed for a term within the same vocabulary. I think this is what you are implying, but at first I didn't think of that and got confused about the many ways in which this functionality could be messy.
3. In deliverables, don't you mean taxonomy.module, not taxonomy.node?
4. In deliverables, specify Drupal 6 HEAD as target for patch.

If you're willing to mentor this, and you make the corrections/clarifications above, then I'd say go for this!

dman’s picture

Mentor yes, because I think I can grok what needs to be done, plus many of the caveats.

I imagined the interface as just a per-vocab setting, as whether it's appropriate depends on the size and purpose of the vocab itself, less on user preferences. Good clarification.

Yup, that's a typo.

Is there even a choice between Drupal 5 and Drupal 6 tasks? Cool. I don't think that a fix to this particular section of the code will be incompatible to patches for both, but if 6 is the target today, that's totally what I expected.

I'll just edit the above edit directly - it's easier...

aclight’s picture

All core patches should be for D6 HEAD, but the students sometimes don't know that, so it's good to make it explicit.

dman’s picture

Status: Needs review » Needs work

edit done (heh, x-post) and I know there are potential gotchas which I alluded to, but trying to second-guess what they would be would have made the brief more than the one-page I thought was appropriate.

Leave it at on-off per vocabulary to be administered when this functionality ceases to be practical. Predictive text can only go so far.
Maybe the match criteria should be tighter before returning synonyms. Maybe we need a threshold or weighting on terms vs synonyms.

I say make the functionality inclusive, and WORK to begin with, then run some test vocabularies through it (I have some spares handy) and then see what we can do about inaccuracies or problems.
With a blog-sized taxonomy with 30 terms * 3 synonyms max, I don't think it'll be any trouble.
With a world sized taxonomy with Countries * districts * towns * local spellings ... this may break.
... but I think it's a practical function of taxonomy management.

Hm, does that mean I need to get taxonomy_xml ported to 6? import/export of large sample vocabs will be a time consuming slog otherwise. And it's only with a large collection of terms that we will start to see issuse.

... but here I am again, solving problems before they even start ... Not my job!

aclight’s picture

Status: Needs work » Reviewed & tested by the community

This looks good, with one minor correction. You specified DRUPAL-6, which we know means HEAD, but I'm finding that many of the students new to Drupal don't know that. So I'd go with Drupal 6 [HEAD] or something like that.

Otherwise, go ahead and add this to the google task tracker, create a d.o issue for this with a link to the google task tracker, and then you can mark this issue closed since we shouldn't need it any more.

Thanks!

dman’s picture

Golly, getting this process rolling is feeling like it will take more effort than just doing the job myself. Still, lets see, step 2 is apparently to add myself themelonman@gmail.com to the project ... but there's no 'subscribe' ... I require a sponsor from here before I can volunteer to be a mentor? what fun... Who feels like adding me?

aclight’s picture

@dman: I would love to add you (and just did) to the Google Drupal GHOP project. You should now be able to add your task and set all of the tags as well as the status. Feel free to follow up here if you have more problems (or find me on IRC, nick=aclight).

Sorry it's such a hassle to get a task created. It's a lot of work for everyone to keep track of things on both Google and Drupal.org, but the feedback from the community that we get from having actual issues on d.o is great and something we wouldn't get if we just tracked the issues completely on google.

Thanks!

dman’s picture

Status: Reviewed & tested by the community » Closed (fixed)

I was going to add :

---------------
Hints:
Several rich taxonomies may have to be built or imported for this function to be tested fully. Useful taxonomy examples would be "geographic" (eg states or countries), "academic" (eg university departments, disciplines and subjects) or "scientific" (eg names within a class of flora or fauna). Anything that is rich in synonyms, and will be intuitive to review, rather than an ad-hoc sitemap sort of taxonomy.

Resources:
Taxonomy import tools :
http://drupal.org/project/taxonomy_xml http://drupal.org/project/taxonomy_csv http://drupal.org/project/taxonomy_manager
---------------

...But then realized that would be more confusing than not, as we won't be reviewing someones working setup, but how the code works on a local taxonomy...

ANYWAY. I think I've crossed all the t's and dotted the i's. I had to keep switching between 5 windows just to do everything everywhere (including reading the instructions several times)
New d.o. issue is http://drupal.org/node/201269#comment-660371
Task on Google is posted #113

... and so this discussion gets out to bed...

dman’s picture

PS. Thanks for the "add" ;-}

elyobo’s picture

An option to automatically assume that the plural of something is a synonym might also be handy... I find that, with free tagging enabled, it's common to have two terms, one plural, one not, when I'd prefer to just have the one. Hardly essential though.

dman’s picture

No argument, however without a full language parser there's no dictionary of plurals to do that right. Chopping any 's' off the end is a pretty rough hack.
So the point of synonyms is to put words like that against their parent. One of the most useful uses of synonyms in fact.

PS. Long closed issue. Follow the progress on this initiative elsewhere.