Right now the "Simple Extraction" library that extractor ships with does not support synonyms. When a geo term like "New York City" is being looked for, only "New York City" will be found. There is no way of specifying "NYC", "N. Y. C.", "Gotham" or "Nueva York" as synonyms for the same place.

Simple synonym support could also go a long way for helping Simple Extractor work better for hyperlocal use cases. Think of geo tagging intersections, coffee houses, parks, neighborhoods, wards, police stations in your city.

Battle plan for getting synonym support for Simple Extractor

1 Use term_synonyms

We should simply use taxonomy's term_synonym table as it has the exact schema we are looking for and a UI on the taxonomy term edit pages.

The only modification that needs to happen here is we need to point _extractor_simple_lookup() to term_synonym instead of term_data.

2 Update import schema

Synonyms should be populated by a Feeds import, this will require the following changes to Feeds:

and a slight modification to our CSV format and the importer configuration to support synonyms.

Performance considerations

Currently, Simple Extractor supports lookups against up to 2000 terms. Synonym support could quickly push us over this limit. We may be able to jack it up (maybe 4000? maybe more?) but at some point we will have to consider different solutions. Ideas would be 5 first character indexing (look up against an index of 5 first characters) or stemming. For a first iteration, I assume we get away with the limitations of the current system.

Comments

tallsimon’s picture

this would be an excellent feature, are you working on this? would need this to usefully use feeds on my sites!

STINGER_LP’s picture

Any progress on this?