When working with a Solr stemmer like SnowballPorterFilterFactory, it's a common practice to index both the stemmed and non-stemmed versions of the field. This gives two advantages:
- It indexes words that are exactly as they appear in the text more highly than near misses
- It avoids awkward cases where the stemmer stems down then doesn't match the actual original word (e.g. in English I've found that with
SnowballPorterFilterFactory, searches on 'unpublished' and 'unravelling' don't match content containing those words, but searches on the stems, 'unpublish' and 'unravel', do match)
The most common method seems to be to use a <copyfield> in the Solr schema.
This is a problem when using Drupal Search API Solr, since fields are defined dynamically, not hard-coded in the schema.
Imagine a fairly ordinary Solr Search API server indexing nodes, processing as fulltext the fields Title, Body, Teaser, and one custom field named Notes, with SnowballPorterFilterFactory enabled on all fulltext fields.
What would be the most robust, Search API-friendly approach to indexing both the stemmed and unstemmed versions of these fields?
(Question also posted on Drupal Answers with no reply)
Comments
Comment #1
RAWDESK commentedHi,
A response after 6 years of silence on this topic, but I thought it would be useful to share my use case and attempts to get phonetic search working in a similar way as described above (using schema.xml copyfields)
Here's what I added manually inside schema.xml :
Define fields dedicated to fieldtype phonetic
Copy values from by Search API Solr indexed fields
After re-indexing the Solr instance, searches on phonetisized field values do not yield any results unfortunately.
So my first thought was the Drupal View responsible for executing the Solr search query, is not aware of the "copied fields" inside the altered schema.xml
Using the below hook_api_search_api_views_query_alter in an attempt to have the query also pickup the copied fields failed also.
/**
* Implements hook_search_api_views_query_alter.
*/
function my_module_search_api_views_query_alter(&$view, SearchApiViewsQuery &$query) {
$view->filter['search_api_views_fulltext']->options['fields']['cast_name'] = 'cast_name';
$view->filter['search_api_views_fulltext']->options['fields']['crew_name'] = 'crew_name';
}
Note : the BeiderMorseFilterFactory phonetic Solr filter implies both index and query analyzer configured in schema.xml for a correct working.
See page 68 and 69 in this e-Book :
https://books.google.be/books?id=u6GrCQAAQBAJ&pg=PA68&lpg=PA68&dq=Beider...
So my question is :
Is there a way to make Search API Solr aware of the existance of the copied fields inside schema.xml ?
Comment #2
drunken monkeyThanks a lot for posting this, might always help others looking for information!
See the handbook. You probably just want to change the type of the fields in Solr, or use
hook_search_api_solr_field_mapping_alter()to change the fields’ mapping to one with the proper prefix.