Quick overview of my use case:

  • Site has language-specific (French or English), and language-neutral content.
  • Language-specific content must only appear in results when its language is the same as the current language.
  • Language-neutral content must be displayed in results regardless of the current language.
  • The actual language of language-neutral content can be any of the two languages. We cannot assume it will be all French or all English.

And some thoughts:

  • CLIR is of no use in this case, because English content must not appear if the current language is French, and vice versa.
  • Apache Solr Multilingual's mapping options are inadequate in this case. If a mapping language is selected, fulltext fields of language-neutral documents are only indexed (and searchable) for that language. If no mapping at all is applied, fulltext fields are simply not getting indexed specifically for any of the site's languages.

So I'm experimenting with the following solution:

  • Added a new mapping option: "All languages". With that option, language-neutral documents get enriched with fulltext fields for each of the languages (for example, a document gets indexed with both i18n_content_fr and i18n_content_en).
  • Search pages are configured with the "Limit search to current language by default" and "Show language-neutral/undefined results by default" options. Queries will only search fields matching the current language, so having duplicate fields for other languages has no impact on search results.

The attached patch is an attempt at this solution. Opinions on this approach?

Of course, users searching in French will get good search results with language-neutral documents that contain French text (because that text will have gone through a French index analyzer), and so-so results with English text (because it will have gone through the same French analyzer). However, I guess it is not too much of a surprise to have to switch the current language to English for optimal English text search, and to switch it to French for optimal French text search.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mkalkbrenner’s picture

That's a valid feature request. I'll review your patch ...

mvc’s picture

Issue summary: View changes
FileSize
6.75 KB

same patch, re-rolled against tag 7.x-1.0-rc1, in -p1 format for drush make.

mkalkbrenner’s picture

Status: Needs review » Needs work

The patch already looks good for "common fields". But you missed the callbacks for node, user, entity and term references.

But in general I think that the original requirements stated in that issue can already be achieved without a patch:

  • Site has language-specific (French or English), and language-neutral content.
  • Language-specific content must only appear in results when its language is the same as the current language
  • Language-neutral content must be displayed in results regardless of the current language.

Enabling "Limit search to current language by default" and "Show language-neutral/undefined results by default" of the Multilingual Query Settings on the corresponding Search Page Settings should do the job. If not, we have to look for a bug.

David Lesieur’s picture

If I remember correctly, the patch ensures that language-neutral content gets indexed (or one could say duplicated) in language-specific fields for all of the languages. That allows the content to get through every language's analyzers (e.g. indexing in both 'i18n_content_fr' and 'i18n_content_en', instead of just 'content'). None of the current mapping options does that.

In my use case, language-neutral content is not consistently in a single default language. It can be in any of the languages enabled on the site. So with the new mapping option, better search results are obtained when the current language matches the actual language of language-neutral content.

mkalkbrenner’s picture

In my use case, language-neutral content is not consistently in a single default language. It can be in any of the languages enabled on the site. So with the new mapping option, better search results are obtained when the current language matches the actual language of language-neutral content.

OK, I got you. For this use case additional searching through the language neutral fields at query time is not sufficient. (But you need to know that you increase the false positives as well.)

the patch ensures that language-neutral content gets indexed (or one could say duplicated) in language-specific fields for all of the languages.

No, only for fields that are not references. Term references needs to be handled because terms might be translated. The other references are simply missing in the "duplicates". This might cause various errors during searches. Therefor the patch needs work.

mkalkbrenner’s picture

Status: Needs work » Needs review
FileSize
4.57 KB

I modified (and simplified) the code to handle the callbacks as well.
Can you please test my patch against the latest dev?

mkalkbrenner’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

  • Commit 93a20e9 on 7.x-1.x, 6.x-3.x by mkalkbrenner:
    [#2066405] David Lesieur, mvc, mkalkbrenner: Add an option to map...