The Apache Solr Search Integration module has this flaws with internationalization:

Stop Words
Stemming
Protected Words
Compound Word Splitting
Spell Checking

All addressed by the apachesolr_multilingual module.

What is the status of these points in Search API Solr search ?

Comments

drunken monkey’s picture

The status is that there aren't any language-specific settings in the schema. Spell checking is language-independent (if you want to use multiple languages with a single index, I don't really know how to do that, though) and if you want any of the others, you'll have to configure them locally.
You can also set up several Solr servers with configurations for different languages, and put indexes on them indexing only items of a certain language. (Although you'd need a few lines of code for that last step, or wait a few weeks until this gets added.)

Fidelix’s picture

I'll wait a few weeks.

Is Internationalization being considered in this module's and Search API's Roadmap?
I'm in the process of planing the architecture of a big project, which will be multilanguage, and I'm trying to figure out if I can rely on this project for the future.

Everything seems awesome so far, but multilingual search if very important for this project.

Thank you.

drunken monkey’s picture

In principle, yes, internationalization is considered important, and I also tried to keep it in mind when designing the basic architecture.
However, in Drupal 7 this became even harder to do than before, so I can't really make any promises regarding which use cases will be supported. Generally, I hope the Search API is flexible enough to allow all necessary customizations at least locally, though.

mac_weber’s picture

As you said in [#1] the fastest way to get it working is to use solr multi-core with one language per core. I'd be interested in testing it.
Also, is this module considering entity_translation? I'm using both i18n and ET.

drunken monkey’s picture

Also, is this module considering entity_translation? I'm using both i18n and ET.

No, at the moment it is not. This would be something to be dealt with in the Search API itself, and I'm really unsure of how to best tackle this. (That's one of the things I meant with Drupal 7 being an even harder environment for i18n in searches.) Currently, I think the current site language is used for retrieving the field data, which might actually be a bit random in certain cases.

marcoka’s picture

i am interested in the "workflow, how to do it" for multilanguage searches too. Like translating the page and then if you switch the language (language switcher) the it searches in that language (like a user would expect).

Anonymous’s picture

@e-anima if you use a Search API view, can you choose the language=current language as a pre-set filter? Or you can hook into a query_alter() kind of function, and add the language filter hardcoded in a hook_search_api_query_alter().
(Assuming your entities have a valid language property/field.)

marcoka’s picture

chose the language with a views exposed field? hm. Best usability would be to chose the language with the normal languageswitcher block provided by internatinalisation/core.
I am thinking about that as it seems very important.

Anonymous’s picture

Hi, why a exposed field? Why not just a unexposed language = current language filter?

marcoka’s picture

morningtime, good ide i will try that.

i played around a little and found one possibility to set the language by code, like using query alter. this may be a bad solution, i do not know so far. digging deeper and testing.

function search_api_tests_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
    $call_args['params']['fq'][1]='ss_search_api_language:"de"';
    dsm($call_args);
}
drunken monkey’s picture

Doing this with hook_search_api_query_alter() instead of the Solr-specific variant would be both simpler and cleaner. But in the end, morningtime's solution should work equally well without any custom code whatsoever.

blackice2999’s picture

Hi @all and merry christmas ;)

all multi language ways that i have seen use often the same fields and try selecting/searching by a language field value or using different indexes. This solution works on the first but if you want to use the apache solr as backend we run into another problem. You can't separate the tokenizer based upon the language. This could be a problem if you want to use SnowballFilter or PorterStemmer filter with german and english.

so i think its necessary that we use the language of the content also as document key and field key. Specially for the apachesolr schema the dynamic fields was named:

<dynamicField name="t_*" type="text" termVectors="true" />

but can only use one "fieldType"

i think its a good idea to add the Language key into the dynamic so we can use a different fieldType based upon the language.

Example:

<dynamicField name="t_*_en" type="text_en" termVectors="true" />
<dynamicField name="t_*_de" type="text_de" termVectors="true" />

Or:


danielnolde’s picture

For anyone interested in the latest effort of getting support for Entity Translation based multi-lingual content search to Search API:

Search API Entity Translation Module:

http://drupal.org/project/search_api_et

At the moment, this module introduces a new fulltext field to Search API which simply concatenates all ET translations of an entity for indexing – so a search keyword is findable in all translations a content. This work, but, of course, is very very crude and blunt and somehow wrong (but works!). For finding and deciding on a better way of supporting ET in Search API, there is a discussion going on in the module's issue queue at http://drupal.org/node/1393058.

How language specific search server setting can be achieved is a very important and interesting topic, so any discussion about this here can also bring good thoughts into the Search API Entity Translation progress.

Feel free to try and use the module and state your thoughts how to progress in the issue!

Carsten Müller’s picture

Hi,

i agree with #12. In the Apachesolr Multilingual module the languages are set in the schema.xml
Because of the multilanguage problems in D7 Apachesolr Multilingual (http://drupal.org/project/apachesolr_multilingual) was not ported yet to D7. But there are plans to start this soon.

The question is now, is it a good idea to port it or to help search api supporting multilingual fields, stemming for different languages (plural in german is different to plural in english), different stopwords for each language and so on?
I think one real good solution will be better instead of two separate ones ...

danielnolde’s picture

Carsten, the setting and support for language specific stemming, stopword etc. is part of the apache solr configuration. The apachesolr_multilingual.module only helped you by preparing this solr config files for you (based on the solr config needed by apachsolr.module). You can quite easily configure stemming, stopwords etc. directly via the solr config files.

What's missing in apache solr and therefore hard to come up with is the possibility of index multiple language/translated versions of fields or a search document within one index, and to have multiple different language specific config settings within one index.

I think Blackice tries to show us a way of working around both these solr shortcomings by utilizing dynamic solr fields via search_api.

klonos’s picture

note-to-self: ...coming from #1335394: Search API integration

stefan.r’s picture

Issue summary: View changes

As of 9 months ago there is a 2.x branch in the Search API Entity Translation module which along with the Search API Entity Translation Solr search module addresses all of these concerns.

See also:
#1393058: Decide on strategy for language aware search
#2147489: Merge with Apache Solr Multilingual?

@drunken monkey, we can probably close this issue at this point?

drunken monkey’s picture

Status: Active » Fixed

@drunken monkey, we can probably close this issue at this point?

I guess, yes. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.