Hi!
Is there a solution for indexing content multilingually having nodes as language neutral? I have nodes translated by field translation and for now I haven't found solution to index multilingual content without setting nodes into specified language (item language can be indexed)...

I feel sad to generate duplicate nodes and I already feel the pain if more languages comes in place in future :S

Comments

drunken monkey’s picture

Title: Indexing field translated nodes » Add support for translated fields
Project: Search API Solr » Search API
Component: Code » Framework
Category: support » feature

No, there currently is no way (that I can see) to do this.
We should definitely add it at some point, but I'm rather uncertain about how to do this. I think this numerous ways of translation give all search module developers quite some headaches.

The easiest way to support it would probably be to let users select a language to use for accessing translated fields, on a per-index base. You'd then have one index for each language – far from perfect, I admit, but at least usable. The Multi-index searches module would then even enable you to search all these indexes at once, or filter per index.

Allowing users to index the same items separately for each language in the same index would be possible in theory, and probably preferrable, but I don't think such a large architectural change is feasible at this point.

iMiksu’s picture

That crossed my mind having an index for each language, but then I don't know how to index nodes as specified language? Is there already a setting for that or would it be a good idea to write a patch for that?

In admin/config/search/search_api/index/%/edit I would see best fit for "Index as language" dropdown setting. What are your thoughts?

drunken monkey’s picture

That crossed my mind having an index for each language, but then I don't know how to index nodes as specified language? Is there already a setting for that or would it be a good idea to write a patch for that?

As said, nothing in this direction is possible right now. So we'd need a patch.

In admin/config/search/search_api/index/%/edit I would see best fit for "Index as language" dropdown setting. What are your thoughts?

That's exactly what I'd have thought, too. So if you want to supply a patch for that, I'd be glad to help with and review it.

danielnolde’s picture

Currently, i see three ways to support language aware search for field translation / entity translation content via search_api:

0.) The dirty but charming quick fix to do at least _multi_ lingual searches (not precisely language _aware_ searches):
Declare a custom fulltext search item property via hook_entity_property_info_alter and there simply concatenate the entity's views/renderings for each of its available translation language. Works like a charm regarding searching and finding in all languages, but there _may_ be 'wrong' matches (like "war" in german != "war" in english) and the search excerpt/highlighting may also show excerpts in unwanted languages. A drawback is also that you can't do field specific multi lingual searches (without declaring a shitload of extra search item properties). Bottom line: This works for simple and son-of-a-gun multi lignual search - if you are not afraid / daring enough to get a small chance of cross-language word matching.

1.) Multiple language-specific indexes (as mentioned by Thomas in #1)
=> only small changes (=little work) in search_api, on the other side very inflexible and not scalable (for many or changing) languages, and a lot of config to do for the sitebuilder. (But extra benefit: the search server could be configured language-specific for each index, including synonms, flexions, compound words, stemming etc.)

2.) Rewiring the inner works of search_api to use more versatile search-item-IDs or support flexible search-item-IDs (i.e. not only an entity's eid – btw, is 'search item' the correct search_api lingo?). Such an extended ID may include a language code (or possibly other meta info, like revision-id).
=> lot of work to do within search api and possibly other search api related modules, but very clean, flexible, little/no config overhead, highly extensible for future extensions beyond language handling [Thus, my favorite, and i think Gábor Hojtsy's, too, see http://drupal.org/node/1335394#comment-5330368 ;]

3.) Using dynamically created custom search item properties (think hook_entity_property_info_alter), in the way of key_property[1..n] => value_property[1..n]
Medium work all contained in a contrib module extending search api, very little config work if done correctly, clean and scaling with changes in a site's language settings. But is that possible (see threaded comment for details)?

danielnolde’s picture

This last possible solution could only work, if it's possible with search_api to do search queries of the type:

key-property[n] => value-property[n]

=> "give me all results with key=ABC and value with matching delta matches my '123' search fulltext pattern"

Then it would be easy to tackle flexible and scalable language-aware search in an entity translation architecture, by defining two custom search_api search properties "langkey" and "langvalue", each having multiple values:

langkey[0] = 'en'
langkey[1] = 'de'
langkey[2] = 'es'

langvalue[0] = 'my english content'
langvalue[1] = 'mein deutscher inhalt'
langvalue[2] = 'mi contenido español'

And searching in a dataset for a specific language by doing a search query like:

Give me all results, where property 'langkey' matches 'es', and the "langvalue" with corresponding index meets my fulltext search criteria.

Would this kind of dynamic delta-linked search item property be possible with search api ??

iMiksu’s picture

Thanks @danielnolde! Your 3. option sounds best possible solution I can think of.

Since I haven't dived into Search APIs source code at all, I can't write any proposed patches for this, but I really would like to hear @drunken monkeys' comments on this!

drunken monkey’s picture

Thanks for your detailed input, Daniel!

1.) Multiple language-specific indexes (as mentioned by Thomas in #1)
=> only small changes (=little work) in search_api, on the other side very inflexible and not scalable (for many or changing) languages, and a lot of config to do for the sitebuilder. (But extra benefit: the search server could be configured language-specific for each index, including synonms, flexions, compound words, stemming etc.)

Though, as you say, not too flexible, I guess we should at one point implement this anyways as it adds a nice amount of flexibility for little work.

2.) Rewiring the inner works of search_api to use more versatile search-item-IDs or support flexible search-item-IDs (i.e. not only an entity's eid – btw, is 'search item' the correct search_api lingo?). Such an extended ID may include a language code (or possibly other meta info, like revision-id).
=> lot of work to do within search api and possibly other search api related modules, but very clean, flexible, little/no config overhead, highly extensible for future extensions beyond language handling [Thus, my favorite, and i think Gábor Hojtsy's, too, see http://drupal.org/node/1335394#comment-5330368 ;]

I wouldn't think it really possible to do such a large architectural change at this point, without starting a 2.x branch.
However, Search API is already free of the assumption of IDs being Entity IDs or integers, since we introduced extensible item types. So I guess all that you describe here could already be implemented in a contrib module, by adding additional language-aware item types for each entity type. You'd just have to customize the data source controller (based on SearchApiEntityDataSourceController) for the new item types a bit.
At least I think so – but any remaining unforeseen obstacles could be removed when encountered, I'm pretty sure. Those would mostly be unjustified assumptions I still make in the code. (Although one thing you shouldn't do is using the entity type names unaltered as item type names – that is bound to wreak havoc in the module.)

(And yes, „search item“ is the correct lingo.)

3.) Using dynamically created custom search item properties (think hook_entity_property_info_alter), in the way of key_property[1..n] => value_property[1..n]

This last possible solution could only work, if it's possible with search_api to do search queries of the type:

key-property[n] => value-property[n]

No, this is not possible with Search API (nor would it be easily implementable in Solr or the DB). A workaround would be something like this:

langvalue[0] = 'en----my english content'
langvalue[1] = 'de----mein deutscher inhalt'
langvalue[2] = 'es----mi contenido español'

Or, to also work with fulltext fields (but then depending on the tokenizer), maybe rather this:

langvalue[0] = 'search_api_language-en my english content'
langvalue[1] = 'search_api_language-de mein deutscher inhalt'
langvalue[2] = 'search_api_language-es mi contenido español'

But as you say, with this little disadvantage this could all be implemented in another contrib module. So if anyone prefers this variant (though I can't really think of a use case for searching/filtering specifically for an English term in one field and the Spanish one in another), they are free to implement it in contrib.

danielnolde’s picture

very interesting feedback ... i'll think about both solutions plotted by Thomas ...

j0rd’s picture

Unfortunately, I've got a search api project which requires multi-lingual searches.

Aside from Search API module, is there any other module, which can handle this easily at this time?

Please let me know, as I'm weighing my options at the moment and will have to decide shortly if this project I'm pitching goes through. Would also be nice for those involved in this thread to see how others have implemented it.

Thanks in advance.

danielnolde’s picture

multi-lingual searches using the i18n workflow (with one node for each translation/language, bundled into translation sets) is supported well and stable by both the core search, the apachesolr.module and the search_api, afaik.
But there is _no_ search module a.t.m., that would enable for correct support for multi-linguality based on the entity translation workflow, it's as simple as that. In such a case, the above mentioned workarounds and/or approaches would be the only ways to implement such a thing (afaik again).

danielnolde’s picture

Hey guys!

Finally we have a dedicated place to get support for entity translations into Search API:

The "Search API Entity Translation" module already offers very basic support to search through all entity translations of an entity via Search API:

http://drupal.org/project/search_api_et

This is a really handy quick win for all those in desperate need of somehow searching though multilingual content managed via Entity Translation.
More important: There is an issue to discuss and decide on how to support a more delicate and truely language-aware searchability for Entity Translation in Search API:

http://drupal.org/node/1393058

You're all invited to participate in this discussion!

drunken monkey’s picture

Status: Active » Closed (duplicate)
int_ua’s picture

Version: 7.x-1.x-dev » 8.x-1.x-dev
Issue summary: View changes
Status: Closed (duplicate) » Active

Do you mind if I reuse this issue for 8.x instead of creating a new one?

int_ua’s picture

Version: 8.x-1.x-dev » 7.x-1.x-dev
Status: Active » Closed (duplicate)