In Drupal 7 we missed a great opportunity to have multilingual entity functionality supported in search module. In Drupal 7, multiple languages can be stored under one entity in various fields, but search module will not render all versions of the node for storage and it does not support variations of entities per language for indexing.
Looking at the index storage it looks straightforward to include a langcode where a sid is referenced, so included a patch for that to start the conversation. There are some more changes suggested in the contrib entity_translation issue at #1291388: Add support for multilingual core search, but I did not look at the completeness of those suggestions to the functionality required.
| Comment | File | Size | Author |
|---|---|---|---|
| #53 | search-changelog.patch | 586 bytes | gábor hojtsy |
| #52 | search-changelog.txt | 586 bytes | gábor hojtsy |
| #42 | search-index-langcode-1669876-42.patch | 18.8 KB | berdir |
| #40 | search-index-langcode-1669876-40.patch | 18.8 KB | berdir |
| #40 | search-index-langcode-1669876-40-interdiff.txt | 4.18 KB | berdir |
Comments
Comment #1
gábor hojtsyComment #3
jhodgdonI will just note here that there were earlier (still open) issues regarding language and search in the Drupal Core issue queue (some even with patches). These are all concerns you will need to address if you want to make search work correctly with translated content:
#987210: Language-specific searches should include language neutral content
#511594: hook_search_preprocess needs to be language-specific
Comment #4
gábor hojtsy@jhodgdon: thanks. I don't think either of those relate to indexing nodes that have field translation based translations, so they can be solved independently. The main missing piece in my book is that field translations on entities are not supported, and search module should add a langcode to its identifiers alongside sid where appropriate. I do agree there are other unrelated issues that are relevant for language.
Comment #5
gábor hojtsyThis patch ports the indexing changes from #1291388: Add support for multilingual core search to Drupal 8. Any interest, comments, complaints? :)
Comment #7
gábor hojtsyNow also includes rendering the right language version of a node found.
Comment #8
nick_vhdoes it have to be there? Can't we assume it can be 'UND' or NULL also?
this is not very scalable. Suppose you index 50 items at once, you have 5 languages, so it will index 250 items at once.
Would it be better to have a separate indexing process per language of have the increment of the limit effect this foreach also?
are we sure there is no performance impact here? I haven't delved into the difference between the two. Looks ok though, since we use the node_view in earlier execution.
Comment #9
gábor hojtsyThanks for the review!
1. The langcode always comes for the index, if the index has und, it will pass on UND. We can make it optional documentation-wise, but it is always coming from the index.
2. Right, well, its harder to get from the outside which nodes are available in a certain language, but when you have a node loaded, it is trivial to see which languages are under it. Also, we are indexing per node here, so sounds like it makes sense to branch out to indexing all languages there. I agree it can be added to the counting.
3. Look, this is just the hook docs updated :) I merely copied the actual implementation 1-1. The hook docs were highly outdated and there were lots of changes in the actual implementation, like this node_view() change, that was not reflected in the docs. I just took the current code, and copy-pasted there, so the hook has a good current example. There is no such change in the actual code, since it is pre-existing to this patch.
Comment #10
balintbrewsI went ahead, and fixed a few things, so the tests should pass now -- at least I hope. :)
I don't know what you think about it, but I added a new method:
_node_index_node_language_variant().This indexes a language variant of a node, and it's called by
_node_index_node(). SinceEntity::translations()doesn't return the language which is set in$node->langcode, we need to render the node for the indexing once with this langcode, and then with using the other languages, that's why I thought it's a good idea to have a separated method that indexes a single language variant. What do you think?I also implemented a simple counter that checks when is the maximum number of indexable items being reached, so the indexing throttle can be aware of the language variants. My idea was that a node should be indexed only if all its language variants can be indexed without exceeding the specified limit. Do you agree with this approach? I added a new test class:
SearchMultilingualEntityTest, so far it attempts to test this indexing throttle functionality.Comment #11
balintbrewsFingers crossed for the tests. :)
Comment #12
gábor hojtsyFirst of all, thanks for this great improvement, keep it up! I only have some notes:
The reason I used the $item langcode here is that the resulting rendered node should depend on which language version of the node was found, and the langcode comes from the $item (search result). Your change seems to render the original language version of the node at all times, regardless of which translation matched the query, which sounds like an issue to me(?)
I'd add what you explained in comments that "we stop indexing if all translations would not fit under the limit".
I'd add a little comment saying "We need to index the original language version separately since it is not included with translations()."
Can we also run a simple search for one or two specific randomName()s from the nodes and check if we found the right node?
(Also asked Nick_vh to look at the patch again :)
Comment #13
nick_vhI like this, but perhaps this can be documented a little bit more?
I'm not sure if people that read the source code will understand what this does.
Not a blocker though!
This will be confusing, don't you think?
Within _node_index_node there is a function that is also capable of indexing content called _node_index_node_language_variant? That's confusing :-)
I'm in favor of keeping this in 1 function. You could use a helper function perhaps but at least leave the search_index() calls in the _node_index_node()
Aside of that, very good work with the tests and implementing the throttle capabilities!
Comment #14
gábor hojtsySo looks like todo:
1. merge the node variant indexer as per Nick_vh (I'm fine either way :)
2. make node_search_execute() take the item language into account (again)
3. explain the counter behavior (as per Nick_vh and myself) in a code comment
4. add some tests for running searches and looking at the results
Comment #15
saltednutThis should just be a working reroll of #10 for the latest 8.x
-- Do we need to invoke search_index() both after the original node is built and during the loop of its translations? Would _node_index_node_language_variant need to return the built variations as $node?
Marking needs review to ensure the reroll was successful - passed tests locally.
Comment #16
gábor hojtsyFirst of all thanks for working on this, it is much appreciated!
Nick_vh's proposal was to not introduce this node language specific function but keep it all in the indexer as before. You can just build a list from the translations() method adding the original language as well and then wrap the code in a loop on those I guess, instead of having two calls with the original language and in a loop for all translations. So no helper function but all integrated.
I'm personally fine with either, just trying to help make this get through sooner than later :)
Comment #18
gábor hojtsySeems like the crux of the fail is "Argument 1 passed to comment_node_update_index() must be an instance of Drupal\node\Node, string given", that is the recurring error in all failures.
Comment #19
saltednutHoping this passes.
Includes reroll from #15 with correct implementation of for comment_node_update_index() failures.
should have been:
I added a comment about the counter as: "Determine when the maximum number of indexable items is reached."
-- This may need to be revised to be more clear though...
I also refactored the helper function into _node_index_node() inside the loop.
Comment #20
balintbrewsThanks for the reroll and refactoring. I noticed that the new test class (core/modules/search/lib/Drupal/search/Tests/SearchMultilingualEntityTest.php) had been removed in your patch, I guess that was an accident. :) So I added it back again (I didn't include it in the interdiff).
I have also changed the
$node->languageback to$item->languageinnode_search_execute()as Gábor pointed out that wasn't the correct thing to do in #10, since we need to render the node using the language in which the search has been executed. For some reason the language code doesn't appear in the$item, so I will continue and investigate this. But in the meantime I wanted to do this minor reroll and submit it to let us focus only on what's actually left.Comment #21
balintbrewsI have added the language code to the result of the search query in
node_search_execute(), so now it appears in$item. I also wrote a test for executing searches and checking the results as Gábor asked before. So it seems the todo list in #14 is completed.Comment #23
gábor hojtsy#21: search-index-langcode-21.patch queued for re-testing.
Comment #25
balintbrewsMeanwhile some changes have been committed, so the patch doesn't apply anymore. I'll reroll it soon.
Comment #26
kristen pol@balintk - I'd like to review the patch when it's ready so feel free to send me a direct message via my drupal.org contact page or through my blog contact form: http://www.kristen.org/contact so I can react quickly. I will be around all day today but not working tomorrow (Wed). Thanks!
Comment #27
balintbrewsHere is the rerolled patch.
Sorry, I didn't create an interdiff, because I removed the code hunks which caused the patch failing to apply, then manually added them again, so the interdiff would not have been correct.
Comment #29
balintbrewsApparently we need the comment module to be enabled for some reason.
What I figured out: in
node_search_execute(), the following line invokescomment_node_update_index():Even though comment module is not enabled at this point. (So the
function_exists()call returns TRUE inmodule_hook().) This doesn't happen when I run the tests locally.The
comment_node_update_index()method tries to fetch something from the database, but the comment table doesn't exist, that's why the fatal errors. Maybe this is a caching issue with the d.o. test environment. Honestly, I have no idea, but maybe it's trivial for some of you. :)Anyway, the comment module is enabled now in the test, so it might solve the issue, but I'm not sure about that this is the right solution.
Comment #30
kristen polI looked through the code and found a few things but perhaps none of them require action:
Just checking it is correct to use LANGUAGE_NOT_SPECIFIED when the search type is specifically for Japanese.
The comment says "run the shutdown method" yet I don't know what that means and don't see the word "shutdown" anywhere except in the comment. Is the comment understandable to the regular core contributors?
Same here about "shutdown method."
Just checking that taxonomy is supposed to be removed from here and not updated like the comment code was... ?
Let me know if you want me to apply the patch to the latest D8 dev and, if so, how you'd like me to test. Thanks!
Comment #31
balintbrewsThanks for the review!
In my opinion it's irrelevant what the language is here, so it's okay to use this constant.
Good point. The
search_update_totals()method is normally called on cron shutdown. That's what the comment refers to. For instance inSearchAdvancedSearchFormTest.phpthe code is more commented:Do you think it would make more sense to copy over these lines?
That's the documentation that has been updated, and the example uses the
node_search_execute()method as a samplehook_search_execute()implementation. Fetching the terms must be some legacy code there, but now it's updated according to the actual implementation.Comment #32
kristen polThe comments above regarding the "shutdown function" are clearer to me. Perhaps using those makes sense.
Thanks for clarification on the rest of it.
So... if I were to apply this patch, create some nodes with different languages, and test the search form, should it be obvious if it's working properly?
Comment #33
balintbrewsI added those comments, and also wrote some more. I hope it's clear enough now.
Since the tests cover the functionality I don't think you should do manual testing. Also, there is no UI in core for adding multilingual values to fields (yet).
Comment #34
klonosI don't see a "needs backport" tag in this. That's because it breaks the API. Right? If so, could the changes here be somehow made available to D7 in the form of a contrib module or perhaps implemented in Backports? ...or is #1291388: Add support for multilingual core search supposed to tackle this?
Comment #35
kristen polThanks for updating... one final comment (I hope!):
My only suggestion would be to explicitly state which function has the extra comments in it, because additional functions might be added in the future within the file and then the "previous method" text might no longer be accurate. Not sure the parentheses are needed either... perhaps:
Comment #36
balintbrewsWell, you are right. :) Here it is, with that better comment added.
Comment #37
gábor hojtsy@klonos: yes, this changes several APIs and the internal structure of the search index. It is theoretically possible that Entity translation module would implement an alternate solution for Drupal 7 by indexing the separate nodes in some other way without the need for backwards compatibility breaks, but that would be a parallel system.
Comment #38
nick_vhSeems to progress really well!
I did not see any other todo's in the list and based on all the comments made earlier + the tests, I am going to mark this as RTBC. Change if you are opposed to this.
Comment #39
xjmThanks everyone! This is looking pretty good. I found a few things we need to fix:
These comments (and many others in this patch) need to wrap at 80 characters. Reference: http://drupal.org/node/1354#general
This member variable needs documentation. Reference: http://drupal.org/node/1354#classes
This should be changed for #1380958: Replace $modules list for WebTestBase::setUp() with ::$modules class properties (see #1711070: Convert tests to use ::$modules property instead of parent::setUp($modules)).
Both these test methods need documentation. Reference: http://drupal.org/node/1354#functions
The assertion messages should not use
t(). Reference: http://drupal.org/simpletest-tutorial-drupal7#tI've added this as a task for today's core mentoring. Thanks!
Comment #40
berdirFixed coding style issues.
Comment #42
berdirChanged $modules to public.
Comment #43
gábor hojtsy#42: search-index-langcode-1669876-42.patch queued for re-testing.
Comment #44
gábor hojtsyLooking at the interdiff, seems like the concerns from @xjm were fixed. It was RTBC-ed earlier by search maintainers, so let's get this in then :) Thanks @Berdir for picking this up!
Comment #45
gábor hojtsyComment #46
webchickI did a quick grep, and it looks like in most places in D8 core, $langcode defaults to NULL. Should it here as well? And if not, why not?
Comment #47
gábor hojtsyWell, this is 'not null' => TRUE because the search data is derived from the node data, and nodes are programmatically enforced to have language codes. Maybe they are not enforced on the schema side. In that case, we should fix that in another patch possibly.
Comment #49
gábor hojtsy#42: search-index-langcode-1669876-42.patch queued for re-testing.
Comment #50
gábor hojtsyStill passes fine, please commit.
Comment #51
webchickSettle down, I needed to get a couple of hours of sleep. ;)
I was more trying to suss out if there was a way to avoid developers having to deal with this new parameter in every case by defaulting it to a sensible value. I'm not sure that answers it, but I guess I'll go ahead and commit this and that can be a follow-up if necessary.
Committed and pushed to 8.x. Marking for change notification, since this requires developers implementing these hooks to do work.
Comment #52
gábor hojtsyAdded a thorough change notice at http://drupal.org/node/1737832, this is a simple changelog patch which this does deserve IMHO. Once this is committed, should go to fixed IMHO.
Comment #53
gábor hojtsyUpload as patch, to ensure a txt file change will not break core :)
Comment #54
webchickCommitted and pushed that CHANGELOG.txt entry to 8.x. Thanks! :)
Comment #55
gábor hojtsyThanks!
Comment #56
klonosI know this involves API changes and cannot be backported, but is there a solid solution for this in D7 (in contrib perhaps)? Everything I have tried in order to get language-aware search results so far failed.
Comment #57
gábor hojtsy@klonos: http://drupal.org/project/search_api_et might help.
Comment #58
klonosThanx Gábor!
Comment #59
gábor hojtsyPutting back on sprint just for easier tracking of our work. Will remove later next week.
Comment #60
gábor hojtsyThis both had a changelog and change notice, so complete for Drupal 8. Not backportable to Drupal 7, and off the sprint. Thanks all!
Comment #62
xjmComment #63
ianthomas_ukOn the face of it, throttling based on the number of translations seems to make sense, but it means the counters are inconsistent and introduces an edge case where indexing will be totally jammed.
I propose removing the language counter introduced in #10 and just throttling based on the number of nodes. If you have an opinion either way, please comment on #2178643: Indexing status can show a negative percentage of content as having been indexed