As discussed with Markus, creating queries that are able to utilize the lucene's sloppiness parameter for phrase proximity searches is not easy in the current implementation of flattenkeys.
For example. Search for chocolate cake on the rendered item and title should produce the following query.
q=(tm_X3b_en_rendered_item:(%2B"chocolate+cake"~100000)^1+tm_X3b_en_title:(%2B"chocolate+cake"~100000)^5)
Today this is not possible. The patch adds a sloppy_phrase parse mode, as for term proximity we don't see a use case to change the number for the proximity search. See https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity%20....
This will allow you to create a query that shows more relevant results to your phrase, without showing results where a word from the phrase is not matched. Eg, in the chocolate cake example you don't want to see results that don't have chocolate or don't have cake. You also don't want to see only matches where the both words are only present in a form with a single space dividing them (eg, ignoring proximity).
See patch for the new parse mode.
Comment | File | Size | Author |
---|---|---|---|
#8 | 3090893_8.patch | 8.93 KB | mkalkbrenner |
Comments
Comment #2
Nick_vhComment #3
Nick_vhComment #4
Nick_vhComment #5
mkalkbrennerIs that the correct position? Should we really execute the terms logic below?
That should become a configurable option. Unfortunately this isn't doable as an option of the parse mode at the moment. So I suggest to offer a setting per index.
Comment #6
mkalkbrennerShouldn't that be
+(x:("A B"~10000000)^1 y:("A B"~10000000)^1)
?
Comment #7
mkalkbrennerComment #8
mkalkbrennerComment #9
Nick_vhThis looks extremely confusing to me. Maybe some documentation?
Comment #11
mkalkbrennerThe code was "simplified". Adding more comments will happen when adding a config option.
Can you try LTR now?
Comment #12
Nick_vhHm - somehow it's not doing what I expected
chocolate cake ->
json.nl=flat&hl=true&TZ=UTC&fl=,score,features:[features]&hl.requireFieldMatch=false&start=0&hl.fragsize=0&fq=%2Bindex_id:umami_search_index&fq=ss_search_api_language:"en"&rows=11&hl.simple.pre=[HIGHLIGHT]&hl.snippets=3&q=((%2B(tm_X3b_en_rendered_item:"chocolate"^1+tm_X3b_en_title:"chocolate"^5)+%2B(tm_X3b_en_rendered_item:"cake"^1+tm_X3b_en_title:"cake"^5))+tm_X3b_en_rendered_item:(%2B"chocolate"+%2B"cake")^1+tm_X3b_en_title:(%2B"chocolate"+%2B"cake")^5)&hl.mergeContiguous=false&hl.simple.post=[/HIGHLIGHT]&omitHeader=true&hl.fl=&wt=json&rq={!ltr+efi.searchKeys%3Dchocolate+cake+model%3Dlambdamart-2019-10-28-14-30-29+reRankDocs%3D100}
this logic is a bit flawed. I don't want people to enter quotes
I want all of the keys entered in the box to be considered a phrase
eg, I type in chocolate cake, not "chocolate cake"
as soon as I enter two double quotes with the two string it works as expected. But this is the intention of this parse mode, that it adds this for me.
Not sure where you think this should be added, or as a parameter in the view to always consider the search term as a phrase?
Comment #13
mkalkbrenner