Support Phrase sloppyiness and fix flattenkeys wrong "sloppy" logic [#3090893]

As discussed with Markus, creating queries that are able to utilize the lucene's sloppiness parameter for phrase proximity searches is not easy in the current implementation of flattenkeys.

For example. Search for chocolate cake on the rendered item and title should produce the following query.
q=(tm_X3b_en_rendered_item:(%2B"chocolate+cake"~100000)^1+tm_X3b_en_title:(%2B"chocolate+cake"~100000)^5)

Today this is not possible. The patch adds a sloppy_phrase parse mode, as for term proximity we don't see a use case to change the number for the proximity search. See https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity%20....

This will allow you to create a query that shows more relevant results to your phrase, without showing results where a word from the phrase is not matched. Eg, in the chocolate cake example you don't want to see results that don't have chocolate or don't have cake. You also don't want to see only matches where the both words are only present in a form with a single space dividing them (eg, ignoring proximity).

See patch for the new parse mode.

Comment	File	Size	Author
#8	3090893_8.patch	8.93 KB	mkalkbrenner
#8
#8	3-8-interdiff.txt	4.28 KB	mkalkbrenner
#7	3090893_7.patch	8.05 KB	mkalkbrenner
#7
#7	4-7-interdiff.txt	5.01 KB	mkalkbrenner
#3	3090893-3-create-sloppy-phrase-parse-mode.patch	7.49 KB	Nick_vh
#3
#2	3090893-create-sloppy-phrase-parse-mode.patch	7.45 KB	Nick_vh
#2

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

29 October 2019 at 16:46

Nick_vh created an issue. See original summary.

Comment #2

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh at Dropsolid commented 29 October 2019 at 16:47

File	Size
3090893-create-sloppy-phrase-parse-mode.patch	7.45 KB

Comment #3

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh at Dropsolid commented 29 October 2019 at 19:49

File	Size
3090893-3-create-sloppy-phrase-parse-mode.patch	7.49 KB

Comment #4

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh at Dropsolid commented 29 October 2019 at 19:56

Status:

Active

» Needs review

Comment #5

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 30 October 2019 at 10:57

Status:

Needs review

» Needs work

+++ b/src/Utility/Utility.php
@@ -744,6 +753,10 @@ class Utility {
+        case "sloppy_phrase":

Is that the correct position? Should we really execute the terms logic below?

```
+++ b/src/Utility/Utility.php
@@ -744,6 +753,10 @@ class Utility {
+          $sloppiness = '~10000000';
```
That should become a configurable option. Unfortunately this isn't doable as an option of the parse mode at the moment. So I suggest to offer a setting per index.

Comment #6

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 30 October 2019 at 10:58

+++ b/src/Utility/Utility.php
@@ -621,6 +624,10 @@ class Utility {
    * OR           | FALSE     | [x,y]  | phrase         | +(x:(A B)^1 y:(A B)^1)
...
+   * AND          | FALSE     | [x,y]  | sloppy_phrase  | +(x:(+"A"~10000000 +"B"~10000000)^1 y:(+"A"~10000000 +"B"~10000000)^1)

Shouldn't that be
+(x:("A B"~10000000)^1 y:("A B"~10000000)^1)
?

Comment #7

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 30 October 2019 at 13:46

Status:

Needs work

» Needs review

File	Size
4-7-interdiff.txt	5.01 KB
3090893_7.patch	8.05 KB

2 files were hidden/shown/deleted

File	Size
3090893-create-sloppy-phrase-parse-mode.patch	7.45 KB

3090893-3-create-sloppy-phrase-parse-mode.patch	7.49 KB

Comment #8

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 30 October 2019 at 13:49

File	Size
3-8-interdiff.txt	4.28 KB
3090893_8.patch	8.93 KB

1 file was hidden/shown/deleted

File	Size
3090893_7.patch	8.05 KB

Comment #9

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh at Dropsolid commented 30 October 2019 at 14:28

+++ b/src/Utility/Utility.php
@@ -744,44 +755,53 @@ class Utility {
+            $query_parts[] = '(' . $pre . $terms_or_phrase . ((strpos($terms_or_phrase, ' ') && strpos($terms_or_phrase, '"') === 0) ? $sloppiness : '') . ')';

This looks extremely confusing to me. Maybe some documentation?

Comment #10

30 October 2019 at 15:24

mkalkbrenner committed 46e0ff9 on 8.x-3.x

Issue #3090893 by Nick_vh, mkalkbrenner: Support Phrase sloppyiness and...

mkalkbrenner committed 4e7c9b1 on 8.x-3.x

Issue #3090893 by Nick_vh, mkalkbrenner: Support Phrase sloppyiness and...

Comment #11

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 30 October 2019 at 15:25

Status:

Needs review

» Fixed

The code was "simplified". Adding more comments will happen when adding a config option.
Can you try LTR now?

Comment #12

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh at Dropsolid commented 30 October 2019 at 16:40

Status:

Fixed

» Needs work

Hm - somehow it's not doing what I expected
chocolate cake ->

json.nl=flat&hl=true&TZ=UTC&fl=,score,features:[features]&hl.requireFieldMatch=false&start=0&hl.fragsize=0&fq=%2Bindex_id:umami_search_index&fq=ss_search_api_language:"en"&rows=11&hl.simple.pre=[HIGHLIGHT]&hl.snippets=3&q=((%2B(tm_X3b_en_rendered_item:"chocolate"^1+tm_X3b_en_title:"chocolate"^5)+%2B(tm_X3b_en_rendered_item:"cake"^1+tm_X3b_en_title:"cake"^5))+tm_X3b_en_rendered_item:(%2B"chocolate"+%2B"cake")^1+tm_X3b_en_title:(%2B"chocolate"+%2B"cake")^5)&hl.mergeContiguous=false&hl.simple.post=[/HIGHLIGHT]&omitHeader=true&hl.fl=&wt=json&rq={!ltr+efi.searchKeys%3Dchocolate+cake+model%3Dlambdamart-2019-10-28-14-30-29+reRankDocs%3D100}

if (strpos($term_or_phrase, ' ') && strpos($term_or_phrase, '"') === 0) {
$term_or_phrase .= $sloppiness;
}

this logic is a bit flawed. I don't want people to enter quotes
I want all of the keys entered in the box to be considered a phrase
eg, I type in chocolate cake, not "chocolate cake"

as soon as I enter two double quotes with the two string it works as expected. But this is the intention of this parse mode, that it adds this for me.
Not sure where you think this should be added, or as a parameter in the view to always consider the search term as a phrase?