As discussed with Markus, creating queries that are able to utilize the lucene's sloppiness parameter for phrase proximity searches is not easy in the current implementation of flattenkeys.

For example. Search for chocolate cake on the rendered item and title should produce the following query.
q=(tm_X3b_en_rendered_item:(%2B"chocolate+cake"~100000)^1+tm_X3b_en_title:(%2B"chocolate+cake"~100000)^5)

Today this is not possible. The patch adds a sloppy_phrase parse mode, as for term proximity we don't see a use case to change the number for the proximity search. See https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity%20....

This will allow you to create a query that shows more relevant results to your phrase, without showing results where a word from the phrase is not matched. Eg, in the chocolate cake example you don't want to see results that don't have chocolate or don't have cake. You also don't want to see only matches where the both words are only present in a form with a single space dividing them (eg, ignoring proximity).

See patch for the new parse mode.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Nick_vh created an issue. See original summary.

Nick_vh’s picture

Nick_vh’s picture

Nick_vh’s picture

Status: Active » Needs review
mkalkbrenner’s picture

Status: Needs review » Needs work
  1. +++ b/src/Utility/Utility.php
    @@ -744,6 +753,10 @@ class Utility {
    +        case "sloppy_phrase":
    

    Is that the correct position? Should we really execute the terms logic below?

  2. +++ b/src/Utility/Utility.php
    @@ -744,6 +753,10 @@ class Utility {
    +          $sloppiness = '~10000000';
    

    That should become a configurable option. Unfortunately this isn't doable as an option of the parse mode at the moment. So I suggest to offer a setting per index.

mkalkbrenner’s picture

+++ b/src/Utility/Utility.php
@@ -621,6 +624,10 @@ class Utility {
    * OR           | FALSE     | [x,y]  | phrase         | +(x:(A B)^1 y:(A B)^1)
...
+   * AND          | FALSE     | [x,y]  | sloppy_phrase  | +(x:(+"A"~10000000 +"B"~10000000)^1 y:(+"A"~10000000 +"B"~10000000)^1)

Shouldn't that be
+(x:("A B"~10000000)^1 y:("A B"~10000000)^1)
?

mkalkbrenner’s picture

Status: Needs work » Needs review
FileSize
5.01 KB
8.05 KB
mkalkbrenner’s picture

FileSize
4.28 KB
8.93 KB
Nick_vh’s picture

+++ b/src/Utility/Utility.php
@@ -744,44 +755,53 @@ class Utility {
+            $query_parts[] = '(' . $pre . $terms_or_phrase . ((strpos($terms_or_phrase, ' ') && strpos($terms_or_phrase, '"') === 0) ? $sloppiness : '') . ')';

This looks extremely confusing to me. Maybe some documentation?

  • mkalkbrenner committed 46e0ff9 on 8.x-3.x
    Issue #3090893 by Nick_vh, mkalkbrenner: Support Phrase sloppyiness and...
  • mkalkbrenner committed 4e7c9b1 on 8.x-3.x
    Issue #3090893 by Nick_vh, mkalkbrenner: Support Phrase sloppyiness and...
mkalkbrenner’s picture

Status: Needs review » Fixed

The code was "simplified". Adding more comments will happen when adding a config option.
Can you try LTR now?

Nick_vh’s picture

Status: Fixed » Needs work

Hm - somehow it's not doing what I expected
chocolate cake ->

json.nl=flat&hl=true&TZ=UTC&fl=,score,features:[features]&hl.requireFieldMatch=false&start=0&hl.fragsize=0&fq=%2Bindex_id:umami_search_index&fq=ss_search_api_language:"en"&rows=11&hl.simple.pre=[HIGHLIGHT]&hl.snippets=3&q=((%2B(tm_X3b_en_rendered_item:"chocolate"^1+tm_X3b_en_title:"chocolate"^5)+%2B(tm_X3b_en_rendered_item:"cake"^1+tm_X3b_en_title:"cake"^5))+tm_X3b_en_rendered_item:(%2B"chocolate"+%2B"cake")^1+tm_X3b_en_title:(%2B"chocolate"+%2B"cake")^5)&hl.mergeContiguous=false&hl.simple.post=[/HIGHLIGHT]&omitHeader=true&hl.fl=&wt=json&rq={!ltr+efi.searchKeys%3Dchocolate+cake+model%3Dlambdamart-2019-10-28-14-30-29+reRankDocs%3D100}

if (strpos($term_or_phrase, ' ') && strpos($term_or_phrase, '"') === 0) {
$term_or_phrase .= $sloppiness;
}

this logic is a bit flawed. I don't want people to enter quotes
I want all of the keys entered in the box to be considered a phrase
eg, I type in chocolate cake, not "chocolate cake"

as soon as I enter two double quotes with the two string it works as expected. But this is the intention of this parse mode, that it adds this for me.
Not sure where you think this should be added, or as a parameter in the view to always consider the search term as a phrase?

mkalkbrenner’s picture

Status: Needs work » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.