I have a node that has a word in the title and the same word in the body. So the same word is full text indexed with 2 scores. When I do a query on this word then the query results in this:

SELECT COUNT(*) AS expression
FROM 
(SELECT t.item_id AS item_id, t.score AS score, CASE WHEN t.word LIKE '%milou%' THEN 1 ELSE 0 END AS w0, 1 AS expression
FROM 
{search_api_db_education_text} t
LEFT OUTER JOIN {search_api_db_education} t_2 ON t.item_id = t_2.item_id
WHERE (t.word LIKE '%milou%' ESCAPE '\\') AND (field_name IN ('rendered_item')) AND (t_2.external = '1')
GROUP BY item_id, score, w0) subquery

So the result has the same node twice because it's grouped by item_id and score.

Now the $view->total_rows var is off by one and this results in various things not working properly. For example the pager, which thinks there is a result more then that there actually is. The view itself loads/shows the correct number of results.

Perhaps the count query should just group by item_id?

CommentFileSizeAuthor
#23 2916534-23--fix_single_field_partial_search.patch3.97 KBdrunken monkey
#23 2916534-23--fix_single_field_partial_search--tests_and_first_fix_only.patch3.27 KBdrunken monkey
#23 2916534-23--fix_single_field_partial_search--tests_only.patch2.45 KBdrunken monkey
#23 2916534-23--fix_single_field_partial_search--interdiff.txt616 bytesdrunken monkey
#19 2916534-19--fix_single_field_partial_search.patch3.99 KBdrunken monkey
#19 2916534-19--fix_single_field_partial_search--tests_and_first_fix_only.patch3.29 KBdrunken monkey
#19 2916534-19--fix_single_field_partial_search--tests_only.patch2.47 KBdrunken monkey
#19 2916534-19--fix_single_field_partial_search--interdiff.txt6.54 KBdrunken monkey
#17 2916534-17--duplicate_result_rows.patch3.75 KBdrunken monkey
#17 2916534-17--duplicate_result_rows--tests_only.patch2.92 KBdrunken monkey
#17 2916534-17--duplicate_result_rows--tests_only--interdiff.txt2.18 KBdrunken monkey
#17 2916534-17--duplicate_result_rows--interdiff.txt844 bytesdrunken monkey
#15 screenshot.png34.95 KBdrunken monkey
#11 2916534-11--duplicate_result_rows--tests_only.patch2.86 KBdrunken monkey
#6 search-api-search-api-db-remove-duplicates-from-count-query-2916534-6-D8.patch745 bytesricovandevin
#2 search_term_occuring_in-2916534-2.patch1.03 KBJohnny vd Laar
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Johnny vd Laar created an issue. See original summary.

Johnny vd Laar’s picture

The attached patch is totally not okay to be included into search_api... but up until now it was the only way to do this because the count query doesn't have a query tag so I can't alter this query... I'm kinda lost because it feels like I am doing something wrong as other people should also encounter this problem...

ricovandevin’s picture

We are running into the same issue. I'll dive into this too.

ricovandevin’s picture

Status: Active » Needs work
ricovandevin’s picture

It seems to me that also for the normal query we should not group by score because also there it will duplicate results in this use case. These duplicates are probably removed somewhere before rendering the results. But leaving the score out of the GROUP BY clause causes a syntax error:

SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'drupal.t.score' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I will try to find how the duplicates are filtered out for the normal query. Maybe we can replicate that for the count query.

ricovandevin’s picture

Version: 8.x-1.5 » 8.x-1.x-dev
Component: Views integration » Database backend
Status: Needs work » Needs review
FileSize
745 bytes

The patch attached is the solution I came up with after a lot of debugging. This solution assumes that

  • We want every item_id to appear only once in both the search results and the result count;
  • If we have multiple results for one item_id we are interested in the total score (sum) over all results for that item_id.

These assumption may not be correct. Maybe we should not sum the score but use the maximum score for each item_id. I'm open to any suggestion for a solid solutions for this issue. :-)

Status: Needs review » Needs work
Johnny vd Laar’s picture

The patch in #6 works better, I'm not really sure that we should use sum. Perhaps max is better. If you've indexed 3 fields all with weight 1000 then the sum will be 3000. If the title gets 2000 then this result would be better... Not sure if this is really the intended way. But perhaps the module maintainers can chip in on this.

drunken monkey’s picture

Issue tags: +Needs tests

Thanks for reporting this problem!
I'm a bit surprised by this (though not too much, since I know how incredibly complex the query generation is), since I thought with our level of test coverage, we would have covered this scenario and spotted any such obvious problems there. But apparently that's not the case.

Looking at \Drupal\Tests\search_api\Kernel\BackendTestBase::searchSuccess() and \Drupal\Tests\search_api\Functional\ExampleContentTrait::insertExampleContent(), though, I really can't see why this error wouldn't be detected. As you see, entity #2 has "test" in both name and body, so the item count should be off. Can you maybe find out why this test still passes, or provide a test case that actually fails?

Anyways, we already use SUM(score) in some cases, so I'd also go with that to resolve this issue. Apparently, we just missed one specific scenario in which the sum is needed.
However, when you add the sum, you have to remember to also make sure that the query is grouped by all non-aggregated columns. And, of course, that you don't still have the score column in the GROUP BY.

Anybody’s picture

I can confirm this problem still exists sadly.

drunken monkey’s picture

Sorry, I still can’t reproduce it. I now went and wrote more test code to try and nail this down, but still nothing (at least locally).
Are you maybe all using Postgres? (I tested in MySQL and SQLite.)

If you can’t provide a failing test, or help me reproduce the problem, we won’t be able to fix this problem.
(At least the query included in the IS points to this being a problem with partial matching – can you confirm you’re using that, too, Anybody?)

Anybody’s picture

Nope, MySQL. Whenever I'll be able to reproduce this, I'll post more information.

Marty2081’s picture

I ran into the same issue. Here are steps to reproduce on a vanilla Drupal install with Search API using a database backend:

- Create a database server where partial word matching is enabled.
- Create an index on the created database server with a rendered entity field indexed as full text indexing any node type (make sure the selected view mode for rendering shows the body field for instance).
- Create a Search API view based on the index that has an exposed full text filter (only searching through the one full text field that is indexed) and add a search summary showing the total number of results in the header area.
- Create a node of any type and make sure the body field contains text that includes the search term you are testing with multiple times as part of some words that are repeated different times (to vary the score per string in the index). For instance: test with the string "the" and create the node with the text "the their their theirs theirs theirs them them them them"
- Make sure the created node is indexed.
- Now go to your search view and search for "the". The expected result is that the total number of results is 1, but the actual result is 4.

The result count of 4 is due to the fact that the index is queried with a like query on the string "%the%" resulting in 4 rows since the four entered words are alle indexed with a different score and there is a grouping on score.

The search_api_db_index_text queried with a like query with "%the%" will have a result like:

item_id field_name word score
entity:node/1:en rendered_item the 1000
entity:node/1:en rendered_item their 2000
entity:node/1:en rendered_item theirs 3000
entity:node/1:en rendered_item them 4000

drunken monkey’s picture

Issue summary: View changes

Still nothing, sorry:

(“The” alone had too many other results, so I prepended a “u”.)

Are you using Postgres? Or maybe a (very) old version of the module? I’m really clueless what could be happening here.

Your explanation makes a lot of sense, except that we already realized this problem when implementing this feature and guarded against it – see the following code in Database::createKeysQuery():

          // Add an expression for each keyword that shows whether the indexed
          // word matches that particular keyword. That way we don't return a
          // result multiple times if a single indexed word (partially) matches
          // multiple keywords. We also remember the column name so we can
          // afterwards verify that each word matched at least once.
          $alias = 'w' . ++$this->expressionCounter;
          $like = '%' . $this->database->escapeLike($word) . '%';
          $alias = $db_query->addExpression("CASE WHEN t.word LIKE :like_$alias THEN 1 ELSE 0 END", $alias, [":like_$alias" => $like]);
          $db_query->groupBy($alias);
          $keyword_hits[] = $alias;

Of course, it’s always possible that there is a bug. However, that’s what the automated tests are for, and they also show no problems.
A failing test would really be what we need.
(As you see, the test above passed for both MySQL and Postgres, so it doesn’t seem to be that simple.)

drunken monkey’s picture

FileSize
34.95 KB
Marty2081’s picture

I reproduced the issue with the latest versions of Drupal 8 and Search API using MySQL

It is very important to only index a single full text field. As soon as you have multiple fields the issue is no longer there since the query then changes. Are you sure you only indexed one field to search on?

drunken monkey’s picture

Important information, thanks – but still nothing.
I adapted the tests in the patch, too, but also couldn’t manage to reproduce. (Interestingly, though, I did uncover a bug with SQLite there, so that would be good to fix! Was something quite like #3068180: SQL error: Ambiguous column name.)

(Before committing, though, tests should probably be moved into a regressionTests*() sub-method. And those from the TestBackendBase class removed, as they didn’t catch anything.)

Are you maybe using some additional modules, or specific Views settings? Or does it also happen when you just install the Database Defaults module, switch to partial matching and restrict the search fields to a single one?

Marty2081’s picture

@drunken monkey: as discussed on Slack: I have sent you a database export (through Slack) of a vanilla Drupal 8.7.7 with Search API 8.x-1.14 install which reproduces the issue. I hope that helps.

drunken monkey’s picture

Thanks again for the dump!
With it, I could finally reproduce this problem! (Afterwards, I also tried with my original installation and, of course, it started failing there, too. No idea what I did wrong there last time – sorry for your trouble!)

Being able to reproduce it, I think I also came up with the correct patch to fix this problem. Please test/review!

Thanks a lot again for your great work on this!

Status: Needs review » Needs work

The last submitted patch, 19: 2916534-19--fix_single_field_partial_search.patch, failed testing. View results

drunken monkey’s picture

Marty2081’s picture

First I reproduced the issue with the 1.x-dev version of the module. Then I patched the module (patch applies cleanly) and tested again. I can confirm that the patch from #23 fixes the issue in my case.

  • drunken monkey committed b5d42bc on 8.x-1.x
    Issue #2916534 by drunken monkey, Johnny vd Laar, ricovandevin,...
drunken monkey’s picture

Awesome, thanks for confirming!
Committed.
Thanks again for everyone’s work in here – especially Marty2081 for his invaluable help in getting to the root cause of this!

drunken monkey’s picture

Status: Needs review » Fixed
Anybody’s picture

Thank you very very much! This works perfectly. A new stable release would be very nice for production.

drunken monkey’s picture

Sure.
We’re currently sprinting during DrupalCon, but I’ll definitely create a new release afterwards (read: next week).

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.