Whilst running unit tests for Postgres for DB TNG, I hit a failure on the test of do_search() that occurs only on Postgres. Specifically, do_search generates a query akin to the following:

SELECT COUNT(*) 
FROM (
  SELECT i.type, i.sid, (? * COALESCE((n.sticky), 0)) AS score 
    FROM search_index i 
      INNER JOIN search_total t ON i.word = t.word 
      INNER JOIN node n ON n.nid = i.sid 
    WHERE n.STATUS = 1 AND (i.word = ?) AND i.type = ? 
    GROUP BY i.type, i.sid HAVING COUNT(*) >= ?
)

(? being the generic placeholder for the new DB API.) Under Postgres, that results in the following error:

SQLSTATE[42803]: Grouping error: 7 ERROR: column "n.sticky" must appear in the GROUP BY clause or be used in an aggregate function

The code inside do_search() is simply a nightmare of string parsing so I am not even going to attempt to fix it as part of DB TNG. Instead, I'm marking it here for those who actually comprehend search.module to fix when updating that function for the new API. :-)

Comments

chx’s picture

As score is not aggregated this is right to fail. We need like MAX MIN or SUM in there.

damien tournoud’s picture

Status: Active » Needs review
StatusFileSize
new2.56 KB

There are two issues on PostgreSQL: first, PostgreSQL mandates that aggregate functions be used in all non-grouping columns (fair enough), second POWER() on two numeric columns is treated in arbitrary codecision and thus fails with an overflow error.

Before the patch:

MySQL 5.0:
Search engine queries 105 passes, 0 fails, 0 exceptions
Bike shed 16 passes, 0 fails, 0 exceptions
Search engine ranking 26 passes, 0 fails, 0 exceptions

PostgreSQL 8.3:
Search engine queries 105 passes, 0 fails, 0 exceptions
Bike shed 16 passes, 0 fails, 0 exceptions
Search engine ranking 21 passes, 5 fails, 20 exceptions

After the patch:

MySQL 5.0:
Search engine queries 105 passes, 0 fails, 0 exceptions
Bike shed 16 passes, 0 fails, 0 exceptions
Search engine ranking 26 passes, 0 fails, 0 exceptions

PostgreSQL 8.3:
Search engine queries 105 passes, 0 fails, 0 exceptions
Bike shed 16 passes, 0 fails, 0 exceptions
Search engine ranking 26 passes, 0 fails, 0 exceptions

I still have a failure on Search engine ranking on MySQL 5.1 (see #276039: Test "Search engine ranking" fails on MySQL 5.1), both before and after the patch.

alexanderpas’s picture

Priority: Normal » Critical

Because this breaks the functionality on PostgreSQL, this is a critical issue.

Anonymous’s picture

Status: Needs review » Needs work

The last submitted patch failed testing.

damien tournoud’s picture

Title: do_search() fails hard on Postgres » PostgreSQL surge #6: do_search() fails hard on Postgres

Promoted to the PostgreSQL surge.

grub3’s picture

I looked at Search module and found this issue:

line 952:
$select = "SELECT SUM(i.score * t.count) AS calculated_score FROM {search_index} i $join WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d ORDER BY calculated_score DESC";

This query will always fail under PostgreSQL.

The correct query would be:
$select = "SELECT SUM(i.score * t.count) AS calculated_score, i.type, i.sid FROM {search_index} i $join WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d ORDER BY calculated_score DESC";

as GROUP BY columns must appear in the SELECT clause.

My knowledge of D7 internals is not sufficient to provide a patch, because I don't understand (yet) how is used $select.
Can someone fix and submit a patch?

damien tournoud’s picture

@jmpoure: there is already a patch in #2 (that may need a reroll).

Crell’s picture

The patch in #2 is wrong anyway as this is a perfect case where the query builder should make the code 10x more readable. That probably requires considerable refactoring, though. If anyone who knows the search module well enough to refactor the query is interested, I can help with the DBTNG parts. I just don't grok search module in the slightest. :-)

damien tournoud’s picture

No, we need to backport this, so let's fix it in the old way, and only then convert those parts to the new database API.

damien tournoud’s picture

Status: Needs work » Needs review
StatusFileSize
new3.21 KB

Here we are, reroll of #2.

damien tournoud’s picture

StatusFileSize
new6.14 KB

Ok there was three issues here:
(1) As explained above, the queries were wrong, because all some non grouping columns were not aggregated
(2) PostgreSQL needs explicit casts in expressions, without them it defaults to INTEGER
(3) the test was not refreshing the variables AND the "recent" ranking score computing was wrong (it was (0.5^t)/t0 instead of 0.5^(t/t0))... and this only appears in our test when we fixed (1)...

Fixed. Ran flawlessly on MySQL, PostgreSQL and SQLite.

swentel’s picture

Status: Needs review » Reviewed & tested by the community

Patch fixes the search tests!

dave reid’s picture

Status: Reviewed & tested by the community » Needs review

Wow. Awesome work Damien!

dave reid’s picture

Status: Needs review » Reviewed & tested by the community

Arg...did not mean to change status.

josh waihi’s picture

have tested patch: 188 passes, 0 fails, and 0 exceptions for simple tests search module

works great :)

dries’s picture

Status: Reviewed & tested by the community » Fixed

I've committed this patch to CVS HEAD. Admittedly, despite the fact that I spent about 20 minutes looking at the patch, I'm not sure I 100% grok the SUM()-related changes to do_search(). Frustrating patch, but I'll trust Damien's research and judgment. The fact I don't have PostgreSQL installed doesn't really help. :)

damien tournoud’s picture

Status: Fixed » Patch (to be ported)

This will require a partial backport to Drupal 6.

Some background on the SUM() part: the standard search query is:

  SELECT i.type, i.sid, i.relevance AS score
    FROM {search_index} i
      INNER JOIN search_total t ON i.word = t.word
    WHERE i.word = ? AND i.type = ?
    GROUP BY i.type, i.sid HAVING COUNT(*) >= ?

i.type and i.sid are part of the GROUP BY clause, but not i.relevance. This is invalid SQL, even if we know that because (1) (type, sid) is a unique key for {search_index} and (2) t.work is a primary key, we can only have one i.relevance per (i.type, i.sid) couple (we are smarter than the planner...). We simply transform all i.relevance into SUM(i.relevance) to make the query planner happy.

Note that both MySQL and SQLite happily deals with that situation, even if the result can be quite undetermined, when the result set is not unique.

damien tournoud’s picture

Version: 7.x-dev » 6.x-dev
damien tournoud’s picture

Title: PostgreSQL surge #6: do_search() fails hard on Postgres » do_search() fails hard on Postgres
Issue tags: +PostgreSQL Surge
Scott_Marlowe’s picture

Given the off chance that some future change could result in > 1 relevance row per grouped set, it might be best to go with max(relevance) so that we don't start summing multiple rows together and getting odd results unexpectedly.

Unless summing them IS what we'd want should that somehow ever change (i.e. someone adds a table to the query that makes relevance non-unique).

Note that both MySQL / SQLLite and PostgreSQL technically misbehave as regards group by. MySQL's acceptance of items in the select list that aren't grouped by, aggregated, and can't be guaranteed unique is against spec (and some would say common sense.)

PostgreSQL OTOH, according to the spec, should note that the relevance field is unique per tuple output by group by and allow the query to run. But it's a check that isn't made and while it is a TODO item for pgsql, I don't think it's a priority for anyone.

dpearcefl’s picture

Priority: Critical » Normal
Status: Patch (to be ported) » Postponed (maintainer needs more info)

Has this been fixed in the latest D6?

dpearcefl’s picture

Priority: Normal » Major
Status: Postponed (maintainer needs more info) » Patch (to be ported)
jhodgdon’s picture

No, it's not fixed in Drupal 6. If it had been ported, there would be a patch above and the issue would be marked closed/fixed. If you would like to help, make a patch for Drupal 6.

liam morland’s picture

Issue tags: +PostgreSQL

Tagging

jhodgdon’s picture

Version: 6.x-dev » 7.x-dev
Issue summary: View changes
Status: Patch (to be ported) » Fixed

Talked with Gabor (the Drupal 6 branch maintainer) and D6 issues are really not being committed unless they're really essential -- we really don't have a test system for Drupal 6 and it's too dangerous. So... putting this back to D7 / fixed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.