Whilst running unit tests for Postgres for DB TNG, I hit a failure on the test of do_search() that occurs only on Postgres. Specifically, do_search generates a query akin to the following:
SELECT COUNT(*)
FROM (
SELECT i.type, i.sid, (? * COALESCE((n.sticky), 0)) AS score
FROM search_index i
INNER JOIN search_total t ON i.word = t.word
INNER JOIN node n ON n.nid = i.sid
WHERE n.STATUS = 1 AND (i.word = ?) AND i.type = ?
GROUP BY i.type, i.sid HAVING COUNT(*) >= ?
)
(? being the generic placeholder for the new DB API.) Under Postgres, that results in the following error:
SQLSTATE[42803]: Grouping error: 7 ERROR: column "n.sticky" must appear in the GROUP BY clause or be used in an aggregate function
The code inside do_search() is simply a nightmare of string parsing so I am not even going to attempt to fix it as part of DB TNG. Instead, I'm marking it here for those who actually comprehend search.module to fix when updating that function for the new API. :-)
Comments
Comment #1
chx commentedAs score is not aggregated this is right to fail. We need like MAX MIN or SUM in there.
Comment #2
damien tournoud commentedThere are two issues on PostgreSQL: first, PostgreSQL mandates that aggregate functions be used in all non-grouping columns (fair enough), second POWER() on two numeric columns is treated in arbitrary codecision and thus fails with an overflow error.
Before the patch:
After the patch:
I still have a failure on Search engine ranking on MySQL 5.1 (see #276039: Test "Search engine ranking" fails on MySQL 5.1), both before and after the patch.
Comment #3
alexanderpas commentedBecause this breaks the functionality on PostgreSQL, this is a critical issue.
Comment #4
Anonymous (not verified) commentedThe last submitted patch failed testing.
Comment #5
damien tournoud commentedPromoted to the PostgreSQL surge.
Comment #6
grub3 commentedI looked at Search module and found this issue:
line 952:
$select = "SELECT SUM(i.score * t.count) AS calculated_score FROM {search_index} i $join WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d ORDER BY calculated_score DESC";This query will always fail under PostgreSQL.
The correct query would be:
$select = "SELECT SUM(i.score * t.count) AS calculated_score, i.type, i.sid FROM {search_index} i $join WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d ORDER BY calculated_score DESC";as GROUP BY columns must appear in the SELECT clause.
My knowledge of D7 internals is not sufficient to provide a patch, because I don't understand (yet) how is used $select.
Can someone fix and submit a patch?
Comment #7
damien tournoud commented@jmpoure: there is already a patch in #2 (that may need a reroll).
Comment #8
Crell commentedThe patch in #2 is wrong anyway as this is a perfect case where the query builder should make the code 10x more readable. That probably requires considerable refactoring, though. If anyone who knows the search module well enough to refactor the query is interested, I can help with the DBTNG parts. I just don't grok search module in the slightest. :-)
Comment #9
damien tournoud commentedNo, we need to backport this, so let's fix it in the old way, and only then convert those parts to the new database API.
Comment #10
damien tournoud commentedHere we are, reroll of #2.
Comment #11
damien tournoud commentedOk there was three issues here:
(1) As explained above, the queries were wrong, because all some non grouping columns were not aggregated
(2) PostgreSQL needs explicit casts in expressions, without them it defaults to INTEGER
(3) the test was not refreshing the variables AND the "recent" ranking score computing was wrong (it was (0.5^t)/t0 instead of 0.5^(t/t0))... and this only appears in our test when we fixed (1)...
Fixed. Ran flawlessly on MySQL, PostgreSQL and SQLite.
Comment #12
swentel commentedPatch fixes the search tests!
Comment #13
dave reidWow. Awesome work Damien!
Comment #14
dave reidArg...did not mean to change status.
Comment #15
josh waihi commentedhave tested patch: 188 passes, 0 fails, and 0 exceptions for simple tests search module
works great :)
Comment #16
dries commentedI've committed this patch to CVS HEAD. Admittedly, despite the fact that I spent about 20 minutes looking at the patch, I'm not sure I 100% grok the SUM()-related changes to do_search(). Frustrating patch, but I'll trust Damien's research and judgment. The fact I don't have PostgreSQL installed doesn't really help. :)
Comment #17
damien tournoud commentedThis will require a partial backport to Drupal 6.
Some background on the SUM() part: the standard search query is:
i.type and i.sid are part of the GROUP BY clause, but not i.relevance. This is invalid SQL, even if we know that because (1) (type, sid) is a unique key for {search_index} and (2) t.work is a primary key, we can only have one i.relevance per (i.type, i.sid) couple (we are smarter than the planner...). We simply transform all
i.relevanceintoSUM(i.relevance)to make the query planner happy.Note that both MySQL and SQLite happily deals with that situation, even if the result can be quite undetermined, when the result set is not unique.
Comment #18
damien tournoud commentedComment #19
damien tournoud commentedComment #20
Scott_Marlowe commentedGiven the off chance that some future change could result in > 1 relevance row per grouped set, it might be best to go with max(relevance) so that we don't start summing multiple rows together and getting odd results unexpectedly.
Unless summing them IS what we'd want should that somehow ever change (i.e. someone adds a table to the query that makes relevance non-unique).
Note that both MySQL / SQLLite and PostgreSQL technically misbehave as regards group by. MySQL's acceptance of items in the select list that aren't grouped by, aggregated, and can't be guaranteed unique is against spec (and some would say common sense.)
PostgreSQL OTOH, according to the spec, should note that the relevance field is unique per tuple output by group by and allow the query to run. But it's a check that isn't made and while it is a TODO item for pgsql, I don't think it's a priority for anyone.
Comment #21
dpearcefl commentedHas this been fixed in the latest D6?
Comment #22
dpearcefl commentedComment #23
jhodgdonNo, it's not fixed in Drupal 6. If it had been ported, there would be a patch above and the issue would be marked closed/fixed. If you would like to help, make a patch for Drupal 6.
Comment #24
liam morlandTagging
Comment #25
jhodgdonTalked with Gabor (the Drupal 6 branch maintainer) and D6 issues are really not being committed unless they're really essential -- we really don't have a test system for Drupal 6 and it's too dangerous. So... putting this back to D7 / fixed.