Closed (won't fix)
Project:
Views (for Drupal 7)
Version:
6.x-2.x-dev
Component:
Code
Priority:
Major
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
12 May 2009 at 21:30 UTC
Updated:
15 Nov 2010 at 19:29 UTC
Jump to comment: Most recent file
Comments
Comment #1
darren ohPatch attached.
Comment #2
merlinofchaos commentedI'm going to need a reroll against the 3.x branch as well, it appears. This affects query.inc so the patches will be different.
Comment #3
darren ohRe-roll. I haven't tested the 3.x branch.
Comment #4
merlinofchaos commentedHmm. I think I must've half committed this in the 3.x branch already. That was lame of me. Anyway, committed to 2.x and 3.x branches.
Comment #6
josh waihi commentedI can't say I agree with the whole idea because not only does it look ugly but I question its scale-ability, which is a huge factor for PostgreSQL sites. If we're going to do pgsql specific stuff, I'd rather add non-aggregate fields to the group by clause rather than use first(). Running tests I found that this improved performance by 185%. For the meantime, when other tables with nid are included, this patch fails so attached is the fix for it.
Comment #7
darren ohPlease refer to issue 607418.
Comment #8
darren ohSorry, that issue deals with a different problem. In answer to Josh's comment, adding more fields to the GROUP BY clause could result in duplicate records. We have to use aggregates for Distinct to work. The FIRST() function simply does explicitly what MySQL does implicitly.
I've modified the patch to use the base field alias, as is done in Views 3. Should be ready to commit.
Comment #9
merlinofchaos commentedNote that Josh is concerned that FIRST() causes major performance problems. To which I am sympathetic, but I also feel pulled between terrible forces on this one.
Comment #10
darren ohWe may be able to eliminate the use of FIRST by adding a DISTINCT ON clause. By including every GROUP BY field in the DISTINCT ON clause, we would match the behavior of MySQL.
Comment #11
josh waihi commentedI know this comment isn't going to go down well, but, is there a way that we could wrangle MySQL to perform more like PostgreSQL in this case? I thought that MySQL can handle PostgreSQL syntax fine in this case...
SELECT DISTINCT(field_1), other_fields FROM {table} GROUP BY other_fieldsIs that different toSELECT DISTINCT(field_1), other_fields FROM {table}in MySQL?If not, I'd like to look at the DISTINCT ON idea, though from the documentation, it didn't seem like it was what we're looking for. Hopefully I'm wrong.
@merlinofchaos can you commit #8 please?
Comment #12
YK85 commentedI have come across some issues with distinct.
I was wondering if there has been further development in this area?
Thanks!
Comment #13
grub3 commentedA possible solution would be to use SELECT DISTINCT ON (foo, bar) ON as proposed by Josh Berkus
Reference: http://www.postgresql.org/docs/8.4/static/queries-select-lists.html
Here expression is an arbitrary value expression that is evaluated for all rows. A set of rows for which all the expressions are equal are considered duplicates, and only the first row of the set is kept in the output. Note that the "first row" of a set is unpredictable unless the query is sorted on enough columns to guarantee a unique ordering of the rows arriving at the DISTINCT filter. (DISTINCT ON processing occurs after ORDER BY sorting.)
Could you give an example of the original SQL query working in MySQL and I will do my best to translate it for PostgreSQL using pgAdmin3.
By the way, making extensive use of DISTINCT hits the database performance a lot, because it may provoque sequential scans to differentiate data. Drupal make heavy use of DISTINCT everywhere, because people have the impression that it has no impact on performance. But in database with 100.000 records, it can have a huge impact.
Many times, DISTINCT can be replaced with LEFT JOINs which are much faster, but I need to read your SQL to tell.
In Views, the user clicks on "Distinct" to choose whether results should be distinct. In some cases, when using views, I had the impression that the LEFT ROW was the node and that we could make a LEFT JOIN without asking the user whether it was distinct or not.
Please not this is my first post on Views, so I have NO KNOWLEDGE of any kind on this matter, only SQL.
Kind regards,
Jean-Michel
Comment #14
josh waihi commented@jmpc, this only applies to situations where the user determines that the query must be DISTINCT. Views won't make a query distinct unless the user tells it too. So for example, Joins are generally used where possible.
So our concern here is how Views deals with query building on PostgreSQL when a query must be distinct. View has a working solution at the moment but its crap in terms of performance.
Both Josh Berkus and Darren Oh (#8) suggest using DISTINCT ON which I am fore over the user of FIRST. What we need is a patch that does this and doesn't break MySQL.
Comment #15
robby.smith commentedsubscribing
Comment #16
Letharion commented"Low performance" bugs aren't critical.
@merlin
This one is getting rather old and could use some dev love :)
Comment #17
dawehnerAnd additional we need a patch for views 3.x
Comment #18
merlinofchaos commentedI wasn't really the person working on this, and the DISTINCT thing is kind of an ugly stepchild in Views. It's known to be a hack to try and deal with some issues. It's totally broken in 7.x and probably always will be at this point, so I don't think I am particularly interested in this issue.
If someone wants to take up the cause and fix this, they're welcome to, but for now, I'm just going to close this.