When using Views aggregation setting this module does not seem to be able to filter out duplicates. I'm not sure if this is a bug or working as intended, but in either case being able to do that would be very useful, as duplicates in Views are far worse when also trying to use Views Aggregation.

Alternatively, adding aggregation functions to the fields (e.g. Count, Average, etc) from within this module so that we can avoid using the Views Aggregation all together would be great.

Comments

2pha’s picture

Have you tried the 'distinct' setting under 'Query settings' ? It is under 'Other' on the right hand side of a view admin screen. Views 3 provides this distinct functionality without the need of a module.

Juryiel’s picture

Yes, but it doesn't work in many situations. Basically regular Views aggregation doesn't do a great job of eliminating duplicates no matter what you do if you have complicated filters and relationships.

jay.dansand’s picture

Assigned: Unassigned » jay.dansand
Status: Active » Postponed (maintainer needs more info)

Can you give me more details about your View, aggregation settings, and Views Distinct settings (what field you're setting to "Filter Repeats," and if you're doing it post-render or pre-render)?

Juryiel’s picture

Sure, I've tested the view quite a bit so here's the simplified version that causes issues. If I'm not being clear I can export it for you.

I have 4 nodes of Type A,

They have a field, Field A, and several other numeric fields

Two of the nodes have Value 1 in Field A, one has Value 2, and one has Value 3

I have a filter for node type = Type A, and a contextual filter for Field A to be grabbed from URL

The view works great when I'm not aggregating, and I get back whatever I filter for in the Contextual Filter, no duplicates. However, if I use Views Aggregation (even without any sorting, which causes issues for Views Aggregation), I always get one of the nodes repeated for each Value in Field A. If I ask for all the nodes (no value passed to the contextual filter), for example, I get 7 nodes. 3 of them have Value 1, 2 of them have Value 2, and 2 of them have Value 3. If I ask for all nodes with contextual filter Value 1, once again, I get 3 nodes returned even though I only have 2 nodes. I tried the Distinct and Pure Distinct settings etc from Views, and I also tried to filter out repeats with this module, neither of which worked. Note that in my Views Aggregation Settings I'm using an averaging function, but really, any of the functions I use return duplicates - I compared Nid for example using Count (returns 7 when no Contextual Filter is passed), vs. Count Distinct (returns 4 when no contextual filter is passed).

I'm not sure what pre-and post-render are, but if you mean this setting: 'Use the rendered output of this field', I tried both. The option I'm using in Views Distinct is 'Filter Repeats'.

Also I should say, Views Aggregation doesn't allow per-field aggregation, so if you just want to use its aggregation functions like AVG, COUNT, MAX, etc, on only one field while leaving other fields ungrouped, it's not possible. I don't know much about SQL queries so I'm not sure how difficult it would be to add functions to Views Distinct (so for example, as opposed to just grouping restuls together with Aggregate Repeats, a setting that removes duplicate results and averages the rest would be a useful feature request).

jay.dansand’s picture

Which field are you setting "Filter Repeats" on?

Juryiel’s picture

The numeric fields (so any fields that are not Field A) as well as Content: nid. So like, Field 1, Field 2, and Field 3 are on the node and numeric. I want to pull all Field 1 from each node whose Field A matches the Contextual Filter, and average them together, all Field 2 and average them, etc, but I'm getting duplicates as described, one duplicate for each unique value returned by the Contextual Filter that runs on Field A.

jay.dansand’s picture

I tried a quick toy example and everything seems to work right:

1) Created a view on Users.
2) Created a relationship on "User: Content authored".
3) Displayed the fields "nodes: Content nid" and "User: Name", which ended up with a ton of repeated names (as expected).

Then I tested Views Distinct:
4) Set the User: Name field to "Filter Repeats," which worked as expected and removed results with repeat names.

Then I enabled Views' Aggregation, and tested it out:
5) Edited Aggregation Settings on the field "nodes: Content nid" and set Aggregation type to "Average" and the results were as expected: still one line per user, but under Nid I have the expected useless decimal fractions (since Averaging doesn't make much sense on NIDs, but this is a toy example).

Do you have output or query caching turned on in the View?

Juryiel’s picture

I'll have to wait a few hours until I'm home to check the View settings. I can export the view and post it here too if that would help. Also, how do you know that the NID average is correct and no nodes are repeated in the averaging calculation?

The error is not that I get multiple values, but that the Average is wrong. E.g. when the Field 1 values are 4 and 2 in each of two nodes, the average should be 3. However, the average returned ends up being (4+2+4)/3 because one of the nodes is somehow being duplicated.

Can you recheck your view by doing both a Count, and a Count Distinct on your NID? If everything is working right those should return the same values for each user. If not, Count will be higher.

jay.dansand’s picture

Count and Count Distinct both return the same numbers. Then I did Sum and divided that by the Count numbers manually, and the Average came out correctly.

A thought just occurred to me. I don't think you can solve your problem with Views Distinct (which happens after the SQL query) - the query itself is what creates the aggregation using the SQL GROUP BY function. That means the aggregated value is built by the database, before Views Distinct can run and eliminate duplicates.

So, to solve the duplicate problem, you'll need to fix your query so it doesn't have duplicate rows based on the relationship(s). You can probably do that via hook_views_query_alter() by altering where the GROUP BY is added, or following the example at Forcing SQL groupby in views 3 with query comments where the whole GROUP BY is added in that function, not just altered there.

Juryiel’s picture

Yeah I figured that was the case but not knowing about this module I wasn't sure if you could make it operate before the query was done, or if it was doing something after.

However, one way Views Distinct could help with this is if we could have aggregation functions (like average for example) like Views Aggregation gives. Then I could just not use Views Aggregation at all and it would actually be helpful to be able to aggregate individual fields rather than everything in the view. I guess this might be outside of the scope of this module, but I figured it might not be since you can do more than 'Filter Out' the non distinct values. If it is, feel free to close this.

Thanks for the links. Unfortunately I'm not too familiar with SQL, but hopefully there's enough information there for me to figure it out.

Juryiel’s picture

I was able to solve my problem in a roundabout way (PHP field in views to do the averaging myself via PHP rather than using Views Aggregation). I'd still be interested in a solution since it would be more convenient, so if you're interested in adding this type of functionality to the module let me know and I can reproduce my problem view for you.

jay.dansand’s picture

Status: Postponed (maintainer needs more info) » Postponed

I'm glad you could work it out!

Another option for adding this functionality to the module could be to make a 3rd method for Views Distinct to aggregate duplicates: modifying the underlying query before it is executed and grouping on just the desired field(s) instead of on the whole base table as Views does by default. That'd let the logic get handled by the DB, which allows for more optimization and query caching.

Thanks for the suggestion! I don't know when I'll have time to code it, but whenever I do, it could be a beneficial addition.