Groupwise maximum ('representative') relationships
| Project: | Views |
| Version: | 6.x-3.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
I've posted a patch that adds views support to image_gallery module, essentially making the existing hardcoded galleries entirely with views.
The quick gist of it is that we need a "cover image" for each gallery in a list of galleries -- in other words, a list of terms needs a "representative node" for each term -- say cover node as it's shorter.
I've implemented several ways of getting this: relationships from term to node and also regular fields, with fairly complex options for how to handle subterms.
There is a detailed explanation of my approach in this comment: http://drupal.org/node/405456#comment-1466564
Predictably, someone has asked me if this could be generalized to non-gallery terms. Obviously it could, but image_gallery is no longer the place for it.
Would part of this approach have a place in Views module?

#1
Yes, term to node and node to term relationships should be in Views itself. I should've done both of those long ago but I haven't had time recently.
#2
If you let me know which parts of my patch you'd like for Views I can work on a patch for Views.
Taking another look at it just now, it occurs to me than instead of implementing several relationships for different options (eg first node by title, newest node, etc) it might be cleaner to have only one relationship with a slew of options. But then this is less easily extended by third party modules.
#3
It's generally best to have a relationship with as few options as possible and rely on filters to provide the options. For terms, 'vocabulary' is the best thing to make the relationship use optionally.
#4
You've lost me now...
My approach has been to use a subquery in the join, so the relationship does this:
LEFT JOIN node node_term_data ON ([SUBQUERY]) = node_term_data.nidEach relationship implements a different subquery to get a cover node nid, eg:
- nodes ordered by creation date
- nodes ordered by title
The patch is here: http://drupal.org/files/issues/image_gallery_views_3_B.patch
#5
So I had a bit of a brainwave this morning while still half asleep :D
To reiterate first, the current approach is made up of: (with the current names from the patch; these should become more generic)
- class image_gallery_join_subquery: expects to be given a subquery by the relationship handler that creates it. This creates a join where the left side of the join clause is a subquery that returns a single value. It can be used with any tables as long as the subquery it's given is correct.
- class image_gallery_handler_relationship_gallery_cover: this is used by several relationship data definitions. Each definition provides a different ORDER clause for the subquery. The relationship handler knows how to build the rest of the subquery.
- about three different relationship data definitions, each giving a different ORDER clause (eg, order nodes by title, by time created, etc). This is that way that users select how the gallery node is chosen: do they want the gallery cover image to be the most recent node, the first node by title, etc.
My new idea is: why not use another view to provide the subquery?
We'd then have:
- class image_gallery_join_subquery: as before, expects to be given a subquery for its left side of the clause.
- class image_gallery_handler_relationship_subquery: Provides a user option to choose a view, and code for checking the view is suitable, and code to get just the query SQL from this chosen view.
- relationship data definition: specifies that the chosen query must be a node query, and gives the correlated field to use in the query.
Benefits:
This would allow the user to set the sort order themselves, and save us having to keep defining another relationship whenever there's a feature request for another one.
The system would be lot more reusable, as a new relationship definition could allow something completely different with different bases (I can't think of useful examples yet... say the most recent ubercart order for each user?)
Questions I'm not sure about:
- how do you get a view to return just the query SQL without running it? I seem to remember I had problems doing this.
- also, the query SQL would have to be built with a dummy argument for the correlated field -- we want the SQL to come back with a %s where we can put the name of the correlated field.
#6
i think http://drupal.org/node/488314 is what you need :)
#7
Won't that will return ALL nodes for each term? That just joins term via term_node to node, so if several nodes have that term, the relationship join will produce multiple rows.
That's useful in some cases, but not what we're after here.
The idea here is to get ONE node for each term that is a sort of 'characteristic' for that term.
For image galleries, that translates as the 'cover image node' for each gallery term.
Here's another example use case:
Suppose I have a node type 'animals' and a taxonomy 'species'.
I want to make a list view of species, with the latest animal in each one, like this:
Horse Latest horse node title Latest horse imagefield Latest horse authorCat Latest cat node title Latest cat imagefield Latest cat author
...
My original approach can provide a handful of relationships that cover major cases like "latest created", "first created", "first by node title".
But this new approach, if it's possible, would allow a user who has installed a voting module to set the sort criteria on the relationship subquery view themselves, and then get this:
Horse Favourite horse node title Favourite horse imagefield Favourite horse authorCat Favourite cat node title Favourite cat imagefield Favourite cat author
...
#8
Update from Drupalcamp UK :)
I've coded a fair bit of this approach and found a problem.
As soon as a sort is added to the subquery view, it gains an extra column. This means it can't be used as a subquery.
The way round this is to wrap it up in another SELECT. But then the outer reference field can't find its outer reference.
The full query is this:
SELECT term_data.tid AS tid,term_data.name AS term_data_name,
term_data.vid AS term_data_vid,
node_term_data.nid AS node_term_data_nid,
node_term_data.title AS node_term_data_title
FROM term_data term_data
LEFT JOIN node node_term_data ON (SELECT subquery.nid FROM (SELECT node.nid AS nid,
node.title AS node_title
FROM node node
INNER JOIN term_node term_node ON node.vid = term_node.vid
WHERE term_node.tid = tid <--- this bit doesn't find the tid on the first line.
ORDER BY node_title ASC
LIMIT 1) AS subquery) = node_term_data.nid
WHERE term_data.vid in ('4')
Longwave & I are looking at an alternative method without subqueries: http://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-i...
Stay tuned :)
#9
Quick update:
Doing the join method seems terribly complex as just about the entire view query needs to be recreated to join on to. This is because we're not just dealing with one table, but with a dataset that's probably made of several tables.
I've yet to manage to get this to work properly even as a bare query in phpmyadmin.
So I've gone back to the subquery approach. The relationship handler has an option that lets the user pick a sort -- this gives the criterion for picking the 'best' item in the subquery (eg comment count, post date).
It all works :)
Just need to get it into a fresh checkout of views and roll a patch.
#10
Here's a patch on views HEAD.
It contains the following:
- a join handler that allows joins with a subquery as the left hand side of the join clause. This is a subclass of views_join.
- a relationship handler that allows groupwise maximum relationships to be defined. For end-users I'm calling these 'representative relationships' as 'groupwise maximum' is a pretty awful term. This allows users to pick a sort within the relationship options to determine the 'maximum' part -- the criterion for choosing a representative in other words. When the user options are saved this whips up a temporary view to generate a query string which is stored. The handler uses this and the subquery join to add itself to the query when it is run in a view.
- a relationship from terms to nodes that uses the handler.
- a help file
This works perfectly for the relationship from terms to nodes :)
You can create a term view and add to it a representative node that gives you...
- the latest node for each term
- the node with the most comments
- the first node by title alphabetically
- ... and so on. Just pick a different sort in the drop-down in the relationship options.
I've attached an example view.
However... when I tested a relationship from users to nodes I found a problem: the field name I need for the outer reference in the subquery is the same as the base field of the subquery, so the WHERE clause in the subquery becomes a tautology and fails completely.
The upshot is that for this to work on other bases, it needs a way of altering all the aliases in the subquery.
Ie, we currently get a subquery of:
SELECT node.nid AS nid
FROM node node
INNER JOIN users users ON node.uid = users.uid
WHERE users.uid = users.uid
ORDER BY node.created DESC
LIMIT 1
and we need something like:
SELECT node.nid AS nid_INNER
FROM node node_INNER
INNER JOIN users users_INNER ON node_INNER.uid = users_INNER.uid
WHERE users_INNER.uid = users.uid /* this one is the outer reference */
ORDER BY node_INNER.created DESC
LIMIT 1
But I am stumped and it's late -- anyone got any ideas?
I am marking as 'needs review', because as it is this is fine for the term relationship.
A subsequent improvement to how the subquery is generated so it works on users won't break that.
#11
Got it working -- I have in front of me a view of users showing the latest node for each one :)
Basically, I've added some stuff to generate the query string differently for our subquery:
- a dummy view class whose only change from the regular query class to...
- our own query class that puts a namespace on all tables.
These classes are only needed when the user saves the relationship options on the main view, as they're just used in options_submit() to generate the subquery which is then saved. So they don't even need to be loaded with the handler. So before I roll a patch: should I put them in an INC file in the same folder as the handler and load it with include_once, or in the includes folder and use views_include?
#12
Patch :)
All works.
Try either:
- term view, with most recent node for each term
- user view, with most recent node from each user.
I tried seeing if I could implement a 2nd method that lets the user pick another view to generate the subquery -- this would let you do "List of users with their most recent nodes of type X", but I ran into horrendous problems with namespacing all the tables: can't figure out how to consistently change the aliases for all tables and the fields that are on them. I reckon we can leave this extra feature till later though.
#13
joachim
I think this patch is exactly what I am looking for. I applied it against the latest head. Can you provide any instructions on how to implement a view as outlined below:
I have a cck type called catalog-Item and a taxonomy vocabulary called Catalog-Categories that is associated with my CCK type.
I would like to create a view that pulls one node per vocabulary term.
So I have three nodes:
Category (term) | Node Title
Breaker | Product 1
Breaker | Product 2
Generater | Product 3
And I want the View to return the following:
Breaker | Product 1
Generator | Product 3
#14
Like this:
- create a term view that shows the terms for your vocab
- add a relationship for 'representative node'
- set the sort order -- not sure what it is you want from your example. Maybe node title?
- add the node title field on the relationship.
#15
Thanks.
I don't see an option under relationship for 'representative node' maybe the patch did not apply?
I would like node title and node image field to display.
#16
@15: clear the Views cache.
You can get anything you like on the node to display -- just add fields.
#17
OK I think I am getting closer.
I cleared the views cache and was able to set up the relationship.
If I set node title to us "Representative Node" i still get two results displaying in the preview code for the same taxonomy term. The node titles are the same, the displayed taxonomy term is displayed but the image paths are different (which is odd since the image path should be tied to the node title).
Circuit Breakers [taxonomy term]
Catalog Item 2 [node title]
/yc156-arc-grid2-web.jpg [node image path]
Circuit Breakers [taxonomy term]
Catalog Item 2 [node title]
/catalog-images/05035d.jpg [node image path]
If I set all fields to be Representative Node then I get two results returned in the preview code although each item is identical but still I think there should be just one result returned.
Circuit Breakers [taxonomy term]
Catalog Item 2 [node title]
/catalog-images/05035d.jpg [node image path]
Circuit Breakers [taxonomy term]
Catalog Item 2 [node title]
/catalog-images/05035d.jpg [node image path]
Thanks for your help on this!
#18
You may be getting terms multiple times simply because of the term view itself -- try making the view again but only with the term stuff. See what happens with that.
#19
I set up a brand new view but it still returns two items of the same term. I only used Node Term as the field and filter by node type.
Here is the SQL code if that helps:
SELECT node.nid AS nid, node_term_data__term_data.name AS node_term_data__term_data_name, node_term_data__term_data.vid AS node_term_data__term_data_vid, node_term_data__term_data.tid AS node_term_data__term_data_tid FROM node node LEFT JOIN term_node term_node ON node.vid = term_node.vid LEFT JOIN term_data term_data ON term_node.tid = term_data.tid LEFT JOIN node node_term_data ON (SELECT node_INNER.nid AS nid FROM node node_INNER INNER JOIN term_node term_node_INNER ON node_INNER.vid = term_node_INNER.vid WHERE term_node_INNER.tid = term_data.tid ORDER BY node_INNER.title DESC LIMIT 1) = node_term_data.nid LEFT JOIN term_node node_term_data__term_node ON node_term_data.vid = node_term_data__term_node.vid LEFT JOIN term_node node_term_data_node_term_data__term_node ON node_term_data.vid = node_term_data_node_term_data__term_node.vid LEFT JOIN term_data node_term_data__term_data ON node_term_data__term_node.tid = node_term_data__term_data.tid WHERE node.type in ('catalog_item')#20
I meant only do the term part -- don't add the relationship. That bit alone might be returning too many rows.
Though that view's SQL looks wrong -- you sure that's a term view? Surely it shouldn't have node as its first table in that case?
#21
OK I was creating a node type view not a taxonomy type view.
Now it is just about working except for one thing. The Preview returns just one term and image path but it also brings up two empty results:
Term: Circuit Breakers
/catalog-images/05035d.jpg
Term:
Term:
SELECT term_data.tid AS tid, node_term_data_node_data_field_catalog_item_image.field_catalog_item_image_fid AS node_term_data_node_data_field_catalog_item_image_field_catalog_item_image_fid, node_term_data_node_data_field_catalog_item_image.field_catalog_item_image_list AS node_term_data_node_data_field_catalog_item_image_field_catalog_item_image_list, node_term_data_node_data_field_catalog_item_image.field_catalog_item_image_data AS node_term_data_node_data_field_catalog_item_image_field_catalog_item_image_data, node_term_data.nid AS node_term_data_nid, node_term_data.type AS node_term_data_type, node_term_data.vid AS node_term_data_vid, node_term_data__term_data.name AS node_term_data__term_data_name, node_term_data__term_data.vid AS node_term_data__term_data_vid, node_term_data__term_data.tid AS node_term_data__term_data_tid FROM term_data term_data LEFT JOIN node node_term_data ON (SELECT node_INNER.nid AS nid FROM node node_INNER INNER JOIN term_node term_node_INNER ON node_INNER.vid = term_node_INNER.vid WHERE term_node_INNER.tid = term_data.tid ORDER BY node_INNER.created DESC LIMIT 1) = node_term_data.nid LEFT JOIN content_type_catalog_item node_term_data_node_data_field_catalog_item_image ON node_term_data.vid = node_term_data_node_data_field_catalog_item_image.vid LEFT JOIN term_node node_term_data__term_node ON node_term_data.vid = node_term_data__term_node.vid LEFT JOIN term_node node_term_data_node_term_data__term_node ON node_term_data.vid = node_term_data_node_term_data__term_node.vid LEFT JOIN term_data node_term_data__term_data ON node_term_data__term_node.tid = node_term_data__term_data.tid WHERE term_data.vid in ('8')#22
Once again: please make sure your terms show properly WITHOUT the relationship or any of its fields. If that works ok then we know it's a problem with this patch.
#23
Sorry for not being clear, with just a Term View filtered by my vocabulary the results returns as expected:
Term: Circuit Breakers
Term: Generators
Term: Load Centers
SELECT term_data.tid AS tid, term_data.name AS term_data_name, term_data.vid AS term_data_vid FROM term_data term_data WHERE term_data.vid in ('8')There are two nodes created under Circuit Breakers and no nodes under the other tems so with Representative Relationship I am expecting
Term: Circuit Breakers & One Image path from a representative node to be returned which is happening but then I get two extra empty results returned.
#24
Ah right -- this happens with any relationship unless you tick the 'Require this relationship' box. I've just tested this on the 'most recent node per user' test view and it works fine.
#25
OK this is now working as expected. Is this patch going to make it into Views?
#26
Has anyone tried one of the views while not logged in as UID 1?
I wonder if this might have an effect: http://drupal.org/node/151910
If it causes problems, an alternative approach might be to use http://api.drupal.org/api/function/db_query_temporary/6
#27
subscribing
#28
Subscribing.
#29
Thanks for this GREAT patch! I spent a good deal of time today figuring out how to do just this before merlinofchaos turned me onto this patch. The one problem I've discovered is that with some choices selected in the "Representative sort criterion:" the field will not be properly prefixed with the table it belongs to. Example: I selected "Comment: Id" and received a SQL error. The relevant part of the query is as follows:
term_node_INNER.vid LEFT JOIN comments comments_INNER ON node_INNER.nid = comments_INNER.nid WHERE term_node_INNER.tid = term_data.tid ORDER BY .cid DESC LIMIT 1)Note that commets is properly joined but in the ORDER BY section the .cid doesn't have a table prefixed to it. It should read comments_INNER.cid. Manually changing it and running the same query from the MySQL terminal works properly.
I've found the same behavior with all book, content, comment, taxonomy and user options. Node and search options work fine.
I don't have time to trouble shoot this at the moment but will try to make time for it soon. This is a fantastic addition to views!
#30
Urgh. I hadn't tested comments.
The problem is that the subquery has to be namespaced -- the 'INNER' suffix on everything -- and that requires a version of the Views object (class view_aliased) that's hacked in several places.
IIRC I left comments at all the points of hackery, so it should be figurable if you get to this before I do :)
#31
Thanks for your work on this. subscribing.
#32
I would like to get this into Views 3.x now that we're seriously working on it, so I'd really love to get some extra attention on this. I apologize that I've let this fall off my radar in all the bajillion thigns I have to do, but this is a really cool feature, often requested. Let's see if we can get this into something we'd like to actually commit.
#33
The problem this was hitting was reliably producing a query for a view in which everything has a namespace prefix -- in the View 2.x query object, prefixes for tables and fields seem to come from a whole load of different places, and I remember problems with them overlapping but not covering everything (eg, I'd get some table names with no prefix at all, then fixing that would result in fieldnames with the prefix twice!).
So the first thing would be to see how this will work with the 3.x query building system.