When selecting nodes and sorting on a CCK field (a very typical use case), a massive performance gain can be had by reversing the order of the join:
SELECT node.nid AS nid, node.title AS node_title FROM node node
LEFT JOIN content_type_listing node_data_field_city ON node.vid = node_data_field_city.vid
ORDER BY node_data_field_city_field_city_value ASC LIMIT 0, 100
100 rows in set (1 min 5.36 sec)
VS
SELECT node.nid AS nid, node.title AS node_title FROM content_type_listing node_data_field_city
LEFT JOIN node node ON node.vid = node_data_field_city.vid
ORDER BY node_data_field_city_field_city_value ASC LIMIT 0, 100
100 rows in set (0.01 sec)
This is because it's doing the sort of the primary table.
I wonder if there is any way that Views can be smart enough to make queries in the faster direction?
| Comment | File | Size | Author |
|---|---|---|---|
| #24 | cck_372994-cck2.patch | 1.15 KB | Scott Reynolds |
| #24 | cck_372994-cck3.patch | 1.11 KB | Scott Reynolds |
Comments
Comment #1
merlinofchaos commentedThat is one place where it's really going to be tough. Even though Views is no longer 'node' centric, it is still tied very closely to the primary table and reversing that is something that would take some serious analytical work to do.
Comment #2
davidburnsWouldn't it be safe to say that a majority of the time if the query involves a core table, that table should always be listed first in the query?
While this isn't a global fix, it has the opportunity to at least improve on a number of views that a site would have.
Comment #3
merlinofchaos commentedGiven the modular nature of the query builder, the unclear definition of 'core table', the fact that the real speedup has to do with the WHERE clause matching up with the primary table, you would need some pretty good analytics to do this. This kind of optimization can yield good results, but can also mess queries up, badly, if it gets things wrong.
One thing I learned writing this thing is that the order of JOINs is very important, so attempting to reshuffle things can have major effects. You have to understand the full query to do it. Nothing in the Views query builder ever understands more than its own piece.
Comment #4
robertdouglass commentedYeah - those are the answers I was expecting. I thought of two things. Either we build some query analyzer/optimizer which spots this stuff at the db layer (potentially really cool but fraught with danger), or we give admins a chance to build views that start with a query. It would be great to have an interface that lets you start a view with a query, and to have the formatting and previewing options like before (clearly the query builder part would be turned off in this case). Or we build some sort of decision tree analyzer into views that rewrites some queries in some known cases.
Comment #5
robertdouglass commentedOr we teach Views multiple ways to build some queries and Views tests each one to see which is faster.
Or David Strauss's materialized views will make this discussion irrelevant.
Comment #6
Scott Reynolds commentedFor my interest sake, could you post the EXPLAIN of those SELECTs. Really surprised at this one, MySQL query optimizer usually does a good job of handling this stuff.
Wondering if you could hook into the view querying process and add in a JOIN hint?
I think the EXPLAIN will be more enlightening. As that will expose where the bottlenecks are. There probably is a good reason MySQL decided to JOIN the way it did, as it picks the JOIN order and I think looking at what it says will showcase a way through this.
Comment #7
Scott Reynolds commentedHaving thought about this a lil bit more, I think I understand what Earl is talking about. It revolves around the LEFT JOIN. If it was an INNER JOIN, MySQL would have done it the way described as faster. This is because it has to with the node table to a ALL access type, because there could be NULL values.
In the second query, it is entirely different as it will do an ALL access type on the cck table and not the node table which is significantly smaller.
So had that query used an INNER JOIN it would have been better as MySQL would have scanned the cck table first regardless of the node table being the first table described. (And I also believe that the INNER JOIN is really what you desire here, as you arn't returning other nodes of type listing)
To verify this, remove the ORDER BY, and you will have NULL values. Then change it to an INNER JOIN and boom, problem fixed no NULL values.
Interesting problem, I would just views_data_alter() that table and make it an INNER JOIN and move on :-D.
Tough problem, pretty sure though that Views can't determine this at build time and CCK does this on purpose. It really is a CCK issue, should it be a LEFT JOIN ? OR should it be an INNER JOIN? from a pure perspective, it should be a LEFT JOIN but... the majority use case calls for an INNER JOIN.
Comment #8
robertdouglass commentedScott, thanks for the insights. I understand Earl's comments better now. I think we should brainstorm around "Interesting problem, I would just views_data_alter() that table and make it an INNER JOIN and move on :-D" and see if there are some ways to help people tune their sites and queries. Perhaps some sort of audit tool, and a way to identify views that need the views_data_alter treatment?
Comment #9
Scott Reynolds commentedSo i can explain a bit further. So if you as the developer know that anytime I am selecting fields from the content_type_listing table then in a hook_views_data_alter(&$data) I would change that data definition to use an INNER instead of a LEFT.
Now building a tool within the Views interface would be difficult because the interface doesn't have a concept of table. But perhaps you could build an analyze tool, via a hook_views_analyze(). But even that is per View basis. And the trick described above is affects all Views, thus if you had a View you wanted a LEFT JOIN and another that you wanted as a INNER JOIN you would have problems.
I think the trickiest part might just be figuring out where to ask for "JOIN type"....
Comment #10
webchickSorry, just subscribing for now. Thanks for the thoughtful discussion. I'll see if I can add anything to it in the coming weeks.
Comment #11
webchickBoy. You weren't kidding that INNER JOINs are faster. I just added these ~10 lines to my custom module and shaved 50% off my page loads, which were excruciatingly slow post-D6 upgrade:
Comment #12
webchickEarl, is switching a field to use an INNER JOIN something you'd even remotely consider adding to the UI of Views (and/or is it even possible)? That's what I was looking for when I came searching and found this post instead.
I could picture this working the same way it works to "require" a relationship:
Comment #13
catchsubscribing.
Comment #14
febbraro commentedsubscribing
Comment #15
merlinofchaos commentedFields don't have control over the join that gets used, though. What happens when you have 7 fields off the same table, 3 say to make it INNER and 4 say to make it LEFT?
Comment #16
merlinofchaos commentedBesides, why is that even in the Views queue? CCK controls the left/inner switch on its stuff, not Views. =(
Comment #17
Scott Reynolds commentedI agree with Earl here, CCK should add a relationship to the node table based on node type.
This would then allow for the "require".
I haven't had the time to write the patch, but it should be incredibly straight forward? Because of multi-values, you would have to set up maybe two relationships, one to the "bundled" table and the other to the "delta" table.
edit: This I feel is what really makes relationship so powerful. We can combine them
Comment #18
markus_petrux commentedThis should be addressed first for CCK2 (CCK3 is multigroup stuff).
~~~~~
I think this problem is not easy as it seems. The query always starts from the base table, which is {node}. Then, when you add a CCK field, the table where this field is stored adds a join to the node table (by vid). If you want to reverse this join, the view would have to start from the field table, I think. This join is LEFT, and maybe this could be INNER, but...
1) this is a new feature and it could break other sites, so it should be optional and remain LEFT by default.
2) we could add this as an option, but since this kind of joins are triggered when a CCK field (filter, sort, argument, etc.) is added to the query, where do we add this option? Not easy because the type of join affects all columns of the affected tables.
3) what do we do with joins related to nodereference relationships?
Comment #19
Scott Reynolds commentedI assigned to 3.x because its the front, but u CCK guys do with as you will :-D
But see thats why im advocating relationships. You can choose, on the Field, to use a relationship. (that is btw, how we handle nodereference).
Again, my concern is handling the 'bundling' vs 'delta' fields. But I think that is really the only trick...
Please don't get lock down on optional LEFT/INNER. Relationship handlers do this for you out of the box and allow you to stack them.
Comment #20
markus_petrux commentedNo worries. CCK3 was just a way to help manage the multigroup module without messing too much the main CCK branch for D6, which is still CCK2. See #494100: State of the multigroup module.
Back to the *thing*... so what can we do here?
1) If we want to reverse the order of the first join in the query, unless there's a way to do it and allow the user choose from Views UI (please chime in if you know how to), then we need to start creating the view from the field table. But the problem is that decision is hard to guess until you know why you really need to reverse the order of these tables, and which one needs to be first.
2) We could also let the user choose the type of join between the node table and the first field table, but we cannot turn it into INNER because it will break existing queries where the relation between the node and that particular field table is optional. But here, there's no place in the Views UI where to allow the user choose the type of this join. So...
3) Then we have the joins added by CCK relationships. These have a way to choose the type of join already, but this is not between existing table and the field table, but between the field table and the new node table related to it. Do we want to reverse these joins too?
In think 1) is not a good solution, and 3) is a lot more complex and probably something more generic that affects all other kinds of relationships in Views. And 2)... it somehow depends on Views UI. Since we cannot change these joins to INNER type, where do we ask the user for this option? Isn't it something that affects a lot more stuff that just CCK that basically depends on Views' features?
Comment #21
Scott Reynolds commented4.) a "dirty" solution. Duplicate all CCK $data tables, except change the join to node as INNER
Comment #22
Scott Reynolds commentedActually, Flag module does something cool.
So it basically adds to the 'node' (or 'users' or 'comment') a relationship field that maps the base table 'node' (or 'users' or 'comment') to the flag_content table. CCK could do much the same thing. You notice the use of 'base' and 'base field' are the 'flag content' stuffs. and the 'relationship field' is the nid (or uid or cid). This is actually what CCK could do.
Comment #23
pribeh commentedAddressing webchick in number 11: how can I use the code you posted? Sorry, I'm not a module developer. [updated] nevermind, this is over my head. I'll wait until Scott and/or Markus develop a patch.
Comment #24
Scott Reynolds commentedAttached is a patch against CCK3 and CCK2.
Adds a relationship handler that you can filter the results to ONLY those with the field you specify.
Comment #25
Scott Reynolds commentedUpdated title to really reflect this patch
Comment #26
Scott Reynolds commented/me sighs
That doesn't work...
Its like teasing me...
Comment #27
Scott Reynolds commentedok so the problem with the above patch, is that views standard relationship handler requires the 'base' property to be both
$def['table'] = $this->definition['base'];So afraid that it will require yet another (tm) relationship handler.
Comment #28
pribeh commentedHi,
Just wondering if there was any update on this. I know Scott is uber busy so if it would be cool to hear if anyone else has taken a jab at this.
Comment #29
crea commentedSubscribing. Cool stuff.
Comment #30
radiobuzzer commentedSubscribing