On my 500.000 post forum, I read this query on PostgreSQL logs:
2009-08-25 14:28:30 CEST LOG: durée : 720.124 ms, instruction : SELECT n.nid, r.tid, n.title, n.type, n.sticky, u.name, u.uid, n.created AS timestamp, n.comment AS comment_mode, l.last_comment_timestamp, IF(l.last_comment_uid != 0, cu.name, l.last_comment_name) AS last_comment_name, l.last_comment_uid, l.comment_count AS num_comments, f.tid AS forum_tid FROM node_comment_statistics l INNER JOIN (SELECT DISTINCT ON (nid) * FROM node) n ON n.nid = l.nid INNER JOIN users cu ON l.last_comment_uid = cu.uid INNER JOIN term_node r ON n.vid = r.vid INNER JOIN users u ON n.uid = u.uid INNER JOIN forum f ON n.vid = f.vid WHERE n.status = 1 AND r.tid = 9 ORDER BY n.sticky DESC, l.last_comment_timestamp DESC, n.created DESC LIMIT 50 OFFSET 0
The problem comes from these lines in forum.module:
$sql = db_rewrite_sql("SELECT n.nid, r.tid, n.title, n.type, n.sticky, u.name, u.uid, n.created AS timestamp, n.comment AS comment_mode, l.last_comment_timestamp, IF(l.last_comment_uid != 0, cu.name, l.last_comment_name) AS last_comment_name, l.last_comment_uid, l.comment_count AS num_comments, f.tid AS forum_tid FROM {node_comment_statistics} l INNER JOIN {node} n ON n.nid = l.nid INNER JOIN {users} cu ON l.last_comment_uid = cu.uid INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {users} u ON n.uid = u.uid INNER JOIN {forum} f ON n.vid = f.vid WHERE n.status = 1 AND r.tid = %d");
Again, the problem comes from :
(SELECT DISTINCT ON (nid) * FROM node)
There seems to be design flows in db_rewrite_sql
You should NEVER ***EVER*** begin each transaction containing node table with a complete sequential index on distinct.
I am not proposing a patch and I am asking in which direction to go:
* either fix db_rewrite_sql
* OR remove db_rewrite_sql in this precise query
| Comment | File | Size | Author |
|---|---|---|---|
| #34 | left-join-fix.diff | 1.83 KB | grub3 |
Comments
Comment #1
damien tournoud commentedPlease see #284392: db_rewrite_sql causing issues with DISTINCT.
Comment #2
grub3 commentedDear Damien, please reopen.
This grave issue needs review from the community.
This is such an important issue at Drupal that it could double the speed of the whole framework.
So let us leave this issue open until core developers fix this either locally or in the database abstraction layer.
Comment #3
damien tournoud commentedThere is probably something broken with your installation. There is no reason for db_rewrite_sql() do to anything if you don't have access control enabled, and I see no {node_access} table joined in the query you copied.
Could you please dump the values of
$join, $where, $distinctinside db_rewrite_sql() for this particular query on your installation?Comment #4
grub3 commentedThis is line 574 from forum.module
$sql = db_rewrite_sql("SELECT n.nid, r.tid, n.title, n.type, n.sticky, u.name, u.uid, n.created AS timestamp, n.comment AS comment_mode, l.last_comment_timestamp, IF(l.last_comment_uid != 0, cu.name, l.last_comment_name) AS last_comment_name, l.last_comment_uid, l.comment_count AS num_comments, f.tid AS forum_tid FROM {node_comment_statistics} l INNER JOIN {node} n ON n.nid = l.nid INNER JOIN {users} cu ON l.last_comment_uid = cu.uid INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {users} u ON n.uid = u.uid INNER JOIN {forum} f ON n.vid = f.vid WHERE n.status = 1 AND r.tid = %d");
I don't understand where to look further.
Thanks for your help, I wait further.
Damien, I upgrade to Drupal-dev meanwhile to make sure we use the same code base.
Comment #5
damien tournoud commented@jmpoure: I know where it is, please dump the runtime values of $join, $where and $distinct in your actual Drupal site.
Comment #6
grub3 commentedOkay Damien, I install dev module or display these values by hand.
Comment #7
grub3 commentedStay tuned, I am learning the devel module.
Comment #8
damien tournoud commentedIn db_rewrite_sql(), just after the call to _db_rewrite_sql().
Comment #9
grub3 commentedYes, I read _db_rewrite_sql() and I think it is a flow in design to let a computer run DISTINCT automaticaly when most of the time the query produces distinct results. I installed dev module, I output the information you nee. This is a cool module.
Comment #10
grub3 commentedLook for example, this is my forum:
These queries are natural distinct queries, no need to force them distinct:
SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0, u2.name, ncs.last_comment_name) AS last_comment_name, ncs.last_comment_uid FROM node n INNER JOIN users u1 ON n.uid = u1.uid INNER JOIN term_node tn ON n.vid = tn.vid INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid WHERE n.status = 1 AND tn.tid = 24 ORDER BY ncs.last_comment_timestamp DESC LIMIT 1 OFFSET 0executes in 62 ms not 2380.57
Wait, I give you the needed information.
Comment #11
damien tournoud commentedSomething is wrong on your system, and forces a distinct in those queries. This is clearly not by design.
Comment #12
damien tournoud commentedOh, I see what could happen. Please visit your site using an user that doesn't have the 'administer node' permission, and please copy/paste the resulting queries here.
Comment #13
grub3 commentedOkay. Wait.
Comment #14
grub3 commentedDisconnecting and browsing under Anonymous account gives instant results.
On the converse using a simple user returns very long queries
I need to wait 10 seconds to display the forum list.
Comment #15
grub3 commentedDear Damien, I am upgrading to dev to make sure we are using the same code base.
Comment #16
grub3 commentedAfter upgrading to Drupal-dev, browsing under anonymous returns long queries too:
Comment #17
grub3 commentedOkay, I am lost. Could you tell me exactly how to output the resquested debug information.
Comment #18
damien tournoud commentedWell, the DISTINCT is legit here, but it is not at the correct place. This query should be:
Comment #19
grub3 commentedDear Damien, I remove the contributed access module and it works again:
Three seconds instead of 12.
Comment #20
grub3 commentedBut I still do not understand why results are not cached. I would save a tremendous time on normal middle-size forums. My cache is set to 3 minutes. It is enough. Large forums should use a kind of cache to display.
Comment #21
grub3 commented{node] carries distinct values.
The following query starts processing on {node} and does only INNER JOINS.
Therefore there is no need to force DISTINCT values.
So Drupal code is exact.
Again, when developers understand the logic of DISTINCT, there is no need to add it to SQL.
Adding a DISTINCT on large tables can destroy a query when not needed.
Comment #22
damien tournoud commented@jmpoure: you don't understand how DISTINCT and JOINS work.
^ This query *will* return duplicate entries, even if id is a primary key for table1.
That's exactly what happens here, and why the DISTINCT is needed.
Comment #23
grub3 commentedOkay, I understand:
SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0, u2.name, ncs.last_comment_name) AS last_comment_name, ncs.last_comment_uid FROM node n INNER JOIN users u1 ON n.uid = u1.uid INNER JOIN term_node tn ON n.vid = tn.vid INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid WHERE n.status = 1 AND tn.tid = 2 ORDER BY ncs.last_comment_timestamp DESC LIMIT 1 OFFSET 0Is a complex query returning no result. It is trying to fetch the latest comment on the database container. Therefore it can only result in nothing.
In this case, the cache does not seem to record the result.
Thefore Drupal runs the query, which gets very long answers because of the huge database.
Right?
Comment #24
grub3 commentedDear Damien.
I am afraid no.
In your case, t1.id is a primary key, thus unique.
In SQL92, primary key implies unique Index.
SELECT t1.id FROM table1 t1produces a unique set of t1.id because it is unique by defaut in the database.
When people wrote SQL, they set primary keys and constraint to make sure that records were unique from the entry of the database.
The database make sure t1.id is unique on entry.
Then you agree that this is exactly the same as:
SELECT DISTINCT t1.id FROM table1 t1Then the INNER JOINS produces unique output, by defaut, because it is only inclusive.
Thefore:
Do we agree?
Now in the upper query, last_comment_timestamp and last_comment_name are unique.
A single user cannot enter two comments on the same exact Unix time.
Thefore, the query is unique.
Note that you can force uniqueness setting a double unique index.
Setting unique indexes is the way to make sure tables produces distinct results.
Comment #25
grub3 commentedDear Damien, I tested setting a constraint on a 700.000 entries comment table and some uid,timestamp are not unique. So you are right for the query, but wrong in general about INNER JOINs. But I guess we agree. So you are right to say Drupal should run the DISTINCT clause. But why do we have duplicates?
Comment #26
damien tournoud commentedNo. DISTINCT applies on the *result* of the query, after the query has been processed.
Can return several rows of the t2 table per row of the t1 table (if the relation is 1-N). It would in that case return duplicate values of t1.id. Adding a DISTINCT clause would remove those duplicate values.
Comment #27
grub3 commentedSELECT COUNT (DISTINCT (uid, timestamp)) FROM comments
647021
SELECT COUNT (*) FROM comments
647224
Comment #28
grub3 commentedis the same as
when t1.Id is unique or primary key
If the join is performed on a non unique index, then it may produce duplicates.
So I agree with you.
Comment #29
damien tournoud commentedNo.
Comment #30
grub3 commentedOkay. Do you know why some of my empty SQL results are not cached?
It seems like queries returning empty results are not cached. Right?
For example:
Comment #31
grub3 commentedThis is quite bad, because without this Drupal would be lightning speed.
I will enquire more ... and get back.
Many thanks Damien for help and YES you were right.
The ACL module needs some fixing, urgently.
Comment #32
grub3 commentedDear Damien, I wrote this :
Devel should warn that SQL execution times are from cache
http://drupal.org/node/559158
This is especially with queries returning natural DISTINCT VALUES and that have a DISTINCT in the query.
It make very long queries and people don't notice it Devel module.
If you would like to comment, you are welcome.
Comment #33
grub3 commentedOkay, I think I found the issue debugging the SQL query server side.
The query:
400 ms
should be replaced with:
63 ms.
When a query needs to kwno whether there is a record or not (LIMIT 1 OFFSET 0),
INNER JOINS should be replaced with LEFT JOINS.
Internaly, databases treat INNER JOINS like LEFT JOINS with a DISTINCT CLAUSE.
On large databases like mine this ends-up with a nested loop and DISTINCT sort.
This can be nightmare for a database system.
Comment #34
grub3 commentedBingo:
Excution time divided by two for the forum code !!!
So funny.
Patch attached, please verify under MySQL using Devel module.
Under PostgreSQL, the result is quite impressive.
MySQL internals also treat INNER JOINS like LEFT JOINS with a distinct clause, so this should result in the same gain.
I am going to open a dedicated issue thread for database and start hunting for LIMIT 1 OFFSET 0 queries.
Your guys should leave MySQL, which is not a very serious database and start using PostgreSQL exclusively for devel.
Comment #35
grub3 commentedPatch needs review on a large forum under MySQL.
Otherwize if you know database programming apply safely.
Comment #36
josh waihi commentedYou're talking to PostgreSQL people ;) But keep in mind that Drupal is a application that prides itself in being flexible enough to run on several different OS and databases. Therefore, testing and debugging is needed by all environments, only the ones that get tested stick around. So thanks for sticking with PostgreSQL!
I can't really say much on this subject as I don't enjoy the forum section of Drupal - I'm sure there is a lot of room to improve it in both functionality and performance. In regards to this patch, I really don't think Drupal 6 could actually commit it because its changing SQL in a big way - you can configure PostgreSQL however to optimize how it scans, DBAs do this quite often as the size of there applications grow. You can use
EXPLAINandEXPLAIN ANALYZEat the beginning of queries to find the bottle necks and tweak PostgreSQL accordingly to see if it helps at all.Comment #37
nnewton commentedWhile the relationship between LEFT and INNER join for performance is true and should be considered, its not as clear cut as presented here. A "DISTINCT", although thats not quite what is going on at least for MySQL is not always a performance killer if its well indexed. To that point, an INNER JOIN gives the optimizer more indexes to choose from in the JOIN order than does a LEFT JOIN. This can be a much larger performance "win" than the LEFT joins lack of DISTINCT. Also, if your not careful just blindly replacing INNER JOINs with LEFT JOINs can kill performance depending on the join order and the query, as there are added padded nulls going on here. That can work to your favour, as in your case above, or not as the case maybe.
"Your guys should leave MySQL, which is not a very serious database and start using PostgreSQL exclusively for devel."
Also, please leave this sort of thing at the door.
Comment #38
grub3 commentedI fixed the explanation about LEFT JOINs vs INNER JOINS here:
Replace INNER JOINS with LEFT JOINS whenever possible
INNER JOINs are processed as LEFT JOINs and then the parser looks for NULL values to remove them.
On large tables, an INNER JOIN has more chances to result in a sequential scan, either in memory or on disc. When SORT BY is used, it may even be worse.
On the converse, a LEFT JOIN has more chances to be optimized by the parser, as the database will not bother for removing NULLs thus will be able to access data directly. This leads to performance result ranging from 1 to 5 or more.
This is true for PostgreSQL and MySQL and many other databases.
LEFT JOINs are known to be faster than INNER JOINs in all databases systems.
Sorry, there is no link with DISTINCT.
I wrote that texte during the night and was quite tired.
I hope that you have enough information now.
Please apply the patch safely.
Comment #39
damien tournoud commented#38 makes very little sense.
Could you please run an EXPLAIN on those two queries and report the result? It's probably that the query planner is making the wrong choice here, for some reason. Let's try to understand it, instead of jumping to far-fetched conclusions.
Comment #40
nnewton commentedAt least for mysql I believe you are confused as to how it executes these joins. The looping algorithm shouldn't have to discard nulls really, the block looping algorithm...possibly, but I don't really think its likely to impact much. The queries in #33 execute in .20 seconds and .21 seconds respectively on one of my servers, with the inner join winning. That was on a database of 500,000 nodes.
Comment #41
grub3 commentedThanks I am in contact with PostgreSQL hackers and we enquire on that query.
It seems that LEFT JOINs in MySQL do not execute as fast they could.
According to PostgreSQL developers, this is a know issue.
In PostgreSQL, I can execute this query in 63ms. Maybe drive it further down.
v.s. 200ms under MySQL (see previous post)
So the best is to let us enquire and I will report back later on.
My plan:
As you can notice, a cast happens which forces nested loop.
If we can avoid this cast, the query may execute in 20 ms or less.
We are enquiring on that issue.
I will get back to you.
This is extremely interesting to be able to execute queries on large installs to analyse plans and results.
Kind regards,
Jean-Michel
Comment #42
grub3 commentedI closed this issue.
The planner does not have enough information to find the right query plan.
Eventually, the query planner may even rewrite the query itself.
My database is running on a 650.000 nodes install.
But there are no real connections from the internet.
I will only be able to review this issue when going live.
Reference: http://drupal.org/node/562180