The following functions have poor database performance:
- forum_get_topics
- forum_get_forums
The problem is that they restrict by taxonomy terms (one table) and sort by latest updates (another table). This will always result in temp tables and filesorts. The solution is unifying the WHERE criteria and SORT BY criteria into one table. In the case of forum_get_topics, the table is term_node. For forum_get_forums, it's term_data or term_hierarchy.
By putting a latest_change_timestamp (which would contain either the node's latest change timestamp or the latest comment's) into these tables, we can paginate forums without expensive queries. Once paginated, it's okay to do something a bit more expensive with each item.
Comment | File | Size | Author |
---|
Issue fork drupal-145353
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #1
catchWould adding the extra column and doing the JOIN on forum rather than term_node work? Seems like a better place for it to live if so.
they're essentially the same I think:
... but forum only deals with forum nodes and terms - which on my site means 8,500 rows instead of 25,000. Even on sites with different ratios of forum nodes and terms it's never going to be more.
Comment #2
catchHad a quick look at _forum_topics_unread.
Moved the join from term_node to forum, and removed the AND n.type = 'forum' because it's not necessary with the forum table I think.
devel module says:
Don't think I believe these results, but it's hopefully tidier sql, and I don't see any reason for the joins on term_node when forum works just as well with a smaller table. Would an index on nid, tid help? at the moment there's one for each, but not for both.
/me waits to be shot down for stupid oversight
Comment #3
David Strauss@catch {term_node} already has an index on tid, nid, and a combination of the two.
Comment #4
catch...now I see it. I reckon adding the index to forum would still cut that WHERE clause, plus I don't see a good reason to add last_comment_timestamp and updated to term_node if it's only needed for forum nodes. It means any performance hit on adding is restricted only to forums, but with all the read benefits. Unless similar JOINS are happening elsewhere that could use those fields, but I can't think of anything obvious.
Comment #5
webchickThis is a patch.
Comment #6
catchin that case it'd need
ALTER TABLE {forum} ADD INDEX forum_somethingdescriptive (nid,tid)
somewhere, but I'm not sure exactly where that goes yet.
Comment #7
Tobias Maier CreditAttribution: Tobias Maier commentedthis goes into the forum.install file
read therefore about the new database schema api http://drupal.org/node/146843
http://drupal.org/node/146865
Comment #8
catchThanks Tobias Maier.
Here's a new patch, quick summary:
in _forum_topics_unread it replaces the join on {term_node} with a join on {forum}, and adds an index to forum on tid,nid
This allows us to elimate
type = 'forum'
in_forum_get_topics_unread
(because all nodes in the forum table should be in the forum anyway). It also paves the way for denormalisation into {forum} instead of {term_node} for the query David Strauss outlines above, which would lead to less of a performance hit on inserts - since it'd be restricted to nodes that appear in that table instead of every node.As a side effect, it also allows for a future decoupling of forum.module from the forum node type, since this query would now work with other node types.
Taking a quick look, {term_node} is also used in:
_forum_new
theme_forum_topic_navigation (shudders)
forum_get_forums
forum_get_topics
and could be easily switched in all cases. I'll not do that yet in case I'm going in the wrong direction.
There's also a join on node_comment_statistics that requires a conditional on type = 'forum' in
forum_block
- which could be simplified by putting the latest_changed_timestamp into {forum} as well.Comment #9
David Strauss@catch It looks like a good patch, though I haven't tested it yet. I think it has an extra tab from one of the lines you've added.
Comment #10
catchThis one should fix the code style issue.
Comment #11
Tobias Maier CreditAttribution: Tobias Maier commentedhello catch,
thanks for your patch.
You added the schema update, but you need also an update function for the .install file
thanks
tobi
Comment #12
catchthanks for the review.
This one hopefully with proper update function in .install
No idea how to work out which update number it is for forum, so stuck it at 1.
Comment #13
Tobias Maier CreditAttribution: Tobias Maier commentedthanks for your fast response.
I have some more remarks for you:
you can read here about the update naming convention
http://drupal.org/node/114774#update-n
you did the update function the old way.
see system_update_6019() at system.install
http://cvs.drupal.org/viewcvs/drupal/drupal/modules/system/system.instal...
there you can see how this is gets done using schema api
Comment #14
catchThanks again Tobias.
OK I think I've got it this time. Last patch of the evening though, it's bed time!
Comment #15
ChrisKennedy CreditAttribution: ChrisKennedy commentedforum_update_6001() is missing a brace and doesn't return $ret.
Comment #16
catchThanks. Even my cut and pasting is sloppy :(
Here it is again.
Comment #17
catchOK this patch replaces all references to {term_node} with {forum}, and removes the redundant
n.type = 'forum'
checks from those queries.I also managed to get rid of a join in
_forum_get_topic_order($sortby);
- it was using both {forum} and {term_node} in the same query, for no good reason as far as I can see.It doesn't yet remove the references to
n.type = 'forum'
in forum_block because there's no reference to term_node, but that's not really the point of this issue. However, I'm waiting on David Strauss' denormalisation in to {forum} for that - since I reckon querying on {forum} instead of {node_comment_statistics} when it's got the 'last updated' column would have the same effect. Assuming the eventual result of this patch and related ones might be to remove node_comment_statistics altogether, this may well be necessary later, and would also be consistent with the rest of the module.Comment #18
David Strauss@catch I'm expecting that you mean denormalization into {forum} for latest update timestamps?
Comment #19
catchnormally means denormalisation into {forum} right? ;)
Comment #20
David Strauss@catch There are all sorts of things I can denormalize into {forum}. I was asking if timestamps were what you had in mind.
Comment #21
catchSorry, caught the wrong end of that question, and apologies for mild sarkiness!
Yes I meant timestamps as in your first post:
But as you know, doing that on {forum} instead so that can happen on a smaller result set, and avoid any unnecessary additional inserts on non-forum node types.
Comment #22
catchjust seen this, will need to be re-rolled taking into account: http://drupal.org/node/144295
Comment #23
David StraussIt doesn't seem to need rerolling to me. Nevertheless, here's an updated patch with a tiny key improvement that applies without any offsets.
Comment #24
David StraussRerolled to fix an issue created by #144295. The patch overlaps somewhat with later fixes in that patch because they're necessary for proper forums function.
Comment #25
David StraussForgot the attachment.
Comment #26
David StraussFix a few more glitches.
Comment #27
David StraussOptimizing the forum topics listing is blocked by #148849.
Comment #28
catchComment #29
lilou CreditAttribution: lilou commentedAdd tag.
Comment #30
moshe weitzman CreditAttribution: moshe weitzman commentedpossibly being fixed at #401328: _forum_topics_unread: SQL performance problem can kill the database
Comment #31
catchRan a couple of quick explains on forum_get_forums() queries while looking at #512864: "comment count" query in getForumStatistics() in \Drupal\forum\ForumManager is not limited to forum terms.
The first one isn't so bad:
However that's run once for the whole listing.
This next one is run per topic and does all kinds of nasty:
edit: quick note that these are probably using 1 row because I didn't run these with a proper forum dataset, rather than because they always use one row.
Comment #32
sunUsing index; Using temporary; Using filesort
is solely caused byORDER BY last_comment_timestamp DESC
However, not sure how you managed to get only 1 row in that first reference. On a moderately large forum, I get 11576 rows for a single forum/term here, which ultimately results in the bad performance.
I've already tried a couple of workarounds (adding indexes here and there, storing a comma-separated list of tids in {ncs}, ...) with no luck.
We need to solve the ORDER BY challenge, somehow.
On a related note, we are needlessly joining two times on {users} via
$query->join('users', 'u1', 'n.uid = u1.uid');
, or resp.However, that has a marginal effect only.
Comment #33
andypostWhy not just add a
changed
column toforum
this bring ability to "up" some posts without commenting and improve performanceEDIT related #877020: _forum_topics_unread() broken
Comment #34
artem_sokolov CreditAttribution: artem_sokolov commentedI'm having heavy performance problems with the forums' listings on 6.19 that seem to be exactly like the addressed in this issue. In the same time their are plenty of "slow-forum&node_access"-related issues opened on drupal.org and it is not clear which one is the solution if any:)
Can anybody tell if the patch #26 is ready and worth applying on D6 to solve the problem at least partially?
Was it ever committed to 6.x-dev?
Comment #35
moshe weitzman CreditAttribution: moshe weitzman commentedI think we fixed the topic listing in D7 but not the forums listing. Can anyone confirm?
Comment #36
moshe weitzman CreditAttribution: moshe weitzman commentedActually, I am still seeing problems in both. Mainly forum_forum_load() and _forum_topics_unread().
Comment #37
nnewton CreditAttribution: nnewton commentedSubscribe
Comment #38
catchHaven't looked into this properly (which is sad since it's 11 months after I posted that comment..), but I'm sure that
Could make use of the new {forum_index} table.
Comment #39
Jon Nunan CreditAttribution: Jon Nunan commented{forum_index} does have the 'last_comment_timestamp' field but I think we'd need to put the 'cid' field in there to be able to get the uid of the last commenter.
Comment #40
catchThere's a patch at #1008410: Forum list header sorting broken that seriously refactors the query in question, and if it doesn't fix the performance issue, ought to give us a great idea what's missing. I haven't had time to properly review that patch though.
Comment #41
roderikNow that #100840 is in, this issue gets clearer.
There are two queries that can benefit from the new index table, both in
forum_forum_load()
:1) topic count per forum
current:
can change to:
Explain of latter:
2) 'last comment' query
as mentioned by catch/#38 -- and meatsack/#39 is right that a 'cid' field needs to be added to the table in this case.
current:
can change to:
explain of latter:
results / stuff for review
I just migrated 90000 topics into a new site. With the database populated, Above queries took 50-75% of the original time, according to devel.module query statistics. The speed gain was obvious, but timing results varied every time I ran a test.
Note: I'm not an expert on indexes. I added an index on 'tid' because this makes the second query run faster and is also mentioned in the 'explain'.
Adding an index on 'nid' didn't seem to make a speed difference for the first query, though I had expected that.
And I have no idea why the 'forum_index' is on 'nid + tid + sticky + last_comment_timestamp'. (#100840: security against injection attacks and bulk emailing for tellafriend last changed that index; it added the nid.)
Comment #42
roderik....OK, the first query change is no good. {forum_index} contains shadow copies too (i.e. several rows with different tid, for a given nid.)
These should not get counted in the per-forum comment count.
Later.
Comment #43
andypostmarked as duplicate of this issue #597742: Optimize forum.module query performance
Comment #44
droplet CreditAttribution: droplet commentedthe patch seems has a little problem and can be more optimized.
downgrade the version to D7 to see testbots reviews first. (because it's easier to reroll a D7 patch. sorry for any inconvenience)
Comment #45
droplet CreditAttribution: droplet commentedGreat.
Comment #46
droplet CreditAttribution: droplet commentedfollowing explains do not reflect teh real world usages.
- patch avoid to join all nodes vids.
- comment name is empty if that is a user submitted comment, should not use COALESCE
D7 with devel generated contents
Original
Comment #47
mikeryanNot sure how generally this issue wants to tackle forums performance improvements, should this be a new issue? On D7, when the forum module is enabled, deleting comments in bulk (as in a migration rollback) suffers tremendously from forum_comment_delete's call to _forum_update_forum_index() - even when there are no forum topic nodes (let alone comments on them). The heavy hitter is:
This is to setup a db_update on forum_index, which will do nothing anyway if the parent node wasn't a forum topic. So, we want to avoid this query unless we know the node is already in forum_index - probably just do an inner join to forum_index on the initial comment query in the function. Let me try that... Yes, in a very rough informal test the average time of forum_comment_delete() goes from 54ms to 1ms (this is with 73k comments in the comment table).
Let me know if I should add the optimization to this patch, or open a separate issue.
Thanks.
Comment #48
droplet CreditAttribution: droplet commented@mikeryan,
this thread mainly focus on forum topic display. please open a new issue for the deletion tweak, thanks :)
Comment #49
mikeryanOK, see #1715402: Poor performance of _forum_update_forum_index() when passed a non-forum node.
Thanks.
Comment #50
xjmComment #51
jrbeemanRe-rolled patch in #44 against latest 7.x.
Comment #53
jrbeemanFix integrity constraint violation in #51.
Comment #54
drummThis is pretty similar to my quick solution for Drupal.org at #1543530-11: Drupal.org D7 performance testing. We'll be doing some profiling for D7, but I think the explains speak for themselves.
Attached is a revision of #53, which I will be applying for Drupal.org's codebase, with:
n.status = 1
conditions from both queries,forum_index
only has published nodes already.->orderBy('NULL')
, we don't care about the ordering thatgroupBy
implies, removing a filesort.Explains:
(It would be ideal to remove the useless joins on node when node access modules are not used.)
Comment #55
drummtags
Comment #56
andypostLooks nice
Comment #58
drummNeeds forward porting to 8.x. And at least explain before/after, maybe profiling if the committers require it.
I don't have exact numbers handy, but #54 indeed helped Drupal.org a lot.
Comment #59
droplet CreditAttribution: droplet commentedComment #60
tvn CreditAttribution: tvn commentedaffects D.o tag is enough.
Comment #61
Anybody54: 145353.diff queued for re-testing.
Comment #63
AnybodyThe logic from #54 works great and this is a very very huge problem for larger forums. So what's the plan to bring this forward? Will this be backported to D7 anyway?
Comment #68
drummThe patch in #54 is for D7, so it needs to be updated for D8.
Comment #69
AnybodyAnd needs to be applied on D7 :)
Comment #74
roderik#54, ported to D8, modified for D8 table structure, and more.
I'll first await testbot results and if it's actually green, post separate interdiffs with various explanations (so that someone can profile/'explain' separate parts if they want.)
Comment #75
roderikThis patch has one small change (I forgot the 'private' directive).
Separate interdiffs. It would be good if someone with an active forum on D8 can do some measurements of the differences between baseline, 1, 2 and 3 (note 3 fires less queries). My own quick tests with only one forum with generated comments, looking at the devel webprofiler, showed;
1): the ported patch
There's a difference with D7 / #41 / #54:
...because the comment table in D8 contains no uid. (That's been moved into comment_field_data, which we don't want to use because it can have multiple rows per nid, prompting unresolved questions about display language of comments which are beyond the scope of this issue.)
To be honest I don't know anymore, why we don't keep joining against comment_entity_statistics in the D7 patch and why I added forum_index.cid in #41. Maybe just because #40 told me to. But a D7 forum can probably re-profile this after the D8 patch is committed; I don't have a real-world D7 forum available anymore.
2): removing node
drumm said in #54 "It would be ideal to remove the useless joins on node when node access modules are not used.". So I did, in a separate interdiff so that it can be profiled.
This has unfortunately made me call 'global' functions from within the ForumManager service, so I separated that out into a commented private method and will await critique.
3): reducing last-post queries for multiple forums
It seems strange to me that we (also on D7) run one query to get all the node/comment counts for all forums, and then run individual queries to get the last-post data for each forum. It doesn't seem necessary (anymore?)
So I changed the code so that only one query for last-post data will be made on a forum list page. It's still inconsistent with the node/comment counts however, because it only queries data for all forums where we actually need the data. I'm not sure what the time difference would be time-wise if we queried data for all forums to make the code easier, and afraid to make that change.
Comment #77
drummAttached is a new version of #54 that removes a couple more joins on the node table.
Could use testing with a node access module enabled, it may need conditional joins added on the node table, when necessary, like the D8 patch has. The D8 patch is looking good in general.
Comment #84
smustgrave CreditAttribution: smustgrave at Mobomo commentedThe last patch was for D7 is this needed for D10?
Comment #85
Anybody@smustgrave thanks, the patch in #75 was for Drupal 8 already, so I guess yes. But I also guess it needs a reroll.
I'd suggest creating the 10.x as MR, so we can keep the Drupal 7 patch separate (while I think it should better be a separate issue, if that's ever to be committed?
Comment #88
Bhanu951 CreditAttribution: Bhanu951 as a volunteer commentedComment #89
smustgrave CreditAttribution: smustgrave at Mobomo commentedFixed phpcs error.
Also not sure if the change for forum.services.yml was intended?
Comment #90
smustgrave CreditAttribution: smustgrave at Mobomo commentedComment #91
Bhanu951 CreditAttribution: Bhanu951 as a volunteer commented>Also not sure if the change for forum.services.yml was intended
@smustgrave : The change was intentional, as the user object is being used in new method introduced. Constructor also changed.
Comment #92
andypostif service getting new argument it should be optional and required in 11.0 with BC-shim to prevent site failures on upgrade
Comment #94
quietone CreditAttribution: quietone at PreviousNext commentedForum is approved for removal. See #1898812: [policy] Deprecate forum module for removal in Drupal 11
This is now Postponed. The status is set according to two policies. The Remove a core extension and move it to a contributed project and the Extensions approved for removal policies.
It will be moved to the contributed extension once the Drupal 11 branch is open.