(Note that though this is filed against 5.x, I need it for 6.x as well)
What we have: Drupal's core forum only shows the "time ago" and author for the last post in each forum on the forum overview page.
What we want: Linked comment title (allowing for paginated threads), linked node title, linked author, time ago.
Problem: To get this, we need to find the most recent node and the most recent comment in each forum and use whichever is newer. We need to get data from node, comments, term_node, and users. We also need to make sure we don't show any unpublished node/comment or any that the viewer can't see because of access restrictions.
Core gets its data in forum_get_forums() which is already known to have performance issues. Since advforum can't prevent that from running without hacking core, it's starting out with that performance hit even though it's not using the last topic data provided. In addition to that, advforum's query suffers from its own performance problems. A rough history of my trials with this feature:
1) We started with this function by LasseP
2) With help from various people, it was modified into the monster query seen in function advanced_forum_get_all_last_topics()here
3) There were many reported issues of performance problems: http://drupal.org/node/245028
4) I attempted to add caching but that failed for three reasons: 1, Drupal was never clearing the cache even though I set it for a minute. 2, the cached topic could be one the viewer didn't have access to. 3, errors such as http://drupal.org/node/235749 and http://drupal.org/node/236431 and http://drupal.org/node/249374
5) Plus, the query just plain failed for some people for reasons I was never able to figure out such as http://drupal.org/node/263838
6) All of that started off http://drupal.org/node/241982 where I decided to take Bdragon's advice and make a table to hold the last topic info.
7) While it mostly worked, there were a few problems. One is that the whole thing was getting very complicated with updating the table on node/comment insert and then making sure the node/comment hadn't been deleted or unpublished or made inaccessable and then having to update the table if it was. Another is a node access one again: if a person with lesser permissions viewed the forum list, it would see that the last post wasn't accessable and go retrieve one that is. That's fine until a person with proper permissions comes along and now the wrong last post is in the table. And, finally, we have our old friend the performance nightmare as shown in http://drupal.org/node/267983 .
8) In frustration, I ripped out the week's worth of work I spent on #6 and redid the function using queries instead of a specialized table. You can see that in function _advanced_forum_get_last_topic() here. Unfortunately, reports are that the performance of this is even worse than the nasty query in #2.
At this point, I don't know what to do. I'm no SQL expert and have run out of ideas. I'm putting up the white flag. If someone can't come in with a working patch that takes node access into account and doesn't take ages to load, then I'm just going to have to remove this feature. It's eaten up far too much of my time and I've hit a wall with it.
Michelle
| Comment | File | Size | Author |
|---|---|---|---|
| #19 | last_topic.patch | 2.59 KB | catch |
| #15 | getLastTopic.patch | 3.67 KB | maulwuff |
| #11 | advanced_forum.module.txt | 38.5 KB | dami |
Comments
Comment #1
maulwuff commentedHere is my first solution for this. It is a modified version of 8)
My plan:
- get as little as possible from the node table without extra joining. the node will not be shown in 95% (or more) of all cases. (see the quotient of "number of forum topics" divided by "number of comments for forumtopics".
- use the information from the node to search for newer comments. that is: the comments timestamp MUST be greater than the creation time of the node!
findings of my mod:
the execution times are similar for the node part, but much better for the comments part (3,x vs 1,x seconds) (52000 entries in comments)
on the other hand: 30 times 1,x sec is also 30+ seconds :(
now I came up with an idea: put an index on the timestamp column of the comment-table. wow, execution time drops to 0,2x
I think adding an extra index while installing advforum is ok. It does not affect other modules or the table. (ok, execution times on comments will rise very, very slightly to update the new index, but I think this isn't noticable at all. (look at the index mess in node-table)
here is the code: (everything done in a dumb notepad, I was not able to test it. I spent 3h on it)
edit: just encountered, that db_rewrite_sql cares for user rights? Why is this mentioned with no single word at http://api.drupal.org/api/function/db_rewrite_sql/5 ? When I look at the code of that function, I can't find any evidence for user rights caring, neither. :/
edit: forgot {} for table names
here are the queries I developed from the original code. I put them in to have a trace, to see what I've done to get to my (current) findings.
The tid is from a forum with ~500 topics, and ~17500 comments.
Comment #2
michelle@maulwuff - I contacted David Strauss about this and he has a performance boosting module that looks like it will help. I don't want to make advforum depend on it, though, so I'll be including a normal query for those who don't want to use it and will have a closer look at your changes when I get that far.
Thanks,
Michelle
Comment #3
michelleOk, here's the plan:
1) I committed the revised query from 5.x to 6.x. They will be in the next tarball. I'll give them both a good test tomorrow and, if they are working, create a new set of alphas so people have something that works even though it's slow.
2) There's a couple of things people have done that I really need to get comitted so I'm going to take care of those and make another set of alphas.
3) This is where it gets messy. To do this query right, there's just no getting around hacking up the forum module. Telling people to go patch their forum module is a nightmare I don't want to get into. So I'm going to fork it instead. I will replace the forum_get_forums code with a modified version that gets the additional information needed for this feature. This will be gathered in one of two ways: For people with the DNA module installed, it will use a denormalized table to be able to get it quickly. For people without the module installed, it will use a variation of the query being used now, probably using maulwuff's modifications.
Michelle
Comment #4
maulwuff commentedSorry, I don't know David Strauss. What performance booster does he have?
I had an idea as I surfed around the issues and surrounding pages: Postpone the "link to last topic"-stuff. Don't include it to v. 1.
You wrote on the description page of advforum as your first sentence: "Advanced Forum is a 'glue' module..."
So, why don't we just "glue" together things provided by drupal and ship v1? That means, leave out things which are too hard to do, or take too much time. Advanced forum is developing towards a standalone module right now. At least v1 should be a "glue"-only version ;)
You have already put in lots of features and things which make the standard drupal-forum so much more useful! Even though there is no "jump to last answer"-link, for example.
My advice would be:
- stop all heavy to do features, including last answer-link for now.
- pull out the parts of those features from current alpha-code.
- publish last alpha
- bring in outstanding small issues (including the retheming of comments #268684: Retheming - one way forward !)
- publish first beta
- bugfixing
- beta ... beta? ... rc ? ... release v1
This would give people more trust in the module, if there is a stable thing they can build upon. I for myself am still stuck on alpha3. I am scared by the time consumption to resync every alpha to my needs and fit it to my theme. (But I think this is part of your tactics ;) )
That's because I would have to look up everything which has changed, see if it is compatible with my hacks, and so on.
Integrators view:
I am waiting for a stable one. In the meantime all my folks (and I) are happy with the features which were included in alpha3!
I think, smaller, but secure steps would bring more effectivness and development speed. It is a lot easier to fix or adapt just a hand full of big finished things, than plenty small unfinished ones.
Developers view:
To climb a mountain, you make small, secure steps, instead of big jumps from stone to stone.
end users view:
- Radical Innovation with looots of new features is good for the "wow" effect at the enduser but makes lots of trouble because of the tons of bugs.
- Continuous improvement puts stability first, uses small increments and has its success at the long haul. The End user is happy about stability.
So lets choose:
- more stable builds
- small, secure steps
- continuous improvement
I hope I was able to express myself in a way a native speaker can understand ;)
Comment #5
michelleI understand but that's not going to happen. "more stable builds" means more branches for me to maintain. I do my best to make each alpha as stable as a real release but this module is under active development and things do get missed. Making more branches isn't going to make that any better and would more likley make things worse.
Michelle
Comment #6
michelleForgot to answer your first question. David is http://drupal.org/user/93254 and the module in question is http://drupal.org/project/dna .
Michelle
Comment #7
michelleOk, step 1 is done. Alpha 10/2 seems to be working ok. Now I need to get in a couple of things that people who contributed have been waiting for me to commit and then I'll address the performance issues. Alphas 11/3 should be coming fairly quickly and then the messy part begins.
Michelle
Comment #8
catchIf the issue is with overriding the /forum page only, then how about using http://api.drupal.org/api/function/hook_menu_alter (D6 only) to insert a custom page there instead to save a fork? Or have I misunderstood?
Comment #9
dami commentedJust some quick thoughts:
Yes, we can use hook_menu_alter in d6, and I think that's the way to go.
Have a redundant complex query (very similar to that in forum_get_forum function) in _advanced_forum_get_topic() is a waste. forum_get_forums() already gives us most info needed for last topic of each forum. As a first step, we just need to add the missing pieces in a custom advanced_forum_get_forums() function. This will cut the number of queries in half, as a first step. Then we can go in to further optimize the query if possible.
In d5, again using existing results (uid, uname and timestamp of last topic) from forum_get_forums(), I suspect a much simpler query is possible.
Comment #10
dami commentedThis is rather a proof of concept using custom advanced_forum_get_forums() function. I don't have a large database to test against. So I just generated a small forum using devel (50 users, 1k nodes with 4k comments). The forum has 3 containers and 11 forums (total of 14 forum + containers).
Path: /forum
Original 6.x-1.0-alpha2
First run:
Executed 112 queries in 111.18 milliseconds.
Following run:
Executed 91 queries in 102.87 milliseconds.
Using custom advanced_get_forums()
First run:
Executed 94 queries in 71.13 milliseconds.
Following run:
Executed 73 queries in 60.99 milliseconds.
The result is more or less in line with my expectation. The improvement comes mostly from the fact last_topic related queries are cut in half by using own advanced_get_forums(). However there is little or no improvement on the query itself.
The implementation is certainly not optimal, I just needed some quick code, but I hope the info is somewhat useful. Since it's not really a patch and I am on Windows, so I just attached the whole module file. You probably will need to enable devel module, and 'rebuild menus' first, so the new menu alter is recognized.
Comment #11
dami commentedAre we still having the lost attachment issue? Upload again...
Comment #12
maulwuff commentedForgot to add the code for the index:
ALTER TABLE {comments} ADD INDEX ( `timestamp` )I ran multiple tests today. my comments table holds 52500 entries right now, with ~50 Forums/Boards
alpha10: 271 queries.
my mod: 273 queries.
queryexec is total query time.
times without _advanced_forum_get_last_topic:
I think there is not much space left for improvement to my version. Perhaps someone with a bigger forum can do the same tests.
Comment #13
catchJust a quick note that the core issue related to the underlying issue with all this is here: http://drupal.org/node/26966 - snufkin suggested storing 'last_comment_cid' in node_comment_statistics (which was taken out, for apparently no reason, four years ago).
Also, old patch to add the feature itself to core (which needs work, and doesn't deal with the paging issue), here: http://drupal.org/node/147045
maulwuff - could you roll your changes into a patch? I have a decent sized forum to test with.
edit: ideally a patch against D6 - since that's what I'm testing advforum on.
Comment #14
michelle@catch - Would be awesome if you can test. Sorry I've been AWOL lately. I've been pushing aside all my other responsibilities to work on advforum and they started pushing back so I had to take a little break. I'm hoping I can get back to it next week.
Michelle
Comment #15
maulwuff commented@catch:
I've replaced the whole function. (code is in first block of #1)
A patch is attached, but I suggest replacing it manually. Lots of in and out there.
@Michelle:
please activate trimming of white space at line endings in your editor. There were lots of hunks in the patch with just deleting whitespaces. ;)
Comment #16
catchThanks! I'll review this over the next day or so.
Comment #17
ianshafer commentedI have a forum with ~1.7M comments. I installed the patch (by hand) and it greatly improved the performance. (Before patch it took over 5 minutes to load the forum list, after it takes ~5 seconds.
I'm running Drupal 6.2.
Comment #18
jwilson3I had to fix a one line error in the $comment query regarding the comment author error to get everything working the way i needed.... but other than that, your patch worked excellently on alpha10 (applied with care by hand).
thanks for your work!
Comment #19
catchSeems to work pretty well for me. Functional D6 re-roll for testing purposes.
Comment #20
catchHmm. This only appears to work on the main topic index. When in a container, I get n/a.
Comment #21
michelleOk, I've spent a bit of time playing around with this patch and am not having any luck. When I add back in the db_rewrite_sql that the patch takes out, it chokes on the query and I can't figure out why. Plus there's the issue of the index. I don't want the module touching a core table. If someone wants to re-roll a patch that works with db_rewrite_sql, I'll take a look. In the mean time, I'm going to investigate the DNA option.
Michelle
Comment #22
catchYep. While it initially looked faster, I think that's probably because the query wasn't returning valid results. I can see DNA being used for a few important projects, so adding it as a requirement ought to be fine.
Comment #23
maulwuff commentedcatch, do you mean something like /forum/50 ?
this works for me, too. Maybe you messed something up while porting it do D6? Or maybe there is another table structure in D6?
what's the advantage of db_rewrite_sql ? I don't see any, but messing things up on an optimized query.
Comment #24
catchmaulwuff: db_rewrite_sql ensures that access restrictions are honoured by the query - so forum_access, tac_lite etc. While you might want to hack it out on your site if you know you're not using any of these, it's not good for a generalised solution.
Comment #25
michelle"what's the advantage of db_rewrite_sql"
Access control. Not everyone has their forums visible to all users. So that has to be in there. I almost missed that you took it out since it had been so long since I looked at this stuff the first time. When you're providing patches, it's not a good idea to remove things you don't understand.
Michelle
Comment #26
maulwuff commentedI had a look at db_rewrite_sql(), but did not find anything on access control.
http://api.drupal.org/api/function/db_rewrite_sql/5
"Rewrites node, taxonomy and comment queries. Use it for listing queries. Do not use FROM table1, table2 syntax, use JOIN instead."
the onyl important function in db_rewrite_sql is
http://api.drupal.org/api/function/_db_rewrite_sql/5
"Collects JOIN and WHERE statements via hook_db_rewrite_sql(). Decides whether to select primary_key or DISTINCT(primary_key)"
After reading closer, I encountered "hook_db_rewrite_sql", which is the thing that matters here:
The key message is NOT "db_rewrite_sql does access control", BUT "db_rewrite_sql ensures hook_db_rewrite_sql() is handled properly".
That explains, why there is no single word on access control in the API.
I don't want to justify leaving out db_rewrite_sql, but give you the right understanding of what db_rewrite_sql really does. If there is no module which implements hook_db_rewrite_sql, there is no query filtering (=no access control) at all! That's an important difference.
Back to the patch: my forums are restricted very tightly. I'll have a look at them again.
Comment #27
michelleYes, I know what db_rewrite_sql does. And it needs to be there. I appreciate you taking the time to work on this but I can't use this as it stands.
Catch and I talked about this on IRC and I think we have a plan that will work. I should have something by the end of the day today.
Michelle
Comment #28
michelleOk, I committed a change to the D6 branch. I still need to backport it to D5. At this point, it should be about the same speed as the core forum module. I'm still planning on investigating David's DNA module but I'll add that in as an option for people with huge forums. For normal sized forums, this should be just fine.
Leaving this active until it's on the D5 branch as well.
Michelle
Comment #29
catchCVS upped and this is much, much faster. Nice one Michelle!
Comment #30
michelleYay, good to hear. I had devel working hard turning out thousands of nodes on my dev site so I could give it a real world test. While it could use more testing, I think it's fairly safe to say if core forum is fast enough for your site than advforum will be, too. Thanks for the tips. I was so focused on getting the queries to match my vision that I had tunnel vision and didn't realize just changing my vision a bit made this work a lot better.
I'll get this to D5 ASAP, hopefully by Monday. It's going to be a bit trickier because I have to backport some D6 functions and work around the lack of hook_menu_alter().
Michelle
Comment #31
michelleOk, backported to D5. I'm marking this fixed, though I still want to look into DNA at some point to speed this up beyond what the core forums can do.
Michelle
Comment #32
Kirk commentedjust wanted to see if I follow this correctly. The next release has better performance on forum-list? If so, that's great.
Comment #33
catchsparq-l - yes, it's a lot better.
Comment #34
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.
Comment #35
mennonot commentedAfter reading the performance issues on this forum, I decided to upgrade a client's site to Alpha17 (from a June 2008 build) in hopes of dealing with a persistently long time (over 5000 ms) to run the advanced_forum_get_forums query when loading the /forum page. I upgraded the site and I'm still getting results like this from the logged queries:
5038.79 advanced_forum_get_forums
430.44 advanced_forum_get_forums
Here are the stats for our forums:
Topics: 103179, Posts: 1714493
Any suggestions?
Michelle, if you're interested in poking around on our site as a case study in a large number of nodes and comments, we'd be happy to give you access.
Comment #36
michelleThis isn't an AF issue; it's core that's slow. You want #145353: Forums performance improvements if you want to help.
Michelle