(Note that though this is filed against 5.x, I need it for 6.x as well)

What we have: Drupal's core forum only shows the "time ago" and author for the last post in each forum on the forum overview page.

What we want: Linked comment title (allowing for paginated threads), linked node title, linked author, time ago.

Problem: To get this, we need to find the most recent node and the most recent comment in each forum and use whichever is newer. We need to get data from node, comments, term_node, and users. We also need to make sure we don't show any unpublished node/comment or any that the viewer can't see because of access restrictions.

Core gets its data in forum_get_forums() which is already known to have performance issues. Since advforum can't prevent that from running without hacking core, it's starting out with that performance hit even though it's not using the last topic data provided. In addition to that, advforum's query suffers from its own performance problems. A rough history of my trials with this feature:

1) We started with this function by LasseP
2) With help from various people, it was modified into the monster query seen in function advanced_forum_get_all_last_topics()here
3) There were many reported issues of performance problems: http://drupal.org/node/245028
4) I attempted to add caching but that failed for three reasons: 1, Drupal was never clearing the cache even though I set it for a minute. 2, the cached topic could be one the viewer didn't have access to. 3, errors such as http://drupal.org/node/235749 and http://drupal.org/node/236431 and http://drupal.org/node/249374
5) Plus, the query just plain failed for some people for reasons I was never able to figure out such as http://drupal.org/node/263838
6) All of that started off http://drupal.org/node/241982 where I decided to take Bdragon's advice and make a table to hold the last topic info.
7) While it mostly worked, there were a few problems. One is that the whole thing was getting very complicated with updating the table on node/comment insert and then making sure the node/comment hadn't been deleted or unpublished or made inaccessable and then having to update the table if it was. Another is a node access one again: if a person with lesser permissions viewed the forum list, it would see that the last post wasn't accessable and go retrieve one that is. That's fine until a person with proper permissions comes along and now the wrong last post is in the table. And, finally, we have our old friend the performance nightmare as shown in http://drupal.org/node/267983 .
8) In frustration, I ripped out the week's worth of work I spent on #6 and redid the function using queries instead of a specialized table. You can see that in function _advanced_forum_get_last_topic() here. Unfortunately, reports are that the performance of this is even worse than the nasty query in #2.

At this point, I don't know what to do. I'm no SQL expert and have run out of ideas. I'm putting up the white flag. If someone can't come in with a working patch that takes node access into account and doesn't take ages to load, then I'm just going to have to remove this feature. It's eaten up far too much of my time and I've hit a wall with it.

Michelle

Comments

maulwuff’s picture

Here is my first solution for this. It is a modified version of 8)

My plan:
- get as little as possible from the node table without extra joining. the node will not be shown in 95% (or more) of all cases. (see the quotient of "number of forum topics" divided by "number of comments for forumtopics".

- use the information from the node to search for newer comments. that is: the comments timestamp MUST be greater than the creation time of the node!

findings of my mod:
the execution times are similar for the node part, but much better for the comments part (3,x vs 1,x seconds) (52000 entries in comments)
on the other hand: 30 times 1,x sec is also 30+ seconds :(

now I came up with an idea: put an index on the timestamp column of the comment-table. wow, execution time drops to 0,2x
I think adding an extra index while installing advforum is ok. It does not affect other modules or the table. (ok, execution times on comments will rise very, very slightly to update the new index, but I think this isn't noticable at all. (look at the index mess in node-table)

here is the code: (everything done in a dumb notepad, I was not able to test it. I spent 3h on it)

edit: just encountered, that db_rewrite_sql cares for user rights? Why is this mentioned with no single word at http://api.drupal.org/api/function/db_rewrite_sql/5 ? When I look at the code of that function, I can't find any evidence for user rights caring, neither. :/
edit: forgot {} for table names

function _advanced_forum_get_last_topic($tid) {
  // Get the most recently created node, in the given forum, that the user has access to
  //leave out authors name, it's extra joining for nothing in  95% or more of all cases
  $query = "select 
                   n.nid AS topic_id,
                   n.title AS topic_title, 
                   n.created AS topic_timestamp,
                   n.type AS topic_type,
                   n.uid topic_author_id                   
            FROM {node} n
            INNER JOIN {term_node} tn on n.nid = tn.nid 
            WHERE n.type = 'forum' AND               
               n.status = 1 AND                              
               tn.tid = %d        
            ORDER BY n.created DESC
            limit 1 ";
            
  $node = db_fetch_object(db_query($query, $tid));

  // If there is a node, then we need to look for the most recent comment       
  if ($node) {
    $query = "SELECT  c.nid AS topic_id, 
                     c.cid AS reply_id,
                     c.timestamp AS reply_timestamp,
                     c.subject AS reply_title,
                     c.uid topic_author_id,
                     n.title AS topic_title, 
                     n.created AS topic_timestamp,
                     n.type AS topic_type,     
                     c.name AS reply_author_name          
              FROM {comments} c
              INNER JOIN {term_node} tn ON c.nid = tn.nid
              INNER JOIN {node} n ON c.nid = n.nid              
              where c.timestamp > %d
              AND c.status = 0 AND
              tn.tid = %d  AND 
              n.status = 1 AND c.status = 0
              ORDER BY c.timestamp DESC
              limit 1";

    $comment = db_fetch_object(db_query($query, $node->topic_timestamp, $tid));

    if($comment){
      //there is a newer comment, than the last created node in this forum
      $comment->tid = $tid;
      return $comment;
    }else{
      //the node is the newest thing. get the authors name
      $query = "select u.name AS topic_author_name from {users} u 
                where u.uid = %d";
       $username = db_result(db_query($query, $node->topic_author_id));         
       
      $node->tid = $tid;
      $node->reply_id = 0;
      $node->topic_author_name = $username;
      return $node;
    }
    
  }
}

here are the queries I developed from the original code. I put them in to have a trace, to see what I've done to get to my (current) findings.
The tid is from a forum with ~500 topics, and ~17500 comments.

/*
//node table
//original
SELECT n.nid AS topic_id,
                   n.title AS topic_title, 
                   n.created AS topic_timestamp,
                   n.type AS topic_type,
                   n.uid topic_author_id,
                   u.name AS topic_author_name
            FROM node n
            INNER JOIN term_node tn on n.nid = tn.nid
            INNER JOIN users u on n.uid = u.uid
            WHERE tn.tid = 35 AND n.status = 1
            ORDER BY n.created DESC
            
            
select n.nid AS topic_id,
                   n.title AS topic_title, 
                   n.created AS topic_timestamp,
                   n.type AS topic_type,
                   n.uid topic_author_id
            FROM node n
            inner JOIN term_node tn on n.nid = tn.nid            
            WHERE tn.tid = 35 AND n.status = 1
            ORDER BY n.created DESC            

final good one:                       
select  n.nid AS topic_id,
                   n.title AS topic_title, 
                   n.created AS topic_timestamp,
                   n.type AS topic_type,
                   n.uid topic_author_id                   
            FROM node n
            inner JOIN term_node tn on n.nid = tn.nid 
            WHERE n.type = 'forum' AND               
               n.status = 1 AND                              
               tn.tid = 35        
            ORDER BY n.created DESC
            limit 1                       
                       
                       
                       
                       
                       
// comments table:                       
//original:                       
SELECT c.nid AS topic_id, 
                     c.cid AS reply_id,
                     c.timestamp AS reply_timestamp,
                     c.subject AS reply_title,
                     c.uid topic_author_id,
                     n.title AS topic_title, 
                     n.created AS topic_timestamp,
                     n.type AS topic_type,     
                     c.name AS reply_author_name           
              FROM comments c
              INNER JOIN term_node tn ON c.nid = tn.nid
              INNER JOIN node n ON c.nid = n.nid
              WHERE tn.tid = 35 AND n.status = 1 AND c.status = 0
              ORDER BY c.timestamp DESC                       
                                         
final good one:              
SELECT  c.nid AS topic_id, 
                     c.cid AS reply_id,
                     c.timestamp AS reply_timestamp,
                     c.subject AS reply_title,
                     c.uid topic_author_id,
                     n.title AS topic_title, 
                     n.created AS topic_timestamp,
                     n.type AS topic_type,     
                     c.name AS reply_author_name          
              FROM comments c
              INNER JOIN term_node tn ON c.nid = tn.nid
              INNER JOIN node n ON c.nid = n.nid
              
              where c.timestamp > 1211927836
              AND c.status = 0 AND
              tn.tid = 35  AND 
              n.status = 1 AND c.status = 0

              ORDER BY c.timestamp DESC, c.nid DESC
                limit 1 
michelle’s picture

@maulwuff - I contacted David Strauss about this and he has a performance boosting module that looks like it will help. I don't want to make advforum depend on it, though, so I'll be including a normal query for those who don't want to use it and will have a closer look at your changes when I get that far.

Thanks,

Michelle

michelle’s picture

Title: Need assistance from SQL guru to make last topic in forum work properly and quickly » Making "last topic in forum" work properly and quickly

Ok, here's the plan:

1) I committed the revised query from 5.x to 6.x. They will be in the next tarball. I'll give them both a good test tomorrow and, if they are working, create a new set of alphas so people have something that works even though it's slow.

2) There's a couple of things people have done that I really need to get comitted so I'm going to take care of those and make another set of alphas.

3) This is where it gets messy. To do this query right, there's just no getting around hacking up the forum module. Telling people to go patch their forum module is a nightmare I don't want to get into. So I'm going to fork it instead. I will replace the forum_get_forums code with a modified version that gets the additional information needed for this feature. This will be gathered in one of two ways: For people with the DNA module installed, it will use a denormalized table to be able to get it quickly. For people without the module installed, it will use a variation of the query being used now, probably using maulwuff's modifications.

Michelle

maulwuff’s picture

Sorry, I don't know David Strauss. What performance booster does he have?

I had an idea as I surfed around the issues and surrounding pages: Postpone the "link to last topic"-stuff. Don't include it to v. 1.
You wrote on the description page of advforum as your first sentence: "Advanced Forum is a 'glue' module..."
So, why don't we just "glue" together things provided by drupal and ship v1? That means, leave out things which are too hard to do, or take too much time. Advanced forum is developing towards a standalone module right now. At least v1 should be a "glue"-only version ;)

You have already put in lots of features and things which make the standard drupal-forum so much more useful! Even though there is no "jump to last answer"-link, for example.

My advice would be:
- stop all heavy to do features, including last answer-link for now.
- pull out the parts of those features from current alpha-code.
- publish last alpha
- bring in outstanding small issues (including the retheming of comments #268684: Retheming - one way forward !)
- publish first beta
- bugfixing
- beta ... beta? ... rc ? ... release v1

This would give people more trust in the module, if there is a stable thing they can build upon. I for myself am still stuck on alpha3. I am scared by the time consumption to resync every alpha to my needs and fit it to my theme. (But I think this is part of your tactics ;) )
That's because I would have to look up everything which has changed, see if it is compatible with my hacks, and so on.

Integrators view:
I am waiting for a stable one. In the meantime all my folks (and I) are happy with the features which were included in alpha3!
I think, smaller, but secure steps would bring more effectivness and development speed. It is a lot easier to fix or adapt just a hand full of big finished things, than plenty small unfinished ones.

Developers view:
To climb a mountain, you make small, secure steps, instead of big jumps from stone to stone.

end users view:
- Radical Innovation with looots of new features is good for the "wow" effect at the enduser but makes lots of trouble because of the tons of bugs.
- Continuous improvement puts stability first, uses small increments and has its success at the long haul. The End user is happy about stability.

So lets choose:
- more stable builds
- small, secure steps
- continuous improvement

I hope I was able to express myself in a way a native speaker can understand ;)

michelle’s picture

I understand but that's not going to happen. "more stable builds" means more branches for me to maintain. I do my best to make each alpha as stable as a real release but this module is under active development and things do get missed. Making more branches isn't going to make that any better and would more likley make things worse.

Michelle

michelle’s picture

Forgot to answer your first question. David is http://drupal.org/user/93254 and the module in question is http://drupal.org/project/dna .

Michelle

michelle’s picture

Ok, step 1 is done. Alpha 10/2 seems to be working ok. Now I need to get in a couple of things that people who contributed have been waiting for me to commit and then I'll address the performance issues. Alphas 11/3 should be coming fairly quickly and then the messy part begins.

Michelle

catch’s picture

If the issue is with overriding the /forum page only, then how about using http://api.drupal.org/api/function/hook_menu_alter (D6 only) to insert a custom page there instead to save a fork? Or have I misunderstood?

dami’s picture

Just some quick thoughts:

Yes, we can use hook_menu_alter in d6, and I think that's the way to go.
Have a redundant complex query (very similar to that in forum_get_forum function) in _advanced_forum_get_topic() is a waste. forum_get_forums() already gives us most info needed for last topic of each forum. As a first step, we just need to add the missing pieces in a custom advanced_forum_get_forums() function. This will cut the number of queries in half, as a first step. Then we can go in to further optimize the query if possible.

In d5, again using existing results (uid, uname and timestamp of last topic) from forum_get_forums(), I suspect a much simpler query is possible.

dami’s picture

This is rather a proof of concept using custom advanced_forum_get_forums() function. I don't have a large database to test against. So I just generated a small forum using devel (50 users, 1k nodes with 4k comments). The forum has 3 containers and 11 forums (total of 14 forum + containers).

Path: /forum
Original 6.x-1.0-alpha2
First run:
Executed 112 queries in 111.18 milliseconds.

Following run:
Executed 91 queries in 102.87 milliseconds.

Using custom advanced_get_forums()
First run:
Executed 94 queries in 71.13 milliseconds.

Following run:
Executed 73 queries in 60.99 milliseconds.

The result is more or less in line with my expectation. The improvement comes mostly from the fact last_topic related queries are cut in half by using own advanced_get_forums(). However there is little or no improvement on the query itself.

The implementation is certainly not optimal, I just needed some quick code, but I hope the info is somewhat useful. Since it's not really a patch and I am on Windows, so I just attached the whole module file. You probably will need to enable devel module, and 'rebuild menus' first, so the new menu alter is recognized.

dami’s picture

StatusFileSize
new38.5 KB

Are we still having the lost attachment issue? Upload again...

maulwuff’s picture

Forgot to add the code for the index:
ALTER TABLE {comments} ADD INDEX ( `timestamp` )

I ran multiple tests today. my comments table holds 52500 entries right now, with ~50 Forums/Boards

alpha10: 271 queries.
my mod: 273 queries.

queryexec is total query time.

without index
avg of 2x4 runs each:
queryexec    pageexec
3,16             4,33         alpha10
2,78             3,69         my mod     ,4 | ,6 better

with index
avg of 2x4 runs each:
queryexec    pageexec
2,44              3,56        alpha10	   ,5 | ,7 better compared to without index
2,15              3,33        my mod     ,6 | ,3 better compared to without index

times without _advanced_forum_get_last_topic:

queryexec    pageexec
1,9               2,7  

I think there is not much space left for improvement to my version. Perhaps someone with a bigger forum can do the same tests.

catch’s picture

Just a quick note that the core issue related to the underlying issue with all this is here: http://drupal.org/node/26966 - snufkin suggested storing 'last_comment_cid' in node_comment_statistics (which was taken out, for apparently no reason, four years ago).

Also, old patch to add the feature itself to core (which needs work, and doesn't deal with the paging issue), here: http://drupal.org/node/147045

maulwuff - could you roll your changes into a patch? I have a decent sized forum to test with.

edit: ideally a patch against D6 - since that's what I'm testing advforum on.

michelle’s picture

@catch - Would be awesome if you can test. Sorry I've been AWOL lately. I've been pushing aside all my other responsibilities to work on advforum and they started pushing back so I had to take a little break. I'm hoping I can get back to it next week.

Michelle

maulwuff’s picture

StatusFileSize
new3.67 KB

@catch:
I've replaced the whole function. (code is in first block of #1)

A patch is attached, but I suggest replacing it manually. Lots of in and out there.

@Michelle:
please activate trimming of white space at line endings in your editor. There were lots of hunks in the patch with just deleting whitespaces. ;)

catch’s picture

Status: Active » Needs review

Thanks! I'll review this over the next day or so.

ianshafer’s picture

I have a forum with ~1.7M comments. I installed the patch (by hand) and it greatly improved the performance. (Before patch it took over 5 minutes to load the forum list, after it takes ~5 seconds.

I'm running Drupal 6.2.

jwilson3’s picture

I had to fix a one line error in the $comment query regarding the comment author error to get everything working the way i needed.... but other than that, your patch worked excellently on alpha10 (applied with care by hand).

thanks for your work!

catch’s picture

StatusFileSize
new2.59 KB

Seems to work pretty well for me. Functional D6 re-roll for testing purposes.

catch’s picture

Hmm. This only appears to work on the main topic index. When in a container, I get n/a.

michelle’s picture

Status: Needs review » Needs work

Ok, I've spent a bit of time playing around with this patch and am not having any luck. When I add back in the db_rewrite_sql that the patch takes out, it chokes on the query and I can't figure out why. Plus there's the issue of the index. I don't want the module touching a core table. If someone wants to re-roll a patch that works with db_rewrite_sql, I'll take a look. In the mean time, I'm going to investigate the DNA option.

Michelle

catch’s picture

Yep. While it initially looked faster, I think that's probably because the query wasn't returning valid results. I can see DNA being used for a few important projects, so adding it as a requirement ought to be fine.

maulwuff’s picture

catch, do you mean something like /forum/50 ?
this works for me, too. Maybe you messed something up while porting it do D6? Or maybe there is another table structure in D6?

what's the advantage of db_rewrite_sql ? I don't see any, but messing things up on an optimized query.

catch’s picture

maulwuff: db_rewrite_sql ensures that access restrictions are honoured by the query - so forum_access, tac_lite etc. While you might want to hack it out on your site if you know you're not using any of these, it's not good for a generalised solution.

michelle’s picture

"what's the advantage of db_rewrite_sql"

Access control. Not everyone has their forums visible to all users. So that has to be in there. I almost missed that you took it out since it had been so long since I looked at this stuff the first time. When you're providing patches, it's not a good idea to remove things you don't understand.

Michelle

maulwuff’s picture

I had a look at db_rewrite_sql(), but did not find anything on access control.

http://api.drupal.org/api/function/db_rewrite_sql/5
"Rewrites node, taxonomy and comment queries. Use it for listing queries. Do not use FROM table1, table2 syntax, use JOIN instead."

the onyl important function in db_rewrite_sql is
http://api.drupal.org/api/function/_db_rewrite_sql/5
"Collects JOIN and WHERE statements via hook_db_rewrite_sql(). Decides whether to select primary_key or DISTINCT(primary_key)"

After reading closer, I encountered "hook_db_rewrite_sql", which is the thing that matters here:
The key message is NOT "db_rewrite_sql does access control", BUT "db_rewrite_sql ensures hook_db_rewrite_sql() is handled properly".

That explains, why there is no single word on access control in the API.

I don't want to justify leaving out db_rewrite_sql, but give you the right understanding of what db_rewrite_sql really does. If there is no module which implements hook_db_rewrite_sql, there is no query filtering (=no access control) at all! That's an important difference.

Back to the patch: my forums are restricted very tightly. I'll have a look at them again.

michelle’s picture

Yes, I know what db_rewrite_sql does. And it needs to be there. I appreciate you taking the time to work on this but I can't use this as it stands.

Catch and I talked about this on IRC and I think we have a plan that will work. I should have something by the end of the day today.

Michelle

michelle’s picture

Status: Needs work » Active

Ok, I committed a change to the D6 branch. I still need to backport it to D5. At this point, it should be about the same speed as the core forum module. I'm still planning on investigating David's DNA module but I'll add that in as an option for people with huge forums. For normal sized forums, this should be just fine.

Leaving this active until it's on the D5 branch as well.

Michelle

catch’s picture

CVS upped and this is much, much faster. Nice one Michelle!

michelle’s picture

Yay, good to hear. I had devel working hard turning out thousands of nodes on my dev site so I could give it a real world test. While it could use more testing, I think it's fairly safe to say if core forum is fast enough for your site than advforum will be, too. Thanks for the tips. I was so focused on getting the queries to match my vision that I had tunnel vision and didn't realize just changing my vision a bit made this work a lot better.

I'll get this to D5 ASAP, hopefully by Monday. It's going to be a bit trickier because I have to backport some D6 functions and work around the lack of hook_menu_alter().

Michelle

michelle’s picture

Status: Active » Fixed

Ok, backported to D5. I'm marking this fixed, though I still want to look into DNA at some point to speed this up beyond what the core forums can do.

Michelle

Kirk’s picture

just wanted to see if I follow this correctly. The next release has better performance on forum-list? If so, that's great.

catch’s picture

sparq-l - yes, it's a lot better.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

mennonot’s picture

Version: 5.x-1.x-dev » 5.x-1.0-alpha17
Status: Closed (fixed) » Active

After reading the performance issues on this forum, I decided to upgrade a client's site to Alpha17 (from a June 2008 build) in hopes of dealing with a persistently long time (over 5000 ms) to run the advanced_forum_get_forums query when loading the /forum page. I upgraded the site and I'm still getting results like this from the logged queries:

5038.79 advanced_forum_get_forums
430.44 advanced_forum_get_forums

Here are the stats for our forums:
Topics: 103179, Posts: 1714493

Any suggestions?

Michelle, if you're interested in poking around on our site as a case study in a large number of nodes and comments, we'd be happy to give you access.

michelle’s picture

Status: Active » Closed (fixed)

This isn't an AF issue; it's core that's slow. You want #145353: Forums performance improvements if you want to help.

Michelle