Hi Michelle... again, thanks a ton for this module :)

I've been working on porting my site/forum over to Drupal from a custom CMS and have run into a snag that it looks like Advanced Profile may be able to remedy (as the Drupal core forum fixes won't come around until 7.x).

My issue is partly related to, though regarding a different query from this issue http://drupal.org/node/241982 -- Create table for advanced_forum_get_all_last_topics()

First some quick background info:

My (local development copy) forum has about 95,000 topics (nodes) and 711,000 replies (comments). I've imported these manually using SQL queries, updated all the related tables/sequences, and ran this script (the top one) to build the node_comment_statistics table (the one in Devel module has a bug, and this seemed to work better). I noticed right away the first time I viewed the main forum page (both with and without Advanced Forum installed) that there was a significant slowdown happening somewhere (15+ seconds to load the main forum page, or containers that hold subcategories... however forum nodes themselves load just fine in well under 1 second).

I turned on Devel module and noted the queries that were causing the trouble were all those related to "forum_get_forums()"... the biggest culprit looks like the instances of forum_get_forums that issue #241982 is designed to fix. However there is one additional forum_get_forums query which is responsible for generating per-category topic/post counts on the fly, and with a large number of topics/posts this query alone takes 3-4 seconds to run (possibly a lot longer on a live server or with more nodes/comments).

This is the query from forum.module I'm referring to, which produces topic/thread counts on the fly:

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.nid = r.nid WHERE n.status = 1 AND n.type = 'forum' GROUP BY r.tid

My feature request is to collect, store, and retrieve these counts in a similar way to what was suggested for "last topics" in http://drupal.org/node/241982#comment-811523
Being able to have a pre-calculated number to look up with a simple SELECT query instead of a complex query counting tens or hundreds of thousands (even millions) of nodes/comments on the fly would completely eliminate this slow down and scale infinitely. It looks like they might be planning on fixing the inefficiencies of forum_get_forums for Drupal 7, but for the next year or two I think Advanced Forum is the ideal candidate to solve this :)

Though I'm too new with programming to be sure, my thought is that initially you could populate this table with the "as current" counts/values to start off on, building the initial values the Drupal core way with the forum_get_forums query. After that you could, as mentioned in #241982, hook into the creation of forum nodes and comments and increment the counts as new content is added. You'd then have to stop the core forum_get_forums queries from running and use your own in their place (or else the performance benefit would be lost), and place the nice instantly-available counts into the forum listing instead.... I don't know if/how to modify forum.module's functionality in this way (maybe hook_db_rewrite_sql() does this?... I'm still getting my feet wet with PHP/making Drupal modules). I'm guessing you have a plan for that as the same is required for the last topics issue.

Thanks for all your amazing work Michelle, and please let me know if there's any way I can help. If it would be helpful or provide ideas, I'd be happy to release the source code of my custom CMS to you (not written by me, I'm not a great programmer yet) which solves the same issue I'm describing (note the instant load times on neverside.com)... the programmer who wrote it was fanatical about squeezing out performance :D

Thanks!

- David

Comments

michelle’s picture

Any chance this fanatical programmer wants to join the Drupal community? :)

There's a couple problems here:

1) My SQL is horribly rusty (4.5 years since I worked with it professionally). I've been trying to solve the last topic in forums problem for months and am going around in circles and not having any luck.

2) Advanced forum is an add on to core forum, not a replacement. So far I've managed to do everything without hacking core. I haven't had a chance to research this request, yet, but I suspect it would require me to do so.

A few of us had planned on pushing some major changes into core forum for D7 but that's getting pushed off to D8, now. Because of that, I may end up making advforum a temporary fork of core rather than an add on. That's not a decision to be taken lightly, though, so I will have to really look at the pros and cons of it. This issue would be one pro as it would be doable. But we'll have to see.

Michelle

dnewkerk’s picture

Hi Michelle -

Haha unfortunately no he wouldn't likely join us :( ... I haven't been in touch with him in a year or so (he decided to move on from our project, which he wrote the entire framework/API for but left little documentation, which is what set me off in search of an alternative system that I'd be able to manage myself). I've had people assess his code though and state that it was spectacular ;) (it's just incomplete). However I do have some other members on my forum who've been helping me with forming the trickier SQL queries needed to transfer over to Drupal, and it's feasible they may be able to lend a hand (they have no experience with Drupal, but are good with MySQL/PHP... one of them is even in the midst of writing a CMS framework of his own). That said, I'm even willing to invest some money into hiring help if it would help us accomplish this - I could initially budget at least several hundred dollars or more to get this performance bug worked out. In fact, to avoid this getting pushed all the way to Drupal 8, I'd be willing to put in more funds and work on a fundraiser to raise several thousand dollars to hire a developer if that would get things solved in time. If you have them handy, would you mind directing me to the issue(s) where the plans for waiting for 8 are being discussed? If the performance issue itself can't be solved in time, I think I could resort to caching to allow the forum to be functional (e.g. Gallery must be caching or have their own solution to the forum_get_forums queries, as their forum loads fast even for logged in users).

My SQL knowledge is just being born haha, so rusty-from-professional is probably a good step up :D Though I'll definitely try to apply what I've learned so far to making suggestions, and ask for advice from my members.

Hmm definitely not a fan of forking Drupal if it can be at all avoided... that was what initially sent me packing from my original plan of using vbDrupal instead of the core Drupal forum. As it's only a single block of code responsible for those performance killing queries (particularly line 719 and line 744), I'd sooner go for a simple setup guide like "Replace lines x and y in forum.module with this block of code". That appears to me to be all that would be required of a hack if Drupal's hooks don't save us in this situation... and that would be a lot more manageable than a fork of Drupal itself (e.g. I already have to take some similar steps when upgrading Drupal, such as manually copying jQuery update into misc, ImageMagick into includes, etc - so one more step is no biggie haha).

Proposed idea/rough solution
If we are able to store the needed statistics and last comment data (e.g. cache it in a sense) into our own table (let's call it "forum_statistics" table for now, for lack of a better name), then either by means of a Drupal hook (preferably) or a tiny hack, replace the 2 queries in those two lines of forum.module with a greatly simplified SELECT query to grab our pre-stored data from our forum_statistics table instead of asking Drupal to recount/refetch everything on the fly, everything would work perfectly and the performance issue would be fixed adequately. The trickier part is keeping the forum_statistics table up to date as new topics and posts are created. I don't know yet how to code that, but I believe it can be done through Drupal hooks and no hacks (based on Bdragon's comment and what I've tried to learn so far about those mentioned hooks). It's basically advanced_forum module sticking its nose into node/comment creation process (which Drupal allows I believe) and asking "Hey, is this a forum node or a forum comment? Oh it is, ok I will update the appropriate counter in my forum_statistics table then". The part that might possibly require the small hack for forum.module is just whether we can use a hook or not to override the forum_get_forums queries... in particular I need to ask people for clarification on what the capability of hook_db_rewrite_sql() can actually do - e.g. can it only augment a query, or fully replace it?). Fixing it all officially in Drupal core is another matter (probably more involved, though basically the same idea involving a new table for stats and a having forum.module be the one stepping into node/comment creation to ask what kind of content it is and updating the stats) but this would likely do the trick for Advanced Forum in the mean time.

If you were pondering a Drupal fork, might be you open to a small hack instead if we can work it out that way? I could be mistaken, but at this point I believe if a hack is needed it would be a very tiny one, and localized to just forum.module and within only 2 or so lines of code. We may even be able to make the hack optional for those who don't need it.

Anyhow... I don't know yet how to make patches, but I'll play with some ideas on the queries in both forum.module and in Advanced Forum and see if I come up with anything. I might be able to get as far as making a static/non-updating forum_statistics table as a proof of concept, and get forum.module and Advanced Forum to get the needed data from it. I'll try but after that someone with more module writing experience may be a better choice for writing the part of Advanced Forum that hooks into node/comment creation. If I come up with anything I'll try to make a patch or at least clearly describe to you the code I've changed.

Talk to you soon :) Let me know if you have any feedback or ideas... I shall be around :D

- David

michelle’s picture

I'm just heading off for the night. Took a break from this to watch the last couple eps of Doctor Who (excellent eps, I might add). I wanted to say a couple things quick:

1) When I say fork, I mean the forum module, not all of Drupal. It's getting to the point that my working around core in advanced forum is becoming a hinderance. Telling people to make a couple of changes to forum.module is not an option. Trust me, simply getting people to add a few lines to template.php has been a challange. ;) What I'm contemplating is pulling all the forum.module code into advanced forum and removing it as a requirement. So advanced forum would replace core forum instead of augmenting it. If I do this, I will make as few changes to what was the core forum module as possible and always with an eye towards submitting those changes as patches to core. If I can succeed in getting them into core in D7, then I can "unfork" the module and go back to being an add on in D7.

2) Bdragon's suggestion of using a table is what is in alpha 9 / alpha 1. The problem is that, after spending a week getting it working, I realized that it falls apart with node access. As soon as you have a node access module that allows some people to see some forum posts that others can't, any sort of caching or pseudo caching of the last topic is problematic. In my case, I'm fairly sure (though untested) that it was secure in that a person wouldn't see a title they weren't allowed to. But the opposite is a problem where the last post in the table would be replaced with something not the last post if the person viewing didn't have permissions to see the last post. Then people with permissions would get the wrong one. Plus, the code was getting more and more complex as I kept trying to fix it to handle one exception after another. And, as you saw, it was still slow. In frustration, I ripped all that code out and replaced it with another attempt at using straight queries, which is what is in the current dev. And that's even slower yet. So I don't know what to do.

If you have the funds to hire someone who can fix this, that would be awesome. For me, this is a hobby. Aside from the occasional bounty for a feature, I'm not getting paid for this module at all. So I can't afford to hire anyone.

Ok, so that wasn't exactly quick... LOL! Hopefully that clarifies the situation a bit.

Michelle

dnewkerk’s picture

Hi Michelle -

Thanks for the clarifications... I also noticed your new white-flag thread today and that looks promising :)

No Drupal fork... whew! (big sigh of relief haha). I misunderstood and thought you meant forking Drupal itself. Doing just the forum module and contributing patches from it to core sounds like a good idea to me if that's the route you choose to take (if it comes to it). On my end I've come to the conclusion that no matter what, unless the genius fellows helping with Advanced Forum can come up with a solution (I have faith!) then I'll have to hack forum.module to get rid of those queries (ideally replaced with some better queries to get at the same info, but worst case, I'd give up that statistics column in the forum completely for now... maybe I'd go a route more like the setup of SitePoint's forum). I installed the forum.module patch today from Advanced Cache module, and that made everything work terrific (haven't tested with Advanced Forum module installed yet though)... with the Advanced Cache forum patch and the paths patch, load time was all the way under 200ms. But... the house of cards tumbles down as soon as someone posts anything new and the cache is reset (*sigh*)... as soon as they do, the next person to load the forum gets the full delay again (15+ seconds minimum). A combination with additional caching and performance tweaks might be able to get things into shape enough, but I've haven't found a solution yet that doesn't end up with the deadly slow down for the poor unfortunate soul who visits right after a new post has been made. On an active forum that will likely be just about everyone :P

Hmm the node access part throws a big wrench into my idea haha (or rather, your idea I was repeating without realizing). Hopefully the Drupal ninjas will have mercy on us and lend a hand :D

So far as hiring someone to help... I don't necessarily have the funds myself (post college student trying to make ends meet in southern california haha), though I want this to work so badly that I'm willing to give up something else in life to make it happen :D And Drupal is worth it. In any case, it would help many other people as well, versus me spending whatever development funds I have trying to improve my own proprietary system. Anyhow, if it comes to that and we end up unable to solve things ourselves, let's figure out what we believe needs to be done and what direction is best to go in that will eventually have a chance of helping and being adopted by core rather than branching in our own direction (e.g. figure out what we want to accomplish, even though we can't code it ourselves). For myself, I'll commit up to $400 up front... if we need more, I'd like to help raise funds, and could even raise my commitment further if needed (just have to spend some extra time in the old IT support job I do on the side haha).

Thanks Michelle and please keep me posted if/when it comes time to hire help :) Keep up the awesome work. As before, please let me know if there's anything you'd like me to test.

- David

michelle’s picture

As I said in the other post, I decided forking forum, at least temporarily, is the only way to do this right. So I'll look into getting this query under control as well.

Michelle

michelle’s picture

Status: Active » Closed (won't fix)

At some point I will be investigating David Strauss's DNA module for handline performance issues on high traffic sites, which will take care of this issue, but I have other things I want to get done first.

Michelle

jons.drupal.org’s picture

Title: Pre-calculate and store topic/post counts for forum categories, eliminating forum.module's slow forum_get_forums query » drupal core

Hi, everyone i am a beginner i was wondering if anyone can help and send the drupal core via my email [removed email so as to not feed spambots]. please don't be mad at me if i am not doing the right thing.

michelle’s picture

Title: drupal core » Pre-calculate and store topic/post counts for forum categories, eliminating forum.module's slow forum_get_forums query

I'm not mad, but you do need to head on over to the forums if you are stuck on something rather than hijacking ancient issues. FWIW, though, if you can't download Drupal the normal way and need someone to email it to you, you are going to be nothing but frustrated trying to use it. You might want to have a look at Wordpress.

Michelle