With comment.module enabled, a row in node_comment_statistics is created upon node creation. If nodes are created prior to enabling comment.module, there is no row in node_comment_statistics for the UPDATE query in _comment_update_node_statistics() to act upon.

Comments

greggles’s picture

So, just to throw out some possible solutions:
1) when the module is enabled add records to the table for all nodes that are not already in it
2) when a comment is added check for the existence of the record prior to trying the update
3) check the success of the update (if possible) and if the updated affected 0 rows then do an insert of the proper values

Maybe there are some more, these just came to my head. I think I like either 2 or 3 the most. #1 seems like it could be a problem if you have many nodes and then enabled comment module because the update may take a long time. 2 and 3 leave us with some data inconsistency, but are more likely to work in more cases.

allie micka’s picture

#4: The least invasive thing you can do is modify the node_comment_statistics function to DELETE the stats for the node, and then INSERT a new record containing the node comment statistics. This will be cleaner and a teensy bit more efficient than SELECT + (UPDATE|INSERT)

This will correct the present behavior, where nodes created prior to turning the comment module on will never have corresponding rows in the node_comment_statistics table, but any node that has a comment will be noted correctly.

The question is, how bad of a deal is that? Are there queries that are doing some kind of an INNER JOIN against the node_comment_statistics table? If so, we should figure out how to put those legacy entries back in (e.g. #1)

If this is not a problem, why are we storing a record for every node anyway? Can we just store an entry in node_comment_statistics for nodes with >= 1 comment?

greggles’s picture

I believe delete/insert would require two locks. For most users (myisam tables) that means two table locks. That seems potentially expensive because it would block selects on the table (e.g. any page that displays the count would have to wait for these two operations). Users of InnoDB tables would only have row locks, but that is not as common of a storage engine.

It's all conjecture until we do a benchmark, of course. It could be that option 1 is fast enough for most situations.

The first thing that _comment_update_node_statistics($nid) does is:
$count = db_result(db_query('SELECT COUNT(cid) FROM {comments} WHERE nid = %d AND status = %d', $nid, COMMENT_PUBLISHED));

So, we already have an if structure for this being the first comment vs. it being an update. I think the simplest thing is to change around those queries to so that the else (which is the "this is the first comment" block) does an insert instead of an update.

Allie raises a good point that we probably also need to look for inner joins on node_comment_statistics to make sure this won't cause other problems.

bjornarneson’s picture

I think the simplest thing is to change around those queries to so that the else (which is the "this is the first comment" block) does an insert instead of an update.

I think that this would break on cases in which all comments were being deleted from a node. $count would return zero, but we'd already have a row in node_comment_statistics and wouldn't want to insert another one.

bjornarneson’s picture

Forget I just wrote that. Not thinking clearly. :)

greggles’s picture

I think your comment in #4 makes sense. Pretty tricky, I guess.

greggles’s picture

Status: Active » Needs review
StatusFileSize
new2.95 KB

Attached is a patch which does a left join in the first query to check for a record in the node_comment_statistics and if that join shows there is no record in the table it performs the queries as inserts instead of updates.

The comment I added is to clear up that the if(!isset) uses result values from an earlier query (which seemed odd to chx in his review so he asked for a comment).

I believe we could remove the insert from the "no comments" case because my limited testing showed that case only happens when we delete a comment in which case it had to be made in which case there should already be a record in node_comment_statistics. However, the current solution seems more robust at very little extra cost: we already have to have the left join so the cost is just one !isset().

Reviews/improvements welcome.

ChrisKennedy’s picture

Sorry, what feature is "comment tracking" referring to? I'm not sure what the testcase would be, otherwise I would verify this patch.

greggles’s picture

One manifestation of this issue is in the search results page.

Steps to test:
1. disable comment module
2. create content (not forum content)
3. enable comment module
4. edit the content created in 2 to enable comments
5. comment on your content created in 2
6. create a new piece of content and comment on it - use words in this content that are similar to the content in step 2
7. enable search (and run cron.php)
8. search with a term that will return the content created in steps 2 and 6

Expected results:
Normal search results page

Actual results:
Content from step 6 appears normal with the last line reading something like "Page - greg - 12/10/2006 - 21:34 - 0 comments"
Content from step 2 is listed with the last line reading something like "Page - greg - 12/11/2006 - 22:52 - comments"

So, this is actually _still_ a problem regardless of my patch. Once a comment is made on each node the search results will look right, but until a comment is added the data shows up incorrectly. One solution would be to provide default values in the case that node_comment_statistics doesn't have all the right data which seems to be what the tracker page does (tracker is another page which uses node_comment_statistics). Or, we can document these new corner cases and commit this as a "basically solves the problem in most cases and for the ongoing situation so that's good enough".

One solution to this would be the solution I proposed as course #1 in comment #1. I can only conceive of doing that as 3 or so queries which could take a while on a site with a lot of content. Even one query might be too much.

I'm happy to keep working on this, but would appreciate some direction on which way to go. I could create a patch which does the item in #1 and benchmark that if it seems appropriate. Or I could look into creating defaults and confirming the left joins to node_comment_statistics if we want to include that into this patch.

greggles’s picture

StatusFileSize
new3.47 KB

Attached problem fixes the " comments" displays instead of "0 comments" if there is no record in node_comment_statisticsf problem that I described in my last comment.

The solution was to use an intval() on the return of the count which I have added a comment because it's a little unclear why that might be a "false".

This is the "provide defaults for joins to node_comment_statistics" solution and may not be 100% complete, but it's a start. I'll audit the rest of the joins but wanted to provide this to show what this solutions looks like.

ChrisKennedy’s picture

The comments need to be modified to have a space after the //, use sentence capitalization, and end with a period.

ChrisKennedy’s picture

So, is this really a critical bug? A display error in the search results for a fairly rare test case seems like a minor bug. Does this cause fatal errors anywhere or otherwise break things?

greggles’s picture

StatusFileSize
new6.69 KB

Ok, I don't know if this is a critical, but here's another patch and another test scenario:

1. Disable comment module
2. Create content
3. Enable comment module
4. Enable tracker module
5. visit the tracker page (?q=tracker)

Expected results:
See your content

Actual results:
Content that was created when comment module was disabled is not visible in the tracker.

So, those are the two scenarios (in core) that I found that are affected by this issue. I don't know if that makes it critical or not.

With this patch I believe that it now follows the comment standards and have addressed every situation in core that could be affected by the problem identified by barneson.

Also, I noticed another place that could be affected by this solution path: the comment_nodeapi when $op = load will return an empty array. I was thinking that we could test for the array being null and, if it is, do a _comment_statistics_update but node_load is already a relatively slow operation so that seems like a suboptimal solution.

Steven’s picture

The idea seems ok, but the code is a bit messy. There is quite some duplication going on. Also, the code comments need to be more to the point.

However, we could consider an alternative: when comment.module is enabled, insert the missing records. This should be a relatively fast operation that can be done in one go. Something like:

$result = db_query('SELECT n.nid, c.comment_count FROM {node}  LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid  WHERE c.comment_count IS NULL');
while ($node = db_fetch_object($result)) {
  db_query('INSERT INTO {node_comment_statistics} VALUES (%d, ...)', $node->nid);
}
Steven’s picture

Status: Needs review » Needs work
Steven’s picture

StatusFileSize
new920 bytes

It's simple really... any nodes that have been posted since comment.module has been disabled, have no comments. And no other comments have been posted either, so all existing records in node_comment_statistics must be up to date.

Patch attached.

Steven’s picture

Status: Needs work » Needs review
Steven’s picture

Testing on my powerbook shows that it takes about 10 seconds to insert 15000 rows. I find it highly unlikely that someone would leave comment.module turned off for that long, only to reenable it. Or that you would turn on comments on a 10k node site. I think the only realistic scenario for this is when you import content. In that case, you should really be using node_save() or generating the right node_comment_statitics rows yourself.

Given all this, I think the performance is good enough to fix the problem.

neclimdul’s picture

Stevens patch seems to work for me. +1 for a simple fix.

eaton’s picture

Status: Needs review » Reviewed & tested by the community

Well. That certainly works. It's exactly the sort of thing hook_enable() was designed for, too. Anyone see any problems with this? I just tried it and it appears to work nicely...

matt westgate’s picture

I like Steven's approach. It seems like comment module should be responsible for playing catchup rather then node module "pretending" like comment module is always there. In Lullabot land for example, we implement comments as nodes these days rather than using comment module and so avoiding extra meaningless queries are appreciated.

greggles’s picture

This looks like the right way to me, it was my first idea...

My only question is if it's really scalable - laptops are faster than an overburdened-shared-host and overburdened-shared-host is the main location of Drupal installations.

Steven’s picture

Overburdened nodes aren't going to run 10k node sites. The really big sites will have a dedicated server. Plus, we have at least 30 seconds + db run time (depending on config), to do this update. That's a lot of nodes that can be processed.

dopry’s picture

To get a little more efficiency out of the enable hook you could try something like:

INSERT INTO {node_comment_statistics} (nid, last_comment_timestamp, last_comment_name, last_comment_uid, comment_count) SELECT n.nid, n.created, NULL, n.uid, 0 FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE c.comment_count IS NULL

So everything is done in the database....

dries’s picture

dopry's suggestion would solve the issue of getting a timeout.

Steven’s picture

Status: Reviewed & tested by the community » Fixed

Great! MySQL run time is not included in the PHP time limit. Committed to HEAD.

dopry’s picture

Status: Fixed » Needs review

This patch with the single db query should resolve most potential timeout issues, unless you have super gigantor tables. I'm not absolutely certain this query will work as is on pgsql. I've tested on mysql 4.1 with no problems. Should work with mysql >4.0.14, will break on earlier versions of mysql due the the join against the table we're inserting into.

Not sure what the lowest version of mysql we're supporting is, but could probably be rewritten to use a temporary table, which would still probably be better than a mess of individual queries.

dopry’s picture

Status: Needs review » Fixed

nm already hit. and I forgot the patch.

chx’s picture

Status: Fixed » Active

Prior to MySQL 4.0.14, the target table of the INSERT statement cannot appear in the FROM clause of the SELECT part of the query. This limitation is lifted in 4.0.14. In this case, MySQL creates a temporary table to hold the rows from the SELECT and then inserts those rows into the target table.

Either we raise the MySQL limit to 4.0.14 or we roll a patch which simulates this. We have temporary table query...

webernet’s picture

Since we are very close to a release candidate, and Drupal 5 is still compatible with MySQL 3.23 (except for this bug), I would like to see this fixed so version 5 will continue to work on as many systems as possible.

It's already been decided that Drupal 6 will be MySQL 4.1+...

chx’s picture

Status: Active » Needs review
StatusFileSize
new953 bytes
webernet’s picture

StatusFileSize
new1.12 KB

Attached patch explicitly creates the temporary table in order to fix the issue on MySQL 3.23.

webernet’s picture

StatusFileSize
new1.12 KB

Remove unnecessary {}.

dopry’s picture

Status: Needs review » Reviewed & tested by the community

Seems to be the ticket. keeps all the dirty work in the DB. Meets our mysql version support target. still works in my mysql4.1/php5 testing environment.

Steven’s picture

Status: Reviewed & tested by the community » Fixed

Good call. Tested and verified locally. Committed to HEAD.

serval-1’s picture

The tracker module depends on node_comment_statistics.
As discussed in issue #87590, this patch solves problems with the tracker module when the comment module is enabled.
But when the comment module is disabled, nodes are still not appearing in the tracker. A patch for RC1 is provided.

ChrisKennedy’s picture

Status: Fixed » Needs work

This issue has three more tasks imo:
1. It needs an _update function in system.install for 5.x upgrades (if comment.module is already enabled).
2. It needs to be backported to 4.7
3. It needs to be run on drupal.org, see http://drupal.org/node/105715 (or if #1 is done, it will be done automatically when d.o is upgraded to 5.0).

Steven’s picture

Status: Needs work » Fixed

IMO this issue is not so critical as to require a backport for 4.7. It's a side-effect that only results when you turn comment.module on/off on an existing site. Any 4.7 site out there that has had this problem has had to deal with it already.

The same reasoning could be applied to the update: yes, it would be nice, but a stand-alone PHP snippet can do just as well. In fact, all that needs to be done is to disable/enable comment.module.

Same for drupal.org.

In any case, these should be opened as separate issues. As Drupal.org is going to upgrade to 5.0 soon, I see no reason to hurry with this, or have it block a release.

Anonymous’s picture

Status: Fixed » Closed (fixed)