Forum count incorrect when using access control modules

crischan - January 28, 2007 - 22:17
Project:Drupal
Version:6.x-dev
Component:forum.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs review
Description

Hi,
I'm using forum_access on a private forum and noticed that the "posts" number on the main forum site is wrong. It's always too high. I have tracked the problem to line 724 in forum.module where a $sql = db_rewrite_sql($sql); adds the access control sql to the original sql. In my opinion some inner join in there changes the SUM(l.comment_count) in the original sql.

I wont have time to look into this myself for the next weeks, but if anyone can identify the problem or even come up with a solution...

[Note from maintainer: See also #242605: Wrong number of posts is displayed for a lot more discussion on this issue.]

#1

merlinofchaos - January 28, 2007 - 22:51
Project:Forum Access» Drupal
Version:5.x-1.6» 5.x-dev
Component:Code» forum.module

Ugh. poll.module had a similar problem.

What happens is that if you end up with multiple records that give you a grant to the forum, you get the counts doubled. Forum access module can't do anything about this; the query needs to be redone to account for this. :/

#2

crischan - January 29, 2007 - 14:05
Status:active» needs review

Ok, during lunch break I went over it and found a quick fix for the forum.module sql to return the right post count when multiple grant records are returned for a forum.
The fix is to give a DISTINCT to the SUM-function in the query. See attached patch. Works fine on my forum.

AttachmentSizeStatusTest resultOperations
forum_post_count_fix.patch894 bytesIgnoredNoneNone

#3

crischan - January 29, 2007 - 14:07
Title:Forum count gets wrong» I forgot

Really, this behavior should be fixed in db_rewrite_sql. This is of course a more sensible thing to touch, so I'll leave that for the maintainer of that function.

#4

spatz4000 - January 29, 2007 - 14:51
Title:I forgot» Forum count gets wrong

#5

merlinofchaos - January 29, 2007 - 16:36

I'm not sure this is fixable in db_rewrite_sql -- I've given it a fair bit of thought and it's very difficult to properly apply the DISTINCT keyword in a way that makes sense. Good call on this patch though; I will give it a real review a little later; getting this into core is a good idea.

#6

RobRoy - February 7, 2007 - 03:14

Marked http://drupal.org/node/116687 a dupe of this.

The patch fixes some TAC issues I was having so gets a +1 from me.

#7

drumm - July 18, 2007 - 07:18
Version:5.x-dev» 6.x-dev
Status:needs review» needs work

As far as I can tell, DISTINCT in this context, in both MySQL and Postgres, means distinct values, not distinct rows from the original table. Maybe I am reading the documentation wrong, but I think that if two nodes have the same count, they will not both get added to the sum since the values are the same.

#8

jjrey - January 20, 2008 - 03:27
Title:Forum count gets wrong» Forum count gets wrong (alternate patch)

I have tested the attached patch with Drupal 5.6 + mysql 4.1.22. It should fix forums post count and poll count problems where they get overcounted if a user has multiple roles. It also does not have the problem that the above patch has where only unique comment counts get counted.

The SQL generated may not work on other databases.

To apply: cd modules/node
patch -p0 < node_patch.diff

AttachmentSizeStatusTest resultOperations
node_patch.diff1.06 KBIgnoredNoneNone

#9

jjrey - January 20, 2008 - 03:31
Title:Forum count gets wrong (alternate patch)» Forum count gets wrong

didnt mean to change issue title...

#10

Liam McDermott - February 28, 2008 - 21:46

#11

techgy - June 2, 2008 - 19:55

Has this issue been addressed yet for Drupal 6.2?
It should be fixed in the module and not patched :)
Another option would be to remove the count all together. If it cannot be easily corrected, it's useless to leave it there.

#12

yelloroadie - July 27, 2008 - 05:07

I am still having this issue in 6.3, but adding in "DISTINCT" seems to solve the problem also (it's at line 509 in 6.3). I'm thinking this really should be committed as a permanent change?

#13

aaron - August 20, 2008 - 18:46

I think this is a duplicate of #242605: Wrong number of posts is displayed. But this contains information that doesn't, so I'll leave them both open (but pointing to each other).

#14

mpg - September 11, 2008 - 06:13
Priority:normal» critical

Hi Team,

I am have only been working with drupal for just over a year now but am a web engineer at the #1 aquarium in the US so I am picking it up fast.

I am working on a site (drupal 5.5) and have been looking and researching the best way to add permissions to the forum module but have not found a solid solution. The Forum Access module is one that i keep reading about but there is this count issue. I have also researched the Taxonomy Access Control module but hear that it is heavy and will hurt performance and Taxonomy Access Control Lite does not have the control i am looking for...

Forum permissions seems like a simple, must have feature when setting up a forum...so what do you folks recommend keeping in mind i would like something that works and doesn't break anything once installed? Is

thank you.

#15

spatz4000 - September 11, 2008 - 12:32
Priority:critical» normal

Please don't hijack threads.

#16

Michelle - September 14, 2008 - 22:11
Title:Forum count gets wrong» Forum count incorrect when using access control modules

I marked the other thread dupe and pointed to it in the OP here and also am changing the title to reflect the problem better.

Michelle

#17

mpg - October 29, 2008 - 06:35
Priority:normal» critical

Drupal Team, this seems like a CRITICAL issue to me.

Who is going to use this module knowing that it displays the wrong "Post" number?

This module is almost complete and once this issue is fixed it will be the best option for controlling permissions in forums and will be added to Drupal core.

We need support from the Drupal wizards, be a hero and help fix this!

#18

Michelle - October 29, 2008 - 14:02
Version:6.x-dev» 7.x-dev
Priority:critical» normal
Status:needs work» duplicate

Incompatibility with contrib isn't really a critical issue. At any rate, this issue looks like it will be taken care of by #299176: Replace db_rewrite_sql() with hook_query_alter().. If that alone won't solve it, then this needs to be set back to active rather than a patch as the patch on this issue is no longer the answer.

Michelle

#19

mpg - October 29, 2008 - 19:09
Version:7.x-dev» 6.x-dev
Priority:normal» critical
Status:duplicate» needs work

Michelle,

what do you mean by "Incompatibility with contrib isn't really a critical issue." ? What is "contrib" ?

I am using 5.5 so #299176 will not the folks with older versions.

thanks

#20

Michelle - October 29, 2008 - 19:30
Version:6.x-dev» 7.x-dev
Priority:critical» normal
Status:needs work» duplicate

Contrib is short for contributed module. This issue only happens when used in conjunction with a contributed node access module such as forum access. It will be fixed, presumably, by the fact that db_rewrite_sql is being replaced in D7. Hence me marking this as a duplicate of that one. I bumped it to D7 because bugs are fixed in the latest version and then backported. Backporting this is probably not possible, however, as it is part of a much larger patch.

Michelle

#21

Frank Steiner - October 31, 2008 - 11:45

So that means the issue will stay in D6 forever? :-(

#22

Michelle - October 31, 2008 - 13:20
Version:7.x-dev» 6.x-dev
Status:duplicate» active

Well, seems I was wrong. I talked with some others on IRC and since the fix for this is not possible to backport, and since this doesn't change the API, it's possible this could get into D6 if someone is willing to write a patch and get some reviews on it.

Setting the status to reflect this.

Michelle

#23

Frank Steiner - November 4, 2008 - 08:21
Status:active» needs work

Hi,

I applied the patch to drupal 6.6 but it doesn't count posts by anonumous users. So for a forum with one topic and 3 replies by anonymous users, the forum overview will say "1 topic, 1 post" instead of "1 topic,4 posts".

Without anonymous posts, the patch works fine in all my forums.

#24

carnage - July 2, 2009 - 13:02

maybe something like:

... count(distinct(n.nid))/count(n.nid) AS ratio, sum(l.comments) / ratio ...

would solve the problem.

Edit:
The above didn't work in every case; i eventually fixed this problem with these lines of php code

//regex magix
$regex = '#INNER JOIN \{node_access\} (na ON na.nid = n.nid) WHERE (.+) AND #';//n.status = 1 \)#';
$replace = 'INNER JOIN ( SELECT DISTINCT na.nid FROM {node_access} na WHERE \2 ) \1 WHERE ';
$sql = preg_replace($regex,$replace,$sql);

Placed after db_rewrite_sql;

I have a patch, but its for advanced forum which uses the same query but different logic following it.

#25

DanielTheViking - September 3, 2009 - 15:10

Subscribing.

#26

salvis - October 4, 2009 - 15:21

The work done in #284392: db_rewrite_sql causing issues with DISTINCT, especially #204 and later, touches this issue here.

#27

JuliaKM - October 21, 2009 - 18:37

I ran into the same problem using Drupal 6.14 with Node privacy by role. For one particular role, my forum topic and post counts were doubled.

Applying patch #204 from the issue @salvis mentions did fix the problem topic counts.

However, my post counts were still doubled. My hacky temporary solution is to modify the following in advanced_forum.module on line 984.

$sql = "SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count
FROM {node} n
INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
INNER JOIN {term_node} r ON n.vid = r.vid
WHERE n.status = 1
GROUP BY r.tid";

The new version is:

$sql = "SELECT r.tid, COUNT(DISTINCT(n.nid)) AS topic_count, SUM(DISTINCT(l.comment_count)) AS comment_count
FROM {node} n
INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
INNER JOIN {term_node} r ON n.vid = r.vid
WHERE n.status = 1
GROUP BY r.tid";

If you apply patch #204, then the COUNT(DISTINCT(n.nid)) will be automatically added for you.

Please note that if you do not use Advanced Forum, the change will have to be made in forum.module instead. As mentioned previously, this bug is not related to Advanced Forum but instead is caused by how the access modules work with the forum module.

I explained the problem in more detail here.

#28

JuliaKM - October 21, 2009 - 18:44

Here are patches for the SQL change mentioned in #27 against Advanced Forum and Forum.

AttachmentSizeStatusTest resultOperations
advanced-forum-113611.patch854 bytesIgnoredNoneNone
forum-113611.patch1.08 KBIgnoredNoneNone

#29

salvis - October 21, 2009 - 20:58

Thank you for working on this. I don't have any answers, but maybe some ideas that could help you on your quest.

The number of topics is the number of nodes — that's clear, but what exactly should the "number of posts" be? Is it just the comments? or nodes + comments? For those who aren't deep inside this issue, it would be easier to understand if you could speak of nodes and comments and eliminate an additional source of confusion.

SELECT COUNT(DISTINCT(n.nid)) AS topic_count, SUM(DISTINCT(l.comment_count)) AS comment_count

I'm not sure this is legal and well-defined. Can you say in English what this should mean? You may have a test case where this gives the right result, but whether two nodes have the same or different comment counts should not influence the counting.

You keep writing that your numbers are doubled. That's not the general case. Your numbers may be doubled because you have two roles that are allowed to access the forum that you're looking at, or something like that. If I remember correctly, then it's the more roles you have the more your numbers will exceed the correct counts.

If you have the time and energy to pursue this (as you seem to be having), take a forum with a small number of topics and use phpMyAdmin to retrieve the results of the various queries (without COUNT, SUM, and the DISTINCTs). This will give you multiple records with the same nid (and they ought to have the same comment_counts, too). If you keep only one record per nid, count them and sum up the comment_counts, then I think you should get the correct result. Can you confirm this (when you do it on a sheet of paper)?

Add the DISTINCT (but not COUNT and SUM) and check what records you get. I thought this would do just what we wanted (the DISTINCT(nid) would eliminate the duplicates caused by matching multiple JOINs, and COUNT and SUM could then work on the correct result set), but I'm not sure it's correct... COUNT and SUM on your sheet of paper, and then add them to the query and see what you get.

This is a tricky problem and you seem to be kind of shooting in the dark. DISTINCT(l.comment_count) doesn't look right...

There's probably no way around actually sitting down with paper and pencil and working this out with carefully designed numbers of nodes, comments, and roles (matching and non-matching), to really understand what is going on.

#30

JuliaKM - October 22, 2009 - 21:42

Thanks for all of the advice salvis.

To clarify, the topic count is the number of new nodes for a particular forum taxonomy term and the post count is what you get when you add together the topic count and the comment count for all nodes in a particular form id. Here's the part of the forum_get_forums function that sets these counts. I find the terminology quite confusing.

      $forum->num_topics = $counts[$forum->tid]->topic_count;
      $forum->num_posts = $counts[$forum->tid]->topic_count + $counts[$forum->tid]->comment_count;

I think that I've figured out a pattern but might be wrong. (I've been staring at these numbers a long time.) The first time that you add a role when an access module is installed and assign it to a user, that user will see double the topic and post counts and the database will show double the topic and comment counts for that tid.

You can continue to add new roles to that user without seeing further multiplication as long as the new roles have the exact same permissions as the ones already assigned to the user. However, when you add a new role with different permissions, you will get a new multiplier to the total count. In the test cases below, you'll see that when I add a new role with different permissions for that node type, the number of topics and comments multiplies. When I add another Role with a different permission level, the number of topics and comments multiplies again.

Here's my attempt at simplified explanation of what's going on.

Is an access module installed?
If yes, then multiply the real topic count by 2.

How many distinct sets of role permissions exist?
If there is only one distinct set of role permissions for that node type, do nothing.

If there is more than one distinct sets of role permissions, then take the number of distinct sets of role permissions, subtract one, and multiply by the original real topic count. Your total number of topics is the real topic count multiplied by 2 plus the number of distinct sets of role permissions minus one and multiplied by the real topic count. Here it is in equation form:

t = number of real topics
n = number of distinct sets of role permissions
total topics = 2t + t(n - 1)

How I Tested This
To test how counts are being changed after installing a forum access module, I set up a fresh 6.14 install, enabled forum, and used devel to generate content and a forum taxonomy.

I tested how the posts and topics change for the first term forum, "trabrospi." Here are my different test scenarios.

Test 1: Jon Doe has 1 Role And No Access Modules Installed

Scenario 1A: Using base forum with no access control modules. There are 2 new user roles: Role 1 and Role 2. Logged in as an administrator (UID 1).

Scenario 2B: Using base forum with no access control modules. There are 2 new user roles: Role 1 and Role 2. Logged in as new user, Jon Doe, with one user role assigned (Role 1).

Result
On http://testsite/forum trabrospi displays:
Topics: 2
Posts: 11

Query used to pull records

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n
INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.vid = r.vid WHERE n.status = 1 GROUP BY r.tid

Query Result for tid 11 (trabrospi):
topic count: 2
comment count: 9

Test 2: Administrator and Access Module Installed

Scenario: Using base forum with Node privacy by role 6.x-1.5. All roles have view permissions. Role 2 has delete, update, and edit permissions only. Logged in as an administrator (UID 1).

There are 2 new user roles: Role 1 and Role 2. Logged in as new user, Jon Doe, with one user role assigned (Role 1).

Result
On http://testsite/forum trabrospi displays:
Topics: 2
Posts: 11

Query Generated in forum_get_forums:
SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n
INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.vid = r.vid WHERE n.status = 1 GROUP BY r.tid

Query Result for tid 11 (trabrospi):
topic count: 2
comment count: 9
total: 11 (correct)

Test 3: Jon Doe has 1 Role and Access Module Installed

Scenario: Using base forum with Node privacy by role 6.x-1.5. All roles have view permissions. Role 2 has delete, update, and edit permissions only. Logged in as Jon Doe with Role 1. There are 2 distinct node access roles assigned to Jon Doe.

Result
On http://testsite/forum trabrospi displays:
Topics: 4
Posts: 22

Query Generated in forum_get_forums:

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n
INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {node_access} na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid

Query Result for tid 11 (trabrospi):
topic count: 4
comment count: 18
total: 22 (wrong)

Test 4: Jon Doe Has 2 Roles and Access Module Installed

Scenario: Using base forum with Node privacy by role 6.x-1.5. All roles have view permissions. Role 2 has delete, update, and edit permissions only. Logged in as Jon Doe with Role 1 and Role 3. There are 2 distinct node access roles assigned to Jon Doe.

Result
On http://testsite/forum trabrospi displays:
Topics: 4
Posts: 22

Query Generated in forum_get_forums:

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n
INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {node_access} na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid

Query Result for tid 11 (trabrospi):
topic count: 4
comment count: 18
total: 22 (wrong)

Test 5: Anonymous and Access Module Installed

Scenario: Using base forum with Node privacy by role 6.x-1.5. All roles have view permissions. Role 2 has delete, update, and edit permissions only. Not logged in (anonymous user).

Result
On http://testsite/forum trabrospi displays:
Topics: 2
Posts: 11

Query Generated in forum_get_forums:

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {node_access} na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 0 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid

Query Result for tid 11 (trabrospi):
topic count: 2
comment count: 9
total: 11 (correct)

Test 7: Jon Doe has 3 Roles and Access Module Installed

Scenario: Using base forum with Node privacy by role 6.x-1.5. All roles have view permissions. Role 2 has delete, update, and edit permissions only. Logged in as Jon Doe with Role 1, Role 2 and Role 3. There are 2 distinct node access roles.

Result
On http://testsite/forum trabrospi displays:
Topics: 4
Posts: 22

Query Generated in forum_get_forums:

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {node_access} na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 5 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 6 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid

Query Result for tid 11 (trabrospi):
topic count: 4
comment count: 18
total: 22 (wrong)

Test 8: Jon Doe has 4 Roles and Access Module Installed

Scenario: Using base forum with Node privacy by role 6.x-1.5. All roles have view permissions. Role 2 has delete, update, and edit permissions only. Role 4 has edit and update permissions. Logged in as Jon Doe with Role 1, Role 3, Role 4 and Role 5. There are 3 distinct node access roles assigned to Jon Doe.

Result
On http://testsite/forum trabrospi displays:
Topics: 8
Posts: 44

Query Generated in forum_get_forums:

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {node_access} na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 4 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 6 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid

Query Result for tid 11 (trabrospi):
topic count: 8
comment count: 36
total: 44 (correct)

#31

salvis - October 22, 2009 - 23:59

Wow, you've done a lot of work, but the real goals is not to find a formula for the numbers. Even if you find one that is 100% correct, this doesn't help us to understand what is going wrong and to find a way to fix it.

If you take the query from Test 1

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count
  FROM {node} n
  INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
  INNER JOIN {term_node} r ON n.vid = r.vid
  WHERE n.status = 1 GROUP BY r.tid

and you remove COUNT and SUM, then you get

SELECT r.tid, n.nid, l.comment_count
  FROM {node} n
  INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
  INNER JOIN {term_node} r ON n.vid = r.vid
  WHERE n.status = 1 GROUP BY r.tid

Remove the braces, feed this to phpMyAdmin, and print out what it gives you.

Test 2 is the same as Test 1 because you ran as user 1. That user bypasses node access completely.

Now take the one from Test 3:

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count
  FROM {node} n
  INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
  INNER JOIN {term_node} r ON n.vid = r.vid
  INNER JOIN {node_access} na ON na.nid = n.nid
  WHERE (
    na.grant_view >= 1
    AND (
      (na.gid = 0 AND na.realm = 'all')
      OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role')
      OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role')
      OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user')
    )
  )
  AND ( n.status = 1 )
  GROUP BY r.tid

Change it (first line only) to

SELECT r.tid, n.nid, l.comment_count, na.gid, na.realm

This will probably give you four rows rather than two, and the na columns should let you make the connection to your roles. Add an additional role out of (2, 3, 12), and you'll probably get six rows. Apparently, both of your nodes are accessible to the same roles. Use devel_node_access.module (part of the Devel module) to cover that side of the medal.

I hope this will let you understand how the different parameters influence the result. Then you can start experimenting with DISTINCT, GROUP BY, HAVING, etc. and try to find a query that gives you just the right rows. Then you can try to COUNT and SUM them and hopefully get to the right numbers.

As I said before, I don't know whether my ideas really lead you towards a solution, but given my limited understanding, I'd be searching in that direction... If anyone can support this more effectively, please come in!

#32

JuliaKM - October 23, 2009 - 16:08

Thanks again for your help salvis.

I have been investigating more and I think that the problem is related to adding "INNER JOIN node_access na ON na.nid = n.nid" when you have a node access module. Since there are multiple nids in that table, when you join with the node table, it adds duplication.

I've tried a number of combinations of GROUP BY and HAVING but wasn't able to get either to fix the duplication problem because it can only be fixed in the SELECT query. Using HAVING or GROUP BY just rearranges the duplicate records that have already been selected. The only options that work for reducing the records are adding the two DISTINCTs or getting rid of the INNER JOIN on node_access (which isn't an option).

I was initially afraid that adding DISTINCT to l.comment_count and n.nid would both fix the problem and make the WHERE part of the query no longer useful. However, it appears that adding the DISTINCT to both does fix the problem and also preserves permissions if someone using a node access modules does not have access to the forum. In that case, the post and topic counts appear as 0, which I think is the desired behavior.

Adding patch #230 from #284392: db_rewrite_sql causing issues with DISTINCT adds a DISTINCT to the n.nid but not the l.comment_count. I think that the best solution would be to modify patch #230 to also add DISTINCT to the SUM function. An alternative is to add the DISTINCTs to the forum.module as shown here. The patch in #284392: db_rewrite_sql causing issues with DISTINCT would work fine with adding DISTINCTs to the query because it doesn't add a DISTINCT when one already exists. However, I understand that since this problem is caused by using contrib access modules, it might not make the cut as a core improvement.

Here are the results returned from your questions in #31.

Query 1: No Access Modules Used and no INNER JOIN on node_access

SELECT r.tid, n.nid, l.comment_count
  FROM node n
  INNER JOIN node_comment_statistics l ON n.nid = l.nid
  INNER JOIN term_node r ON n.vid = r.vid
  WHERE n.status = 1 GROUP BY r.tid

Result for tid 11:
tid: 11
nid: 94
comment count: 5

Query 2: With 3 Roles and INNER JOIN on node_access

SELECT r.tid, n.nid, l.comment_count, na.gid, na.realm
  FROM node n
  INNER JOIN node_comment_statistics l ON n.nid = l.nid
  INNER JOIN term_node r ON n.vid = r.vid
  INNER JOIN node_access na ON na.nid = n.nid
  WHERE (
    na.grant_view >= 1
    AND (
      (na.gid = 0 AND na.realm = 'all')
      OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role')
      OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role')
      OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user')
    )
  )
  AND ( n.status = 1 )
  GROUP BY r.tid

result for tid 11:
tid: 11
nid: 94
comment_count: 5
gid: 2
realm: node_privacy_by_role

#33

salvis - October 23, 2009 - 20:54

Does your DISTINCT on l.comment_count also work if the two nodes have the same comment counts?

Part of the problem here is that we're dealing with non-standard MySQL features that are not really documented. In Standard SQL you cannot apply DISTINCT to a single column. The standard syntax is

SELECT DISTINCT col1, col2, col3, AGG1(col4), AGG2(col5) FROM ...

This is interpreted as "select rows such that there are no two rows where the combination of col1 and col2 and col3 is the same, and apply the aggregation functions AGG1 and AGG2 (such as SUM, COUNT, MAX, etc.) to the col4 and col5 values in the selected rows. IOW, DISTINCT automatically applies to all columns that are not aggregated.

I have been investigating more and I think that the problem is related to adding "INNER JOIN node_access na ON na.nid = n.nid" when you have a node access module. Since there are multiple nids in that table, when you join with the node table, it adds duplication.

Yes, the duplication occurs if more than one of the OR clauses match. Each role that your user has will cause one OR (na.gid = RID AND na.realm = 'node_privacy_byrole_role') clause.

I've tried a number of combinations of GROUP BY and HAVING but wasn't able to get either to fix the duplication problem because it can only be fixed in the SELECT query. Using HAVING or GROUP BY just rearranges the duplicate records that have already been selected. The only options that work for reducing the records are adding the two DISTINCTs or getting rid of the INNER JOIN on node_access (which isn't an option).

GROUP BY is an alternative mechanism for eliminating duplicates. GROUP BY nid should eliminate duplicate nids.

In that case, the post and topic counts appear as 0, which I think is the desired behavior.

Yes, definitely.

#34

JuliaKM - October 25, 2009 - 02:15

Does your DISTINCT on l.comment_count also work if the two nodes have the same comment counts?

Unfortunately, you are right. The query stops returning accurate results if two nodes have the same comment count. Nice catch!

I tried grouping by n.nid and realized that this wouldn't provide accurate results without rewriting the query because multiple nodes can be associated with by the same taxonomy term.

However, I did find one thing that may be the start of a solution. If I add 'na.gid' as a GROUP BY field, then my counts appear to be accurate. The catch is that there are duplicate rows (a new row for each gid). The other problem is that I'm not sure how to add in the GROUP BY since it would only apply in cases where an INNER JOIN has been added to the node_access table. I think that the INNER JOIN is added in _db_rewrite_sql.

Here's a revised query that includes na.gid in GROUP BY if you want to test.

SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count FROM node n
INNER JOIN node_comment_statistics l ON n.nid = l.nid
INNER JOIN term_node r ON n.vid = r.vid
INNER JOIN node_access na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 4 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 5 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )
GROUP BY r.tid, na.gid

#35

salvis - October 25, 2009 - 14:52

I tried grouping by n.nid and realized that this wouldn't provide accurate results without rewriting the query because multiple nodes can be associated with by the same taxonomy term.

I don't quite understand what you mean. We certainly do have to do some amount of rewriting. What do you mean with "multiple nodes can be associated with by the same taxonomy term"? A forum topic can be in more than one forum? No, we don't allow that.

Try to list the na.gids and understand how they're used, before you GROUP BY them. I think the duplicate records have the same nid but different gids. Grouping by gid seems to go the wrong way, eliminating same-gid rows rather than eliminating different-gid rows.

Maybe we should start experimenting with a subquery. I've shied away from this because db_rewrite_sql() (yes, that's what adds the JOINs) doesn't support subqueries, and I think I read it's less efficient than JOINs, but forum.module might be able to let db_rewrite_sql() prepare the subquery and then wrap the query around it.

Something like replacing

<?php
    $sql
= "
      SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count
        FROM {node} n
        INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
        INNER JOIN {term_node} r ON n.vid = r.vid
        WHERE n.status = 1
        GROUP BY r.tid"
;
   
$sql = db_rewrite_sql($sql);
   
$_counts = db_query($sql);
?>

with

<?php
    $sql
= "
      SELECT r.tid AS tid, n.nid AS nid, l.comment_count AS nid_comment_count
        FROM {node} n
        INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
        INNER JOIN {term_node} r ON n.vid = r.vid
        WHERE n.status = 1
        GROUP BY r.tid, n.nid"
;
   
$sql = db_rewrite_sql($sql);
   
$sql = "
      SELECT tid, COUNT(nid) AS topic_count, SUM(nid_comment_count) AS comment_count
        FROM ($sql) AS forum_content_list
        GROUP BY tid"
;
   
$_counts = db_query($sql);
?>

This is untested, but maybe something along these lines could work? Can you check whether the result of the subquery (by itself) is reasonable?

#36

JuliaKM - October 25, 2009 - 18:27

I think the duplicate records have the same nid but different gids. Grouping by gid seems to go the wrong way, eliminating same-gid rows rather than eliminating different-gid rows.

After thinking about this more, I agree. Since the nid is the distinct element that we care about, it wouldn't make any sense to involve gid.

Your subquery method appears to work perfectly. I'm attaching a patch with the change. I'm a bit hesitant to mark it as "needs review" without hearing from you first.

When using an access control module with forum, the subquery is rewritten as:

SELECT r.tid AS tid, n.nid AS nid, l.comment_count AS nid_comment_count
FROM {node} n
INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
INNER JOIN {term_node} r ON n.vid = r.vid INNER JOIN {node_access} na ON na.nid = n.nid
WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 4 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 5 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid, n.nid

This returns the following, accurate rows for tid 11:
tid: 11
nid: 94
nid_comment_count: 4

tid: 11
nid: 95
nid_comment_count: 4

When I add in the parent query, the following is output as $sql:

SELECT tid, COUNT(nid) AS topic_count, SUM(nid_comment_count) AS comment_count FROM (SELECT r.tid AS tid, n.nid AS nid, l.comment_count AS nid_comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
INNER JOIN {term_node} r ON n.vid = r.vid
INNER JOIN {node_access} na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 4 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 5 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid, n.nid) AS forum_content_list GROUP BY tid

For tid 11, this returns the following, accurate row:
tid: 11
topic_count: 2
comment_count: 8

I tested to see what would happen after turning off my access control module (Node privacy by role). After turning off the module, the following query ran instead and appears to be accurate:

SELECT tid, COUNT(nid) AS topic_count, SUM(nid_comment_count) AS comment_count FROM (SELECT r.tid AS tid, n.nid AS nid, l.comment_count AS nid_comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.vid = r.vid WHERE n.status = 1 GROUP BY r.tid, n.nid) AS forum_content_list GROUP BY tid

I also applied patch #233 from #284392: db_rewrite_sql causing issues with DISTINCT and the query still returned accurate results.

AttachmentSizeStatusTest resultOperations
drupal-forum-duplicates-113611.patch1.19 KBIgnoredNoneNone

#37

salvis - October 26, 2009 - 08:43

Ah, some light at the end of the tunnel — great!

I tested to see what would happen after turning off my access control module

Good point. Introducing a subquery even for sites that are not using an NA module will raise resistance. Try comparing $sql before and after db_rewrite_sql(), and if it's unchanged, then use the old query (with the DISTINCT and SUM).

The 'AS tid' and 'AS nid' are probably not needed, try removing them.

Can you do (and report on) some more tests to build some confidence? The silence of the community is deafening — it'd be nice to get some feedback about where we're heading here, so go ahead and mark it NR after making these changes...

#38

JuliaKM - October 27, 2009 - 19:09

I did some testing and found that I couldn't join the two queries into one $sql without running into problems with db_rewrite_sql. db_rewrite_sql has to run on the first part of the query to avoid the duplicate post issue.

The final $sql with the working query after the subquery outputs as:

SELECT tid, COUNT(nid) AS topic_count, SUM(nid_comment_count) AS comment_count
FROM (SELECT r.tid AS tid, n.nid AS nid, l.comment_count AS nid_comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
INNER JOIN {term_node} r ON n.vid = r.vid
INNER JOIN {node_access} na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 2 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 3 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 4 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 5 AND na.realm = 'node_privacy_byrole_role') OR (na.gid = 12 AND na.realm = 'node_privacy_byrole_user'))) AND ( n.status = 1 )GROUP BY r.tid, n.nid) AS forum_content_list GROUP BY tid

The example above is when db_rewrite_sql is called after the first query and before the second.

When I combine the two queries and have db_rewrite_sql follow, db_rewrite_sql rewrites $sql and I get an error message that "Unknown column 'n.nid' in 'on clause' query." The unknown n.nid is coming from db_rewrite_query and is added in the INNER JOIN to node_access. I'm not sure how to get around this.

Also, I don't think that we can remove 'AS nid'. I am able to remove 'AS tid' but when I remove 'AS nid' db_rewrite_sql adds a 'DISTINCT' to nid, which creates a SQL error. This is rather confusing.

I've installed PostgreSQL locally and plan to test the revised query in PostgreSQL next.

If you want to test, here's the code I'm using in form_get_forums. I'm also attaching a new patch.

<?php
    $sql
= "SELECT r.tid, n.nid AS nid, l.comment_count AS nid_comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {term_node} r ON n.vid = r.vid WHERE n.status = 1 GROUP BY r.tid, n.nid";
   
$sql = db_rewrite_sql($sql);
   
$sql = "SELECT tid, COUNT(nid) AS topic_count, SUM(nid_comment_count) AS comment_count FROM ($sql) AS forum_content_list GROUP BY tid";
?>

AttachmentSizeStatusTest resultOperations
drupal-forum-dups-113611-2.patch1.18 KBIgnoredNoneNone

#39

JuliaKM - October 27, 2009 - 20:44

The revised $sql caused an error in PostgreSQL. Here is the message.

    * warning: pg_query() [function.pg-query]: Query failed: ERROR: column "l.comment_count" must appear in the GROUP BY clause or be used in an aggregate function in /Users/JuliaKM/workspace/drupal_testing/drupal-6.14-postgres/includes/database.pgsql.inc on line 139.
    * user warning: query: SELECT tid, COUNT(nid) AS topic_count, SUM(nid_comment_count) AS comment_count FROM (SELECT r.tid, n.nid AS nid, l.comment_count AS nid_comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN term_node r ON n.vid = r.vid WHERE n.status = 1 GROUP BY r.tid, n.nid) AS forum_content_list GROUP BY tid in /Users/JuliaKM/workspace/drupal_testing/drupal-6.14-postgres/modules/forum/forum.module on line 504.

I fixed this by adding l.comment_count to the GROUP BY. I'm attaching a revised patch in the next comment.

In addition, just want to note that the same multiple problem exists in PostgreSQL when you combine forum.module with an access module. The revised query fixes the problem for PostgreSQL users as well.

#40

JuliaKM - October 27, 2009 - 20:46
Status:needs work» needs review

Here's the new patch that works for PostgreSQL users. Marking as needs review so that we can get some more feedback.

AttachmentSizeStatusTest resultOperations
drupal-forum-dups-113611-3.patch1.2 KBIgnoredNoneNone

#41

salvis - October 27, 2009 - 20:54
Status:needs review» needs work

When I combine the two queries and have db_rewrite_sql follow

What? Now I'm completely confused! Seems I didn't make myself clear. What I meant was this:

<?php
   
// See #113611 for the history of the following code.
    // Start with a query that would list all nodes grouped by tid.
   
$sql = "
      SELECT r.tid AS tid, n.nid AS nid, l.comment_count AS nid_comment_count
        FROM {node} n
        INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
        INNER JOIN {term_node} r ON n.vid = r.vid
        WHERE n.status = 1
        GROUP BY r.tid, n.nid"
;
   
$sql_rewritten = db_rewrite_sql($sql);
    if (
$sql_rewritten == $sql) {
     
// If the query was NOT rewritten, we directly count the nodes and sum their comments.
     
$sql = "
        SELECT r.tid, COUNT(n.nid) AS topic_count, SUM(l.comment_count) AS comment_count
          FROM {node} n
          INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
          INNER JOIN {term_node} r ON n.vid = r.vid
          WHERE n.status = 1
          GROUP BY r.tid"
;
     
// Just to be safe we pass this through db_rewrite_sql(), too, but we don't expect any changes (otherwise our strategy may have failed!).
     
$sql = db_rewrite_sql($sql);
    }
    else {
     
// If the query WAS rewritten, then it now selects the accessible nodes and we wrap the counting and summing around it.
     
$sql = "
        SELECT tid, COUNT(nid) AS topic_count, SUM(nid_comment_count) AS comment_count
          FROM ($sql_rewritten) AS forum_content_list
          GROUP BY tid"
;
     
// There's no need to pass this through db_rewrite_sql() again, because $sql_rewritten returns exactly what we're allowed to see.
   
}
   
$_counts = db_query($sql);
?>

I've included the 'AS tid' and 'AS nid' again because they don't hurt and they may help, but do try removing them to see whether they're really needed.

BTW, it's unlikely that anyone is able to understand the subtle differences of the SQL statements if they appear on one line. I know that the Drupal custom is to obfuscate SQL in this way, but maybe you can try to propose a patch where the SQL is actually readable. Let's see what happens...

#42

salvis - October 27, 2009 - 20:58

Sorry, our posts crossed.

I think we absolutely need to avoid introducing a subquery for sites that don't use access modules (because it means performance overhead), which my strategy does. Adapt it to what you learned with pgsql, and we should be set...

#43

JuliaKM - October 27, 2009 - 21:32
Status:needs work» needs review

Here's a revised patch using your new query and fixing the PostgreSQL error. I had misunderstood what you meant by combining the two queries. Sorry about that.

I ran the patch by Coder and it didn't bark at the multi-line SQL so I left in the spacing. I also left in the 'AS tid' and 'AS nid' because it seemed odd to remove 'AS tid' when removing 'AS nid' throws a SQL error related to adding a DISTINCT. I figure it can't hurt to leave both in but I'll wait and see what others say.

AttachmentSizeStatusTest resultOperations
drupal-forum-dups-113611-4.patch1.83 KBIgnoredNoneNone

#44

salvis - November 26, 2009 - 00:25

Kind of disappointing that we don't get any feedback here...

 
 

Drupal is a registered trademark of Dries Buytaert.