I'm using domain access with the drigg module and am finding that for anonymous users the node listing on the front page has each node summary showing up twice. I've traced the problem to the sql alterations made by domain access befor the query is passed to query that is fed to pager_query to retrieve the list. From looking at this it appears to be generic issue that I would expect to effect many other modules.

The original Query is:

SELECT * FROM {node} n WHERE n.type='drigg' AND n.status = 1 and n.promote = 1 ORDER BY n.sticky DESC, dn2.promoted_on DESC

Then after 'db_rewrite_sql' had alter the query:

SELECT * FROM {node} n INNER JOIN {node_access} na ON na.nid = n.nid LEFT JOIN {drigg_node} dn2 ON dn2.dnid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 2 AND na.realm = 'domain_id'))) AND ( 1=1 AND !(n.type='drigg' AND dn2.killed=1 ) ) AND ( n.type='drigg' AND n.status = 1 and n.promote = 1 ) ORDER BY n.sticky DESC, dn2.promoted_on DESC

There are a few drigg specific changes as well but my testing shows that the part that is cause the duplicates is (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 2 AND na.realm = 'domain_id'). The node_access table does have 2 rows for each of the relevant nodes one with gid = 0 AND realm = 'domain_site. and the other with gid = 2 and realm = 'domain_id'.

So is the expected behaviour of the query replacement for this module? I have experimented with unticking 'Send to all affiliates' for the node and this does remove the duplicate. But it really doesn't seem that reliable to expect users to never have both ticked.

CommentFileSizeAuthor
#7 pager_distinct.patch818 bytesagentrickard

Comments

agentrickard’s picture

This error is actually coming from the Drigg portion of the query alteration as far as I can tell.

It is related to #264092: pager count(*) error . But unless you are using the multiple_node_access() patch, this is not a DA issue. The query works as designed as part of the node access system.

I think this is the part of the query that is the problem:

AND ( 1=1  AND !(n.type='drigg' AND dn2.killed=1 ) )

The other way to fix this, if you wrote the original query, is run a DISTINCT(n.nid) instead of a SELECT *.

dejbar’s picture

Actually I checked that it wasn't the part you mentioned by removing that condition and anything to do with 'drigg_node' from the query (such as the LEFT JOIN). When you strip this away it becomes clear that it is the join to node_access that causes the 2 rows.

SELECT * 
FROM {node} n   INNER JOIN 
        {node_access} na 
              ON na.nid = n.nid  
WHERE  (na.grant_view >= 1 AND 
              (   (na.gid = 0 AND na.realm = 'all') OR 
                  (na.gid = 0 AND na.realm = 'domain_site') OR 
                  (na.gid = 2 AND na.realm = 'domain_id')
              )
           ) AND 
           ( n.type='drigg' AND  n.status = 1 and n.promote = 1 ) 
ORDER BY n.sticky DESC; 

There are 2 rows in node_access that match the condition so naturally there will be 2 rows in the output. I am pretty sure that 'select * from {node} n' would produce the same duplication.

I'm not the author of the module or that query so I'm not really in a position to dive in and refactor the code. I can see that DISTINCT would work but I would certainly feel more comfortable if queries produced the correct output in general.

agentrickard’s picture

Title: Duplicate Listings for Anonymous Users » page_query() bug for SELECT(*) with node access
Project: Domain » Drupal core
Version: 5.x-1.9 » 7.x-dev
Component: Code » node system

We had seen this before but thought it was limited to the multiple_node_access.patch introducing AND logic. The actual problem is a core issue with the node access API and with pager_query().

The problem is here:

http://api.drupal.org/api/function/pager_query/5

  // Construct a count query if none was given.
  if (!isset($count_query)) {
    $count_query = preg_replace(array('/SELECT.*?FROM /As', '/ORDER BY .*/'), array('SELECT COUNT(*) FROM ', ''), $query);
  }

There is a patch discussed at http://drupal.org/node/264092#comment-946563 -- but it would only help with queries that currently pass DISTINCT.

This bug affects all versions of Drupal, and a core patch is the proper solution.

node_db_rewrite_sql(), where this SQL gets written, respects DISTINCT, but obviously cannot operate on COUNT(*), or SELECT(*).

Given how Domain Access needs to check its grants, there does not appear to be anything we can do about the query, short of a nasty REGEX of our own in a hook_db_rewrite_sql() function that looks for the presence of DISTINCT in the query and inserts it as needed.

Bumping to Drupal 7.x to see if anyone has any ideas.

dejbar’s picture

The first thing that comes to mind is LEFT joining to a different version of the table once for each OR'd condition. Then you can check to see if at least one of the 'OR'ed condition has been met by check if at least one of the table versions has been joined. The above query would translate as :-

SELECT *
FROM node n   
          LEFT JOIN node_access na1
              ON na1.nid = n.nid AND na1.grant_view >= 1 AND na1.gid = 0 AND na1.realm = 'all'
         LEFT JOIN node_access na2
              ON na2.nid = n.nid AND na2.grant_view >= 1 AND na2.gid = 0 AND na2.realm = 'domain_site'
         LEFT JOIN node_access na3 
              ON na3.nid = n.nid AND na3.grant_view >= 1 AND na3.gid = 2 AND na3.realm = 'domain_id' 
WHERE  (na3.nid is not null OR na2.nid IS NOT NULL OR na3.nid IS NOT NULL
           ) AND
           ( n.type='drigg' AND  n.status = 1 and n.promote = 1 )
ORDER BY n.sticky DESC; 

I've testing the query and it behaves correctly in this situation. Although it might look a bit unseemly I think it should run pretty much as efficiently as the original. It seems likely to me that this method could be used in Domain Access without touching the core although I haven't really looked into it. I don't know what the next step for this would be but unless you believe it won't work then I can look into this further.

I can understand the reasons for moving this to Drupal 7. But I think it would be a good idea to leave something in place in original Domain Access part as well so that other users who encounter this issue might be able to find it.

agentrickard’s picture

Version: 7.x-dev » 6.x-dev

There is already #264092: pager count(*) error in the queue.

The problem with D7, of course, is that the new database layer will change this entirely.

But this is a core problem that affects all node access, modules, so it doesn't belong in the DA queue. Understand that DA does not write this query. Core does based on how the API works.

Dropping to D6 as more appropriate as a bug.

dejbar’s picture

OK I see what you mean. I just assumed it was rewritten in Domain Access. So were is it rewritten? Is it in the part that the multiple_node_access() patch applies to? I haven't applied that to my installation. Perhaps there is an off chance it could help with this situation?

So would this apply whenever there was 2 applicable ways that a user was allowed to access the node? i.e. 2 rows in node_access that applied.

BTW I personally wouldn't characterise this as just a 'page_query() bug' (or did you mean 'pageR_query'). Either way anyone, from a module developer to somebody just embedding a bit of PHP on their node template, might want to perform a SELECT query and expect it to return the correct number of rows. Perhaps if there was something that clearly warned them to always use DISTINCT in queries it would be OK but otherwise this is just laying traps for the unwary.

My solution above assumed compatibility with older version of mysql was required. I'm sure a SUB SELECT select would look a lot neater.

agentrickard’s picture

Title: page_query() bug for SELECT(*) with node access » pager_query() ignores DISTINCT queries
Status: Active » Needs review
StatusFileSize
new818 bytes

Any node query should be run through db_rewrite_sql() for security reasons. db_rewrite_sql() appropriately makes that query a DISTINCT on the nid.

pager_query() drops the DISTINCT when calculating the page total. That makes it a bug in pager_query().

See, for instance, the queries run by the default node page, which is correct in one case, but not in the other, due to this bug.

This is the query that node_db_rewrite_sql() sends to pager_query():

0.19	1	pager_query	SELECT DISTINCT(n.nid), n.sticky, n.created FROM node n INNER JOIN node_access na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))) AND ((n.type NOT IN ('project_project', 'project_issue'))) AND ( n.promote = 1 AND n.status = 1 ) ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10

This is how that query is altered to count the number of pages to return:

0.19	1	pager_query	SELECT COUNT(*) FROM node n INNER JOIN node_access na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))) AND ((n.type NOT IN ('project_project', 'project_issue'))) AND ( n.promote = 1 AND n.status = 1 )

The proper solution is not to drop the DISTINCT when contructing the pager_query().

If a module or code snippet runs a node lookup without db_rewrite_sql(), that is a security bug in that specific code.

agentrickard’s picture

I re-read the original query and it may be malformed -- SELECT * is a bad idea. Node module uses SELECT DISTINCT(nid), etc for a very good reason. That would be a bug in drigg. But the pager query is still broken.

dejbar’s picture

Title: pager_query() ignores DISTINCT queries » page_query() bug for SELECT(*) with node access
Version: 6.x-dev » 7.x-dev
Status: Needs review » Active

OK I think we have been somewhat misunderstanding each other throughout this. Most because of my ignorance and party because you are mainly interesting in a different related problem. I think what you are saying is this.

'When you feed a node query to db_rewrite_sql() it needs to add a DISTINCT on nid to the query. However if you try to use 'SELECT *' it can't do that so things can get messed up, such as in this case where too many rows got returned. The moral of the story is - don't use SELECT *'

Is it clearly documented anywhere that 'SELECT *' will break things? If so then I guess you could say it is a Drigg bug. If not then it definitely should be documented.

I do think that the 'LEFT JOIN' style of adding constraints to the query I mentioned above would more accurately maintain the original purpose of queries than the OR'ed conditions plus 'DISTINCT(nid)' that is currently used. However if the only price that is paid is not being able to use 'SELECT *' then fair enough as long as everybody knows not to use it.

agentrickard’s picture

Title: page_query() bug for SELECT(*) with node access » SELECT * FROM {node} violates db_rewrite_sql()
Project: Drupal core » Drigg
Version: 7.x-dev » 5.x-2.x-dev
Component: node system » Code

I looked at this again last night, and you're right. We're talking about two different things. My apologies.

The issue is that SELECT * does not work with db_rewrite_sql(). I can't (looking very quickly) see where that is documented, but core never uses that syntax. And it is very clear from looking at how the function works that you must specify a primary field in the query. SELECT nid, * should work fine.

That suggests two things:

- A drigg bug/patch
- A documentation patch

To do things as you suggest is a much more difficult solution to the problem, since it would involve patching, testing, and altering the current hook_db_rewrite_sql() and/or node_db_rewrite_sql(). And then getting people to test the much more complex SQL on Postgres.

Much easier to fix Drigg than try to convince everyone that a core API change is warranted, especially since this is obviously a known limitation. See, for example, node_page_default().

Still, my apologies for misunderstanding the original iessue.

Changing title and moving to the Drigg queue.

agentrickard’s picture

Compare the following queries:

WORKS

db_query(db_rewrite_sql("SELECT n.nid, n.* FROM {node} n"));

FAILS:

db_query(db_rewrite_sql("SELECT n.* FROM {node} n"));
agentrickard’s picture

In #drupal, we agree that this is a documentation error in the API, so a patch for that would be appreciated, too. But that is a separate issue.

mercmobily’s picture

Hi,

Thank you people!

@dejbar : are you able to send me a patch that works against Drigg...?

Bye,

Merc.

dejbar’s picture

Hi @mercmobily.

Actually the end result of this long and winding thread is that @agentrickard is saying is that the problem is with Drigg because the queries violate an undocumented requirement of db_rewrite_sql(). Namely that they shouldn't use 'SELECT *'. SELECT queries (on nodes at least) need to name the fields explicitly because of the way node authorisation works. So any patch would actually have to be to Drigg. I'll leave the issue of whether violating an undocumented requirement should be considered a bug for you to discus with the Drupal core people.

One of the problems (the one we had) that can occur is when the user in question has more than one permission (in the node_access table) allowing them to access the node. The joins added by db_rewrite_sql() hooks to allow/disallow access to the node cause the query to return a row for each positive permission the user has. In our case both permissions were from Domain Access but any number of access related modules be set to allow access.

Anyway we 'solved' the problem by turning of one of the options by default so admins don't accidentally leave it on and made sure that normal users can't access this setting. I'll post this in the original Issue.

I'd love to have the time to produce a patch to Drigg for this. It looks like pretty simple work of just going through and replacing 'SELECT *' with whatever the required fields are. But unfortunately we are well behind in getting Drigg integrated and running on our site.

Cheers,
Daniel.

mercmobily’s picture

Hi,

No worries, I will deal with this.
SELECT * are bad anyway -- but I didn't know that till recently.

Will get this fixed -- but I will do it after the 5th of November, when I finally get the latest Drigg V6. At that point I will fix both versions.

THANK YOU for helping out on this. I don't think I would have managed to come on top of this.

Bye,

Merc.

agentrickard’s picture

I will try to submit a documentation patch over the weekend.

drupalina’s picture

subscribing

taqwa’s picture

ditto

drupalina’s picture

I don't want to report this as a new bug, because it seems like it could be well related:
I have a Drigg site with "Organic Groups" and "Organic Groups access configuration". When OG is enabled on it's own it doesn't conflict with Drigg, but when "OG access configuration is enabled", then to members of particular groups Drigg listings in Upcoming and Popular will display those teasers that were also cross-posted to one of the groups where that user is a member TWICE.

The new problem is with TAC_lite module. If you have it enabled then the teasers for nodes that were cross-posted to a group that end-user is a member of will display TREE TIMES!

In Views module there is an option which says "Don't display duplicate entries" to make sure that each teaser is displayed only once. Could it be the case that Drigg listings should just enter that kind of line in it's listings...

magpie5212’s picture

Does anyone know if there is a 6.* equivalent of this? I have this on symptom for anonymous users on my site.

Found it.

FrancoisL’s picture

Hey magpie5212,

can you explain how you resolved it I think I've the same problem.

Thanks

avpaderno’s picture

Status: Active » Closed (outdated)

I am closing this issue, as Drupal 5 is no longer supported.