taxonomy_term_count_nodes() (used by directory.module and some other contrib modules) return the wrong count when nodes are assigned several terms (multi-select vocabulary). The node is counted as many times at it has terms.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

beginner’s picture

Status: Active » Needs review
FileSize
4.01 KB

This is an almost complete rewrite of taxonomy_term_count_nodes().
Surprisingly, this module is not used by core, but it is by some contrib module.

The rewrite amounts to a change of API, fixing the bug and adding some functionality in the process.

As such, the patch will most certainly NOT be accepted for Drupal-5.
However the patch is against Drupal-5 and to be tested with Drupal-5, because the contrib modules to test with are not ported to D6 yet.

To reproduce the bug:
enable the contrib directory.module.
Create a node which has many terms within the same vocab (multi-select) (the terms should be descendants of the same parent term.
In the settings, enable 'show node counts'.
See the directory main page at ?q=directory: the nodes are counted twice at the common parent term.

The way to correct the bug was to count separately the term's own nodes and those of child terms, making sure there are duplicate.
Since we are doing so, we are now able to return more information upon return: instead of returning an integer, we return an array of different values, as noted at the top of the patch.

If this way of solving the problem is properly tested and deemed acceptable, then I can find a way to roll a proper patch for Drupal-6.

Btw, project.module does "not use taxonomy_term_count_nodes() because it includes child terms' counts" (quote from project.module code). Instead, it has created its own implementation With this patch, it could use this function again by looking at the proper value within the returned array.

beginner’s picture

Version: 5.x-dev » 6.x-dev
FileSize
5.48 KB

Here is a patch for D6/HEAD.

It is a complete rewrite of the function.

Here it is, for easier review:


/**
 * Count the number of published nodes classified by a term.
 *
 * @param $tid
 *   The term's ID
 *
 * @param $type
 *   The $node->type. If given, taxonomy_term_count_nodes only counts
 *   nodes of $type that are classified with the term $tid.
 *
 * @param $save_to_db
 *   This function is recursive, and we don't need to save the result in the DB at each iteration.
 *   $save_to_db is set to FALSE at each nested call, so that the actual saving to DB can happen only
 *   when the function exits the last time.
 *
 * @return $array
 *   where:
 *     $array['count_own'] being the number of nodes within the term proper.
 *     $array['count_children'] being the number of nodes in children terms, not counting those which are already counted in the parent term.
 *     $array['own_nodes'] array of nid's within this $tid.
 *     $array['children_nodes'] array of all the descendent nid's from children terms, not including those already set as one's own.
 *
 *   Results are statically cached.
 * Also, in order to improve performance accross requests, we cache the serialized array on database, in {cache_page} (this table is flushed each time a node or a taxonomy item is added/updated/deleted).
 */
function taxonomy_term_count_nodes($tid, $type = 0, $save_to_db = TRUE) {
  static $count = array();
  $modified = FALSE; // We keep track of modification of $count, to check whether we need to save it to DB.


  if (empty($count)) {
    $count = cache_get('taxonomy_term_count_nodes', 'cache_page');
    $count = unserialize($count->data);
  }

  if (!isset($count[$type])) {
    $modified = TRUE;
    $count[$type] = array();
    // In the queries below, we cannot use 'SELECT t.tid, COUNT(n.nid) AS c FROM ...'
    // because a node may be assigned more than one term and be counted more than once.
    // We therefore take note of the nid's and count the number of items in the $count array,
    // making sure there is no duplicate.

    // $type == 0 always evaluates TRUE if $type is a string
    if (is_numeric($type)) {
      $result = db_query(db_rewrite_sql('SELECT t.tid, n.nid FROM {term_node} t JOIN {node} n ON t.nid = n.nid WHERE n.status = 1 '));
    }
    else {
      $result = db_query(db_rewrite_sql("SELECT t.tid, n.nid FROM {term_node} t JOIN {node} n ON t.nid = n.nid WHERE n.status = 1 AND n.type = '%s'"), $type);
    }
    while ($item = db_fetch_object($result)) {
      if (!isset($count[$type][$item->tid])) {
        $count[$type][$item->tid] = array('own_nodes' => array());
      }
      $count[$type][$item->tid]['own_nodes'][$item->nid] = 1;
    }
  }

  if (!isset($count[$type][$tid]['count_own'])) {
    $modified = TRUE;
    $count[$type][$tid]['count_own'] = count($count[$type][$tid]['own_nodes']);
    $count[$type][$tid]['count_children'] = 0;
    $count[$type][$tid]['children_nodes'] = array();
    foreach (_taxonomy_term_children($tid) as $c) {
      $children = directory_taxonomy_term_count_nodes($c, $type, FALSE); // FALSE: we do not need to save $count in the db at this iteration.
      // Add the children's own nodes:
      foreach ($children['own_nodes'] AS $child_nid => $n) {
        if (!isset($count[$type][$tid]['own_nodes'][$child_nid])) { // make sure the nid is not already counted for the parent.
          $count[$type][$tid]['children_nodes'][$child_nid] = 1;
        }
      }
      // Add the nodes of the children's children.
      foreach ($children['children_nodes'] AS $child_nid => $n) {
        if (!isset($count[$type][$tid]['own_nodes'][$child_nid])) { // make sure the nid is not already counted for the parent.
          $count[$type][$tid]['children_nodes'][$child_nid] = 1;
        }
      }
    }
    $count[$type][$tid]['count_children'] = count($count[$type][$tid]['children_nodes']);

  }

  if ($modified && $save_to_db) {
    $cache = serialize($count);
    // the cached data will be used for at least one hour before being flushed.
    cache_set('taxonomy_term_count_nodes', 'cache_page', $cache, time() + 60 * 60);
  }

  return $count[$type][$tid];
}
stella’s picture

Status: Needs review » Needs work

Hi,

I tested the drupal 6.x patch version and it gives the error below. The patch seems to call a function from the directory module. Surely any patch to the taxonomy module should be independent of the directory module?

Fatal error: Call to undefined function directory_taxonomy_term_count_nodes() in /var/www/drupal_6.1_20070904/html/modules/taxonomy/taxonomy.module on line 814

Cheers,
Stella

Pasqualle’s picture

Priority: Normal » Critical

need a verdict here

issue summary:
1. the function is incorrect
2. the function is not used by core
3. the patch is an API change
4. the patch still not work

the possibilities:
1. drupal 7.x
2. fix the patch quickly
3. remove the function

feel free to change back the priority after final decision

moshe weitzman’s picture

we are accepting bug fixes before release and afterwards. so i think 2 is reasonable. the patch here looks more complicated than i expected.

beginner’s picture

Priority: Critical » Minor
Status: Needs work » Needs review

my verdict:
1) the function is not used by core -> minor
2) but it's useful and used by some contrib modules -> so we keep it in core (unless someone objects).
3) a bugfix can happen any time, so keep it in the D6 queue, even if the fix appears after the D6 release.

About the proposed fix:
a) forget the patch for a while. See the general idea in #2.
b) Yes, it's complicated. That's why I haven't bothered to update the patch: I need a proper review on the concept first.
c) In order to get that review, I set CNR. The patch doesn't apply but I need feedback on the idea.
d) the process is quite cpu intensive (??), so I cache the results, but I am far from sure it's the right approach.

So, please:
http://drupal.org/patch/review

Take a look at the big picture first and NOT the details.
Simply saying "I [don't] like this feature" or "-1" is of no use and is strongly discouraged. Similarly, diving into a patch and saying nothing but "there is a typo in function so and so" could be a waste of time as you could be continuing a patch that has no hope of being committed.

mlncn’s picture

Priority: Minor » Normal

I have an alternate, simpler patch that applies against Drupal 5.

I think the priority is normal, it's not good for Drupal 6 to ship with bugs. The bug in this function in Drupal 5 certainly burnt me-- it would have been better if it weren't there.

The difference in my approach is that it doesn't change the API or take the step of caching this result as a Drupal variable.

If I add in beginner's check for not double-counting (or worse) nodes belonging to both a term and one or more children terms –

        if (!isset($count[$type][$tid]['own_nodes'][$child_nid])) { // make sure the nid is not already counted for the parent.
          $count[$type][$tid]['children_nodes'][$child_nid] = 1;
        }

– I think I'd prefer my patch.

benjamin, Agaric Design Collective

beginner’s picture

Merci beaucoup. I'll test your patch.

I'll make time for that next year (!!) ;)

brenda003’s picture

Status: Needs review » Needs work

I tested both patches on D5. Firstly, returning an array was unexpected results (yes, even after reading that's what happens!). There must be a better way to do this. I also received several errors on line 1105 about Invalid argument supplied for foreach.

This line:

foreach ($children['children_nodes'] AS $child_nid => $n)

Still the issue of having the parent terms tallying the TOTAL number of nodes both within the parent term and the children term doesn't seem to be working. The array includes both count_own and count_children - so I'm guessing we'll need to add these up. This feels dirty.

Agaric's approach is cleaner and a good starting point, perhaps another placeholder for DISTINCT should be added to the function to complete it.

Wim Leers’s picture

beginner's patch is way too complex. But I've seen it working in the directory module.

Benjamin Melançon's patch doesn't work at all. All it does, is exclude the children completely from being counted. I'd even call that a new bug.

A simple illustration of the problem:

- home cinema sets (0 nodes)
   - sound (3 nodes)
   - video (2 nodes)

3 nodes are tagged with "sound", 2 nodes are tagged with "video". 1 of the 3 nodes that is tagged with "sound" is *also* tagged with "video". So the affected number of nodes is 4, *not* 5.

So when we retrieve the node count of the "home cinema tag", we will get 0 for the actual tag, 3 for "sound", 2 for "video". But we're actually counting the same node *twice*! So we're getting instead of 4.

Obviously, Benjamin Melançon's patch, by simply excluding all children, doesn't fix this.

Wim Leers’s picture

Version: 6.x-dev » 7.x-dev
Assigned: beginner » Wim Leers
Status: Needs work » Needs review
FileSize
2.39 KB

I've fixed this for D5. I ported it to D6/D7 (which use versioned taxonomy terms), but didn't test it there, it's a trivial port though, so I expect it to work.

I know I need to add more comments, but I'd like to get some feedback on my approach first. It's fairly trivial: I keep the current taxonomy_count_nodes() function, but in order to get the correct count (i.e. only count child nodes if no ancestor nodes are assigned to the same node), I exclude nodes from the count that match ancestor terms.

Wim Leers’s picture

Status: Needs review » Needs work

I just realized this doesn't work 100% correctly yet. It doesn't take one case into account yet: when a node is assigned 2 terms of the same level (i.e. sound and video in my example of #10).

I guess we will have to resort to collecting an array of nids (vids in D7) and then keeping only the uniques. That's all I can think of ATM at least.

beginner’s picture

Status: Needs work » Needs review

Keeping an array of nids and keeping only the uniques is precisely what I am doing.

I am sorry that Benjamin's patch does not work out. I know my approach is very complex, but I thought a long time about it, and so far, it is the only one that works.

Please read again comment #6, then review #2.

Wim Leers’s picture

While thinking about this, I've found a way to do it without keeping an array of nids (which is *very* bad for scalability). I'll update my patch tonight.

Wim Leers’s picture

Updated patch.

Pros/cons:
+ *less* code than what's currently in core
+ no recursion is needed.
+ less PHP memory usage (thanks to lack of recursion: we only cache the counts for the terms that were actually requested, instead of for all children too)
- more stress on the DB server
? scalability

This solution is also much more scalable than beginner's patch: we don't have to collect all nids on the server side, we simply let the DB do all the work (as it should be).
However, if you ask for the node count of a term with a lot of descendants (children, grandchildren, and so on), the query time will increase. We'd have to do benchmarks to make definitive conclusions.

On the other hand, this *fixes* the bug, with *simpler* code. It's not used by Drupal core itself, so minimal benchmarking should be sufficient.

catch’s picture

Status: Needs review » Needs work

This needs re-rolling for dbtng (and I'm not sure how that works for the IN() here). Will definitely need benchmarks too.

catch’s picture

FileSize
3.58 KB

Patch no longer applied. Started initial dbtng conversion but then realised this needs refactoring to use the SELECT query builder (and I'm not sure how that plays with COUNT() and DISTINCT() yet). So posting untested initial re-roll so there's an applicable patch to work against later. Also this will need a test.

taxonomy_get_tree() needs fixing anyway for performance, not the job of this patch. I also agree that we only need minimal benches as the function is pretty far outside the critical path.

I grepped contrib yesterday to see if it this got used before seeing this issue - looking for dead code in taxonomy module. It does seem to be a useful utility function, and it's something which would be handy for term administration too - finding 'unassigned' terms for bulk deletion for example. So simplifying it and having it actually work seems like the best approach to take.

We might be able to simplify things even further by fetching the results into an array keyed by nid using fetchAllAssoc('nid'), which would remove duplicates without further array munging - then return a count on the array. Then it'd just be a simple select and count(). No idea how this would compare for scaling with a large result set to the distinct and count, but will look into that as well.

catch’s picture

Status: Needs work » Needs review
FileSize
2.59 KB

Here it is with dbtng conversion, addExpression() makes the DISTINCT(COUNT(nid)) easy to do :)

Tests forthcoming, hopefully later today.

catch’s picture

Found a place to use the function in an existing test - this doesn't deal with the original bug, but it gives us some coverage, and also allows us to use an API function instead of a direct query in one place.

On my system I'm getting array to string conversion notices and two fails, whilst testing the function via devel's execute PHP works so far. So not quite sure what's happening yet. Will leave at CNR so it runs via the test bot.

catch’s picture

Re-rolled for whitespace and to nudge the test bot.

Dries’s picture

catch, your last patch does not seem to have the tests that the second-to-last patch had?

catch’s picture

Status: Needs review » Needs work

Dries, you're right - and even though the penultimate patch comes up 4k, I only see the taxonomy.module hunks, hence the bad re-roll. If I don't have it locally, it should be easy to recreate anyway.

CNW for now (and there were test failures with the actual tests - which I wanted the testbot to double check for me, which can't happen if they aren't in the patch, doh!).

catch’s picture

Some initial tests, while it seems to work when running in devel, it's not happening in the test. Specifically, despite saving a parent/child relationship between terms, taxonomy_get_tree for the parent doesn't return this. Not done much debugging yet, posting for reference.

catch’s picture

Status: Needs work » Needs review
FileSize
6.15 KB

Here it is with working tests. I had to add a $reset argument to both taxonomy_get_tree() and taxonomy_term_count_nodes() to get the tests not failing dismally. We need this anyway until standardise static caching goes in. The tests handle individual terms, terms with no nodes of their own but with children, the case above when two children are assigned to the same node, and the node type argument.

As noted above, it's not used in core (which means the tests are the only coverage we get) - although it might come in handy later if we want to add removing 'orphaned' taxonomy terms in taxonomy administration (which I have plans to).

catch’s picture

Extra code removal (_taxonomy_term_children() -which is redundant now).

catch’s picture

And the patch...

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
7.44 KB

Re-rolled for changes to taxonomy_get_tree() parameters.

theabacus’s picture

Hmmm, both patches (#11 & #15) fail on Druapl 6.6. Hunk fail on lke 896 and 910 respectively.

catch’s picture

@theabacus - any chance you could test it on HEAD? Once it's in we could potentially do a backport, but that won't happen until it's committed to HEAD.

theabacus’s picture

The patch applied successfully [Hunk #1 succeeded at 915 (offset 5 lines)] and I ran it through some testing. I was unable to do extensive testing due to the 2nd error. Please let me know if any clarity is needed. I will be happy to test the next version when it is released.

Taxonomy Tree used
> Edibles (Vocab)
> Fruit
> Apple
> Golden Delicious
> Granny Smith
> Orange

Error occurs when node has a single term.
(Error: notice: Undefined index: 2 in .../drupal-6.x-dev/modules/taxonomy/taxonomy.module on line 966.)

Error occurs when creating a term over 1 level deep (i.e. creation of Granny Smith).
Fatal error: Call to undefined function hs_taxonomy_term_count_nodes_exclude_ancestors() in .../drupal-6.x-dev/modules/taxonomy/taxonomy.module on line 963

Term counts still returns incorrect count in certain circumstances:

Node terms: Apple, Orange
Fruit term count: 2 (Incorrect. Should be 1)

Node terms: Fruit, Apple
Fruit term count: 1 (Correct)

Node Terms: Fruit, Apple, Orange
Fruit term count: 1 (Correct)

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review

re-testing.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
7.44 KB

Works for me, uploading again.

CitizenKane’s picture

Patch applies cleanly to D7 HEAD and all tests pass. Works with hierarchical terms. However, this does not seem to work with synonyms. Using the follow procedure:

1.) Make terms ny and New York.
2.) Make New York a synonym of ny.
3.) Make two nodes, tag one with New York and the other with ny.
4.) do taxonomy_term_count_nodes with tid of ny.

In my testing this returns 1. I would instead expect this to return 2. Any thoughts? I'm not sure if that's the desired behavior for synonyms.

catch’s picture

Synonyms are just a lexical synonyms for individual terms stored as text strings, they don't actually involve any kind of relationships between terms. I think based on this we're probably RTBC here, but I won't change the status since the last patch has my name on it.

CitizenKane’s picture

Status: Needs review » Reviewed & tested by the community

Based on #36 and #37 this seems RTBC.

catch’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
7.47 KB

Re-rolled for changes to taxonomy module tables.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Title: taxonomy_term_count_nodes returns wrong count » taxonomy_term_count_nodes returns wrong count (+ tests)
Status: Needs work » Needs review
FileSize
7.47 KB

Let's try that again.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review

retesting.

drewish’s picture

Status: Needs review » Needs work
  * @param $max_depth
  *   The number of levels of the tree to return. Leave NULL to return all levels.
  *
+ * @param $reset
+ *   Whether to reset the static cache.
+ *
  * @param $depth

I don't like the extra lines between @params. We don't do that elsewhere in core. I'm not saying you should strip them out of the entire module but you should drop them where you're already touching the docs.

+function taxonomy_term_count_nodes($tid, $type = 0, $reset = FALSE) {

I really hate 0 as a default for $type. What's wrong with NULL or ""? Also it makes the tests really confusing seeing "page" some places and 0 others.

catch’s picture

Re-roll per drewish's comments, also moved some more stuff inside the static check in taxonomy_term_count_nodes().

catch’s picture

Status: Needs work » Needs review
catch’s picture

One local fix didn't make it into the patch, this should be it.

webchick’s picture

Status: Needs review » Needs work

Comments from IRC:

I found it odd that $reset was not the last parameter in taxonomy_get_tree(), but catch pointed out that the final parameter was for internal use only, so this makes sense.

+  // If $type is NULL, change it to 0 to allow it to be used as an aray key

Should be "array" not "aray"

In the tests, there's lots of stuff like:

taxonomy_term_count_nodes($term3->tid, 0, TRUE)

That second param was changed to NULL per drewish's feedback, but the test calls are still 0.

Also, the first time you call the reset param, could you explain in a comment why you're doing so? Perhaps do the same under:

+ * @param $reset
+ *   Whether to reset the static cache.

Wim, if you're out there, I'd love to see you review this one more time. Otherwise I'll probably commit it once these minor issues are fixed.

catch’s picture

Status: Needs work » Needs review
FileSize
8.64 KB

Re-rolled with those changes.

webchick’s picture

Status: Needs review » Fixed

Awesome. :) Thanks a lot!

Committed to HEAD.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

catch’s picture

Version: 7.x-dev » 6.x-dev
Status: Closed (fixed) » Patch (to be ported)

Still a bug in Drupal 6.

catch’s picture

Priority: Normal » Critical
catch’s picture

Status: Patch (to be ported) » Needs review
FileSize
2.08 KB

Backport based on Wim's patch in #15. I've left out the API changes since while they're necessary for the simpletests to work, we might not need them just to make this less broken.

This patch is completely untested - I don't have any sites using taxonomy_term_count_nodes() at the moment (or not that I can think of).

JeremyFrench’s picture

Subscribe, I've just spent several hours trying to work out why my code wasn't adding items to the taxonomy in a simpletest.

ibis’s picture

subscribe

perarnet’s picture

Version: 6.x-dev » 6.19
FileSize
2.52 KB

Rerolled patch in #54 with latest Drupal release. Initial testing ok.

Status: Needs review » Needs work

The last submitted patch, taxonomy_count_nodes-1.414.2.16.patch, failed testing.

perarnet’s picture

Status: Needs work » Needs review
FileSize
2.79 KB

Trying again - with cvs patch

Status: Needs review » Needs work

The last submitted patch, taxonomy_count_nodes-1.414.2.16.patch, failed testing.

PEF-dupe’s picture

Fix just with add DISTINCT in the two SQL requests:

- $result = db_query(db_rewrite_sql('SELECT t.tid, COUNT(n.nid) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 GROUP BY t.tid'));
+ $result = db_query(db_rewrite_sql('SELECT t.tid, COUNT( DISTINCT n.nid ) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 GROUP BY t.tid'));

and

- $result = db_query(db_rewrite_sql("SELECT t.tid, COUNT(n.nid) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 AND n.type = '%s' GROUP BY t.tid"), $type);
+ $result = db_query(db_rewrite_sql("SELECT t.tid, COUNT( DISTINCT n.nid ) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 AND n.type = '%s' GROUP BY t.tid"), $type);

It's works for me when I use the block of Catalog module in Ubercart.

JeremyFrench’s picture

Rename patch from #59 and remove strange first line. Still completely untested.

Andrew Gorokhovets’s picture

+1

kenorb’s picture

Version: 6.19 » 6.20

Patch #62 is working for me. My terms were doubled, after patch #62 - it fixes the issue.

dpearcefl’s picture

Please repost the patch in #62 with a proper filename (so the system knows it is a patch) and mark the issue "needs review".

http://drupal.org/node/1054616
[description]-[issue-number]-[comment-number].patch

anawillem’s picture

FileSize
727 bytes

PLEASE.

We also need this urgently and have a policy against hacking the sacred core of drupal. Knowhatimean?

pretty please.

JeremyFrench’s picture

Status: Needs work » Needs review
FileSize
1.93 KB

Patch attached, first I have done on core with git. Hopefully it is all correct.

[description]-[issue-number]-[comment-number].patch + D6 so testbot ignores it.

Refactored a little and tested in a limited sense. Could do with testing in a more general sense.

Status: Needs review » Needs work

The last submitted patch, taxonomy_count_nodes_D6_2.patch, failed testing.

JeremyFrench’s picture

Status: Needs work » Needs review

Not sure why test bot tested, and why it tested the wrong patch. Trying to set to needs review again.

dmention_eblack’s picture

The fix that PEF posted in comment #61 fixed the issue for me.

Has any progress been made on getting this into core?

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.