I'm marking this up as a bug report rather than a support request, because several people have looked at this now and we're all totally stumped. Take a look at the attached module. It's fairly complicated, but the bit that matters is in _drivingforce_import_rebuild_taxonomy which is called by _drivingforce_import_process_file.

Basically there is a for loop, looping through an array of file objects and processing the XML documents referenced by the file objects for import. There are several documents per node, all nearly identical except for the category information in the XML document's head. As such, I need to save the first document I come across with a matching id (I've called this $tm_id) - this I can do, successfully (see the _drivingforce_import_prepare_article function - it works great). The node saves with the first category applied. Then the problems start:

Within the same for loop, the *next* time the script finds a document with the same $tm_id it checks, sees it already has a node with that id and instead of creating a new node (using _drivingforce_import_prepare_article) it loads the existing node like this: node_load($result, NULL, TRUE);

Note, I *am* resetting the cached node. I then call _drivingforce_import_rebuild_taxonomy, which rebuilds the taxonomy in $node->taxonomy on that node and saves it.

If I do a select on the term_node, the *instant* after the node_save in the _drivingforce_import_rebuild_taxonomy function, the result is correct.

BUT when I next load the node, the applied additional terms are gone and it's as though the whole thing was a dream.

Entire module is attached. Hoping someone can shed some light. I'm at a total loss.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

greg.harvey’s picture

Title: Taxonomy not being saved when node_save is used to apply it in a loop. » Taxonomy module does not respect the $reset parameter of node_load()

Ok, I think I can expand on this and I think it is a bug. Even when the $reset parameter in node_load() is set to TRUE, old taxonomy still appears to be cached.

Having moved the part that processes the taxonomy to an entirely separate loop, I discovered now the first AND last terms are saving, rather than just the first as before. Looking at the before and after node objects of each loop, I can see the node_load is loading the wrong (initial) taxonomy every time, even though in the interim taxonomy was updated. My initial node_save saves taxonomy successfully, the following node_load brings it back successfully, the second node_save succeeds (I can prove this) but after the second node_load our taxonomy seems to still be cached from the previous one, in spite of the reset. Consequently, the next node_save overwrites the successful prior save with the NEXT term to be processed, having failed to load in and retain it.

So there must be some caching within taxonomy that ignores the node.module's request for a reset of the cache. I'll see if I can find out precisely what.

greg.harvey’s picture

And here is the same issue, referenced elsewhere: #471074: Taxonomy synchronization caching bug

stewsnooze’s picture

Just humour me here and try looking in


function taxonomy_node_get_terms($node, $key = 'tid') {
  static $terms;

  if (!isset($terms[$node->vid][$key])) {
    $result = db_query(db_rewrite_sql('SELECT t.* FROM {term_node} r INNER JOIN {term_data} t ON r.tid = t.tid INNER JOIN {vocabulary} v ON t.vid = v.vid WHERE r.vid = %d ORDER BY v.weight, t.weight, t.name', 't', 'tid'), $node->vid);
    $terms[$node->vid][$key] = array();
    while ($term = db_fetch_object($result)) {
      $terms[$node->vid][$key][$term->$key] = $term;
    }
  }
  return $terms[$node->vid][$key];
}

change the static $terms;
to $terms = array();

I've had a weird thing like this before and I fixed it by moving the node creation to the batch API and only created one per batch (Super slow) unless you hack core.

greg.harvey’s picture

@stewsnooze Spot on! That's where the bug lies - that sneaky static.

Worked around it by binning the API (it's an import module, so code maintenance not an issue). So this:

  $existing_terms = $node->taxonomy;

Became this:

  // existing taxonomy terms must be loaded from the db
  // node_load can't get around caching problems with taxonomy.module
  $result = db_query("SELECT * FROM {term_node} WHERE nid = %d AND vid = %d", $node->nid, $node->vid);
  while ($term = db_fetch_object($result)) {
    $existing_terms[] = $term;
  }

Works now, but taxonomy_node_get_terms really needs a $reset that can be called by node_load when it's $reset == TRUE!

greg.harvey’s picture

Title: Taxonomy module does not respect the $reset parameter of node_load() » taxonomy_node_get_terms() needs a $reset param that node_load can call when *it* is reset

Updating title to suit.

berenddeboer’s picture

Version: 6.14 » 6.17

Subscribing, is still an issue and this kind of caching can completely stump people. If you do a node_save, then a node_load, update the terms in your node and a node_save again, and then again node_load your assigned terms are gone.

This is completely surprising and unexpected behaviour. IMO taxonomy_node_get_terms should not cache at all or get cleared upon node_save.

PS: $reset is not passed in by the nodeapi hook, so cannot be used here.

berenddeboer’s picture

Status: Active » Needs review
FileSize
633 bytes

Attached a small patch that makes sure that when the calls comes through the nodeapi load, the terms are not cached. Caching for every other use case is left intact.

Status: Needs review » Needs work

The last submitted patch, taxonomy_node_get_terms.patch, failed testing.

greg.harvey’s picture

Doing it like node_load() would be more consistent:

function taxonomy_node_get_terms($node, $key = 'tid', $reset = FALSE) {
  static $terms = array();

  if ($reset) {
    $terms = array();
  }

Would that to it? It would make more sense because it would follow the same pattern as this:
http://api.drupal.org/api/function/node_load/6

berenddeboer’s picture

But how do you get the reset parameter right there? I think $reset is overkill as this caching is overkill.

greg.harvey’s picture

Version: 6.17 » 6.x-dev

Something like this, no?

function taxonomy_node_get_terms($node, $key = 'tid', $reset = FALSE) {
  static $terms;

  if ($reset) {
    unset($terms[$node->vid][$key]);
  }

  if (!isset($terms[$node->vid][$key])) {
    $result = db_query(db_rewrite_sql('SELECT t.* FROM {term_node} r INNER JOIN {term_data} t ON r.tid = t.tid INNER JOIN {vocabulary} v ON t.vid = v.vid WHERE r.vid = %d ORDER BY v.weight, t.weight, t.name', 't', 'tid'), $node->vid);
    $terms[$node->vid][$key] = array();
    while ($term = db_fetch_object($result)) {
      $terms[$node->vid][$key][$term->$key] = $term;
    }
  }
  return $terms[$node->vid][$key];
}

Mind you, I can't see this ever getting committed, since all focus is on Drupal 7 and this function no longer even exists once Drupal 6 dies out.

Edit: And yes, there will then need to be a follow-up patch to tackle the hook_nodeapi() implementation.

jiv_e_old’s picture

Status: Needs work » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.