I'm marking this up as a bug report rather than a support request, because several people have looked at this now and we're all totally stumped. Take a look at the attached module. It's fairly complicated, but the bit that matters is in _drivingforce_import_rebuild_taxonomy
which is called by _drivingforce_import_process_file
.
Basically there is a for loop, looping through an array of file objects and processing the XML documents referenced by the file objects for import. There are several documents per node, all nearly identical except for the category information in the XML document's head. As such, I need to save the first document I come across with a matching id (I've called this $tm_id
) - this I can do, successfully (see the _drivingforce_import_prepare_article
function - it works great). The node saves with the first category applied. Then the problems start:
Within the same for loop, the *next* time the script finds a document with the same $tm_id
it checks, sees it already has a node with that id and instead of creating a new node (using _drivingforce_import_prepare_article
) it loads the existing node like this: node_load($result, NULL, TRUE);
Note, I *am* resetting the cached node. I then call _drivingforce_import_rebuild_taxonomy
, which rebuilds the taxonomy in $node->taxonomy
on that node and saves it.
If I do a select on the term_node
, the *instant* after the node_save
in the _drivingforce_import_rebuild_taxonomy
function, the result is correct.
BUT when I next load the node, the applied additional terms are gone and it's as though the whole thing was a dream.
Entire module is attached. Hoping someone can shed some light. I'm at a total loss.
Comment | File | Size | Author |
---|---|---|---|
#7 | taxonomy_node_get_terms.patch | 633 bytes | berenddeboer |
drivingforce_import.txt | 19.9 KB | greg.harvey |
Comments
Comment #1
greg.harveyOk, I think I can expand on this and I think it is a bug. Even when the
$reset
parameter innode_load()
is set toTRUE
, old taxonomy still appears to be cached.Having moved the part that processes the taxonomy to an entirely separate loop, I discovered now the first AND last terms are saving, rather than just the first as before. Looking at the before and after node objects of each loop, I can see the
node_load
is loading the wrong (initial) taxonomy every time, even though in the interim taxonomy was updated. My initialnode_save
saves taxonomy successfully, the followingnode_load
brings it back successfully, the secondnode_save
succeeds (I can prove this) but after the secondnode_load
our taxonomy seems to still be cached from the previous one, in spite of the reset. Consequently, the nextnode_save
overwrites the successful prior save with the NEXT term to be processed, having failed to load in and retain it.So there must be some caching within taxonomy that ignores the node.module's request for a reset of the cache. I'll see if I can find out precisely what.
Comment #2
greg.harveyAnd here is the same issue, referenced elsewhere: #471074: Taxonomy synchronization caching bug
Comment #3
stewsnoozeJust humour me here and try looking in
change the static $terms;
to $terms = array();
I've had a weird thing like this before and I fixed it by moving the node creation to the batch API and only created one per batch (Super slow) unless you hack core.
Comment #4
greg.harvey@stewsnooze Spot on! That's where the bug lies - that sneaky static.
Worked around it by binning the API (it's an import module, so code maintenance not an issue). So this:
Became this:
Works now, but
taxonomy_node_get_terms
really needs a$reset
that can be called bynode_load
when it's$reset == TRUE
!Comment #5
greg.harveyUpdating title to suit.
Comment #6
berenddeboer CreditAttribution: berenddeboer commentedSubscribing, is still an issue and this kind of caching can completely stump people. If you do a node_save, then a node_load, update the terms in your node and a node_save again, and then again node_load your assigned terms are gone.
This is completely surprising and unexpected behaviour. IMO taxonomy_node_get_terms should not cache at all or get cleared upon node_save.
PS: $reset is not passed in by the nodeapi hook, so cannot be used here.
Comment #7
berenddeboer CreditAttribution: berenddeboer commentedAttached a small patch that makes sure that when the calls comes through the nodeapi load, the terms are not cached. Caching for every other use case is left intact.
Comment #9
greg.harveyDoing it like node_load() would be more consistent:
Would that to it? It would make more sense because it would follow the same pattern as this:
http://api.drupal.org/api/function/node_load/6
Comment #10
berenddeboer CreditAttribution: berenddeboer commentedBut how do you get the reset parameter right there? I think $reset is overkill as this caching is overkill.
Comment #11
greg.harveySomething like this, no?
Mind you, I can't see this ever getting committed, since all focus is on Drupal 7 and this function no longer even exists once Drupal 6 dies out.
Edit: And yes, there will then need to be a follow-up patch to tackle the hook_nodeapi() implementation.
Comment #12
jiv_e_old CreditAttribution: jiv_e_old commentedRelated issues:
http://drupal.org/node/605182
http://drupal.org/node/767104
http://drupal.org/node/471074