The autotaxonomy module is creating multiple identical taxonomy terms in my Drupal 4.6.4 server. I have looked at the code and see that it has been designed so as to avoid this happening, so I'll see if I can provide enough info to reproduce it. It happens when a new term is being created by the module (the term does not yet exist in the vocabulary), *and* the term is used on more than one item from the same feed. A new term will be created for each RSS item containing the term - even after it has already been created.

If the taxonomy term exists before the feed is processed or the new term is only encountered once in the feed, everything works fine.

I see the call (CVS->aggregator_autotaxonomy.module:Line65) to module_invoke in the taxonomy module if $cat_names[$category_name] is unset. This eventually does a database call in the taxonomy module.
So the term should not be created again if it exists in the DB, but it is (being created again). Perhaps the first call wasn't flushed? Or something has been cached somewhere deeper in the db layer and not refreshed?

In any case there's another bug here. It belongs to the taxonomy module, which will happily create terms with duplicate names. Since these ultimately become TID's perhaps it doesn't need to care about the name, but it can be frustrating that this practice is allowed. Even if it were corrected, it would mess up aggregator2 since node->taxonomy will be undefined if the call to taxonomy_save_term fails.

I also notice that the module_invoke -> get_term_by_name() does not specify a vocabulary. Perhaps this is the root of the problem - and if not, could be the source of future bugs if autotaxonomy then sets the node to some unrelated or unallowed name space for this content.

Comments

macgirvin’s picture

The following patch to aggregator2_autotaxonomy.module fixes the problem. I don't know that it is the correct fix, but it seems to perform the desired result. I'm suspecting that there is some DB caching going on somewhere.

// Create a new category term
if (count($cat_names[$category_name]) == 0) {
$term = array();
$term['name'] = $category_name;
$term['description'] = t('Auto generated by aggregator2 autotax\
onomy');
$term['vid'] = $vocab;
$term['weight'] = 0;
$term = taxonomy_save_term($term);
$node->taxonomy[] = $term['tid'];
+ $cat_names[$category_name][0]->tid = $term['tid'];

ahwayakchih’s picture

Thanks!

It's not db caching, it's autotaxonomy module which caches results, so it doesn't generate queries if they're not needed.
Of course there's a chance that in the meantime somone creates such term and module will not know about it, and create duplicate...

It doesn't use vid when searching for term because it tries to reuse terms, no matter in which vocab they're used.
But maybe it would be good to make that optional?