I found the leech yahoo terms tool (cool btw) to create duplicate terms, especially when they are popular ones (not sure how this relates)

which leads to a confusing link structure e.g

/terms/cars
/terns/cars-0
/terns/cars-1
/terns/cars-2
/terns/cars-3
/terns/cars-4
/terns/cars-5
/terns/cars-6

all being assigned to the same node

I assume the code fragment checking if it shall create a term or not has a bug and
does not sanitize or lower/upcase the input...

Any help on this?

CommentFileSizeAuthor
#1 fixduplicateterms.php.txt1.2 KBalex_b

Comments

alex_b’s picture

StatusFileSize
new1.2 KB

hi christoph,

I did a change to the current development version (tag DRUPAL-4-7 http://drupal.org/node/106896 ) that applies an array_unique() function to the array that the yahoo terms service returns.

CVS diff:
http://cvs.drupal.org/viewcvs/drupal/contributions/modules/leech/leech_y...

After this fix i still get duplicate terms from leech_yahoo_terms, because there is already duplicate entries in the term_data table. When I delete all duplicates in term_data, yahoo_terms does not create any new duplicates anymore.

My hunch is, that those duplicate terms came into the system by duplicates in the response of yahoo terms API. I could not test this fully, though.

I attached a little script for deleting duplicates from term_data without loosing any tags on nodes. Copy it into a node body, set variable $termname to the tag name that is duplicate, set input type of the body to PHP and view the node.

If you end up getting new duplicates again, let me know.

Alex

alex_b’s picture

BTW, I just created leech version 1.3 - it contains the above fix as well.

Christoph C. Cemper’s picture

Hi Alex,

thanks, but that didn't do it...

maybe I need to explain that I did a copy your yahooterm module and let it trigger by the "submit" hook,
so I can use it on normal blog posting...

however, whenever I submit a node again, it adds more duplicates

so I assume the program logic doesn't take care of exisitng tags already in the DB, or?

thanks
christoph

Christoph C. Cemper’s picture

I could track down the real problem.

Whenever another vocabulary already has that term, the code re-creates the term in the to-be-used-by-leech-voc

ie..
first post

voc1 has cars
vocleech generates cars and creates cars-1

second post

voc1 has cars
vocleech has cars
but still , the post generates cars as cars-2 for this post,
but adds cars, cars-1 and cars-2 to the post taxonomy

crazy

Christoph C. Cemper’s picture

this means the real bug must be in yahootermfilter_create_vocabulary_items
which falsly checks if the voc is already there and creates dupes

Christoph C. Cemper’s picture

ok,
the whole else path after if ($curr_term[$i]->vid == $vid) {
is actually the reason... this is loop generating a new term every time it finds a term in another voc

pretty sick, actually - why no ignore that one and proceed with search?

actually I'd even reuse the term from another voc (it even MIGHT be already assigned to that node)

funana’s picture

Same here...

(just to follow this thread)

aron novak’s picture

Assigned: Unassigned » aron novak
Category: bug » feature

So you need vocabulary-insensitive version of leech_news_yahoo_terms. It's definitely not a bug. The original design of the module force to be vocabulary-sensitive, because it was a requirement. It's easy to create a settings option that makes the module vocabulary-insensitive when finding after currently existing terms.
Soon I'll attach a patch.

aron novak’s picture

Priority: Critical » Normal
alex_b’s picture

maybe I need to explain that I did a copy your yahooterm module and let it trigger by the "submit" hook,
so I can use it on normal blog posting...

Christoph,

Are you using leech_yahoo_terms as a standalone module? If so, could you provide the necessary changes for doing so as a patch on http://drupal.org/node/107814 ? That would be of great help: on the longer term, leech_yahoo_terms should be a standalone module for the use of whatever node type you want to auto-tag.

Thanks a lot, Alex

Christoph C. Cemper’s picture

@aron: no way... it wasn't voc insensitive... it should create new terms in ONE voc... but your SQL / php logic is broken, so it created that new term over and over again as it did the CHECK in all vocs, but created a new one only in the defined target voc

Christoph C. Cemper’s picture

Category: feature » bug

not sure if you resembled the same setup, but with

1 VOC holidng the term "google"
and the term creation finding "google" matchin vor creation in the target VOC (VOC2)

it does create duplicates... just test it :-) it IS buggy

patchak’s picture

Well I just installed another fresh version of leech and yahoo leech terms, andI have to say that I had some terms that were duplicatedas well... It's weird because it's only a few terms, and always the same terms, like youtube, google, gmail, etc...

I had installled this in the same site last week, and this is the first time I get duplicate terms.

Any ideas now what causes this or how to fix it?

Thanks

patchak’s picture

Update:

It seems that the site I though was a fresh install was not at all fresh... It still had terms in the db even if I though I erased everything.

Well it turns out that on a fresh fresh install I have no dup terms yay!

I'll let you know if the problems comes back, but at the moment it look A1

Patchak

aron novak’s picture

Status: Active » Closed (works as designed)

Whenever another vocabulary already has that term, the code re-creates the term in the to-be-used-by-leech-voc

Yes. I think you expect another behaviour of the module than we. It's a by-design issue, I modified the module (re-create the possibility of per-feed settings and other little improvements), tested a lot and never experienced the following:
- identical terms in specific vocab
- same terms connected to specific node
The module's by-design behaviour:
- creates identical terms in different vocab

patchak and me tested this and found everything A1. Btw patchak thanks a lot for helpful response.