Download & Extend

Some logic for duplicate stories from different sources

Project:Memetracker
Version:6.x-1.1-alpha6
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

I'm tracking news with memetracker. And when multiple outlets run an AP story, it creates a big meme of identical items from different sources.

While memetracker should see these as a meme and give them weight, it'd be great to have some theme logic that would check for duplicates within the related items of a meme and display them as child links below the oldest of the bunch.

Do this make sense? Complete duplicate items should maybe look a little different than unique ones that share keywords.

Comments

#1

If I'm remembering this correctly, doesn't the FeedAPI have some functionality to check for dupes? Could this be leveraged here?

#2

In this case, I think we want the items themselves, because they're part of the emerging meme.

But we don't want to display them side-by-side right by eachother. It's boring and uninteresting.

Better would be to leave the first dupe alone and have the ones that come after nested beneath the original one as children. After all, they were late to the party!

#3

@bonobo it does have that function but it doesn't seem to work all the time. . . not sure why.

#4

Brad -- could you attach a (rough) mockup of what you're envisioning this would look like. That'd be helpful.

#5

Here's what I was thinking.

So maybe you run a test against each item in the related_content array, looking for items that share a headline or a summary. If you find any, group them together, with the older item as the parent.

Here's what would be great, see attached.

AttachmentSize
mockupdupes.gif 34.45 KB

#6

Hi, any solution to this issue? I am facing this type of a duplicate for quite some time, even if there is just one entry in memetracker_content table, there are duplicates (or multiples thereof) in memetracker_search_1.