No related content anyhow, no clouding

fhelmschrott - September 7, 2008 - 13:53
Project:Memetracker
Version:6.x-1.1-alpha6
Component:Code
Category:support request
Priority:normal
Assigned:Unassigned
Status:active
Description

Hi Kyle,

i gave memetracker a first try today. Forgive me if i understand things wrong - this is my first time with memetracking but not the first with drupal.

I installed all the dependencies including the python part - i also did "import from pycloud *" and got no error. Apart from that pycloud seems to work with numpy inbetween or even depends on it. I installed numeric first and got a missing dependency from the setup script. After installing the missing numpy everything worked well.

ok not everything - i setup some feeds and fetched the posts. Unfortunately all i get is duplicate memes (the same story makes two memes) and i don't see any content relations. On the first try i added 150+ feeds from an OPML but as this is a bit hard to handle i now added 5 topic related feeds but still with the same result. Memetracking generally seems to work as i see searchscores in the database which vary. What i don't understand why memetracker module gives search scores to the same cid's - the table shows for example id 543 and 543 with a search score from for example 446.67 - maybe this has to do with the duplicate stories?

I'm a bit confused how everything should work and where possible errors could be.

Can you probably give us an opml file of testfeeds that should have related content to see if it works basically? That would be great i think.

#1

kyle_mathews - September 14, 2008 - 02:52

Duplicate content can come from a couple of sources. Sometimes FeedAPI will import the same content twice. More often, Memetracker will add an already added item creating a duplicate piece of content. I've tried to eliminate that error as much as I can but I still see the problem cropping up (like you experienced) for reasons I don't fully understand. That memetracker gives a search score to the same cid isn't a problem. The only way you'd see duplicate memes if there is two copies of the content in the database.

A way to see find the cids of content is to hover over the links on the memebrowsing page. The cid is added to the end of the links. If you look at the two duplicate links in your memes, the two links will have different cids.

Have you had time to experiment with Memetracker since writing this issue? Have you had any better luck?

Including a sample OPML file is a great idea. A great one to start out with is Techmeme's top 100 blog OPML file -- grab it here:
http://www.techmeme.com/lb.opml

#2

fhelmschrott - September 14, 2008 - 06:07

Hey Kyle,

unfortunately i didn't have much time yet to continue trying. And yes i think there are definitely duplicate content pieces in the database. I'll clean everything up completely and give it a try with the opml from techmeme and let it run a few days. Lets see how that goes and if we get content relations or not.

so long,
Frank

 
 

Drupal is a registered trademark of Dries Buytaert.