Hey folks,

I just published a new module which does the same thing as this one, but with views integration and weighting by vocabulary.

I'm not sure how you want to proceed, because it's not really a "update" per se, more of a re-write. Is there an advantage to incorporating the two? Or should we make mine the 5.2.x version?

Anyway, here's the mod page: Similar Nodes

Best,
Jacob

Comments

jjeff’s picture

Eek! Yes. Let's integrate them. I've been pushing for Views support in this module for some time now.

Is there (plans for) a D6 version of Similar Nodes?

Let's do a quick compare/contrast of the modules to figure out what we'll need to do to get them to meet in the middle.

wayland76’s picture

If I might be able to make a suggestion, I'd suggest that we do things this way. We have one main module with various sub-modules.

Each sub-module would use its own criteria for deciding whether terms are similar, and then assign various nodes a rating from 0 to 9 in similarity.

The main module would then (optionally, based on interface selection) combine the different scores, optionally weight them, and then display a list of the most similar modules.

This would mean that the sub-modules could then use code from the following modules:
* The current Similar By Term: http://drupal.org/project/similarterms
* Similar Entries: http://drupal.org/node/25974
* Related Nodes: http://drupal.org/node/39822 (if there's anything useful in it)
* Similar Nodes: http://drupal.org/project/similarnodes

Would that work usefully?

encho’s picture

That would be dream come true :-)

catch’s picture

fwiw I've started a list of content recommendation modules over at groups in a bid to help some of these modules consolidate, found this issue while I was doing it: http://groups.drupal.org/node/12347

I've been using similar by terms for a while and really like it, if ya'll can work together that'd be extra nice.

JacobSingh’s picture

Hey guys,

I'm really low on time to deal with this right now, although I'd like it not to sit. Similar Nodes does pretty much exactly what SImilar Terms does, but does it through views and also allows for vocab based weighting. I think what Wayland is saying is what I also expect in the future, which is why I named my module similarnodes.

How this is accomplished is a little more tricky. You will end up with some truly massive tables, or some truly gnarly joins. If I have 5000 nodes on the site, that is essentially 5000 ^ 2 records or 25m. Not sure if my grouping and sorting hacks in views (should be re-written in views 2 soon) will hang with that.

Anyway, I have time to give support / documentation / help people understand hacks, but I don't have time right now to port any code one way or the other and I won't for about a month. I can do little stuff here and there, but not be the primary maintainer of this merge. I'm happy to consolidate into this project, but I would like CVS access, as I have to maintain this module for a real client. I think this one (similar terms) has a larger community, but if we merge it, it would have to be a 5--2 version without BC because it has a new dependency, and is anyway written in a totally different way.

Okay, jjeff - let me know what you would like to do about this, I'm happy to help.

Best,
JacobSingh

rmiddle’s picture

I have a pretty large site already with large number of nodes. What you listed means that my site would be crawling from the size and scope of the Database. Although adding views and weight support is a plus. It sounds like your setup has problems with scaling that similar terms currently does not seem to have. I have read though your code and there are some things that are worth using but I believe that your code is too immature to use whole sale right now. I am not speaking for Jjeff but I have been doing a lot of work on the code at this time and personally don't see any benefit in merging the two projects at the same time myself.

Thanks
Robert

JacobSingh’s picture

Hi Robert,

I don't know how it would scale really. It depends on your mysql setup probably and how much RAM you have allocated. I think because the entire table is numeric, it should index nicely.
Hi folks,

For most sites, up to 1,000 nodes is probably fine. If you're volunteering to test the performance, by all means :) I'm using it on http://hub.witness.org, and so far it seems to work okay. That's a pretty big site too.

I disagree that just because you've spent time on this code recently, that makes it a bad idea to merge them. I've often worked really hard on something to find out someone else came up with a better or more accepted solution and happily ditched my code because that was the direction that made sense. Maybe I missunderstood you, but that's what I got from your last sentence, please correct me if I'm wrong.

I think that performance is the main unknown here, but it's not necessarily an issue. I was talking about the proposal to have a true "similar nodes" module which can implement different relationships and build a huge nid1|nid2|relationship_type_weight of relation table, which would be many times bigger than the one I've build. So far, Similar Nodes actually has performed very fast, and probably would perform as fast as Similar Terms of faster because it is using a cache table in the middle to build the relationships and store them as opposed to making joins.

The other issue is "out-of-the-box" ness... I think that similar nodes could benefit from a panels 2 pane using node context, as well as a block (I kinda set this up I think using panels PHP args).

The final issue is just review. As you say, the code is "immature", and could use some more tire kicking / review. I don't think that means that it should be disregarded however, as it provides much sought after functionality.

Still, if the way of the community it to stick with Similar Terms and re-write views support, I will try to help with that, and feel free to cannibalize my code to do it.

Best,
Jacob

rmiddle’s picture

JacobSingh - June 30, 2008 - 22:08

I don't know how it would scale really. It depends on your mysql setup probably and how much RAM you have allocated. I think because the entire table is numeric, it should index nicely.

For most sites, up to 1,000 nodes is probably fine. If you're volunteering to test the performance, by all means :) I'm using it on http://hub.witness.org, and so far it seems to work okay. That's a pretty big site too.

1000 nodes is a really small site. Almost any decent site is going to post more then 100 new nodes a month. So 1000 nodes would be hit in 10 months and that is with nothing else that generates nodes is being used.

I disagree that just because you've spent time on this code recently, that makes it a bad idea to merge them. I've often worked really hard on something to find out someone else came up with a better or more accepted solution and happily ditched my code because that was the direction that made sense. Maybe I missunderstood you, but that's what I got from your last sentence, please correct me if I'm wrong.

I agree. I have just dumped code this week as a mater of fact. I am just saying that as someone who has added several new features to this module including doing a 6.x release in the last month. The code in this module is really well written, clean, and seem to scale pretty well with large numbers of nodes and hits based on my own use and lullabot.com use of this module.

I think that performance is the main unknown here, but it's not necessarily an issue. I was talking about the proposal to have a true "similar nodes" module which can implement different relationships and build a huge nid1|nid2|relationship_type_weight of relation table, which would be many times bigger than the one I've build. So far, Similar Nodes actually has performed very fast, and probably would perform as fast as Similar Terms of faster because it is using a cache table in the middle to build the relationships and store them as opposed to making joins.

Ok I might have miss read what you wrote since it sounded in #5 that your current code is node^2 number of row that would be a huge amount of data and would kill the preform in almost any site over time. However is that was theory you were kicking out for possible updates then that is a diff. story.

I agree that caching will generally produce more efficient code and is on my todo list for Similar by Terms. But if you are generating a huge table that will kill any performance gain that caching will produce.

The final issue is just review. As you say, the code is "immature", and could use some more tire kicking / review. I don't think that means that it should be disregarded however, as it provides much sought after functionality.

I didn't say that we should disregard your code in fact I said that there are some good things about your code. I am just saying that the code you have written is a lot more complex and sometime that is better and sometimes it isn't. I personally think that caution is in order and sometime a simple solution is the best option.

Thanks
Robert

summit’s picture

Hi,
I would love integration of similarterms and similarnodes. Is there a timeline please?
Thanks a lot in advance for considering combining these fine modules.

Greetings,
Martijn

rmiddle’s picture

Status: Active » Closed (won't fix)

I consider this a dead issue at this point. Our two codes bases are way to different.

Thanks
Robert