By jhaggenburg on
Ok, the client really got me now. This is the idea:
Users can fill in two extra fields (only options that I can control), let's say "skills" and "hobby's". With this, I need to make a top 10 of best matches! So, if two users have the same skill chosen, that's a match. If two users have the same skill and hobby chosen, that's a even better match. And with this, I need to make an overview of these top-matches, linkable to an overview of the people inside that match.
I'm thinking taxonomy, i'm thinking views or panels.. But How? Can anyone push me in the right direction?
J.
Comments
Skills and hobbies imply
Skills and hobbies imply plural to me, people can have 0 or more of each. If this is the case then just picking the top ten is not a small task. If one considers just the case of zero or one you first need to decide which is more import for the match skill or hobby, I am going to pick skill. The you have logic something like (note this is pseudo code)
The point here is I can not a way to construct a single query (so I do not think views will help) that is going to return your list of top 10 matches. Allowing for multiple skills and hobbies makes the problem more complicated (my math skills are rusty I think this requires combinatorics or permutations). Note the more general solution becomes increasingly slower as number of users grows.
link table / match nodes
Interesting question!
I think taxonomy would be perfect to store the skills and hobbies. That's a piece of cake.
Combining all users and making a top-10 requires that you calculate all values. It is, in terms of database management, a many-to-many relation. Many-to-many relations are best solved by using a linking table in between. Say you have three users: A (hobby x, skill y), B (hobby z, skill y) and C (hobby x, skill y). You would have to calculate the 'match-value' of A*B, B*C and A*C before you can create your top-10. The link table would be:
Now the question is: how do you create such a table? You can write a php script yourself which calculates the matches every time a new user is created or changed, and puts the results in a separate database table. However, Views is not designed to work with custom tables. Maybe it is a better option to calculate the matches and store them as nodes. This would require a new node type, "match" consisting of two user reference fields and one field for the match value. Your custom script would have to fire every time a user is created or changed. It compares the new user with all existing users and adds a match-node for every comparison. This way, views can easily access the results and publish a top 10. If you wonder how to write a script that adds nodes, here is some info: http://drupal.org/node/67887#comment-209250.
By the way, Nevets is right: a solution like this will become very, very large when the number of users grows. 10 users already means calculating 3628800 combinations...
I'm curious what others think of this approach. Maybe there are many other (better?) methods to tackle your problem - I'd like to read them all.
Thanks!
Thanks for the reply's! It certainly is a tricky one.. and by the way, I'm just getting started with Drupal! :) But it's already clearer to me than it was before so let's make it a bit more concrete. The user-dbase is gonna be holding +/- 250 members, witch will be given a list of +/- 10 skills and +/- 10 hobby's and luckely: they can't pick multiple options.
Nevets: Looks quite simple actually, but I'm a bit nervous about performance. This is gonna be triggered every time this overview-page is loaded right?
Marcvangend: Your table seems great, but on some point it confuses me a bit. Let's stay with your example, witch would be:
It could be a lack of sleep, but I've looked it over and over again for 30 minutes now and somehow I can't find the link with your table:
But anyway, the whole idea of keeping the score in separate nodes sounds cool. And these should contain some kind of link with the users as well, right?
J.
edit: Moment of light, maybe :)
let's say that we've got a bit more users/skills/hobby's:
I think that with this, it should count how many times every skill is in these users..
So for my own state of mind: The first table is growing horizontal, the counting tables vertical. I guess when counting the score and putting that in seperate nodes (each for every skill and hobby), combined with the complete list of the particular users it might just work...?
Rotate the table, so
Rotate the table, so
becomes
and you can avoid grow the table horizontally (which means adding column) versus add data as rows.
In reality though this table is not needed though since the profile already stores the mapping of uid to skill and hobby.
Explanation of my table
I understand, my table could use some more explanation. Of course one needs some kind of logic to calculate the "match value" (score) of two users. I assumed that this logic would be: If two users share the same hobby or skill, their score increases with 1. The link table would be used to store the match value of each possible pair of users, so the can be compared and the top-10 could be generated.
In my example, the user table would be (just like yours, but rotated, because this how database tables work):
For each possible combination of users, the score is stored in a table. Users A and B have one match (score = 1), B and C have one match also (score = 1) and A and C have two matches (score = 2). So the table becomes:
When a new user is registered, this would lead to 3 new rows in the score table: A*D, B*D and C*D. This is why the table could grow very fast. I guess 250+ users is way too much for this approach: 250 users generate 3,2328562609091077323208145520244e+492 combinations! (see the actual number here: http://www.sebsworld.com/tools/maths-n.asp) Also, if users can only choose one skill and one hobby, the score can only be 0, 1 or 2. That's a very coarse scale for a tool like this; I guess you can hardly select a top10, because hundreds of combinations will have the top score. (Assuming you use the logic I described earlier.)
Oh yeah, i totally agree.
Oh yeah, i totally agree. That was mostly for my own imagination and i wasn't thinking sql yet and indeed this data isnt gonna be a dbase-table. Besides that, should be this the way to go?
Ok, couldn't sleep last
Ok, couldn't sleep last night, so I thought things trough.. Imagine:
- A new node-type, filled with a node for every skill and every hobby
- Two node-reference-fields in the user's profile (to be sure that the identification to the skills/hobby-node is right and new skills/hobby's can easily be added)
Ok. As soon as a user changes his profile, it will add an user-reference to this specific user for this hobby and/or skill. It will count the total amount of user-references and adds it to a numeric field in that node, called "Score". It could also be an option to not even count it, but just give a +1 to the score, but I'm not sure about the reliability and possible flaws.
Assuming that all this is possible (is it? Is a list of user-references available as an array or something?) It would now be possible to view a top list of high-score matches. Quite simple and light actually, assuming it ís possible :)
Now, another problem would be the fact that an exact match (both hobby and skill) would have to count more. So I thought, after this action of setting the score, it would set a "Bonus" by going through the user-reference-list of the chosen skill and compare it with the user-reference-list of the chosen hobby, count the amount of matches and sets the bonus-score.
What would you guys (/girls?) think about this solution? Would it be a neat way, possible scalable, or am I thinking in the whole wrong direction maybe? With that: Is it possible to use the list of added users to this node and load this particular data? I didn't got into the backside of node/user-reference-fields yet (but I will soon..).
J.
If I may and if I understand
If I may and if I understand correctly:
There is something left from the equation. Namely if 14 users share the same combination. Then 4 valid people get left out of the top 10.
For how I see now, I don't
For how I see now, I don't think I agree. The final list will be an top 10, sorted on score. The first one should hold a complete list of all it's users, then a group of #2, then #3. So, I don't really think that people will be let out. But maybe I'm overlooking something here!