I apologize for not following along with the discussion of 5.3 as it has progressed. If I'm not understanding something please let me know.

In our implementation of User Points we've dealt a lot with a single set of points and max_points figures. Categories are cool but they made it difficult to report on the total points (or max_points) a user has accrued across all categories. These patches should, hopefully, solve that problem.

The install patch creates a new table called "userpoints_summary" that will hold summary data -- points and max_points -- across all categories. The userpoints_update_9 function grabs legacy data from the userpoints table, summarized the data across categories and inserts it into the newly created userpoints_summary table.

The module patch modifies the _userpoints_update_cache function to update the userpoints_summary table each time it updates the userpoints table.

I've tested this with my Top Contributors module and it appears to work as advertised.

Please let me know if I'm missing (or misunderstanding) something.

Thanks!
Kevin

Comments

kbahey’s picture

Project: User Points » User Points Contributed modules
Version: master » 5.x-3.x-dev
Component: Code: userpoints API » Code

Looks like another candidate for a contrib module.

It has its own table, and using the hooks, it summarizes things, and provides some calls.

Commit it when you get CVS access.

jredding’s picture

wait.. wait.. Do NOT commit this as this modifies the original userpoints.module file thus it makes a change to core, I do not recommend we do this. If a change is needed to core then we should put these changes straight into core and not make it a contributed module. In My Humble Opinion.

I have two main questions
SELECT uid, sum(max_points) as mp
FROM userpoints
GROUP BY uid
ORDER by mp DESC

Then run that through db_query_range to limit your result set. I fully understand that SUM() can be expensive query BUT the expense can be easy mitigated with an index especially considering that if you have a few categories the GROUP BY clause will run extremely fast. And fast is relative to the site.

That's question #1
Question #2
Why another table?
The summary caching table is exactly the same as the first caching table except that it holds 1 line per person. Why not just create a category called "summary" then put the points in there. You can grab them with the hook_userpoints watching for "points after"

Question #3
If it absolutely positively can't be done through the use of SUM and hooking points_after is a pain, why not throw this directly into core? Its already been request although the previous request was completely fine with a SUM.

Sorry if this whole thing seems harsh but I think this is gone about the completely wrong way. As it stands right now we have a fairly decent change to core that adds a new table, modifies internal functions and then provides zero functionality to the user unless the top contributers contrib module is installed.

kbahey’s picture

- Yes, SUM can be expensive when the table grows a lot. I think this is why Kevin is writing this module.

- Let us explore more indexing. Kevin, did you try to do some EXPLAINS on the slow queries to see why they are slow and whether they would benefit from indexes?

- If Kevin still does need the summary, then it can be a contributed module, which watches for "points after", and updates the summary table with the info there, completely separate from core.

jredding’s picture

My only concern is that the patch as is modifies userpoints.install and userpoints.module and isn't functional without the top contributers contrib module.

This has been a requested feature though and it can be done in core with the existing table. tid = 0 is the uncategorized points and tid = -1 could contain the summary points if we really wanted this available in core. No second table needed a minor, minor change done to the update_cache function and then a new block added.

kbahey’s picture

-1 on changing API tables, regardless.

The category route sounds viable, but I have 2 concerns:

- Performance remains to be assessed. The reason is, we are still doing selects from the big table, so maybe we don't get the performance benefit Kevin is after. Maybe a WHERE tid = $summary_tid would make it fast, but we can never know until we try.

- I don't like having -1 as a tid though, looks like a kludge. Make it a proper positive number with name "summary" if that is the way we go.

Kevin can chose whether he wants to go the "category as a summary" or "contrib module with 'points after' and summary table'. If he decides for the latter, then no API changes should be done (schema, or module), and it should all be in contrib with its own summary table.

Either is fine from my point of view.

kmillecam’s picture

Hmm, you two make some good points.

I know that other users have asked for a "Highest Users" block that lists summary data. I don't believe this is possible with the current core code. Maybe the addition of the "summary" category (with minor changes to _userpoints_update_cache) would solve the problem for all of us? (Core users get a "Highest Users" block that reflects summary data and I get a new category that can be used to build a "Top Contributors" block.)

My reason for adding another table wasn't for performance reasons. It was just how my mind thought of solving the problem.

It's really nice to have other brains (like yours) helping to think things through.

Kevin

kmillecam’s picture

Before you make any changes to core, let me try the SUM approach. If it turns out to be too expensive, we can revisit.

Thanks,
Kevin

jredding’s picture

Sorry if I came on too strong, the work is good I just had some concerns. I'm currently setting up a test environment to figure out what the impact of using SUM is. My guess is with some good indexes it won't be very heavy but i don't know this to be the case. Personally I like SUM()'s in DBs because it reduces the possibility of having out of sync issues in your DB.

Report back with your findings and I'll do the same.

jredding’s picture

sorry 'dupe

kbahey’s picture

I know that other users have asked for a "Highest Users" block that lists summary data. I don't believe this is possible with the current core code.

Why is it not? You can just create a contrib module that does a SUM and GROUP BY uid, and it will sum all the categories for you. You can make that run from hook_cron, and even store the results by cache_set() and update it every hour or half a day.

Maybe the addition of the "summary" category (with minor changes to _userpoints_update_cache) would solve the problem for all of us? (Core users get a "Highest Users" block that reflects summary data and I get a new category that can be used to build a "Top Contributors" block.)

In this case, the summary category would conflict the above solution, or the above solution has to know which tids to exclude if points do overlap/span tids.

jredding’s picture

I started doing some research on the implication of using SUM() and came up with some lightbulbs moments but no hard data yet.

1) we're talking about the userpoints table and not the userpoints_txn table. At most the userpoints table will have # of site users X # of point categories in it. So if your site has 100K and you use 3 point categories you have 300K lines, which really isn't that much especially for a SUM.
So i think the performance issues are on sites with both a ton of users AND a ton of userpoint categories. Of course then you probably have bigger fish to fry than this one potentially slow performing block.

2) in D5 core.. upload.module, forum.module and search.module all utilize the SUM function to provide statistics. forum.module utilizes it to provide a count of the number of comments.
While using other code to justify the use isn't the best method I'm simplying point it out to say that we may be created a solution for a problem that doesn't really exist.

I still think a SUM() is a very clean way of doing this. Its easy to read, has zero out-of-sync possibilities and would fit very nicely into a contrib module. Heck it could even go into core. I just don't see the need for an additional table and an additional caching row. For performance issues an index on the max_points field would speed up the query significantly and moreover the block code itself could be cached.

kmillecam’s picture

I'm going to give the SUM() implementation a try. Thanks for the direction guys.

Kevin

kmillecam’s picture

Status: Active » Closed (fixed)

I added SUM() functions to the Top Contributors module (now committed to the 5.x-3 branch) and it appears to be working well.

Closing request.