Posted by memger on September 17, 2008 at 12:34am
Jump to:
| Project: | Download Count |
| Version: | 6.x-2.0-beta1 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (won't fix) |
Issue Summary
Hi,
is it possible to add a feature to the download_count plugin, which makes sure that each download is only counted per visitor? The "problem" currently is, that many people use download managers, which create multiple connections per download, so it's not uncommon that the plugin counts 25 downloads for one user. Is there a way to achieve this?
Comments
#1
Yes me too, I would not want my audio downloads skewed or not honest. If somebody wants to make their song appear more popular than it really is, the current download counts can give this impression. Presently all a person has to do is start a download, cancel it and then start it again and so on to give bogus download counts for their audio file. I haven't found any module yet to do as you suggested. If you find one let me know, or if anyone else has a suggestion for this great.
Eric
#2
not that i know how to handle this, but keeping open against the new version for the time being.
#3
Not sure but the advertisement module has a filter for click throughs etc. I wonder if code could be borrowed from this to apply the same sort of filter per users etc and anonymous visitors?
Eric
#4
cool... thanks for the info. I'll take a look.
#5
Many thanks to WorldFallz for taking over this module, and so quickly turning things around. I needed this module to be working when we had a major release in the community served by my Drupal site a couple days ago, and your new versions of the module showed up just in time to start counting downloads.
It does seem that I am still getting multiple records of a download for the same IP address and the same timestamp, so if the bug with node revisions duplicating download count is fixed, it must be download accelerators causing the issue. It would be great to see this fixed. Thanks again for your work!
#6
See #823200: add ip address and referrer for downloads -- the module now collects IP info so we're part way there. I haven't done anything about flood checking yet, but at least the info is there and you can use it in queries and views.
#7
OK, I've been playing with different ways to do this for a while. I don't think it makes sense to take a performance hit and query the db for existing download data before storing new info. Since the IP address is now part of the data, you can do your IP filtering on the reporting/querying side.
If someone wants to submit a patch, I'd be open to taking a look-- but I won't take a performance penalty during data storage.
#8
Just trying to figure out what this all means. So on my site where the audio modules posting to the front page, it shows that the song has x number of downloads. Does your download count code have anything to do with that or is that just what the audio module reports? If it is the audio modules code showing this download info, I guess I should see about turning that display off to the public? Then I would show the actual download count from your code to the public, how do I do this?
Thanks
Eric
#9
I would be very interested in seeing a patch that excludes the duplicate entries, even if it makes the server work a little harder. When I look through my logs for "download" events, I see tons of duplicate and triplicate entries for a file (same minute, same IP) which tell me that my download counts are grossly exaggerated.
Even if it involves a performance hit, if the IP filtering was an optional feature, wouldn't that be acceptable? I don't see how else the module can avoid storing wrong numbers besides checking as a download begins whether that IP already just got logged downloading that file. Is that really going to make the server work a lot harder? It just has to look at entries from the same minute. True, the filtering could happen at the time the log is queried, but then we're talking about filtering numerous entries at once every time the count for a file is requested, and meanwhile the log is still wrong.
#10
What exactly do you mean by performance hit. Could you give some example numbers in comparison to what is already being done. Are we talking about nano seconds, milliseconds and how many user number sites will this affect with negative results on performance, etc? As a site owner, performance is way back on the plate of priorities if within a reasonable amount of visitors are being subjected to a performance henderance. Much more important is usability, and accuracies with functionality. We have to look at the bigger picture sometimes when considering trade offs here. Escort VS Corvette (car scenario), they both have to use the exit ramp to get on a congested highway scenario. The escort will always out sell the Corvette, and more people will use the escort as well, usability = Priority. On the other hand if performance hits brings it down to the equivalent of a horse and buggy trying to get on that same highway, well maybe then you have a point to consider with performance issues. So, what is it exactly in the kind of comparable numbers with this performance hit. With all of this being said I also would wonder what exactly do the big boy web sites have that make their counts more accurate, or are they not accurate as well, making this more of a mute point to even try to accomplish?
Eric
#11
I don't have any numbers, but adding even a single additional query to the db before every file download is bound to have performance implications. I just don't see the value added when you can simply do the filtering on output. I can see how it would be a nice feature to check a box and have the built-in data display filter by ip with a variable time frame, but it's not something I myself need or can spare the time to code at the moment. My emphasis is on getting the darn beta done. Besides, I expect most of the heavy lifting for output to be handled by views.
Of course, if someone submits a patch I'm more than happy to consider it. But it would have to be an option-- I dont want to force users to have to use that additional query on every download.
#12
Hi WorldFallz,
First of all thanks for the great plugin. I'm kind of a drupal newbie you say that "you can simply do the filtering on output" could you (or anyone) provide some guidelines on how to do this?
Regards
James
#13
Probably the simplest way to do it is use views3 to make a view by ip and aggregate it. You could probably also do it with views2 and the views_groupby module.
#14
I have to say, I'm curious as to how exactly to get a view to remove duplicate IPs. I upgraded to the 3.x-dev version of Views that adds the grouping feature, but the most I can do is make a view that uses Count on the "Download count: IP address" field, and this counts the duplicates too, so I arrive at the same exact number as your module's "File: Download count" field. (The real issue is when a download occurs in the *same minute* from the same IP, but for the sake of simplicity, let's just talk about filtering for one field, IP address.)
I picked a relatively unpopular file to look at as a case study since I can count all its downloads with the naked eye. It's been downloaded 20 times according to DL Count, but I can see that only 17 unique IPs downloaded it (this probably doesn't sound like a major issue, but some files have been downloaded thousands of times and have numerous duplicate IPs occurring up to 10 times each). It doesn't seem possible to tell Views that a file downloaded by the same IP is to be ignored when using the grouping feature's Count function.
I also tried turning on the Distinct option, but apparently there's no way to tell Views which field to filter for distinct values, so it does not produce any intelligible results. There's also an option under the Table settings to group the display by a certain column, and when I group it by IP address, it correctly displays the duped IPs in groups, but I can find no way to *count* those groups. Sorry, I know this isn't a Views 101 class, but if you happen to have the answer, I'd love to hear it. Otherwise I will search for alternatives (e.g., maybe I can do this on the theme level).
P.S.: If it helps to have some sample data to look at, I'd be glad to email you my download_count tables so you can see the duplication issue.
#15
yeah-- this doesn't appear to be as straight forward as I thought it would be. I finally got some time to devote to creating a views3 view and couldn't get it formatted properly. Unfortunately, you're probably looking at a custom query for the time being. I probably won't get to flood control for a while unless someone steps up with a patch.
As for the duplicates-- is every field in the row (including the timestamp) duplicated? In order for the records to truly be duplicates, they would have to match exactly. Afaik-- there's nothing in the module that could be doing that in the download_count table. If it's in the download_count_statistics table there could be bug in the cron aggregation function.
#16
"Unfortunately, you're probably looking at a custom query for the time being." I will probably investigate the use of a customized theme. I don't know SQL, but I do program, so it should be easier for me to issue a SQL call like the one your module uses and then iterate over the results in PHP (as soon as I learn PHP :-).
"As for the duplicates-- is every field in the row (including the timestamp) duplicated?" Besides the unique download ID, yes... pretty much. A typical occurrence would be seeing the same IP download the same file three times in a row with the same timestamp (down to the second), then three more times a second later. Or three times and then six more in a couple seconds.
And this is in the download_count table, not the -_statistics table, plus the bug predates the addition of statistics to the module. Perhaps I shouldn't call it a bug, but whatever it is that's happening.
#17
I've been following this issue closely, tried the views alternative with no success either so I will try the custom SQL part, if I cant get it to work do you have any suggestions?
"I will probably investigate the use of a customized theme". How would you implement a fix on a theme level?
#18
Hey I managed to do this counting the unique ips per file, here is what I did:
Added the ip address field to the view then I used a custom module to override the query and just added COUNT(DISTINCT node_node__download_count.ip_address) and GROUP BY nid to the original query!
This thread pointed me in the right http://drupal.org/node/409808 on query overwriting it was very helpful.
Regards,
James
#19
awesome-- thanks for posting that back. I might be able to add a 'distinct ip' filter to the filters provided by the module that does exactly this. I'll take a look and post back when i get a chance.
#20
somewhat related-- added flood control to the 7.x-3.x branch.