Hey guys,
Noticing an issue involving scaling with File Downloads on a multi-tier system. InnoDB is slower when it comes to SELECT statements, and today we released a new product on our new multi-tier Drupal 6 environment (hooray!) that caused a lot of people to try to buy the new products all at once. Since our products are all downloads, I looked in the db processlist and saw dozens of "SELECT * from uc_file_users ... WHERE ufu.file_key = 'ef1......'; type selects.
My gut feeling is that part of the issue is the text search that has to happen. With a couple million rows in the uc_file_users table, it takes anywhere from 3 to 18 seconds to complete one search (via mysql Command-Line!) using the file_key as the index; and less than one second to do it by uid and fid. Here's the output:
(I used a different new file_key each time to avoid mysql caching the query/results)
mysql> select pfid, granted, file_key from uc_file_users where file_key = '7d12a475e2ac761501fa3f217147683d';
+------+------------+----------------------------------+
| pfid | granted | file_key |
+------+------------+----------------------------------+
| 1755 | 1261449969 | 7d12a475e2ac761501fa3f217147683d |
+------+------------+----------------------------------+
1 row in set (4.67 sec)
mysql> select pfid, granted, file_key from uc_file_users where uid = 6 and fid = 11653;
+------+------------+----------------------------------+
| pfid | granted | file_key |
+------+------------+----------------------------------+
| 1749 | 1261447848 | 81931a1497e69ff9809ab9e089795bb0 |
+------+------------+----------------------------------+
1 row in set (0.00 sec)
I propose a patch be written that allows this type of select FIRST, if the user is logged in. Either that, or (maybe down the road) we add the uid to the entire URL; that way we can validate against the user without having to worry about the user actually being logged in.
To be honest we need this type of functionality in order to scale; if 10,000 people all try clicking a link within a few seconds of each other, it could take quite some time for the site to respond again.
| Comment | File | Size | Author |
|---|---|---|---|
| #13 | 666238_file_key.patch | 4.45 KB | Island Usurper |
Comments
Comment #1
torgospizzaProof of concept, the results were immediate.
In uc_file.module (anywhere, its just a hack of uc_file_get_by_hash()):
In uc_file.pages.inc (line ~136):
If you think this is a good idea, I'll write a patch. But like I said, works for me and takes the load off the server without having a compound index on fid and file_key. That might be another good thing to do:
create index uc_file_fid_file_key ON uc_file_users(fid, file_key);Thoughts?
Comment #2
torgospizzaUpdating this, just ran into the issue again with someone using a download accelerator. Brought the site to a screeching halt because he was downloading 4 files simultaneously, using about 5 connections each. This caused 20 threads to open up in mysql, each of them doing a text search for the hash.
I'll be building the compound index tomorrow morning, and investigating other ways to work around this. The hash is good to have but I think we should look at using alternatives first, and only use the hash as a last resort.
Comment #3
tr commentedHow about altering the uc_file_users table so that file_key is an index?
Selecting on an unindexed key is akin to searching through unsorted data - the amount of time it takes scales like N (the number of elements to be searched). Searching sorted data, on the other hand, scales like log N. The difference is negligible when N is small, but for millions of rows the effect is huge. (If N=1x10^6, log N = 6 ... so it should be on the order of 100,000 times faster if indexed).
Selecting on fid is fast because fid is already an index on that table. Selecting on file_key should be just as fast if file_key is indexed. No code changes other than the ALTER TABLE should be necessary if this is the real problem.
EDIT: Oh, re-reading the thread I see you did mention creating an index as a possibility. Did you try it? I think it should work.
Comment #4
torgospizzaYeah, that's part of the solution, I think. I'm interested in running some benchmarks before and after to see what kind of an improvement I get.
It's a shame that file_key wasn't already an index and we in particular had to wait until we already had 2+ million rows in the table :)
Comment #5
tr commentedDid you ever try adding the index to see if it fixed the problem? I would like to get this fixed for the next point release of Ubercart.
Comment #6
tr commentedTagging
Comment #7
torgospizzaI was able to index it, and frankly, I think using the integer is still faster than a keyed index. Especially in our case where we literally have 2.5 million rows to search through; though I could be wrong.
I think indexing it is still a good thing to do, or possibly even a compound index on uid and file_key which is how the query runs.
Comment #8
404 commentedsubscribe
Comment #9
jwilson3subscribe
Comment #10
tr commentedMoving to 7.x-3.x. Still need someone to do some work on this.
Comment #11
longwaveChanging component.
Comment #12
Island Usurper commentedThe index should be on file_key itself because the query isn't searching on fid first. That or the query should be modified to say
WHERE fid = %d AND file_key = '%s'That said, I don't think anonymous users can be granted file downloads. Unless there's some way that a user can get two rows in uc_file_users for the same file, we can probably just drop the file key and rely on the uid.
Comment #13
Island Usurper commentedLet's give this a shot.
Comment #14
Island Usurper commentedWorks well for me.