Hey guys,

I'm working on working out the bugs for our D6 site (yeah I know, I'm a little behind) and noticed that, when visiting a user/#/edit page, it takes quite some time to finally load due to the way _uc_file_gather_files is checking every single file in the download files directory for an fid.

The problem with this, is that we have thousands of files for sale on our site. To check each one of those files, and search for its fid in the uc_files table, takes about 17 seconds on average, and this is using a pretty beefy database server.

My suggestion is to only "refresh" the uc_files table when new files are uploaded via the admin, or a "refresh files directory" button is pressed, similar to the way the theme system's "clear all cached data" is implemented. There's no reason why this directory should be refreshed every time a user needs to edit his password.

If the concern is that you want to make sure you can add files to a user, either a) just scrape the directory, and allow an fid to be assigned / corroborated after form submit, or b) add a new "Add Files" tab to the user form.

I need to get this worked on immediately, since we're hoping to migrate this week. Let me know if you'd like me to work on a patch and how you'd most like to see this fixed.

CommentFileSizeAuthor
#2 uc_file.norefresh.patch849 bytestorgosPizza
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

torgosPizza’s picture

Another solution I just thought of would be to add a new permission and a new config (or maybe just the new config). If we add an "administer file downloads" permission for the uc_file module, you could make sure that the _gather_files function is only called when the form is viewed by a user with this permission

Secondly, a config option could include things like "refresh files list on user edit form" or "refresh files list when a file feature is added to a product" or something of that nature, so a store admin can control when this action is fired. Like I said, for our site, which has thousands of users, many of whom will have product files for sale, plus our own products which contain hundreds, maybe thousands of files themselves - the amount of time it takes to do a files list refresh can add up to be quite a lot. The solutions I've proposed will help keep that costly action from being fired when it's not a good time to do so, and will give the admin more control over when it does happen.

Again I'm more than willing to write and submit a patch to include any of these solutions. We just need it done asap.

torgosPizza’s picture

Title: _uc_file_gather_files needs to be optimized or moved » _uc_file_gather_files needs to be fixed (or needs config)
FileSize
849 bytes

Here's what I added to uc_file.module, at line 152. Right now it just runs _uc_file_gather_files but I created a condition to check a) that it's an admin and b) that a configuration option, which does not yet exist, is set (but I created it in my patch). I think this is the best way to go as it'll take less code to implement than pretty much anything else.

  // Rebuild the file list.
  if (user_access('administer users') && variable_get('uc_file_refresh_files_user', FALSE) == TRUE) {
    uc_file_refresh();
  }

Open to suggestions of course but this was easy enough to implement. One thing that might be considered (and that this patch leaves room for) is finding other configuration options for when to refresh the files.

The patch is attached.

sgriffin’s picture

Is this function also called when adding a File Download feature?
I've noticed that adding a file download feature is taking a similar amount of time and of course getting worse.

// Make sure we have an up-to-date list for the autocompletion.
uc_file_refresh();

Does this mean I can disable this at the trade off for turning off autocompletion?

This is a heavy duty function indeed.
Checking into this function, it would also be a good idea to have an option to remove the recursive directory check, OR be able to specify matching paths to ignore (better).
For example, I re-use the drupal files directory because I sell images which are re-used by imagecache and this causes and 100% increase in the amount of files for each imagecache preset, that are found via this function, and in my case are useless for the store.

torgosPizza’s picture

Yup, this function is called whenever you add a new File Download feature to a product, as well as when you Edit a user. I imagine it's also called elsewhere in uc_file.

I would at the very least like to see a button that would allow you to manually push the changes. Lately I've had to comment out the call to this function when adding files because it simply takes too long to scan all of our files, which number in the thousands.

longwave’s picture

landry’s picture

as proposed in http://drupal.org/node/1065664, maybe uc_file_refresh() could be migrated to a drupal batch, called upon admin decision.

torgosPizza’s picture

+1 to that idea. Definitely needs to happen.

landry’s picture

Thought a bit about it... here's how i see things :
- uc_file_refresh() could take an optional subdir argument to avoid scanning everything under uc_file_base_dir
- i'm not sure it's a good idea to have it automatically triggered, or it should be optional. (ie 'do you want ubercart to automatically scan the dir ?')
- a form in the admin part could trigger the call, with a field for the optional subdir to scan

Now, for the batch itself :
- for prune_files(), there should be only one prune_db() call, that requires a new prune_db_multiple() func taking an array of fids. It's better imo to delete X records at once rather than deleting only 1 record X times.
- prune_files() could be one batch operation, only gathering fids to prune
- as for gather_files(), there should be a batch operation gathering filenames under uc_file_base_dir, and only doing that
- and then each filename should be handled in a separate batch operation (or rather 100 by 100 using array_splice ?), calling module_invoke_all too
- fids/filenames could be passed between operations through $context['sandbox'] array

Thoughts ? Does it make sense ?

longwave’s picture

Component: Code » File downloads
Priority: Critical » Major

Changing component.

longwave’s picture

#815380: Add index to uc_files.filename helped speed up _uc_file_gather_files() for me by about 60%, and #885010: Move uc_file File Downloads form to "Files" tab removes the refresh from the user edit page. Let's get #885010: Move uc_file File Downloads form to "Files" tab fixed and then think about improving this further.

It would probably be best to make the refresh only occur when an admin decides it's necessary (or when the files path setting is changed), and add error checking to deal with files that have been deleted since the last refresh when a user tries to view or download.

torgosPizza’s picture

Awesome. I agree with your last part, and especially about waiting until the tab is changed. Now that we're on better hosting it hasn't been an issue, but every day we get more products and could be edging closer towards having the issue again :) Thanks!

sgriffin’s picture

Please add this! All image/imagecache files are getting scanned currently if your UC files are the original images.
Maybe a config textfield for lines containing directories to omit would be good.

// - uc_file_refresh() could take an optional subdir argument to avoid scanning everything under uc_file_base_dir

longwave’s picture

Status: Active » Closed (works as designed)

Not sure this is really needed now all the issues mentioned in #10 are fixed, nobody has reported performance problems since then.