Given that there are several duplicates of the same issue, but none that stood out to cover all parts of the problem, I wanted to create this "Master Issue" that could be used to cover all parts of this optimization & configuration effort.
Many users have complained of problems exporting aka downloading the results/submissions from webforms, either as CSV/delimited files or as Excel spreadsheets with either error messages or more commonly just blank screens after several minutes waiting. Further, the watchdog log / dblog may not record any error messages.
There are two parts to this problem:
- memory limits building the output
- timeouts because it takes too long to finish building the output
Solving Problem 1: Memory Limits
Resolving memory limits may be as simple as adjusting the .htaccess file where it says:
# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
php_value mbstring.http_input pass
php_value mbstring.http_output pass
php_value mbstring.encoding_translation 0
</IfModule>
just add the following extra lines before the line:
# New settings:
php_value memory_limit 128M
Unfortunately some web hosts may limit your control over this and may not allow the memory_limit value be configured through a .htaccess file, in which case it may have to be modified in the system's primary php.ini file. The standard system requirements call for this to be 16M (i.e. 16 megabytes) but you may find a need for it to be 64M, 96M or even 128M depending on the number of records or fields in the webform.
Solving Problem 2: Timeout
The PHP configuration variable max_execution_time controls how long the PHP parser will allow a script to run before being forcibly terminated. By default this is set to 30 seconds, but depending on the number of records and fields this will probably not be sufficient. Similarly to the memory limit, this can be configured through the .htaccess file, as follows:
# New settings:
php_value max_execution_time 180 #seconds
Unfortunately, even this may not be sufficient as it may still take a very long time to complete.
Proposed Solution
The output generation is controlled by the webform_results_download() function in webform_report.inc which generates a delimited file in the desired format. Tracing through the file, the initial lines run rather quickly but the "foreach ($submissions..." block can still take several hundred miliseconds to save each record, which for a heavily used webform can still cause a timeout even with a generous max_execution_time. This structure needs to be optimized to be much faster.
| Comment | File | Size | Author |
|---|---|---|---|
| #43 | 423350-webform-select-cache-6-2-9.patch | 3.2 KB | mrfelton |
| #38 | webform_select_cache_6_2_9.patch | 3.15 KB | EmanueleQuinto |
| #37 | webform_select_cache_6_2_9.patch | 3.14 KB | EmanueleQuinto |
| #32 | webform_select_cache.patch | 3.54 KB | quicksketch |
| #31 | webform_select_cache-6.x-3.0-beta2-v3.patch | 3.8 KB | leprechau |
Comments
Comment #1
quicksketchThanks Damien for taking initiative on this. I marked the other issues you found as duplicates.
#273885: blank page when downloading results (memory exhausted)
#392290: Submissions: Modifying CSV download function, or SQL statement
#323493: Cannot download results of webform
#407460: "Download" results link leads to blank screen
Comment #2
damienmckennaI added a ton of microtimer() statements to the webform_results_download() function and recorded the execution time taken to output ten records on my MacBook Pro. The top row indicates what the field was to store, the second indicates the time taken to output that one cell; the first column contains the row number, second-last column indicates how long the fwrite() call took, the last column indicates how long the processing of the entire record took; the State select field stores a list of all US States, Canadian provinces and US outlying territories grouped with optgroups, while the Country select stores a list of ~240 countries.
As you can see, textfield and email fields fly through in no time, but selects take a very long time, especially if there are a lot of predefined options to check against.
Comment #3
damienmckennaOops, I renamed the file before uploading..
Comment #4
damienmckennaHere's a patch for _webform_filter_values() that keeps track of whether it has already loaded the user profile, which saves on a large number of unneeded queries.
Comment #5
damienmckennaFYI the first patch has boosted the local execution time from ~0.7s per row to ~0.29s, as seen in the attached CSV.
Comment #6
damienmckennaAnother patch, this one keeps a static array of the processed options variable from components/select.inc, indexed by the md5sum of the unprocessed options variable.
Comment #7
damienmckennaFYI the second patch has given an approximate 100:1 speed optimization, as seen in the attached CSV.
Comment #8
damienmckennaI've tested both of the patches on a copy of our production database and I can now successfully output a 6877 record webform in a matter of minutes versus it timing out after ten minutes having barely gotten half-way. Please test the patches and provide feedback. Thank you.
Comment #9
damienmckennaPlease also test the attached cumulative patch for Dupal 5.
Comment #10
damienmckennaHave identified a problem stemming from not separating the cached select lists based on whether they were set to be $flat or not. Here's a new patch for D6 that takes this into account.
Comment #11
damienmckennaAn updated patch for D5 based on the bug discovered for comment 10.
Comment #12
damienmckennaFYI the problem identified in comment 10 would only be a problem when using option groups (optgroup).
Comment #13
leenwebb commentedFWIW, my issue at #407460: "Download" results link leads to blank screen is NOT a timeout issue. My webform had 5 fields, and only 4 submissions -- I'm pretty certain that if the server can handle views and aggregator and all of the other parts of that site, it's not a memory error that leaves this screen blank.
Comment #14
damienmckennaJust to see, please try the patch and see if it makes a difference.
Comment #15
leenwebb commentedOooh, turns out I had some other problem. I tried to apply the patch and got a big FAIL message. So I re-extracted the webform module and now everything works perfectly (sans patch). Thanks for your help!
(I always think it is so weird when that happens -- I expect that if some file gets corrupted I will end up with big PHP errors and a huge mess, not some stealth unobtrusive error.)
Comment #16
damienmckennaleenwebb: What was the error message? D5 or D6? I patched against the relevant branch in CVS rather than the release versions.
Comment #17
leenwebb commentedD5; it was a patch error that said that Chunk 2 had failed during the webform.module patching. I figured that since I hadn't changed anything in the actual webform module, something was amiss! Hence the re-extraction of a re-downloaded zip and now everything works as expected.
Comment #18
fredcy commentedI came up with a similar patch (to #4) before seeing this issue, and for me it alone speeds things up enough to avoid the PHP timeout problem.
Comment #19
jakraska commentedI applied the patches in #4 and #6 and it fixed my timeout issue - Havent noticed any bad effects yet. Thanks for the patch
Comment #20
damienmckennajakraska: thanks. Maybe we'll be able to twist Quicksketch's arm into pushing out an update to lock it in? :)
Comment #21
quicksketchI definitely like the improvements to _webform_filter_values(). There's no good reason to load that multiple times. The changes to select.inc look extremely suspicious though. Right now there's accounting in the static variable for "$flat = TRUE" and "$flat = FALSE", but it doesn't account for multiple select lists that are both flat or both not flat. It looks like this would erroneously return options for the first select list in the form for all the remaining select lists. Perhaps the $cid needs to be passed into the select list options.
A small code-style issue also: comments should be formatted as sentences, starting with capital letters and ending with periods.
Comment #22
quicksketchJust a note that caching to _webform_filter_values() have been included in the 2.7 version. The select list options would still be good to cache, but they still require work per my comments in #21.
Comment #23
damienmckennaThanks, I'll look into it.
Comment #24
quicksketchUnless there's further interest in completing these changes to the select list options, this will probably go unchanged. The caching in _webform_filter_values() already has corrected the bulk of the problem this issue was trying to solve.
Comment #25
damienmckennaI've not been able to get time to work on this since last Summer, I'll try out the latest release and see if there are any other improvements I can submit.
Comment #26
leprechau commentedI would like to take a look at this issue some more. I am currently running webform-6.x-3.0-beta2 and routinely have 30k+ submissions to my monthly sweepstakes webform. If I disable the output of my two option selects for country and state/province I can download all results with just a few seconds total processing time. However, if I enable both of those selects the process times out even with 512M max script size and 600 seconds max execution time which just seems totally ridiculous. This is definitely something that needs more work.
Comment #27
leprechau commentedI just modified select.inc based somewhat on the earlier patch in #10 taking into account the comments in #21 and decided to just not cache non-flat select lists. I can now successfully generate and download a webform with 40k results resulting in a 9.5MB csv with just a few seconds parsing time. I am going to go ahead and make a patch and post it. It would be great to get this included in 3.0 final as I would almost consider it a bug.
Comment #28
leprechau commentedHere is the patch I am currently using...it is working fine for me both on webform display and when downloading results. Please comment on your thoughts.
Comment #29
quicksketchThanksk leprechau! I don't understand why $flat determines when an item is cached or not. For select lists, $flat = FALSE, but for both radios and checkboxes this is TRUE. So we're only caching for select lists? This patch also deviates pretty far from coding standards, if you could clean it up a bit that would be great.
Comment #30
leprechau commentedYes, I was only interested in caching select lists. I can't see a situation where someone would have dozens of check/radio boxes listed down a page. I think most people when listing several dozen options would use a select list. I may be incorrect but it seemed rational to me.
I will re-read the drupal coding standards page and make changes where appropriate.
Comment #31
leprechau commentedOkay after some input from quicksketch on irc I think this should be committable. The code is cleaner and less involved than previously suggested patches.
Comment #32
quicksketchThanks leprechau, I shortened up the variable name for readability (just $options until the end of the loop, then assign it to $cached_options[$flat][$md5]). Committed to both 3.x branches.
Comment #33
leprechau commentedLooks great, thank you.
Comment #35
Jorrit commentedWould it be possible to apply those changes to 6.x-2.x as well?
Comment #36
ldweeks commentedI'm running a site with a webform that currently has just over 500 submissions. It's a large form, and I get a fatal error when I try to download a CSV file.
I would also be very interested in a port of #32 to the 2.x branch. Thanks for your work!
Comment #37
EmanueleQuinto commentedThis patch is a backport of #32 and apply to Webform 6.x-2.9 (select.inc, v 1.22.2.30) and, at least, works for us.
Please disregard this attached patch, small typo at the end
Comment #38
EmanueleQuinto commentedThis should work.
Comment #39
alanburke commentedSubscribe
Comment #40
alanburke commentedPatch at 38 is working fine for me on a 6.2.9 install.
Comment #41
machi27 commentedhi, could anyone pls post the file itself (select.ini patched) for version 6.2.9, i dont have the tools to patch it here
thanks
Comment #42
quicksketchThis will not be backported, I suggest upgrading to 3.x now that it is the recommended/stable release.
Comment #43
mrfelton commentedHere is the patch from #38 against 6.x-2.x that will apply from drush make, for those that need it.
Comment #44
luthien commentedI'm having the same problem with 6.x-3.18 version. Blank page for table view and download results view. Only 800 submissions, the form has several select.
memory_limit = 128M
max_execution_time = 30 (180 same results)