Improve efficiency of options in select.inc

DamienMcKenna - April 4, 2009 - 00:24
Project:Webform
Version:6.x-2.6
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs work
Description

Given that there are several duplicates of the same issue, but none that stood out to cover all parts of the problem, I wanted to create this "Master Issue" that could be used to cover all parts of this optimization & configuration effort.

Many users have complained of problems exporting aka downloading the results/submissions from webforms, either as CSV/delimited files or as Excel spreadsheets with either error messages or more commonly just blank screens after several minutes waiting. Further, the watchdog log / dblog may not record any error messages.

There are two parts to this problem:

  1. memory limits building the output
  2. timeouts because it takes too long to finish building the output

Solving Problem 1: Memory Limits

Resolving memory limits may be as simple as adjusting the .htaccess file where it says:

# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
  php_value magic_quotes_gpc                0
  php_value register_globals                0
  php_value session.auto_start              0
  php_value mbstring.http_input             pass
  php_value mbstring.http_output            pass
  php_value mbstring.encoding_translation   0
</IfModule>

just add the following extra lines before the line:
  # New settings:
  php_value memory_limit 128M

Unfortunately some web hosts may limit your control over this and may not allow the memory_limit value be configured through a .htaccess file, in which case it may have to be modified in the system's primary php.ini file. The standard system requirements call for this to be 16M (i.e. 16 megabytes) but you may find a need for it to be 64M, 96M or even 128M depending on the number of records or fields in the webform.

Solving Problem 2: Timeout

The PHP configuration variable max_execution_time controls how long the PHP parser will allow a script to run before being forcibly terminated. By default this is set to 30 seconds, but depending on the number of records and fields this will probably not be sufficient. Similarly to the memory limit, this can be configured through the .htaccess file, as follows:

  # New settings:
  php_value max_execution_time 180 #seconds

Unfortunately, even this may not be sufficient as it may still take a very long time to complete.

Proposed Solution

The output generation is controlled by the webform_results_download() function in webform_report.inc which generates a delimited file in the desired format. Tracing through the file, the initial lines run rather quickly but the

<?php
"foreach ($submissions..."
?>
block can still take several hundred miliseconds to save each record, which for a heavily used webform can still cause a timeout even with a generous max_execution_time. This structure needs to be optimized to be much faster.

#2

DamienMcKenna - April 4, 2009 - 05:21

I added a ton of microtimer() statements to the webform_results_download() function and recorded the execution time taken to output ten records on my MacBook Pro. The top row indicates what the field was to store, the second indicates the time taken to output that one cell; the first column contains the row number, second-last column indicates how long the fwrite() call took, the last column indicates how long the processing of the entire record took; the State select field stores a list of all US States, Canadian provinces and US outlying territories grouped with optgroups, while the Country select stores a list of ~240 countries.

As you can see, textfield and email fields fly through in no time, but selects take a very long time, especially if there are a lot of predefined options to check against.

AttachmentSize
webform-n423350-timer_logs.txt 0 bytes

#3

DamienMcKenna - April 4, 2009 - 05:23

Oops, I renamed the file before uploading..

AttachmentSize
webform-n423350-timer_logs.csv_.txt 1.7 KB

#4

DamienMcKenna - April 4, 2009 - 14:12

Here's a patch for _webform_filter_values() that keeps track of whether it has already loaded the user profile, which saves on a large number of unneeded queries.

AttachmentSize
webform-n423350-load_profile_user_once.patch 1.31 KB

#5

DamienMcKenna - April 4, 2009 - 14:56

FYI the first patch has boosted the local execution time from ~0.7s per row to ~0.29s, as seen in the attached CSV.

AttachmentSize
webform-n423350-timer_logs2.csv_.txt 1.74 KB

#6

DamienMcKenna - April 4, 2009 - 15:01

Another patch, this one keeps a static array of the processed options variable from components/select.inc, indexed by the md5sum of the unprocessed options variable.

AttachmentSize
webform-n423350-cache_processed_select_options.patch 3.5 KB

#7

DamienMcKenna - April 4, 2009 - 15:02

FYI the second patch has given an approximate 100:1 speed optimization, as seen in the attached CSV.

AttachmentSize
webform-n423350-timer_logs3.csv_.txt 1.74 KB

#8

DamienMcKenna - April 4, 2009 - 15:12
Status:active» needs review

I've tested both of the patches on a copy of our production database and I can now successfully output a 6877 record webform in a matter of minutes versus it timing out after ten minutes having barely gotten half-way. Please test the patches and provide feedback. Thank you.

#9

DamienMcKenna - April 4, 2009 - 15:22
Title:Timeouts and memory errors exporting/downloading CSV/Excel spreadsheet of results» Timeouts and memory errors exporting/downloading CSV/Excel spreadsheet of results (D5, D6)

Please also test the attached cumulative patch for Dupal 5.

AttachmentSize
webform-n423350-drupal5.patch 4.83 KB

#10

DamienMcKenna - April 5, 2009 - 21:07

Have identified a problem stemming from not separating the cached select lists based on whether they were set to be $flat or not. Here's a new patch for D6 that takes this into account.

AttachmentSize
webform-n423350-cache_processed_select_options_v2.patch 3.6 KB

#11

DamienMcKenna - April 5, 2009 - 21:08

An updated patch for D5 based on the bug discovered for comment 10.

AttachmentSize
webform-n423350-drupal5_v2.patch 4.94 KB

#12

DamienMcKenna - April 5, 2009 - 21:16

FYI the problem identified in comment 10 would only be a problem when using option groups (optgroup).

#13

leenwebb - April 6, 2009 - 21:19

FWIW, my issue at #407460: "Download" results link leads to blank screen is NOT a timeout issue. My webform had 5 fields, and only 4 submissions -- I'm pretty certain that if the server can handle views and aggregator and all of the other parts of that site, it's not a memory error that leaves this screen blank.

#14

DamienMcKenna - April 7, 2009 - 00:16

Just to see, please try the patch and see if it makes a difference.

#15

leenwebb - April 8, 2009 - 15:25

Oooh, turns out I had some other problem. I tried to apply the patch and got a big FAIL message. So I re-extracted the webform module and now everything works perfectly (sans patch). Thanks for your help!

(I always think it is so weird when that happens -- I expect that if some file gets corrupted I will end up with big PHP errors and a huge mess, not some stealth unobtrusive error.)

#16

DamienMcKenna - April 8, 2009 - 16:17

leenwebb: What was the error message? D5 or D6? I patched against the relevant branch in CVS rather than the release versions.

#17

leenwebb - April 8, 2009 - 16:31

D5; it was a patch error that said that Chunk 2 had failed during the webform.module patching. I figured that since I hadn't changed anything in the actual webform module, something was amiss! Hence the re-extraction of a re-downloaded zip and now everything works as expected.

#18

fredcy - May 6, 2009 - 18:55

I came up with a similar patch (to #4) before seeing this issue, and for me it alone speeds things up enough to avoid the PHP timeout problem.

#19

jakraska - May 28, 2009 - 16:53
Version:» 6.x-2.6

I applied the patches in #4 and #6 and it fixed my timeout issue - Havent noticed any bad effects yet. Thanks for the patch

#20

DamienMcKenna - May 28, 2009 - 17:08

jakraska: thanks. Maybe we'll be able to twist Quicksketch's arm into pushing out an update to lock it in? :)

#21

quicksketch - May 31, 2009 - 22:23
Status:needs review» needs work

I definitely like the improvements to _webform_filter_values(). There's no good reason to load that multiple times. The changes to select.inc look extremely suspicious though. Right now there's accounting in the static variable for "$flat = TRUE" and "$flat = FALSE", but it doesn't account for multiple select lists that are both flat or both not flat. It looks like this would erroneously return options for the first select list in the form for all the remaining select lists. Perhaps the $cid needs to be passed into the select list options.

A small code-style issue also: comments should be formatted as sentences, starting with capital letters and ending with periods.

#22

quicksketch - November 6, 2009 - 00:55
Title:Timeouts and memory errors exporting/downloading CSV/Excel spreadsheet of results (D5, D6)» Improve efficiency of options in select.inc

Just a note that caching to _webform_filter_values() have been included in the 2.7 version. The select list options would still be good to cache, but they still require work per my comments in #21.

#23

DamienMcKenna - November 6, 2009 - 05:59

Thanks, I'll look into it.

 
 

Drupal is a registered trademark of Dries Buytaert.