Currently every time a PDF is generated with wkhtmltopdf, the print module does a version check on the selected wkhtmltopdf binary. On New Relic it was shown that this version check was accountable for approximately 14 - 24% of the time taken to generate the PDF. I am proposing that a variable is added which stores the manually entered version number to prevent this expensive operation. In the patch (in the first comment so I could get an issue number) I've added a text field to the print_pdf admin form where the version number can be entered, or left blank (default, and recommended) to continue to autodetect at run time.
I set up a Drupal 7.17 site with the standard installation profile, devel, the current print module 7.x-1.x-dev, and wkhtmltopdf version 0.11.0 rc2 (located in sites/all/libraries).
The image below shows the performance improvement as reported by xhprof on an HP Microserver.
The 400ms improvement is the same decrease in time which I was seeing on a VMWare VPS.
Attached are callgraphs from xhprof for the unpatched print module (callgraph-dev-unpatched.png), the patched module with a version string set (callgraph-dev-patched-version-set.png) and the patched module without a version string set (callgraph-dev-patched-version-unset.png). These are roughly representative of the roughly 6 runs I did of each. The diff callgraph (callgraph-diff.png) shows the difference between the unpatched module and the patched module with the version set.
The case where the patched module is used without a version string set did not show any performance degradation on my set up.
Edit: The changes the patch makes should also be backwards compatible, and the introduction of the new setting defaults to the old behaviour.
Comment | File | Size | Author |
---|---|---|---|
#3 | print-wkhtmltopdf-performance-1867656-3.patch | 3.33 KB | Phizes |
#2 | print-wkhtmltopdf-performance-1867656-2.patch | 3.59 KB | Phizes |
#1 | print-wkhtmltopdf-performance-1867656-0.patch | 2.89 KB | Phizes |
callgraph-dev-patched-version-unset.png | 251.21 KB | Phizes | |
callgraph-dev-patched-version-set.png | 259.61 KB | Phizes |
Comments
Comment #1
Phizes CreditAttribution: Phizes commentedThe patch for the issue above.
Comment #1.0
Phizes CreditAttribution: Phizes commentedUpdate to say patch is in first comment, as it seems I can't edit to add files.
Comment #2
Phizes CreditAttribution: Phizes commentedUpdated patch, I had forgotten the variable_del() in hook_uninstall().
Comment #3
Phizes CreditAttribution: Phizes commentedUpdated patch, removed the accidental change of file modes.
Comment #4
jcnventura CreditAttribution: jcnventura commentedI don't like the patch as proposed.. What about the following solution:
1. Call version as is once, and store the stat() of the file executed/called as:
type|filename|size|mtime|version detected
2. Next time that the *_version() function is called, call stat() again and if the filename, size and mtime are the same reuse the version stored.
3. Else execute current code and update the stored data
If the stat() calls are too expensive, one possibility is to add a no-stat option that will simply skip the stat() in step 2 when used inside PDF generation code. The stat(), or even the full version code should always be used during the hook_requirements() version call. This would improve performance at the expense of correct version data, but I guess most installations aren't updating the PDF libs every day.
Comment #5
Phizes CreditAttribution: Phizes commentedYour proposed solution is much more sensible. I do have an idea of the implementation, though I am somewhat involved with work currently. I will be using the print module in the upcoming weeks, but I'm not sure if I will have the time to work in an updated patch just yet.
I would definitely implement the no-stat() option as the latest released wkhtmltopdf binary is almost 2 years old, and I try to avoid stat() calls whenever possible.
Thanks for the input.
Comment #5.0
Phizes CreditAttribution: Phizes commentedAdded in extra information.
Comment #8
jcnventura CreditAttribution: jcnventura commentedI've implemented a simpler version of the no-stat that I proposed in #4.
The module is mostly working as before, except that it uses the stored version information during the PDF generation. It will refresh the version information on the pdf settings page and in the status report page.. It also deletes it when flushing caches, which should cover any likely case where the binary gets updated.