Currently every time a PDF is generated with wkhtmltopdf, the print module does a version check on the selected wkhtmltopdf binary. On New Relic it was shown that this version check was accountable for approximately 14 - 24% of the time taken to generate the PDF. I am proposing that a variable is added which stores the manually entered version number to prevent this expensive operation. In the patch (in the first comment so I could get an issue number) I've added a text field to the print_pdf admin form where the version number can be entered, or left blank (default, and recommended) to continue to autodetect at run time.

I set up a Drupal 7.17 site with the standard installation profile, devel, the current print module 7.x-1.x-dev, and wkhtmltopdf version 0.11.0 rc2 (located in sites/all/libraries).

The image below shows the performance improvement as reported by xhprof on an HP Microserver.
Print module performance, patched vs unpatched.

The 400ms improvement is the same decrease in time which I was seeing on a VMWare VPS.

Attached are callgraphs from xhprof for the unpatched print module (callgraph-dev-unpatched.png), the patched module with a version string set (callgraph-dev-patched-version-set.png) and the patched module without a version string set (callgraph-dev-patched-version-unset.png). These are roughly representative of the roughly 6 runs I did of each. The diff callgraph (callgraph-diff.png) shows the difference between the unpatched module and the patched module with the version set.

The case where the patched module is used without a version string set did not show any performance degradation on my set up.

Edit: The changes the patch makes should also be backwards compatible, and the introduction of the new setting defaults to the old behaviour.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Phizes’s picture

Status: Active » Needs review
FileSize
2.89 KB

The patch for the issue above.

Phizes’s picture

Issue summary: View changes

Update to say patch is in first comment, as it seems I can't edit to add files.

Phizes’s picture

Updated patch, I had forgotten the variable_del() in hook_uninstall().

Phizes’s picture

Updated patch, removed the accidental change of file modes.

jcnventura’s picture

Status: Needs review » Needs work

I don't like the patch as proposed.. What about the following solution:

1. Call version as is once, and store the stat() of the file executed/called as:
type|filename|size|mtime|version detected
2. Next time that the *_version() function is called, call stat() again and if the filename, size and mtime are the same reuse the version stored.
3. Else execute current code and update the stored data

If the stat() calls are too expensive, one possibility is to add a no-stat option that will simply skip the stat() in step 2 when used inside PDF generation code. The stat(), or even the full version code should always be used during the hook_requirements() version call. This would improve performance at the expense of correct version data, but I guess most installations aren't updating the PDF libs every day.

Phizes’s picture

Your proposed solution is much more sensible. I do have an idea of the implementation, though I am somewhat involved with work currently. I will be using the print module in the upcoming weeks, but I'm not sure if I will have the time to work in an updated patch just yet.

I would definitely implement the no-stat() option as the latest released wkhtmltopdf binary is almost 2 years old, and I try to avoid stat() calls whenever possible.

Thanks for the input.

Phizes’s picture

Issue summary: View changes

Added in extra information.

  • jcnventura committed b5c4f5f on 7.x-2.x
    Issue #1867656 by jcnventura: Improve wkhtmltopdf PDF generation...

  • jcnventura committed 719f08c on 7.x-2.x
    Issue #1867656 by jcnventura: Clear cached wkhtmltopdf version during cc...
jcnventura’s picture

Status: Needs work » Fixed

I've implemented a simpler version of the no-stat that I proposed in #4.

The module is mostly working as before, except that it uses the stored version information during the PDF generation. It will refresh the version information on the pdf settings page and in the status report page.. It also deletes it when flushing caches, which should cover any likely case where the binary gets updated.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.