Cache PDF file in server
Super G - July 11, 2009 - 15:20
| Project: | Printer, e-mail and PDF versions |
| Version: | 6.x-1.7 |
| Component: | User interface |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | won't fix |
Jump to:
Description
I would like to generate PDF versions of the stories on my site. My stories are static, and it makes more sense to generate a PDF file from a story once and save it on the server than it does to generate it on-the-fly each time the file is requested. Are the developers of this module planning to provide an option that allows for the saving of generated PDF files on the server, rather than having them streamed to the user?
My apologies if this has been covered elsewhere or is too far beyond the scope of this module.

#1
No. What you're referring would be to develop some kind of 'cache' of PDF files. This could be done, but it would require tracking the revision date of the node and the date of generation of the latest PDF, to invalidate the 'cached' copy in case there has been any changes to the node since the last time that the PDF was generated.
This would save some seconds of PDF generation time, but would waste a lot of space in the hard disk. Since in most situations the only limit is the hard disk space, this feature is probably not useful to the vast majority.
#2
I run a fairly high traffic site. I am concerned that potentially hundreds of users trying to generate PDFs at once would significantly impact the performance and reliability of my site. I am interested in maintaining a repository of PDFs anyway, so the disk space is not an issue for me.
What I'm thinking now is that I would have the saving of a node trigger the generation of the PDF, which would then be saved on the server. This seems different enough from "Printer, e-mail and PDF versions" to warrant a separate, albeit fairly simple module.
Thanks for the feedback.
#3
Makes a lot of sense to me, Super G. Not a 'cache', but an archive of snapshots, in PDF, in parallel to the DB contents generating the nodes, which is dynamic and inherently lacks snapshots in the typical hostnig MySQL situation. I agree with you: disk space is cheap and responsive relative to server CPU in the typical shared hosting evironment, particularly with Drupal which is so DB query intensive.
#4
I think that generating a PDF of all nodes would both slow down the site too much and take up too much space. Unless, of course your users would really like to download all nodes as PDF.
I see these options:
Option 1: Saving all nodes as PDF
- Pros:
Faster for all users
- Cons:
Slower site as each node editing action forces a PDF generation
Huge disk space waste as PDFs are stored for all nodes, including nodes not accessed in several weeks/months
Option 2: PDF 'cache'
- Pros:
Faster for most users
Smaller disk space usage
Ability to configure cache size to leave only the top accessed PDFs
- Cons:
No speed-up for the first user to ask for the PDF of a node modified recently
#5
Well, every site is different. For me, server performance and reliability is paramount, and I have a lot of disk space, so I'm going with the "save all nodes as PDF" approach. I wrote a small custom module triggers a "generate PDF" action when a node is saved. Right now, the module is specific to my site, but if someone wants to help me make it worthy of being listed on drupal.org, drop me a line.
#6
You don't understand..
Your solution makes your server LESS performant. Unless of course, you rarely create/edit nodes, and all your users all downloading several PDFs all the time. Think about it, and you'll probably reach the same conclusions.
#7
Hi SuperG. I am interested in this small module which trigger the generation of PDF when the node is saved. I am currently implementing a site where the content is mostly static, eventually it should be possible to modify your trigger such that it will allow an administrattor to request the generation of PDF for specific node types. If you can share your module I will look into extending it to allow generation of content "by request" of administrator (or other user with sufficient permission rights).
#8
If the node are rarely edited, it is the better solution.
For me, by example, it is the case.
Why don't provide this functionality as on option for advertised user.
for example we could choose between
1- normal mode
2- generated on creation/update of the node
3- generated on the fly for the first time and by update later
It just need to prevent the administrator on what he's doing.
I don't understand your opinion jcnventura an administrator is not a baby.
#9
No.. I'm not a baby but I'm going to act like one on this issue, because you are and you need to learn some manners.
Understand two things:
1. You're using software for FREE that is designed by me in my free time.
2. I decide what I do in my free time.
So, if you want this one, this feature now costs you 5000€ (five K euros). Normally, I'm willing to accept patches that other people do and keep on maintaining those for free. In this one case, even if that happens, I won't accept it. You can, of course, fork the module, and keep on maintaining it yourself.. It would actually create more free time for me, so I would welcome it.
João
#10
Oups, maybe it is my English, so I want to rectify.
When I say "an administrator is not a baby", it is not for you, but for the people who use "your" module.
I wanted to post a patch and I thinked that you were opposite to this. that's why i have said "I didn't understand your opinion".
If you've been hurted by what i have written, sorry it wasn't my intention.
I'm a developper, it was to make advance "Your" project which is used by many people on Drupal.
5000 euros lol
Peace and THX A LOT FOR YOUR WORK !!!
#11
Thanks for clearing it up.. Yes, it really annoyed me at the time, as the phrase only made sense to me as an insult.
The main reason why I don't provide such an option is not that I am fundamentally against it, as I think it would be extremely useful and increase performance in sites that use the PDF functionality. It's the lack of time to do it, and the fact no-one has done it for me. Of course, if someone did sponsor this it would probably move up in the list of stuff to do in my free time :)
It's easy to add a node API hook and have the module create a PDF file each time the module is edited/created. Doing that however, would slow down the node creation/editing process that in extreme cases it may lead to PHP timeouts with unknown consequences (the worst being loss of the node contents). Also, Some kind of interface must be provided to enable the user to re-create all the nodes when a newer/better version of the PDF tool is released. So, on this option, the best would be to use the Job Queue to schedule the creation of these PDFs during the next cron run (of course, if a user were to ask for the PDF before the cron execution, then he would get a delay, and the cron job would now be irrelevant).
As this could eventually lead to several gigabytes of wasted space on the server, I would prefer the cache option, whereas the first user to access a PDF would get the delay, but all the others would just download the cached copy... This would allow the PDF 'store' to be configurable and all the usuallly-used PDFs would be instantly available. The problem with this is that handling a file cache is something that would probably benefit from a third-party module, but the one available (http://drupal.org/project/fastpath_fscache) doesn't seem to be actively maintained on a regular basis, and is used by almost no-one.
João