The ability to support PDF docs, to convert them into JPG, would be great.

(this would probably require support with imagefield or filefield as well)

Drupal 7 Solutions:
As this was closed as "wont fix", there are two potential solutions now for Drupal 7:

* http://drupal.org/project/pdf (Embeds PDFs in a page using HTML5 and JavaScript)
* http://drupal.org/project/pdf_to_imagefield (Converts PDFs to images)

Comments

duntuk’s picture

Category: task » feature
egfrith’s picture

I'm interested in this issue, as I'm looking into writing some code to create a preview of pdf document uploaded in a filefiled, and making these available to views.

The conversion would require imagemagick to be installed, as gd can't convert pdf to jpg.

egfrith’s picture

Hmmm... looks like PDF to jpg conversion isn't going to happen in imagefield module: #339266: Feature Request: Convert PDF to image. Perhaps in filefield module? Or as a contrib module? There is also the pdfstamper module, though this is not yet views enabled, and does more than I really want: #391308: Future direction of the module.

TyraelTLK’s picture

Subscribing

egfrith’s picture

To get this to work, a sequence of patches to imageapi and imagecache modules is required:

  1. imageapi module: create an imageapi_image_get_info() function to replace the use of getimage() in imagecache: #416254: Add equivalent of image_get_info() at the toolkit level
  2. imagecache module: use imageapi_image_get_info() instead of getimage(). No patch yet; for now replace getimagesize($src) around line 412 in imagcache module with imageapi_image_get_info($src).
  3. imageapi module: the patch at #375218: Changing file type with imagemagick needs to be applied, so that imageapi_imagemagick saves pdf files which have been converted to a browser-viewable format (e.g. jpeg or png) with a .pdf extension.

Then, create an imagecache preset which contains the "Change File Format" action from imagecache_coloractions module. You can specify that the pdf (or any other type) is converted to jpeg, png or gif.

To move the work on this forward, reviews are needed of #416254: Add equivalent of image_get_info() at the toolkit level.

fei’s picture

Subscribing (thanks for the support)

egfrith’s picture

I've got this working - just. I've edited comment #5 so that it gives up-to-date instructions.

egfrith’s picture

StatusFileSize
new1.21 KB

There is now a patch for imagecache. Here are updated instructions for testing:

1. imageapi module: apply latest 6.x patch at #416254: Add equivalent of image_get_info() at the toolkit level
2. imagecache module: apply patch attached here.
3. imageapi module: apply patch at #375218: Changing file type with imagemagick

Then, create an imagecache preset which contains the "Change File Format" action from imagecache_coloractions module. You can specify that the pdf (or any other type) is converted to jpeg, png or gif.

Alex Andrascu’s picture

All ok after applying patches on step 1 and 2 at #8 but the last one fail against version. Please review.

egfrith’s picture

I've just tested this with the latest -dev version of imagapi. The last patch applies OK for me, though with an offset:

$ patch -p0 < imageapi_375218-2.patch
patching file imageapi/imageapi_imagemagick.module
Hunk #1 succeeded at 111 (offset 9 lines).

Does it apply at all for you?

Alex Andrascu’s picture

I've applyed the patches in the order you describe in #8 with TortoiseSVN. Maybe that's why. Anyhow i've applyed it by hand and it seems it working. Now i don't know how to use all this stuff to write a IM raw command to try tiff->jpg conversion.

Thanks for the blitz reply :)

[Update]

I figure that we shall do a cumulative patch with

#416254: Add equivalent of image_get_info() at the toolkit level
#375218: Changing file type with imagemagick

for this to work without errors.

egfrith’s picture

Have you tried using imagecache_actions.module (as described at the end of #8)? It may not be how you want to do things in the long run, but it would confirm whether things are working. I'd be interested to know!

Re the patch, once you've confirmed things are working, perhaps it would make sense to post a new combined patch to #416254: Add equivalent of image_get_info() at the toolkit level

Alex Andrascu’s picture

I guess we're very close now...i just lack some imagemagick skills

 ImageMagick command: /usr/bin/identify -format "%w %h %m" 'sites/default/files/ads/angel_copy_0.tif'
ImageMagick output: 626 926 TIFF 
ImageMagick command: /usr/bin/identify -format "%w %h %m" 'sites/default/files/ads/angel_copy_0.tif'
ImageMagick output: 626 926 TIFF 
ImageMagick command: /usr/bin/convert 'sites/default/files/ads/angel_copy_0.tif[0]' -resize 50x74! -colorspace RGB -quality '90' -append 'jpeg:sites/default/files/imagecache/thumbnail_100X100/ads/angel_copy_0.tif'
ImageMagick output: 

[UPDATE]

Holly molly this is workin' :)

It just doesn't append the .jpg at the end of the file
It creates a jpg with the .tif extension. Wonder where's the problem.

egfrith’s picture

Great! Yes, the file has to have the orginal extension, otherwise imagecache will think it doesn't exist. As far as I can see, this doesn't cause problems when viewing in browsers.

Alex Andrascu’s picture

No it doesn't :) But we need to fix this anyhow.

schildi’s picture

Not sure if this hint is helpful for your project, but
- converting PDF to JPG will drop the complete text (no cut and paste any more)
- you will get the well known JPEG-artefacts around sharp edges

may be you will have a look at the DJVU-format which is also a raster format but preserves the text when converting from PDF. Text is still selectable. And it has some other advantages. For more background please see http://en.wikipedia.org/wiki/DJVU.

The disadvantage might be that the format is not as wide spread today.

egfrith’s picture

Thanks for your hint schildi. I hadn't thought about DJVU, which does have the advantages you say over jpeg. However, is it viewable in a browser? And can imagemagick convert to it?

Your comment also reminds me that I've had problems with the jpegs that imagemagick has produced from some PDF files. On some machines I've used (all Linux) they have either not showed in the browser, or show in a partial way. On other machines (again Linux) they have been fine. The workaround has been to convert the files to png rather than jpeg.

egfrith’s picture

@alex_andrascu: I agree fixing the filenames would be nice, but I think that it is a separate - and potentially very thorny - issue. I think I may have seen it discussed elsewhere, so it might be worth searching.

cbrody’s picture

I've got #8 to work using a CCK filefield and Views to display the imagecache converted images but the images are each displayed multiple times in the view (as many times as there are images, e.g. three images results in each being displayed three times). Any hints?

schildi’s picture

On Linux it installs with some stand alone application (converters like cjb2) and a plugin for firefox.
I checked this out and it worked well for me.
For a complete conversation cycle you can start from e.g. a jpeg or tif file and use one of the converters mentioned above to create the djvu file.
For example command lines see

http://en.wikisource.org/wiki/Help:DjVu_files

Converting form png is also described to be possible. You have probably use "convert" to get a pbm-stream and pipe the result through cjb2 (not checked).

vthirteen’s picture

subscribing

egfrith’s picture

@19 cbrody: I'm not sure that this is an issue with the code in this patch. To test whether it is, can you check that the multiple images are actually one image file? E.g. examine the HTML on the pages on which you have the multiple images displayed, and find the href of a converted image, and then view it in the browser on its own. Also, you could check the HTML of the page to make sure there aren't multiple hrefs to the same image.

If the image itself is fine, and there are multiple hrefs, perhaps there is problem with the view?

cbrody’s picture

Hi egfrith, the img src and href is the same for all the images. Seems this could be a problem with Views, as I have it set to select distinct and group multiple values. The query is as follows:

SELECT DISTINCT(node.nid) AS nid, node_data_field_menu.field_menu_data AS node_data_field_menu_field_menu_data, node_data_field_menu.nid AS node_data_field_menu_nid, node.type AS node_type, node.vid AS node_vid FROM node node LEFT JOIN content_field_menu node_data_field_menu ON node.vid = node_data_field_menu.vid WHERE node.status <> 0

agileware’s picture

Subscribing.

rachel_norfolk’s picture

subscribing

egfrith’s picture

I've merged the two imageapi patches, and fixed a problem with one of them which prevented images appearing the first time they were generated, leaving "Failed generating an image..." messages in the logs.

Here are updated instructions for using the patch:

1. imageapi module: apply latest 6.x patch at #416254: Add equivalent of image_get_info() at the toolkit level, #19
2. imagecache module: apply patch attached at #8.

Then, create an imagecache preset which contains the "Change File Format" action from imagecache_coloractions module. You can specify that the pdf (or any other type) is converted to jpeg, png or gif.

Other news: the changes to the core code now mean that this functionality should be in D7 with the imageapi_imagemagick module; see #269337: Support for more image types (PDF, TIFF, EPS, etc.).

Another point: it seems that some versions of Safari do have a built-in PDF viewer, so JPEG or PNG files which have a .pdf ending aren't displayed, because the built-in viewer tries to display them as PDFs. At the moment, the best guess I have about what to do about this is to implement a wrapper module for imagecache that would map URLS such as imagecache_wrapper/files/test.jpg to imagecache/files/test.pdf ... but other ideas are welcome.

rachel_norfolk’s picture

I'm wondering if there is another way to approach this that might be more flexible.

In the flashvideo module, they have a flashvideo_cck module that takes the incoming video file in one cck field, converts it and sticks the .flv result into a second cck field. The module itself hides the appropriate fields on the input form from the authors.

If we were to implement the pdf --> .jpg system in a similar way, we would have access to the original pdf and also to the resultant jpg. It may even be possible to output multiple pages of the pdf into multiple occurances of the jpg cck field.

I know what I'd like to do but I'd need guidance on how to do it. I am a willing volunteer to help, though...

rachel_norfolk’s picture

and I guess because the filename can then properly relfect the content, Safari will be okay...

egfrith’s picture

Thanks for your comments ricklawson.

At present we do have access to the original PDF - it's at a location like /files/original.pdf . The problem is that the file ending of the resultant JPEG or PNG file is also .pdf

The solution you propose should fix this problem, but I'm wondering if it's more complicated than it needs to be? I was thinking of a bit of code that didn't have to insert anything into the database, but which would pretty much use the tools given by imagecache. Also, we would have to work out how to map different imagecache presets onto the CCK fields. And what would happen when the presets are altered or flushed?

It might be possible to create a module that effectively re-implements imagecache_cache() so that it if asked generate presetname/files/original.jpg, it would look for /files/original.pdf if it couldn't find /files/original.jpg
http://drupalcontrib.org/api/function/imagecache_cache/6

boobaa’s picture

Subscribe

anrikun’s picture

iva2k’s picture

This would be an awesome feature to have. Please, think also of supporting other file types, or at least a roadmap to do it. Can it potentially employ a mimedetect module to recognize file types?

Once the feature is committed, I would be picking it up into iTweak Upload module. People there are requesting previews of other file types besides images #601896: Allow preview / thumbnail for PDF and other non-image attachments.

What I would like to have from imagecache is a function that returns TRUE if there is a preview image for any given file, either an image or any other supported type, like PDF. This will decouple nicely and make iTweak Upload's code support (without modifications) any future imagecache updates. See _itweak_upload_isimage() and itweak_upload_itweak_upload_preview() functions in itweak_upload.module - these are the ones I will modify/replace with corresponding imagecache call.

@drewish
Before I get too excited - please chime in if you would consider committing a final patch from this issue into imagecache project? What would be your requirements?

egfrith’s picture

There's now a solution for the problem with Safari (and some other browsers, it turns out) not displaying converted thumbnails. See #628146: File extensions don't match the actual MIME type - Some browsers do not display converted images. The code is in the attachment - it's not committed to a module yet.

Alex Andrascu’s picture

Any updates on this? we can't drop dead with this when we're so close to it.

egfrith’s picture

Hello Alex,

I think drewish (maintainer of imagcache and imageapi) has been focussing his effort on Drupal 7.

In order for the imageapi_imagemagick module to be ported fully to D7, a version of the patch for D6 at #416254: Add equivalent of image_get_info() at the toolkit level will have to be applied. I think the D6 patch is pretty much ready. I don't know what drewish's ideas are for the future of the D6 versions of these modules, but when he gets to work on imagecache for D7, it would be great from my point of view if he could commit the D6 patch beforehand.

Alex Andrascu’s picture

Wonderfull news! Just looked at the patch at #416254: Add equivalent of image_get_info() at the toolkit level and i can't wait to play with it a little.
Thanks David.

sorensong’s picture

Very interested in seeing this functionality included, also. Thanks a lot!

mattwmc’s picture

RE: convert PDF to JPG support

So is this a yes or no for drupal 6?

samdeskin’s picture

what do y'all think about displaying PDFs as HTML5 instead of JPGs?

egghunter’s picture

subscribing

dman’s picture

Guys.
It looks like imagecache is not the place for radical document conversions to happen. There are so many kinks that have to be built into the process it gets hard to handle.

http://drupal.org/project/pdf_to_imagefield
- is a proper solution, and takes the #27 approach.
Right now it's a total doc conversion (all pages), but it's just a small feature request to #798996: Just create an image of the first page?

I'd really suggest pulling this feature request out of imagecache - it doesn't fit. Suggest leave it here as 'by design' and concentrate efforts on a fuller solution over at that other module.
(You can still run imagecache effects on the results it produces)

shenzhuxi’s picture

@dman
I use poppler for the comverting in my module http://drupal.org/project/fileviewer.
How pdf_to_imagefield solve the same problem?

dman’s picture

Status: Active » Closed (works as designed)

not the same approach.
If you use a browser utility to download and display the full PDF inline, that's fine.
Folk here want to create a jpeg - like of the first page - that can be used as a thumbnail or preview.
A different task.
Neither job is likely to be taken care of within imagecache module itself.

jejk’s picture

subscribe

develcuy’s picture

StatusFileSize
new949 bytes

Alternative patch from frankiedesign at http://drupal.org/node/460132

develcuy’s picture

Status: Closed (works as designed) » Active

What about providing us with a way to extend imagecache so that a plugin module can provide support for PDF files? see previous patch, perhaps is a good start to create a hook.

shenzhuxi’s picture

http://drupal.org/project/fileviewer
My module supports the image handler in Drupal 7 core for the PDF thumbnails now.

upupax’s picture

[deleted]

my fault.

romiomon’s picture

How can I convert pdf tp image thumbnail on Drupal 7
Please Advise

fizk’s picture

jwilson3’s picture

Issue summary: View changes

Add Drupal 7 solutions

jwilson3’s picture

Status: Active » Closed (won't fix)

Note: The FileViewer module page mentioned in previous comments now directs people to use http://drupal.org/project/pdf, which uses HTML5 to embed and display pdfs inline inside a page.