PROBLEM:

When you click the 'upload' button when uploading a PDF file to a node of type 'copyzol', it needs to be able to count the number of pages of that pdf file. Bounty for $40. We really need this, ASAP. Skype me at teodyseguin.

thanks.

Comments

dman’s picture

FYI, http://drupal.org/project/pdf_to_imagefield has code you can scour to find an answer to this.
There are two classes of PDF - one with the page count that can be scraped directly from embedded meta data, which can be done by string-scanning, but there are some (I think older encodings, or from different encoders) where the only reliable count can be deduced by rendering each page and counting the result. That method requires you to have the tcpdf application installed on the server - which means you must have good control on the host. And may take some liason.

I'll leave the bounty for someone who wants to experiment with this as an easy job, but if you do get it sorted in a clean way - come join us as a maintainer of http://drupal.org/project/pdf_to_imagefield

Drave Robber’s picture

To my knowledge, by far the simplest (and not too resource-consuming, at least compared to sifting through the file with preg_match()) method is this. Requires ImageMagick.

(I'm leaving it open, too.)

criznach’s picture

If you don't have decent hosting this may not be easy. As dman said, there are many different types of PDFs, and to accept any arbitrary file will require a robust solution. I see a few options...

  • TCPDF + FPDI - only supports up to PDF 1.4
  • TCPDF + FPDI PDF Parser - should work with 1.5+, but requires a 100 euro license and good hosting.
  • PDF to imagefield - requires Imagemagick - not all hosts support it.
  • Imagemagick - utilizes ghostscript - could just use ghostscript - again, not all hosts support it.
  • preg_match search - may be resource intensive, but any solution is going to load the file. Not sure if this will work with 1.5+
dman’s picture

Here's the code that uses imagemagick identify
http://drupalcode.org/project/pdf_to_imagefield.git/blob/refs/heads/6.x-...
Caveats about installing ghostscript etc are on the proj page.
But that's the slow version.

I think the faster but more haphazard string approach is here: (a grep not a preg match)
http://drupalcode.org/project/pdf_to_imagefield.git/blob/refs/heads/7.x-...

odyseg’s picture

Thanks for all the reply and suggestions. I had found a developer who solves this :)

criznach’s picture

odyseg’s picture

here is the function we've use to count the number of pages when you upload a pdf file

function copyzol_pdf_count_get_number_of_pages($filepath) {
    $filepath = realpath("./$filepath");
    $fp = @fopen($filepath,"r");
    $max = 0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);

    if($max == 0 && class_exists('imagick')){
        $im = new imagick($filepath);
        $max = $im->getNumberImages();
    }

    if ($max == 0)
        $max = 1;

    return $max;
}

We hav ImageMagick installed to our server to make this works.

iwant2fly’s picture

Thank you. This code helped out greatly in a module we had developed.