Re: PDF images problem

From John R. McPherson
DateFri, 12 Jul 2002 08:39:02 +1200
Subject Re: PDF images problem
In-Reply-To (OF814C4402-91AA421F-ON85256BF3-00640AA7-altarum-org)
On Thu, Jul 11, 2002 at 03:14:13PM -0400, steve.brophy@altarum.org wrote:
> I'm also having a problem with PDF images which might be related to the
> pdftohtml version.
>
> Using greenstone 2.38 on Windows NT, the pdf file import works fine except
> for the files with security on them. Those result in the pdftohtml
> (version 3.1) program error of "decryption support not included"

> Is the greenstone 'pdftohtml.exe' program customized to work with ppm
> images, or has the pdftohtml program changed its image support along the
> way? Could this be a problem with my windows pdftohtml or ghostscript
> setup, or a more general situation similar to what Elias is seeing?


The pdftohtml program creates an HTML file, and extracts any
images in it to .PPM format. Our wrapper script, pdftohtml.pl, reads
the images.log file and converts the .ppm files to .png. (We included
a binary, pnmtopng.exe that does this for windows users).
Some images get embedded in the PDF, and these are the ones you
sometimes see like the .JPEG files.

pdftohtml is based on the code for "xpdf", a free pdf viewer. The
creator of xpdf refused to put in code to read "encrypted"
pdf files because he didn't want to get in legal trouble under
the U.S's laws regarding encryption, circumvention of copyright
controls, etc.

Until recently, pdftohtml was not being maintained as the guy who
wrote it evidently finished his degree and left university. But
just recently, it's been taken up again by someone - have a look
at http://pdftohtml.sourceforge.net/. If we find enough spare time
we might look into updating the version with greenstone, but so
far I don't think that there are any major differences - our version
is based on the code for version 0.31.

Hope this helps
John McPherson