|John, Katherine, GSDL folks,
I got some problems with PDF files, too (again).
I had used GSDL 2.40a on RHLinux8 and had to go around several times,
and always got
I am using GSDL 2.41 now, in both a local Windows version and with
RHLinux9, which I recompiled to enable z39.50 (as I did with 2.40a).
Problem is this. I worked out files in OpenOffice Writer and saved them
as .dos and then convert them via the internal PDF converter, which I
believed is based in ghostscript-to-pdf, which I heard what it does is
create images (bitmaps or jpegs, I don´t know). When using either the
Collector or GLI with the local gsdl (I do this first before working
the linux gsdl) no PDF file is shown in the final list of the
In fact no file is shown.
Then, going a step back, using GLI and instead of using the pdf version
of the file I used the .doc generated by OOWriter, the list contained a
html text file, and a icon that is supposed to bring me the .doc file
does not work at all). Only the html text works.
Am I missing something with PDFPlug parameters? And with classify
parameters? Is this functionality crippled in the Win-local version but
should work in the linux-server version?
What I need is simple. A list of pdf files, with title and filename next
the pdf driving icon, and perhaps the text versión too, nothing else (no
.doc´s since it seems not to work properly).
Any help? Sorry for the time spent if this has been answered before.
If it has been, then please provide with pointers to this not-so-Faq.
Azael Barrera, Ph.D.
Director - Transferencia de Tecnologías de Información y Comunicación.
Secretaría Nacional de Ciencia, Tecnología e Innovación
De: John R. McPherson [mailto:email@example.com]
Enviado el: miércoles, 10 de marzo de 2004 18:24
Para: Diego Spano
CC: Greenstone (users)
Asunto: Re: RV: [greenstone-users] PDFPlug
Diego Spano wrote:
> Hi John, you are right, png files are better than jpg, but Greenstone
> doesn´t process it !!! I made a PDF document composed with png files.
> imported it in Greenstone but it doesn´t export each page, so when I
> browse the document in the collection I see no images ! If I use jpg
> files, Greenstone process it with no problems !
> Is something about the PDFPlug? I also use -complex option but nothing
It looks like "pdftohtml", the converter program we use, handles .JPG
images differently to other image types. The older versions of pdftohtml
used to always extract images, it looks like the newer version only
extracts JPG images by default, and will only extract other image types
if the -complex option is used.
If the complex option is used, then the images are extracted, but then
pdftohtml does some annoying things things:
places in the .HTML file, which means that if you add other stuff around
it (such as greenstone html code), all the text is overlapping and out
2) It makes a big image of the page and uses that as the background,
drawing the extracted text on top of the image.
Basically, if you have too much text at the top of the page, it might
make it all render funny. Also, I don't know how well Internet Explorer
Anyway, I tried it with a pdf file I created using "pdflatex" and
embedding .PNG images in it, and it worked ok when I gave PDFPlug the
"-complex" option. I'm using greenstone v 2.41 on linux.