Re: RV: [greenstone-users] PDFPlug

From Katherine Don
DateThu, 01 Apr 2004 15:57:05 +1200
Subject Re: RV: [greenstone-users] PDFPlug
In-Reply-To (B7E1F79C9774334CA7400990CFEB75033A2AC8-intranet-senacyt-int)
Hi Azael

With PDF and Word files, you should just be able to add them to your
collection, rebuild, and they should appear in a titles a-z list with
links to the html and pdf/word files. The html may be strange, but the
link to the original should always work. I don't think there are any
special options needed for this to work.

Do you get error messages (using GLI) when building your collection of
PDFs?

If you are still having problems with this, please send me (off the
list) 2 of your pdfs with their corresponding word documents, and your
collect.cfg file, and I'll have a look at them for you.

Regards,
Katherine Don

Azael Barrera wrote:
> John, Katherine, GSDL folks,
>
> I got some problems with PDF files, too (again).
>
> I had used GSDL 2.40a on RHLinux8 and had to go around several times,
> and always got
> something wrong.
>
> I am using GSDL 2.41 now, in both a local Windows version and with
> RHLinux9, which I recompiled to enable z39.50 (as I did with 2.40a).
>
> Problem is this. I worked out files in OpenOffice Writer and saved them
> as .dos and then convert them via the internal PDF converter, which I
> believed is based in ghostscript-to-pdf, which I heard what it does is
> create images (bitmaps or jpegs, I don□t know). When using either the
> Collector or GLI with the local gsdl (I do this first before working
> with
> the linux gsdl) no PDF file is shown in the final list of the
> collection,
> In fact no file is shown.
>
> Then, going a step back, using GLI and instead of using the pdf version
> of the file I used the .doc generated by OOWriter, the list contained a
> html text file, and a icon that is supposed to bring me the .doc file
> (which
> does not work at all). Only the html text works.
>
> Am I missing something with PDFPlug parameters? And with classify
> parameters? Is this functionality crippled in the Win-local version but
> should work in the linux-server version?
>
> What I need is simple. A list of pdf files, with title and filename next
> to
> the pdf driving icon, and perhaps the text versi□n too, nothing else (no
> .doc□s since it seems not to work properly).
>
> Any help? Sorry for the time spent if this has been answered before.
> If it has been, then please provide with pointers to this not-so-Faq.
>
>
> Azael Barrera, Ph.D.
> Director - Transferencia de Tecnolog□as de Informaci□n y Comunicaci□n.
> Secretar□a Nacional de Ciencia, Tecnolog□a e Innovaci□n
> E-mail: abarrera@senacyt.gob.pa
>
>
> -----Mensaje original-----
> De: John R. McPherson [mailto:jrm21@cs.waikato.ac.nz]
> Enviado el: mi□rcoles, 10 de marzo de 2004 18:24
> Para: Diego Spano
> CC: Greenstone (users)
> Asunto: Re: RV: [greenstone-users] PDFPlug
>
> Diego Spano wrote:
>
>>Hi John, you are right, png files are better than jpg, but Greenstone
>>doesn□t process it !!! I made a PDF document composed with png files.
>
> I
>
>>imported it in Greenstone but it doesn□t export each page, so when I
>>browse the document in the collection I see no images ! If I use jpg
>>files, Greenstone process it with no problems !
>>
>>Is something about the PDFPlug? I also use -complex option but nothing
>>happens.
>
>
> It looks like "pdftohtml", the converter program we use, handles .JPG
> images differently to other image types. The older versions of pdftohtml
>
> used to always extract images, it looks like the newer version only
> extracts JPG images by default, and will only extract other image types
> if the -complex option is used.
>
> If the complex option is used, then the images are extracted, but then
> pdftohtml does some annoying things things:
> 1) It uses horrible javascript to place the extracted text in particular
>
> places in the .HTML file, which means that if you add other stuff around
>
> it (such as greenstone html code), all the text is overlapping and out
> of place.
> 2) It makes a big image of the page and uses that as the background,
> drawing the extracted text on top of the image.
>
> Basically, if you have too much text at the top of the page, it might
> make it all render funny. Also, I don't know how well Internet Explorer
> handles the all the placement javascript.
>
> Anyway, I tried it with a pdf file I created using "pdflatex" and
> embedding .PNG images in it, and it worked ok when I gave PDFPlug the
> "-complex" option. I'm using greenstone v 2.41 on linux.
>
> John
>
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users