RE: [greenstone-users] one trouble pdf files

From James Brunskill
DateMon, 22 Jan 2007 11:05:32 +1300
Subject RE: [greenstone-users] one trouble pdf files
In-Reply-To (BAY124-W45DF17BA567793988A0DFAC9A90-phx-gbl)

Hi Israel,


I’m not sure if I am understanding your question. But I think you want to convert your PDF file to images rather than html.

To do that try putting something like this in your collect.cfg:


plugin              PDFPlug -use_sections -convert_to pagedimg_png




James Brunskill

Library Systems Consultant

The University of Waikato

Ph: +64 7 838 4323

From: [] On Behalf Of Israel Abraham Flores Cruz
Sent: Saturday, 20 January 2007 4:03 a.m.
Subject: [greenstone-users] one trouble pdf files


Hi,my name´s Israel I´ve just to come back, Before that nothing  thank you very much for your  help, a least, I can obtain an advanced search, when I translate  a cds/isis to GSDL , remove the brackets´[  ]’ from the fiels of the database i.e  AUTOR[10] , I change to AUTOR_10 , And I haven´t  any problem.

I have another problem when I want to make  a new collection with pdf files, I´ve used the tutorial “ Enhanced PDF handling” , there someone mention that :

If conversion to HTML doesn't produce the result you like, PDF documents can be converted to a series of images, one per page. This requires ImageMagick and Ghostscript to be installed.

At this time,  I´ve worked since the local library And I don´t  know  where I should  intall  Ghostscript, by default the path is : C:Archivos de programags, I´ve installed Greenstone in C:Archivos de programaGSDL, thereby I changed the path of Ghostscript  only  gs by GSDL, and no more, and I install ImageMagick-6.3.1-7-Q16-windows-dll.exe( my OS is Windows XP)  in the default path , I don´t know  if , I´m OK , or not  because , when  I pass the mouse over  complex (option in the PDFPlug)

An advice  that I don’t  understand , if I use it or not  I  obtain the same result in the file of greenstone(I´ve use only , PDFPlug-convert_to html-complex) , A simple column with almost all information , any image  clear away of the files, and when  I  changed the option of PDFPlug  now to convert_to  option to one of the image types, e.g. pagedimg_jpg. & Use that advice : Switch off the use_sections option, as it is not used with image conversion.I get 6 document processed but in the file of Greenstone , it appear whithout  information except it´s title ,it pass whith all 6 documents,include  pdf05-notext .I add the collect.cfg on this message, thank you for your help again.




public             true


buildtype         mgpp


#indexes         document:text document:Title document:Source

indexes                       text Title Source

defaultindex    text


levels  document


indexoptions   accentfold casefold stem


defaultlevel      document


plugin              GAPlug

plugin              PDFPlug -convert_to html -complex

plugin              ZIPPlug

plugin              TEXTPlug

plugin              HTMLPlug -smart_block

plugin              EMAILPlug

plugin              RTFPlug

plugin              WordPlug

plugin              PSPlug

plugin              ImagePlug

plugin              ISISPlug

plugin              NULPlug

plugin              MetadataXMLPlug

plugin              ArcPlug

plugin              RecPlug


classify            AZList -metadata Title

classify            AZList -metadata Source


format VList "<td valign="top">[link][icon][/link]</td>

<td valign="top">[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]</td>

<td valign="top">[highlight]




format HList "[link][highlight][ex.Title][/highlight][/link]"


format DocumentHeading "{Or}{[parent(Top):Title],[Title],untitled}<br>"


format DocumentText "[Text]"


format DocumentButtons "Detach|Highlight"


format SearchTypes "plain,form"


collectionmeta collectionname [l=es] "Coleccion de PDF´s"

collectionmeta .document:text [l=es] "text"

collectionmeta .document:Title [l=es] "titles"

collectionmeta .document:Source [l=es] "filenames"

collectionmeta .text [l=es] "text"

collectionmeta .Title [l=es] "titles"

collectionmeta .Source [l=es] "filenames"


Be one of the first to try Windows Live Mail. Windows Live Mail.