Re: [greenstone-users] Re: PDF files

From Katherine Don
DateFri, 25 Aug 2006 11:46:26 +1200
Subject Re: [greenstone-users] Re: PDF files
In-Reply-To (1156423857-21765-89-camel-sevenofnine-smtl-co-uk)
Hi Chuck

>From one of your emails it looked like the PDFs were not importing
because the text could not be extracted. If you set the -verbosity
argument to import.pl to 4, you should see a message to that effect.

Have a look at the PDF tutorial to get an idea about what you can do
with PDFs whose text cannot be extracted.
http://greenstone.sourceforge.net/wiki/gsdoc/tutorial/en/enhanced_pdf.htm

Alternatively, you can process these failed PDF's with UnknownPlug:
Add

UnknownPlug -process_ext pdf -file_format PDF -mime_type application/pdf
-srcicon iconpdf

to your plugin list after PDFPlug. Any pdf files that fail to be
processed by PDFPlug will be captured by UnknownPlug. UnknownPlug will
not do any processing on the files, so no metadata/text will be extracted.

Regards,
Katherine

Chuck Amadi Systems Administrator wrote:
> Hi me agian
>
> Got one of my pdf's to work But the rest all failed.
>
> less /local/sw/gsdl/gsdl-2.70/collect/smtl/etc/fail.log
>
> BS_ISO_15378_2006packagingmaterialsformedicinalproducts.pdf: PDFPlug
> failed to convert to HTML
> BS_ISO_15378_2006-packaging-materials-for-medicinal-products.pdf: no
> plugin could process this file
> BS_EN_ISO_21649_2006needlefreeinjectors.pdf: PDFPlug failed to convert
> to HTML
> BS_EN_ISO_21649_2006-needle-free-injectors.pdf: no plugin could process
> this file
> BS_EN_ISO_21647_2004respiratorygasmonitors.pdf: PDFPlug failed to
> convert to HTML
> BS_EN_ISO_21647_2004-respiratory-gas-monitors.pdf: no plugin could
> process this file
> BS_EN_ISO_22870_2006pointofcaretesting.pdf: PDFPlug failed to convert to
> HTML
> BS_EN_ISO_22870_2006-point-of-care-testing.pdf: no plugin could process
> this file
> BS_EN_ISO_21171_2006glovepowdertesting.pdf: PDFPlug failed to convert to
> HTML
> BS_EN_ISO_21171_2006-glove-powder-testing.pdf: no plugin could process
> this file
>
> Any ideas howto resolve this as I have plenty more pdf's that I need to
> add to my SMTL collection.
>
> Cheers for your help list.
>
> Thanks.
>
> On Thu, 2006-08-24 at 13:12 +0100, Chuck Amadi Systems Administrator
> wrote:
>
>>Hi Pete
>>
>>I have got Greenstone Digital Software working.
>>
>>He He
>>
>>I had to setup another Apache Alias thus restart all other apps are
>>running OK.
>>
>># Greenstone Digital Library Conf file.
>>
>>ScriptAlias /gsdl/cgi-bin "/local/sw/gsdl/gsdl-2.70/cgi-bin"
>><Directory "/local/sw/gsdl/gsdl-2.70/cgi-bin">
>> Options None
>> AllowOverride None
>></Directory>
>>
>>Alias /gsdl "/local/sw/gsdl/gsdl-2.70"
>><Directory "/local/sw/gsdl/gsdl-2.70">
>> Options Indexes MultiViews FollowSymLinks
>> AllowOverride None
>> Order allow,deny
>> Allow from all
>></Directory>
>>
>> Alias /collect "/local/sw/gsdl/gsdl-2.70/collect"
>> <Directory "/local/sw/gsdl/gsdl-2.70/collect">
>> Options Indexes MultiViews FollowSymLinks
>> AllowOverride None
>> Order allow,deny
>> Allow from all
>> </Directory>
>>
>>
>>Here is the URL:
>>
>>http://intranet.smtl.co.uk/gsdl/cgi-bin/library
>>
>>Click on SMTL from the selection of collections of import pdf's you told
>>me to use.
>>
>>Thus search for draft and then click on the pdf icon and Viola.
>>
>>Cheers
>>
>>Chuck
>>