Re: [greenstone-users] Collector PDF Encoding Error

From Katherine Don
DateFri, 25 Jun 2004 12:28:26 +1200
Subject Re: [greenstone-users] Collector PDF Encoding Error
In-Reply-To (009d01c457fa$ac1081e0$6112010a-danpc)

It sounds like you have specified an empty directory in your source files.

I have tried using the Collector to build a collection of PDF files, and
it worked fine.
The warnings about PDF files not being encoded in utf-8 do not affect
the collector. If it can find the files but can't process them, the
build completes fine, and the build log would contain messages like the

Build summary for collector2 collection

* 5 documents were considered for processing
* 0 were processed and included in the collection
* 5 were unrecognised
See /research/kjdon/home/gsdl/collect/collecv1/etc/fail.log for a list
of unrecognised and/or rejected documents
Fail log for collector2 collection

beatles_georgeob.pdf: no plugin could recognise this file
beatles-1.pdf: no plugin could recognise this file
beatles_tab.pdf: no plugin could recognise this file
beatles.pdf: no plugin could recognise this file
beatles_review.pdf: no plugin could recognise this file

So please check your source URLs. Please note that you can only retrieve
files using file:// from the computer that Greenstone is installed on.

Katherine Don

Pesserl Dan wrote:
> Hey Guys,
> I've beeen trying to create a collection using the collector but keep
> getting this message:
> The collection could not be built as it contains no data. Make sure
> that at least one of the directories or files you specified on the
> /source data/ page exists and is of a type or (in the case of a
> directory) contains files of a type, that Greenstone can process.
> When I try to create the collection through the command line I am able
> to, but get encoding warnings saying that the PDF files I'm using are
> not encoded in UTF-8, however they are fine to be processed.
> I've been trying to find out how I can get things to work through the
> Collector by using switches for the plugins but there's really nothing
> of help out there.
> The directory has PDF files in it and I can GSDL can read it without a
> problem since I can see the files being read by the status page.
> I have to get the Collector working to let my clients update collections.
> Any ideas? Thanks!
> -Dan
> ------------------------------------------------------------------------
> _______________________________________________
> greenstone-users mailing list