[greenstone-users] Excluding documents from full-text indexing

From schild
DateFri, 18 Mar 2005 17:35:49 +0100
Subject [greenstone-users] Excluding documents from full-text indexing
Hi list,

does anybody know, if it is possible to exclude specific files from the
full-text indexing process in the build phase of a collection, while
nevertheless do a metadata indexing for those documents? In particular
what I want to do is:

certain documents in my collection should be only accessible via a WEB
link (those files must not directly be included into the repository of
the digital library), whereas other are directly included. For those
that should be accessed over a WEB link, I included a dummy file instead
(just an empty file having the same name as the original file). If I do
the build process, all plugins exit with an error (since there is no
text to be index or the file does not conform with the normal file
structure. As an example here is what the librarian interface spits out
in expert mode

import.pl> Converting paxson96endtoend.pdf to HTML format
import.pl> Error: May not be a PDF file (continuing anyway)
import.pl> Error (0): PDF file is damaged - attempting to
reconstruct xref table...
import.pl> Error: Couldn't find trailer dictionary
import.pl> Error: Couldn't read xref table
import.pl> Error executing pdftohtml.pl
import.pl> Could not convert paxson96endtoend.pdf to HTML format

Obviously an empty .pdf file is not conformant with the pdf file
structure.... By the way, this need for me arises from copyright laws.

Anybody who got a clue on this one?




