If you don't have any HTML created from PDF, Word, Excel etc. documents
then there is no text for Greenstone to search -- is this what you want?
If so, you can use the UnknownPlug to include these file types in the
collection. You'll need to set the "-process_exp" option to include the
file types you don't want processed (and remove PDFPlug, WordPlug etc.)
A different way of solving this is to simply remove (by editing the
format statements) the icon to view the "Greenstone version" of the
document from search results and classifier lists. This means users can
only access the original source documents yet can still search based on
any text extracted from them.
Hope this makes sense,
Doug Carter wrote:
> Hi all,
> Is there a way to build a collection without transformation? That is,
> I don't want any html created from pdf, word or excel docs. I tried
> removing the associated plugins, but then nothing built.
> Any ideas?
> Doug Carter
> Mercy Corps
> greenstone-devel mailing list