[greenstone-users] PDF metadata and the build

From William Hursthouse
DateSat, 26 Mar 2005 09:06:56 +1200
Subject [greenstone-users] PDF metadata and the build
Hi,
I am trying to put together a collection (actually, a series of
collections) of pdf files. (A technical library) Almost all of them have
no metadata to be extracted (and some are very large), so I wish to
restrict the build process to only extracting metadata which I have
entered manually for each file - yet still recongnise the file is a pdf
and have it available to click on afterwards.
I presume the build has to extract the "Source" from the original file -
but can I restrict it to that? (At the moment the build process also
extracts garbage and displays it. The build also sometimes screws with
some of the files after getting frustrated at not being able to read
them, so they don't open at all after it has finished).
I am working with just a few files while I experiment, but the
collection(s) will probably have several thousand such scanned pdf
files, so I need to know if what I am aiming for is possible, and
hopefully receive a little guidance.
Thanks very much
William