searching the list archives I came across a few similar projects, but
I'm not yet terribly familiar with Greenstone and could therefore not
quite grasp the suggested solutions.
I have a bibliography database that is linked to PDF files. A file has
the ID of the matching database record at the end of its name; it was
therefore easy for me to bring record and filename together with a
regular expression scan of the directory. Currently there are about 2000
records and matching files.
The goal is to build a collection by primarily making use of the
bibliographical data, for browsing of titles and authors (keywording
might be added at a later stage), and for field searches using mgpp. I'm
not sure whether the quality of the PDF files and their language - many
of them are in Japanese - allow for text indexing. But at any rate, for
now, working with the bibliographical data for searches is sufficient
and all I currently need.
I thought of structuring metadata.xml like this:
<Metadata name="Title">A Study of Something</Metadata>
<Metadata name="Author">Kobayashi, Akira</Metadata>
<Metadata name="Journal Title">Nani-ka no Kenkyu</Metadata>
My assumption is: If I place all pdf-files in the import-directory,
together with this metadata.xml that contains FileSet-tags for all of
them, I could import the collection with RecPlug. Is this correct?
If so, am I correct in assuming that, if I use the use_metadata_files
option, that RecPlug will then *not* extract titles from the PDF-files?
(Which is exactly what I want because the database has correct titles in
all cases anyway, so the information in metadata.xml should be given
preference.) Would I even need to use PDFPlug in addition if I just
wanted to make use of the metadata, and not build full-text indices?
I'm not sure whether my approach to the metadata fields is correct, or
practical. In the Greenstone DTD - which my xml editor currently reports
as "temporarily moved", by the way, so I can't try and validate -, the
"name"-attribute doesn't seem to have a fixed set of values - or are
these restricted somewhere else? Should I use a different DTD? Or write
my own metadata set? Or do I need both a DTD for validating metadata.xml
and a special metadata set for the librarian interface to be able to
And: If I use metadata name attributes for specifying the
bibliographical fields, are these also the ones I can use to configure
the display, with format-statements in collect.cfg as it is done in the
Apologies if these questions are all too newbie-ish, and best regards,