[greenstone-devel] getting Greenstone to "see" DC metadata

From Holly Hendricks URI
DateSat, 31 Jul 2004 14:43:52 -0400
Subject [greenstone-devel] getting Greenstone to "see" DC metadata
I have been unable to get Greenstone to see my metadata.xml file
containing Dublin Core metadata, so I built a small test library to
try to debug this. I created a new library with 3 html documents
containing 1 image each.

I have gone through all the doc and "how to build a digital library"
in great detail, made flowcharts and lists of the steps, and believe I
am following every step.

Contents - everything in one folder. No documents with sections in
this test example.
- 3 simple html documents in the import directory
- Each has one associated .jpg image file also in the import directory
- A metadata.xml file is in the import directory containing 3 sets of
dublin core metadata following the prescribed format. See below at
the bottom of the file for an excerpt.

Question - should the metadata.xml be trying to use the remote
greenstone.org metadata.dtd, or a local one? That is the only thing I
have not tried doing differently.

RecPlug is configured to "use_metadata_files"
There are no html <meta> tags in the files

My classifiers are
classify AZList -metadata dc.subject
classify AZList -metadata dc.title
classify DateList -metadata dc.date

Yet the display from buildcol.pl at the end is repetitions of
WARNING: AZList: HASH5d32454163871c7a719c57 metadata is empty - not
classifying
every time it tries to process "doc.xml" for the hash directory

The only metadata that shows up in the Enrich section of the librarian
interface is ex.[name] metadata, never the Dublin Core.

At what step (gather, enrich, design, create) or make-import-build
does the metadata.xml actually get assigned? According to the book,
it should happen in the import step. I have never seen it show up in
the enrich step - should it? Or would it only show there if entered
by hand? I think I am running with a minimal set of plugins, but is
one of these conflicting with the metadata I am trying to assign?

GAPlug
HTMLPlug
TEXTPlug
ZIPPlug
ArcPlug
RecPlug -use_metadata_files

When I look at metadata.xml after running my build procedure, it does
not contain the same information that was originally there, but
different metadata for language, encoding, plugin, source, and title.
Where does this other (unwanted) metadata come from?

I have been trying to make this work for 3 months. It seems like
others are using Dublin Core successfully. What am I missing?

Thank you.
Holly

part of metadata.xml (the one I wrote)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DirectoryMetadata SYSTEM
"http://greenstone.org/dtd/DirectoryMetadata/1.0/DirectoryMetadata.dtd
">
<DirectoryMetadata>
<FileSet>
<FileName></FileName>
<Description>
<Metadata name="dc.title">March to Birmingham</Metadata>
<Metadata mode="accumulate"
name="dc.language">English</Metadata>
<Metadata mode="accumulate" name="dc.subject">Rev. Dana M.
Greeley and colleagues in Birmingham, Alabama</Metadata>
<Metadata mode="accumulate" name="dc.description">Rev.
Dana M. Greeley and colleagues in Birmingham, Alabama </Metadata>
<Metadata mode="accumulate" name="dc.publisher">Arlington
Street Church Archives Committee, Boston, Mass. </Metadata>
<Metadata mode="accumulate" name="dc.contributor">Holly
Hendricks, Arlington Street Church Archives Committee </Metadata>
<Metadata mode="accumulate" name="dc.date">1965</Metadata>
<Metadata mode="accumulate"
name="dc.type">Image</Metadata>
<Metadata mode="accumulate" name="dc.format">image/jpeg
</Metadata>
<Metadata mode="accumulate"
name="dc.identifier">ASC0001nnn </Metadata>
<Metadata mode="accumulate" name="dc.coverage">Boston,
Mass </Metadata>
<Metadata mode="accumulate"
name="dc.coverage">1965</Metadata>
<Metadata mode="accumulate" name="dc.rights">Property of
Arlington Street Church Archives Committee</Metadata>
</Description>
</FileSet>

...repeat for 2 other file sets....

</DirectoryMetadata>