Re: [greenstone-devel] getting Greenstone to "see" DC metadata

From Katherine Don
DateMon, 02 Aug 2004 09:47:54 +1200
Subject Re: [greenstone-devel] getting Greenstone to "see" DC metadata
In-Reply-To (EFEPLPIPLCDFJGDKFDPLOEJPGOAA-hollyh-mail-uri-edu)
Hi Holly

Your problem is to do with how the Librarian Interface finds metadata.
If you put the documents, images and metadata.xml file in the import
directory, and had a suitable collect.cfg file in the etc directory,
then you could build your collection fine using command line building
(running and in a terminal).

If you are using the Librarian, you can't just put a metadata file into
the import directory, because it wont pick it up properly.

There are a couple of options.
1. add the metadata manually using the Enrich panel. This may not be
suitable for large numbers of documents.
2. put the html files and metadata.xml file into a directory somewhere
(they should be in the same directory). Create a new collection using
the Librarian, remembering to include the dublin core metadata set. drag
the html files into the new collection, and the metadata should be
dragged in with it. If successful, you should immediately see it in the
Enrich pane.

Note that our dublin core schema uses metadata element names with a
capital letter, eg dc.Title, dc.Subject etc, not dc.title, dc.subject.
If you change your metadata.xml file to use the proper names, it should
come over automatically, if you leave it lowercase, you will get a
prompt asking what element to map it to. then you choose dc.Title for
dc.title, etc.

The other metadata that you see after building is metadata that
Greenstone has extracted from the documents.

If you are not already using Greenstone version 2.51 I would recommend
upgrading to that as it has a lot of improvements to the Librarian

Hope this helps and you can get your metadata working :-)
Katherine Don

PS We are happy to answer questions on the mailing list, so you don't
need to wait three months before you ask for help when you are stuck :-)

Holly Hendricks URI wrote:
> I have been unable to get Greenstone to see my metadata.xml file
> containing Dublin Core metadata, so I built a small test library to
> try to debug this. I created a new library with 3 html documents
> containing 1 image each.
> I have gone through all the doc and "how to build a digital library"
> in great detail, made flowcharts and lists of the steps, and believe I
> am following every step.
> Contents - everything in one folder. No documents with sections in
> this test example.
> - 3 simple html documents in the import directory
> - Each has one associated .jpg image file also in the import directory
> - A metadata.xml file is in the import directory containing 3 sets of
> dublin core metadata following the prescribed format. See below at
> the bottom of the file for an excerpt.
> Question - should the metadata.xml be trying to use the remote
> metadata.dtd, or a local one? That is the only thing I
> have not tried doing differently.
> RecPlug is configured to "use_metadata_files"
> There are no html <meta> tags in the files
> My classifiers are
> classify AZList -metadata dc.subject
> classify AZList -metadata dc.title
> classify DateList -metadata
> Yet the display from at the end is repetitions of
> WARNING: AZList: HASH5d32454163871c7a719c57 metadata is empty - not
> classifying
> every time it tries to process "doc.xml" for the hash directory
> The only metadata that shows up in the Enrich section of the librarian
> interface is ex.[name] metadata, never the Dublin Core.
> At what step (gather, enrich, design, create) or make-import-build
> does the metadata.xml actually get assigned? According to the book,
> it should happen in the import step. I have never seen it show up in
> the enrich step - should it? Or would it only show there if entered
> by hand? I think I am running with a minimal set of plugins, but is
> one of these conflicting with the metadata I am trying to assign?
> GAPlug
> HTMLPlug
> TEXTPlug
> ZIPPlug
> ArcPlug
> RecPlug -use_metadata_files
> When I look at metadata.xml after running my build procedure, it does
> not contain the same information that was originally there, but
> different metadata for language, encoding, plugin, source, and title.
> Where does this other (unwanted) metadata come from?
> I have been trying to make this work for 3 months. It seems like
> others are using Dublin Core successfully. What am I missing?
> Thank you.
> Holly
> part of metadata.xml (the one I wrote)
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE DirectoryMetadata SYSTEM
> "
> ">
> <DirectoryMetadata>
> <FileSet>
> <FileName></FileName>
> <Description>
> <Metadata name="dc.title">March to Birmingham</Metadata>
> <Metadata mode="accumulate"
> name="dc.language">English</Metadata>
> <Metadata mode="accumulate" name="dc.subject">Rev. Dana M.
> Greeley and colleagues in Birmingham, Alabama</Metadata>
> <Metadata mode="accumulate" name="dc.description">Rev.
> Dana M. Greeley and colleagues in Birmingham, Alabama </Metadata>
> <Metadata mode="accumulate" name="dc.publisher">Arlington
> Street Church Archives Committee, Boston, Mass. </Metadata>
> <Metadata mode="accumulate" name="dc.contributor">Holly
> Hendricks, Arlington Street Church Archives Committee </Metadata>
> <Metadata mode="accumulate" name="">1965</Metadata>
> <Metadata mode="accumulate"
> name="dc.type">Image</Metadata>
> <Metadata mode="accumulate" name="dc.format">image/jpeg
> </Metadata>
> <Metadata mode="accumulate"
> name="dc.identifier">ASC0001nnn </Metadata>
> <Metadata mode="accumulate" name="dc.coverage">Boston,
> Mass </Metadata>
> <Metadata mode="accumulate"
> name="dc.coverage">1965</Metadata>
> <Metadata mode="accumulate" name="dc.rights">Property of
> Arlington Street Church Archives Committee</Metadata>
> </Description>
> </FileSet>
> ...repeat for 2 other file sets....
> </DirectoryMetadata>
> _______________________________________________
> greenstone-devel mailing list