Re: [greenstone-users] Success and More questions

From Michael Dewsnip
DateMon, 23 Feb 2004 11:37:24 +1300
Subject Re: [greenstone-users] Success and More questions
In-Reply-To (402D389C-3060506-chicagonet-net)
Dear Petra,

> I have successfully built a trivial collection using the Greenstone
> conversion, however, it is clear that if I wish to really use this
> software to catalog files I'll need to use the Dublin Core.
> Therefore, several questions:

Good to hear you've had some initial success with Greenstone. I'll do my
best to answer your questions.

> 1. How do I automate populating the metadata for each file?

In Greenstone metadata comes in two forms: automatically extracted
metadata, and manually assigned metadata. Greenstone will store some
basic metadata with all files (such as source document filename), and
depending on the type of file and the plugin used to process it, other
metadata will be extracted (eg. Image width and height for images, Title
for HTML documents).

It is also possible to manually assign metadata to documents using
"metadata.xml" files. This is described in the "Assigning metadata from
a file" section in section 2.1 of the Greenstone Developer's Guide. You
can look at an example metadata.xml file in the import directory of the
demo collection.

These metadata.xml files allow you to assign metadata to documents in
quite powerful ways. Regular expressions are used, meaning that you can
easily assign metadata to groups of files, or all files in a directory.
The files are easy to edit, meaning you can write them yourself, and if
you know a bit of programming you can easily generate them if you have
metadata in other formats.

Perhaps the easiest way to assign large amounts of metadata is to use
the Greenstone Librarian Interface (GLI), however. This makes assigning
metadata a "point and click" progress, and generates the metadata.xml
files for you. Although the GLI has not really been streamlined for data
entry, it should still be faster than editing the files by hand. It also
allows you to create your own metadata sets (if Dublin Core doesn't
quite do what you need).

> 2. Will all my document files be converted to html?

That depends on what your document files are, but typically yes. The
reason Greenstone converts most document formats into HTML is so that it
can 1) easily get the document text for full-text searching, and 2)
easily display (a form of) the document.

The conversion process means that the HTML documents often don't look
much like the originals, but you can always show the original documents
to the user rather than the HTML versions.

> 3. I have many image files I would like to put into a collection, is
> there any particular metadata requirements other than filename and
> extension?

Greenstone doesn't require any metadata: what you assign is up to you.
For an image collection it is likely that you would want to assign some
though, otherwise you can only browse the collection (not search it),
like this one:

> 4. And last, where can I go to learn about all the mind boggling
> choices in Greenstone?

The manuals (especially the User's and Developer's Guides) are the best
reference for the core functionality of Greenstone:

To get a feel for what Greenstone can do, a more interesting place to
visit might be the Greenstone Examples page
( and

Hope this helps,