[greenstone-devel] Classifiers and metadata/Organizing Documents

From Gregory S. Williamson
DateSun, 27 Jul 2003 15:25:04 -0700
Subject [greenstone-devel] Classifiers and metadata/Organizing Documents
Thanks to some help from this list I have been able to make some progress on extracting metadata from html documents. Alas, I now seem to have run in to my stupidity again.

I am not enamored of long lists of titles, and it is a dead certainty that no-one will find anything of any use in our site if that is the best we can do. Few of our vistors (we purport to document San Francisco history) will know beforehand what they are looking for, and those people are easily serviced with Greenstone's searching capabilites. We need to find a way to break down the paltry few thousand documents and images that we have into more easily browsed chunks that can "seduce" the user.

I had thought to do that with a "Subject" metadata which would at least dump documents into some "buckets" that would be more easily browsed. Alas, I seem utterly stymied. I have a "Subjects" button but under it are all articles with a Subject tag, in alfa order by title, but no subject line, let alone groupings of documents.

According to "How to Build a Digital Library" (Witten & Bainbridge, 2003), page 344, "Formatting Lists" a classifer's default format statement can be overridden to provide some more tailored appearences. Hence, I have the following bit of a config file:

indexes document:text document:Title document:Source
...
indexes document:text document:Title document:Source
...
classify AZList -metadata Subject
classify AZList -metadata Title
classify AZList -metadata Source

format CL1Vlist "<br>[subject] [link][Title][/link]"
...

This does extract the "Subject" metadata but then the "Subject" is utterly gone and the above format statement doesn't seem to do anything (I take it out and everything looks the same). I would settle for a listing that has something like:
1934 General Strike <Origins>
1934 General Strike <Media Distortion>
...
African-American <...>

Willing to settle for redundant text, although obviously it would nicer without it ...

===============

What I really need is a "Subjects" button that then has a hierarchy:

1934 General Strike --> click on this and get a list of titles about the strike
African-Americans --> click on this and get a list of titles about blacks in the city
...
Irish
...
Performing Arts
...
===================

Am I trying to use the wrong structures ? Can anyone point me to the error of my ways ? (small question is about the stupid formatting tricks, big question is about organization of data).

I can spend hours retooling the export function from our existing authoring tool if needs be, but daminit, I need to know where I am going. (remembering that this is not a job, i.e. I don't get paid for this, and once I am done with that work I still have apparently countless hours to try to get greenstone to deliver what we need). If I have to create external files with hierarchy data I can do it, but at this point I have spent so much time with GSDL for such little result that I do not want to "experiment."

Perhaps I am using the wrong tool ? I guess that Greenstone is a very powerful tool because I can't get it do the simplest things without wasting huge amounts of time.

TIA for whatever illumination can be shed...

Greg Williamson
gsw@globexplorer.com