Re: Changing metadata after collection?

From sjboddie
DateFri, 22 Feb 2002 11:05:13 +1300
Subject Re: Changing metadata after collection?
In-Reply-To (200202220353-EAA19790-geri-narc-com)

There's no easy answer to your question so sorry if the following gets a
little technical. There's also no existing solution and any future
solution would require some work to implement.

> Is it possible to change the metadata of an item after it has already
> been imported and built, without re-importing and re-building the
> collection?
> In particular I want to use greenstone to help organize documents whose
> contents I have not yet fully classified. I want to import the document
> into a "general category" in greenstone, then later, after having understood
> the document and all of its implications, update its metadata and reclassify
> it.
> Indeed, in general, after months or years, I will certainly be changing
> and updating metadata of my documents, as I realize their relevance in
> new contexts. But years down the road, I do not anticipate having the
> original source from which I imported the documents.
> So, what's the best way to update metadata on already imported and built
> items for which you no longer have the original source?

The easiest way is to keep the archives directory laying about, alter
the metadata in the doc.xml files and rebuild when required. If that's
not an option (i.e. if you only intend to store the collections index
directory) then read on.

It's not possible to incrementally update MG's index files (MG is the
search engine under Greenstone). It is possible however to alter the
GDBM database that stores the document metadata and all the info used to
generate the classifier browse structures (which is what you'd need to
do for your project I think).

So, if you've got a "search by title" index there's no way to change the
title metadata of a given document and update the index appropriately
(aside from rebuilding the index entirely).

If all you want to do is change the display value of a documents
metadata and/or change the documents position in a classifier hierarchy
then it's technically possible to alter the GDBM database to do that.
The GDBM database is the *.ldb (or *.bdb on big-endian machines) file in
your collections index/text directory. You can view it using
Greenstone's db2txt utility and update it using "txt2db -append".

--- Note that since all document metadata is stored in the GDBM database
it's actually possible to rebuild metadata indexes (like a "search by
title" index) from the data stored there. Taking this approach one step
further you can create a process to turn built indexes back into XML
archives, change the required metadata then rebuild everything (at
gsdl/bin/script/ you'll find an old, out of date script to do
exactly that).

Creating utilities for altering collections that exist only in built
form might be an interesting project if someone had some time on their