Re: [greenstone-devel] a simple patch to allow collection builders to assign a documentidentifier (OID)

DateTue, 22 Jul 2003 09:00:46 +0930
Subject Re: [greenstone-devel] a simple patch to allow collection builders to assign a documentidentifier (OID)

Thanks for the tip;

Here is the code;

I thought about adding it as a flag to 'RecPlug' as our case involves getting the metadata from the metadata.xml file- but it seems to me that it would be more flexible if it was in and had the ability to specify which metadata field to draw the OID from.



Stephen De Gabrielle
Digitisation Officer
AraDA Project

Northern Territory University Library
Tel: (08) 8946 7009 from overseas: 61 8 8946 7009
Postal address: P.O.Box 41246, Casuarina, NT, 0811, Australia
CRICOS Provider No: 00300K

"John R. McPherson" <>
21/07/2003 07:43 PM ZE12

Subject: Re: [greenstone-devel] a simple patch to allow collection builders to assign a documentidentifier (OID)

On Mon, Jul 21, 2003 at 04:37:30PM +0930, wrote:
> Hi,
> We needed the ability to assign our own unique identifiers to greenstone
> documents- in our case the 'Hash' and 'Incremental' methods of assigning
> identifiers were not suitable as we would like greenstone to use the
> identifiers created by our others systems.
> We have put together a simple patch to allow an new value for the -OIDtype
> flag when using
> to call it simply type;
> -OIDtype barcode -removeold ntlier
> and include a <Metadata name="Barcode">C10001</Metadata> for each document

If you have a custom plugin, plugins can override the Identifier for a
document - you just override the default function which calls the BasPlug
one if the plugin doesn't have it (set_OID()). Some of the plugins do this:
eg the SplitPlug adds a section identifier onto the end of the hash, the
BibTex plugin uses the reference name, and the Database plugin can use any
field out of a database.

But the above is a good idea. Someone will probably add it in :p

> One problem; we have included our 'Barcode' metadat in the default index,
> but when we tried to search it weirdly split search term "C10001":
> >Word count: C1000: 2, 1: 13
> >2 documents matched the query.
> We used the double quotes but it still split the term into 'C1000' and '1'.
> Any ideas what went wrong here? Does MG have problems with large numbers or
> other nontext characters?

Greenstone does this on purpose for indexing numbers - it breaks them up
into 4-digit groups, otherwise page numbers etc could greatly increase
the size of the dictionary and lead to not-so-good compression.

Unfortunately I couldn't find where this is done in the c++ code, so
hopefully someone who knows the code better than I do can tell you where
this happens.