Re: [greenstone-devel] a simple patch to allow collection builders toassign a documentidentifier (OID)

From Stefan Boddie
DateMon, 21 Jul 2003 15:27:10 -0600
Subject Re: [greenstone-devel] a simple patch to allow collection builders toassign a documentidentifier (OID)
In-Reply-To (3F1C58EB-8B03663F-cs-waikato-ac-nz)
Yup, set maxnumeric to some suitably large value in your collect.cfg
then rebuild the collection.

Stefan.

Katherine Don wrote:
> hi Stephen,
>
> this is done by MG - theres a maxnumeric variable, which defaults to 4. This was
> originally a #define, but has been changed so that it can be specified during
> indexing. there is now a -M option to mg_passes, which specifies the max number of
> digits allowed in a word.
> this can be used in your collection by adding
> maxnumeric 6 (or whatever)
> to teh collect.cfg file.
>
> Stefan B, is that all that needs to be done? will querying of the collection use
> this maxnumeric thingy too??
>
> Note, I think this is curently not available with mgpp (due to an oversight). I'll
> try and stick it in at some stage - if its urgent, let me know.
>
> Katherine Don
>
>
>
>>>One problem; we have included our 'Barcode' metadat in the default index,
>>>but when we tried to search it weirdly split search term "C10001":
>>>
>>>>Word count: C1000: 2, 1: 13
>>>>2 documents matched the query.
>>>
>>>We used the double quotes but it still split the term into 'C1000' and '1'.
>>>Any ideas what went wrong here? Does MG have problems with large numbers or
>>>other nontext characters?
>>
>>Greenstone does this on purpose for indexing numbers - it breaks them up
>>into 4-digit groups, otherwise page numbers etc could greatly increase
>>the size of the dictionary and lead to not-so-good compression.
>>
>>Unfortunately I couldn't find where this is done in the c++ code, so
>>hopefully someone who knows the code better than I do can tell you where
>>this happens.
>>
>>John
>>
>
>
>
>