|Yup, set maxnumeric to some suitably large value in your collect.cfg
then rebuild the collection.
Katherine Don wrote:
> hi Stephen,
> this is done by MG - theres a maxnumeric variable, which defaults to 4. This was
> originally a #define, but has been changed so that it can be specified during
> indexing. there is now a -M option to mg_passes, which specifies the max number of
> digits allowed in a word.
> this can be used in your collection by adding
> maxnumeric 6 (or whatever)
> to teh collect.cfg file.
> Stefan B, is that all that needs to be done? will querying of the collection use
> this maxnumeric thingy too??
> Note, I think this is curently not available with mgpp (due to an oversight). I'll
> try and stick it in at some stage - if its urgent, let me know.
> Katherine Don
>>>One problem; we have included our 'Barcode' metadat in the default index,
>>>but when we tried to search it weirdly split search term "C10001":
>>>>Word count: C1000: 2, 1: 13
>>>>2 documents matched the query.
>>>We used the double quotes but it still split the term into 'C1000' and '1'.
>>>Any ideas what went wrong here? Does MG have problems with large numbers or
>>>other nontext characters?
>>Greenstone does this on purpose for indexing numbers - it breaks them up
>>into 4-digit groups, otherwise page numbers etc could greatly increase
>>the size of the dictionary and lead to not-so-good compression.
>>Unfortunately I couldn't find where this is done in the c++ code, so
>>hopefully someone who knows the code better than I do can tell you where