RE: [greenstone-devel] MGPP (2.71 build problem)

From Emanuel Dejanu
DateFri, 12 Jan 2007 10:01:52 +0200
Subject RE: [greenstone-devel] MGPP (2.71 build problem)
In-Reply-To (45A57FAC-1090608-cs-waikato-ac-nz)
Hi Michael,

The collection has 2283 documents encoded in UTF-8.
Total size of all doc.xml files is 235.5MB.
Total size of archives directory (contains PDF-files
and images) 1.97 GB.

I do not have this problem when building a smaller collection
~300, 400 documents.

My system:
- Windows XP SP2 (32bit)
- Pentium IV (HT enabled) 2.80GHz, 1.5GB of RAM
- 5 GB of free space on a SATA 150 HDD.

I can put the archives (trim pdf files to reduce size)
directory online, but I must have
your word that they will not be put public or use for
other purpose then for debug this problem.



searchtype form
indexes Title Language text allfields

levels document section

-----Original Message-----
From: Michael Dewsnip []
Sent: Thursday, January 11, 2007 2:07 AM
To: Emanuel Dejanu;
Subject: Re: [greenstone-devel] MGPP (2.71 build problem)

Hi Emanuel, Jens,

I've tried to reproduce this problem here but have been unable to do so.
How big are your collections, and what type of documents do they contain?
Any unusual encodings? Does the problem go away if you make the collection

All the best,


Emanuel Dejanu wrote:

>After upgrading greenstone from 2.53 to 2.71 I get the following error
>when building:
>numDocs: 485
>numChunkDocs: 202
>numDocsInChunk: 217
>numFrags: 5965381
>numFragsInChunk: 3230603
>chunkStartFragNum: 3228063
>num: 141838
>[num].start: 21245041
>[num].here: 21245127
>[num+1].start: 21245107
>mgpp_passes.exe : Bit buffer overrun
>and after that:
> create the weights file
>mgpp_weights_build.exe : The invf file contains skips. Unable to create
> creating 'on-disk' stemmed dictionary mgpp_invf_dict.exe : Unable
>to open "C:Program FilesGreenstonecollectunhcrbuildingidxunhcr.ii"
>First I was thinking that is a problem about my modification to
>greenstone but I get the same error also when I build with Greenstone
release 2.71.
>So there is a problem with the changes that have been done betweeen 2.53
and 2.71 to mgpp.
>I build on windows xp sp2 with active perl 5.8.8.
>Can somebody take a look over my problem.
>Best regards,
>greenstone-devel mailing list

__________ NOD32 1971 (20070110) Information __________

This message was checked by NOD32 antivirus system.