Re: [greenstone-users] problem with building "large" collections

From Michael Dewsnip
DateFri, 04 Aug 2006 10:36:30 +1200
Subject Re: [greenstone-users] problem with building "large" collections
In-Reply-To (44D26194-5010100-gmx-net)
Hi Jens,

There is a 2GB limit for GDBM files (ie. the info database) -- are you
nearing this limit? (You do mention 2GB but I assume that is the total
size of the import documents, and the amount of metadata should be
somewhat less).

It might not make any difference but it would be worth trying Greenstone
2.70w, just in case this is a bug we've fixed.

Regards,

Michael

jens wille wrote:

>hi there!
>
>i got a quite intricate problem here, hope somebody can help...
>
>here's what we're doing: from a digitised mulit-volume encyclopedia
>(html + jpg) we build a collection using HTMLPlug under greenstone
>v2.62 primarily on a linux machine, but also tested on w2k3. until
>now eight volumes have been digitised, and of course all (and any
>further volumes) shall be contained in one greenstone collection.
>
>this is where the problem comes in: building a collection of up to
>five volumes usually works fine on the linux box, but adding only
>one more to the list and trying to build will fail during the infodb
>phase of buildcol.pl - without giving any error message, it just
>backs out! and that's what i don't have any explanation for (i just
>read on the mailing list today that there definitely exist
>greenstone collections of umpteen gigabytes, and ours isn't that
>large at all [~2GB for five volumes]) - does anybody have *any* idea?
>
>of course, i can give more details if desired. our current
>collection is available at [1] and log and configuration files can
>be found there, too [2]. in case this might help, i can also give
>access to my perl script that wraps up all the action.
>
>[1]
><http://linux2.fbi.fh-koeln.de/gsdl/cgi-bin/library?a=p&p=about&c=rdk-dev2&ct=1&qto=3&qt=1&l=en>
>[2] <http://linux2.fbi.fh-koeln.de/gsdl/collect/rdk-dev2/>
>
>finally, these are my observations so far:
>
>- linux [P4 2.4GHz, 1GB RAM]:
> * 1-5 volumes (in any combination) OK
> * 6 volumes (tried several combinations) *NOK*
> - no error messages (just returns with an error code indicating
> failure)
> - strace has shown a SIGSEGV in /some/ cases (but i don't know
> of any commonalities which _all_ failures might have shared -
> apart from the aforementioned fact that all crashes happened
> at the end of the build process when creating the info db)
> - top shows that buildcol.pl soaks up all the memory (3GB in
> total) while kswapd (consequently) utilizes most of the CPU
> power
>
>- w2k3 [Xeon 3.6GHz, 512 MB RAM; virtual machine]:
> * 1 volume OK
> * 4-6 volumes *NOK*
> - same result as above (except that i don't have an strace at
> hand ;-)
>
>TIA for any help, i'm pretty lost by now! (i've been struggling with
>this issue for several months now...)
>
>cheers
>jens
>
>
>_______________________________________________
>greenstone-users mailing list
>greenstone-users@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
>
>