[greenstone-users] problem with building "large" collections

From jens wille
DateThu, 03 Aug 2006 22:50:28 +0200
Subject [greenstone-users] problem with building "large" collections
hi there!

i got a quite intricate problem here, hope somebody can help...

here's what we're doing: from a digitised mulit-volume encyclopedia
(html + jpg) we build a collection using HTMLPlug under greenstone
v2.62 primarily on a linux machine, but also tested on w2k3. until
now eight volumes have been digitised, and of course all (and any
further volumes) shall be contained in one greenstone collection.

this is where the problem comes in: building a collection of up to
five volumes usually works fine on the linux box, but adding only
one more to the list and trying to build will fail during the infodb
phase of buildcol.pl - without giving any error message, it just
backs out! and that's what i don't have any explanation for (i just
read on the mailing list today that there definitely exist
greenstone collections of umpteen gigabytes, and ours isn't that
large at all [~2GB for five volumes]) - does anybody have *any* idea?

of course, i can give more details if desired. our current
collection is available at [1] and log and configuration files can
be found there, too [2]. in case this might help, i can also give
access to my perl script that wraps up all the action.

[1]
<http://linux2.fbi.fh-koeln.de/gsdl/cgi-bin/library?a=p&p=about&c=rdk-dev2&ct=1&qto=3&qt=1&l=en>
[2] <http://linux2.fbi.fh-koeln.de/gsdl/collect/rdk-dev2/>

finally, these are my observations so far:

- linux [P4 2.4GHz, 1GB RAM]:
* 1-5 volumes (in any combination) OK
* 6 volumes (tried several combinations) *NOK*
- no error messages (just returns with an error code indicating
failure)
- strace has shown a SIGSEGV in /some/ cases (but i don't know
of any commonalities which _all_ failures might have shared -
apart from the aforementioned fact that all crashes happened
at the end of the build process when creating the info db)
- top shows that buildcol.pl soaks up all the memory (3GB in
total) while kswapd (consequently) utilizes most of the CPU
power

- w2k3 [Xeon 3.6GHz, 512 MB RAM; virtual machine]:
* 1 volume OK
* 4-6 volumes *NOK*
- same result as above (except that i don't have an strace at
hand ;-)

TIA for any help, i'm pretty lost by now! (i've been struggling with
this issue for several months now...)

cheers
jens