[greenstone-users] Out of Memory error in buildcol.pl

From Arthur R. Belanger
DateThu Feb 28 02:52:41 2008
Subject [greenstone-users] Out of Memory error in buildcol.pl

I am building a collection of MARCXML records. I've designed and built the
collection with a small number of records, 186, to make sure everything works
before loading a much larger set, 500,000. I've been rebuilding the collection
with larger and larger numbers of records. My last rebuild had 162,315 records
and failed during the "creating the info database and processing associated
files" phase of buildcol.pl.

I had seen this with a larger set, 286,000, which is why I reduced the number
and started building it up. I am running GS 2.75 on Red Hat Linux AS4. I had
originally had just 1 GB of RAM and 2 GB of swap space. I increased the swap
space to 6 GB and got the same error. I added another 4 GB of RAM and still get
the error.

I had been using the mgpp buildtype and switched to lucene to see if the
-incremental option to buildcol.pl would help;it didn't. I suspect this has
something to do with the way the txt2db program works.

Is there anything I can do short of making several smaller collections as part
of a super collection and do cross collection searching?

If it is of any help, here is part of the collect.cfg file:

buildtype lucene

indexes Title Subject Creator Keyword
defaultindex Title
indexoptions accentfold casefold stem

levels document
defaultlevel document

classify AZCompactList -metadata Subject -sort Title -buttonname
Subjects A-Z"
classify Hierarchy -hlist_at_top -metadata Subject -sort Title
-buttonname "Subject Trees"
classify AZCompactList -metadata Title -mingroup 2
classify AZCompactList -metadata Creator -buttonname Authors

plugin MARCXMLPlug
plugin GAPlug
plugin ArcPlug
plugin RecPlug


Here are the last few lines of output from buildcol.pl:
GAPlug: processing HASH0141.dir/doc.xml
GAPlug: processing HASH8fc0.dir/doc.xml
GAPlug: processing HASH79ad.dir/doc.xml
GAPlug: processing HASH01e9.dir/doc.xml
Out of memory!
Out of memory!
Out of memory!
Callback called exit.
END failed--call queue aborted.
Out of memory!
Thanks in advance for any light you can shed on this problem.
Arthur Belanger
Medical Library System Manager
ITS Academic Media & Technology
Yale University
PO Box 208065
New Haven, CT 06520-8065

(203) 785-6928
(203) 737-2859, fax