I am building a collection of MARCXML records. I've designed and built the
collection with a small number of records, 186, to make sure everything works
before loading a much larger set, 500,000. I've been rebuilding the collection
with larger and larger numbers of records. My last rebuild had 162,315 records
and failed during the "creating the info database and processing associated
files" phase of buildcol.pl.
I had seen this with a larger set, 286,000, which is why I reduced the number
and started building it up. I am running GS 2.75 on Red Hat Linux AS4. I had
originally had just 1 GB of RAM and 2 GB of swap space. I increased the swap
space to 6 GB and got the same error. I added another 4 GB of RAM and still get
I had been using the mgpp buildtype and switched to lucene to see if the
-incremental option to buildcol.pl would help;it didn't. I suspect this has
something to do with the way the txt2db program works.
Is there anything I can do short of making several smaller collections as part
of a super collection and do cross collection searching?
If it is of any help, here is part of the collect.cfg file:
indexes Title Subject Creator Keyword
indexoptions accentfold casefold stem
classify AZCompactList -metadata Subject -sort Title -buttonname
classify Hierarchy -hlist_at_top -metadata Subject -sort Title
-buttonname "Subject Trees"
classify AZCompactList -metadata Title -mingroup 2
classify AZCompactList -metadata Creator -buttonname Authors
Thanks in advance for any light you can shed on this problem.
Medical Library System Manager
ITS Academic Media & Technology
PO Box 208065
New Haven, CT 06520-8065
(203) 737-2859, fax