Re: [greenstone-users] problem with building "large" collections [SOLVED! - not really]

From jens wille
DateWed, 16 Aug 2006 08:38:25 +0200
Subject Re: [greenstone-users] problem with building "large" collections [SOLVED! - not really]
In-Reply-To (EE1AF18EB3686C459A38EBECE3BD51988D674D-ex1-its-waikato-ac-nz)
hi james!

James Brunskill [15.08.2006 23:04]:
> I'm not sure if this helps, but I am building my 44GB collection
> on a machine with 2GB real ram + 2GB Swap, I don't run out of
> memory at all...
good to hear that ;-)

> The files I import are OCR'd PDFs so they are basically a high
> resolution image + some text hidden behind. This means that the
> files are a lot larger than the text content they contain, if
> your 2GB collection is mainly text you could potentially have a
> lot more index data than me.
well, our collection consists of way more text: the html and
metadata.xml files which get imported make 550M + 150M for six
volumes - nearly pure text! i wonder if there are other collections
with a comparable amount of imported metadata.

here's a more detailed listing (still for six volumes, gs v2.62):

- import: 550M (~30M per volume) + 150M (~90-100M per volume)

- archives: 700M (nearly the same amount as the import directory)

- index: 400M

> Are you running the build from within GLI? I have found the
> collection building happens much faster if I run it from the
> commandline. I'm guessing it is because there is more "real" ram
> available.
sorry, i didn't mention that, but i *always* build on the
command-line ;-)

thanks & cheers