Re: [greenstone-users] Database limit

From Stefan Boddie
DateTue, 14 Oct 2003 11:09:47 +1300
Subject Re: [greenstone-users] Database limit
In-Reply-To (Pine-LNX-4-33-0310140137150-1132-100000-mmsl-serc-iisc-ernet-in)
Hi Anandh,


>
> I want to build 1.5 TB data (more than 29,000 books and thesis) into
> greenstone. Is it possible? If so, how long will it take to import and
> build.? System details i've given below.
>

Do you really have 1.5 TB of text or is it 1.5 TB of postscript, pdf, or
some such?

> i have tried for 100 MB data (Text files), it took 340 minutes and 19
> seconds to finish.
>
> I have tried it in windows 2000 professional OS, 512MB RAM,and Pentium 4.
> I used gsdl 2.4 greenstone version.
>
> If it is not enough then what all resources i need to build that
> much huge amount of data, into greenstone?
>

Firstly, there's a discussion of some of the theoretical limits to the size
of a greenstone collection at
http://www.sadl.uleth.ca/nz/cgi-bin/library?a=d&c=gsarch&d=HASHf26df1aa702a90bc16ae37s95.

If you really do have 1.5 TB of text you will undoubtedly run up against one
or more of these limitations. The way to proceed may well be to split your
data up and create multiple collections from it.

Generally speaking greenstone isn't heavy on RAM so you could build a
collection as big as you like using your existing system (ignoring for the
moment the limitations mentioned above). It could take a long, long, time
though. If it took you 6 hours to build 100Mb of text you could make a rough
guess at 60 hours to build 1Gb or nearly 7 years to build 1 TB! You might
therefore consider getting the fastest machine you can lay your hands on and
building in linux instead of windows.

We've not previously built collections any bigger than a few Gb of text so
you're in unchartered territory here. All you can really do is try to build
it and see if it breaks. Let us know how you get on.

Stefan.