[greenstone-devel] large collections

From Richard Managh
DateTue Dec 11 11:58:39 2007
Subject [greenstone-devel] large collections
In-Reply-To (f659bc6d46fb-475572a1-ucy-ac-cy)
Hi Marios Zervas,

I believe that one of the resultant GDBM (the database greenstone uses
to store a collection's metadata) files, that the collection building
process creates has a 2GB filesize limit. So it really depends on how
much metadata you have in your collection. It's very possible to have a
collection with 4TB of data, but with a lot less than 2GB of metadata,
because for example you might have video files in your collection, that
have massive filesizes compared with a very small amount of associated
metadata. Even large image files take up a lot of space, compared to the
amount of metadata associated with them.

As far as indexed text goes, I believe there is a limit of 2GB of
compressed indexed text if you are building your collection with the
mg, or mgpp indexer, but if this is an issue, you can switch to the
lucene indexer which has no limit that I'm aware of on indexed text. I
believe that a collection has been built with the lucene indexer that
has 13 million documents and 87G of text.

For more information see:
http://wiki.greenstone.org/wiki/index.php/Building_Greenstone_collections#Are_there_any_limits_to_the_size of collections.3F



DL Consulting
Greenstone Digital Library and Digitisation Specialists

Marios Zervas wrote:

>Dear all,
>We have a large collection with 4 millions of pdf document. Do you think that will be possible to create a collection with 4TB of data amount
>greenstone-devel mailing list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-devel/attachments/20071211/f78291a9/attachment.html