Re: [greenstone-users] restriction on number of collections?

From Michael Dewsnip
DateFri, 22 Sep 2006 12:16:44 +1200
Subject Re: [greenstone-users] restriction on number of collections?
In-Reply-To (1158288004-2795-27-camel-develop-sytes-net)
Hi Colin,

>Is there a restriction on the number of collections you can service from
>one Greenstone installation? Does anyone have experience with servicing
>a large number of collections, say 30?
Having many collections will have a small effect on the speed of page
responses, as Greenstone will read the collect.cfg files for each. I
doubt that this effect would be noticable unless you had hundreds of
collections, however.

>Does having multiple collections
>rather than one consolidated collection dramatically slow searching?
I assume you are talking about cross-collection searching here. The way
this works is that each collection is searched individually, and then
the results are merged to get the final set of search results. This will
certainly be slower than searching one complete collection, but I'm not
sure by how much.

Having one collection will be faster and a lot simpler to maintain, so
this is the ideal situation if you can manage it (see below).

>I read the archives and there is mention here and there of multiple
>collections, but not in this way.
>I am looking to set up one collection per day, six days per week with
>all data from before the Current Month being in one 'master' collection.
>At the end of the Current Month, the Current Month's 30-odd collections
>would be added to the master, and we start again with a new Current
>month. So I would have 32 collections maximum in theory.
>This will stop re-building the total collection on a daily basis to
>reduce file transfer size and time to the server- time is important as
>we have a so-so link and the rebuild makes all collections off-line for
>the duration.
Your collection shouldn't be off-line while it is building. When a
collection is rebuilt the built collection ends up in the collection
"building" directory -- the live "index" directory is not touched. To
install the new version of the collection the old index directory is
moved out of the way, and then the building directory is renamed to
"index". I wouldn't expect your collection to be off-line for more than
a couple of seconds.

I assume that each day you are only uploading the new files for the day?
And the collection takes less than a day to rebuild? If these are both
true then I think daily rebuilds should be possible.

All the best,