Re: [greenstone-users] AZLists in very large collections

From Rene Schrama
DateTue, 02 Dec 2003 11:34:29 +0100
Subject Re: [greenstone-users] AZLists in very large collections
Hello Stefan,

I'm afraid it's not going to work at all. I tested 12.5% of the
collection and the build process completed in less than half an hour, so
the entire collection should complete in 4 hours, right? Wrong! It was
still running after 24 hours and displayed an "Out of memory!" message
(but still running). I had a fixed paging file (Windows XP) of 1 Gb so I
changed the max size to 4096 and rebuilt the collection. Next day: out
of memory, still running and a Windows message about the min paging file
size being too small. It was increased by Windows but during that
process some memory requests were denied (according to the message). The
size of the paging file was 1.2 Gb at the time. The out of memory error
occured during the list processing, just after the final index phase but
before the auxiliary files processing (which was never reached). It
would seem that the thesaurus (hierarchy classifier) was the main cause
of the problem because processing time increased exponentially after it
was added, so I don't think adding another hierarchy classifier will do
any good.

As for the solution, I am considering the following options:
1. Increase physical memory from 256 Mb to 512 Mb or even 1 Gb and use
subcollections (will this also split up the AZLists??)
2. Create separate collections (e.g. Dutch, English, French, German)
and use cross-collection searching (problem: export to CD)
2. Drop some of the AZLists

Any ideas, comments, advice?


>>> "Stefan Boddie" <> 01-12-2003 21:28:37
Hi Rene,

You're right that AZLists and most other classifiers don't scale well
large collections. You might need to resort to setting up a Hierarchy
classifier to nest your documents in a more complex structure.


----- Original Message -----
From: "Rene Schrama" <>
To: <>
Sent: Monday, December 01, 2003 10:49 PM
Subject: [greenstone-users] AZLists in very large collections

> Hi,
> I just built a collection of about 6500 documents, which is about
> of the entire collection which consists of about 52000 documents.
> problem is that the pages of the AZLists are already a bit chunky
> after the entire collection is built they will be huge, e.g. the
> list will have about 2000 titles on one page. Did anyone try this
> before, and are there any known solutions for this problem?
> Rene
> _______________________________________________
> greenstone-users mailing list