Re: [greenstone-users] AZLists in very large collections

From Katherine Don
DateWed, 03 Dec 2003 08:49:09 +1300
Subject Re: [greenstone-users] AZLists in very large collections
In-Reply-To (sfcc78ec-007-apps-niwi-knaw-nl)
Hi Rene

I can only help with a tiny part of this - in the next release of
Greenstone - should be ready by next week - the export to CD facility will
work with more than one collection - so your suggestion no 2 of using cross
collection searching will be fine for CD.

regards,
Katherine Don

Rene Schrama wrote:

> Hello Stefan,
>
> I'm afraid it's not going to work at all. I tested 12.5% of the
> collection and the build process completed in less than half an hour, so
> the entire collection should complete in 4 hours, right? Wrong! It was
> still running after 24 hours and displayed an "Out of memory!" message
> (but still running). I had a fixed paging file (Windows XP) of 1 Gb so I
> changed the max size to 4096 and rebuilt the collection. Next day: out
> of memory, still running and a Windows message about the min paging file
> size being too small. It was increased by Windows but during that
> process some memory requests were denied (according to the message). The
> size of the paging file was 1.2 Gb at the time. The out of memory error
> occured during the list processing, just after the final index phase but
> before the auxiliary files processing (which was never reached). It
> would seem that the thesaurus (hierarchy classifier) was the main cause
> of the problem because processing time increased exponentially after it
> was added, so I don't think adding another hierarchy classifier will do
> any good.
>
> As for the solution, I am considering the following options:
> 1. Increase physical memory from 256 Mb to 512 Mb or even 1 Gb and use
> subcollections (will this also split up the AZLists??)
> 2. Create separate collections (e.g. Dutch, English, French, German)
> and use cross-collection searching (problem: export to CD)
> 2. Drop some of the AZLists
>
> Any ideas, comments, advice?
>
> Rene
>
> >>> "Stefan Boddie" <sjboddie@cs.waikato.ac.nz> 01-12-2003 21:28:37
> >>>
> Hi Rene,
>
> You're right that AZLists and most other classifiers don't scale well
> to
> large collections. You might need to resort to setting up a Hierarchy
> classifier to nest your documents in a more complex structure.
>
> Stefan.
>
> ----- Original Message -----
> From: "Rene Schrama" <Rene.Schrama@niwi.knaw.nl>
> To: <greenstone-users@list.scms.waikato.ac.nz>
> Sent: Monday, December 01, 2003 10:49 PM
> Subject: [greenstone-users] AZLists in very large collections
>
> > Hi,
> >
> > I just built a collection of about 6500 documents, which is about
> 12%
> > of the entire collection which consists of about 52000 documents.
> The
> > problem is that the pages of the AZLists are already a bit chunky
> but
> > after the entire collection is built they will be huge, e.g. the
> title
> > list will have about 2000 titles on one page. Did anyone try this
> > before, and are there any known solutions for this problem?
> >
> > Rene
> >
> >
> > _______________________________________________
> > greenstone-users mailing list
> > greenstone-users@list.scms.waikato.ac.nz
> > https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
> >
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users