[greenstone-devel] Mega test - Statistics!!!

From Diego Spano
DateFri, 11 Feb 2005 14:20:26 -0300
Subject [greenstone-devel] Mega test - Statistics!!!
In-Reply-To (41EDB23F-2090505-cs-waikato-ac-nz)
Finally we have done the test, indexing one million pages into a unique
collection. Perhaps this information helps someone.

Greenstone version: 2.52
Server: Pentium IV, 512 Mb RAM, Windows XP Prof.
Number of indexed documents: 17655
Number of images (tiff format): 980.000
Total size of text files: 3,2 Gb
Built indexes: section:text document:Title
Used Plugin: PagedImgPlug
5 classifiers

** Some statistics **
Time to build the collection (import was made in many steps): almost 24
hours
Time to open a hierarchy node that contains 908 objects: 23 seconds
Average Time to search only one word in text index: 2 to 5 seconds
Average Time to search 3 words in text index: 2 to 5 seconds
Average Time to search exact phrases (includes 4, 5 and 6 words): 30
seconds


Greenstone presented no problems to manage all documents. Very fast
response to queries, very fast classifiers navigation.

Diego Spano
Archivo Digital
Secretaria de DD. HH.
Ministerio de Justicia y DD. HH.
Tel: 4382-6404
djspano@jus.gov.ar


-----Mensaje original-----
De: Katherine Don [mailto:kjdon@cs.waikato.ac.nz]
Enviado el: Martes, 18 de Enero de 2005 10:05 p.m.
Para: Diego Spano
CC: greenstone-users@list.scms.waikato.ac.nz; Greenstone (Devel)
Asunto: Re: [greenstone-users] Mega test !!!


Hi Diego

Greenstone expects the index files to be in the collection directory. So

my suggestion would be to put the import dir on a different disk, and
build into the c disk.

i.e.
e:import
d:archives
c:program filesgsdlcollect ucumanbuilding

then you can rename the building dir to index and it will be in the
right place.

you will need to use -importdir and -archivedir options to import.pl,
and -archivedir option to buildcol.pl.

Cheers,
Katherine Don

Diego Spano wrote:
> Hi list, I need to test Greenstone with one million pages using
> PagedImgPlug, so I have the following scenario:
>
> - a windows xp machine with 3 disks
> - greenstone installed on c:program filesgsdl
> - a collection named "tucuman" on c:program
filesgsdlcollect ucuman
> - an import dir in c:program filesgsdlcollect ucumanimport
> - an archive dir in d:archives
> - a building folder in e:building
>
>
> I build the collection using command line, so the import process is:
>
> "perl -S import.pl -archivedir d:archives -verbosity 0 tucuman"
>
> and the build process is:
>
> "perl -S buildcol.pl -builddir e:building -archivedir d:archives
> -verbosity 1 tucuman"
>
> This works ok.
>
> The question is that when I load Greenstone, it tries to read indexes
> from "c:program filesgsdlcollect ucumanindex". Where do I have to

> "tell" greenstone that it has to read indexes from "e:index"?
>
> Thanks a lot.
>
> *Diego Spano*
> Archivo Digital
> Secretaria de DD. HH.
> Ministerio de Justicia y DD. HH.
> Tel: 4382-6404
> djspano@jus.gov.ar <mailto:djspano@jus.gov.ar>
>
>
>
------------------------------------------------------------------------
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users