[greenstone-users] Basic questions about Greenstone

From Mike
DateWed, 09 Feb 2005 15:45:22 +0000
Subject [greenstone-users] Basic questions about Greenstone

We are planning to make available a collection of approx 600,000 -
900,000 pages of scanned documents both on-line and off-line and I was
hoping to find out the answers to some basic questions about Greenstone,
by drawing on your experiences.

The likely OS for hosting the collection is Linux

1. We plan to digitise the documents to produce TIFF files. We would
want the user to be able to search in the full text of the docs and find
a list of hits, each of which takes them to the relevant page – what
they will see is the image (.gif or .png ??) of that page. The user
should also be able to navigate through the images of the pages one at a
time, forward and backwards.

I assume we can implement this via Greenstone, but will it handle
searching possibly 900,000 pages?

Would built-in Greenstone web server handle this volume?

Also I assume the full text searching could be done on the OCR (we have
no budget for re-keying the text)?

2. As well as making this collection available on-line, some users will
need to be able to access it off-line, since they will have little or no
Internet access. Obviously a CD-ROM will be too small to hold 900,000
pages – can Greenstone export a collection to DVD-ROM?

3. If not, the other possibility would be to place the collection on a
server in say 3 or more different institutions. Most would have slow or
no access to the Internet (based in Africa), but users at the
institutions could access the collection over their local network. This
would be OK until some new docs were to be added to the collection – is
there any way to manage the update of a number of identical collections
so that they remain identical?

4. We plan to collect metadata at the time of scanning (maybe to simple
text files). Would there be any automated way to upload this to the
created collection, or would it be a matter of a very large cut and
paste exercise using the Librarian interface?

Mike Cave
Technical Development Manager
Refugee Studies Centre
University of Oxford
Queen Elizabeth House
21 St Giles
Oxford OX1 3LA
E-mail: mike.cave@qeh.ox.ac.uk
Tel: +44-1865-270262
Fax: +44-1865-270297