[greenstone-users] Migration of Existing Digital Library to Greenstone

From Mike Cave
DateMon, 30 Oct 2006 16:42:21 +0000
Subject [greenstone-users] Migration of Existing Digital Library to Greenstone

At Forced Migration Online we are about to make a funding application to
expand our Forced Migration Online Digital Library
(http://fmo.qeh.ox.ac.uk/fmo/). This currently consists of 9715 items
(total of approx. 220,000 pages). These items are comprised of:

a) 4066 full text documents
b) 5649 full text articles from 5 Journals

Searching is possible in both the full text of the items, and in their
metadata. The source material for the current library was a mixture of
either born digital PDFs or scanned + OCR'd paper originals.

The Digital Library is currently driven by an application from a company
called Olive Software (http://www.olivesoftware.com/) and is a
proprietary solution. To enhance / expand our Library for the long term
(10 years +) we first want to migrate it to an Open Source solution □
and we think Greenstone is that solution.

We don□t know a great deal about Greenstone, and I was wondering whether
I could ask some questions about it so that we are better informed
before making our funding application.


1. The big question is how we perform the migration.

Currently in our Digital Library, each page of content (whether part of
a Document or Journal Article) consists of:

.png file (page image)
full text of the page (XML) derived from the OCR
metadata (XML) describing the structure of the page

Each document / journal article has structural metadata, plus keywords
for browsing / searching, again all XML.

Because of the size of the migration task, we were wondering whether you
know of any third parties that undertake this kind of work?


2. Can Greenstone handle searching in both full-text and metadata?


3. Would the full text searching be done on the OCR (or would we have to
budget for re-keying the text)?


4. As well as making this library available on-line, some users will
need to be able to access subsets of it off-line, since they will have
little or no Internet access. Can Greenstone export a subset of a
library to CD-ROM / DVD-ROM? And would such a portable version have
full-text / metadata searching functionality available?


5. Once the migration stage was complete we would add new material □
either scanned + OCRd or in many cases from PDF originals. We create
metadata at the time of scanning in XML. Would there be any automated
way to add this to the uploaded items, or would it be a matter of a
large cut and paste exercise using the Librarian interface?


6. We can host the future library under either Linux or Windows □ are
there any particular advantages to either platform?

Mike Cave

Technical Development Manager
Forced Migration Online
Refugee Studies Centre
University of Oxford
Department of International Development
Mansfield Road, Oxford OX1 3TB
United Kingdom

Tel: +44 (0)1865 270262
Fax: +44 (0)1865 270297
E-mail: mike.cave@qeh.ox.ac.uk