From | Stefan Boddie |
Date | Tue, 31 Oct 2006 10:47:40 +1300 |
Subject | Re: [greenstone-users] Migration of Existing Digital Library toGreenstone |
In-Reply-To | (45462B6D-9020709-qeh-ox-ac-uk) |
Hi Mike,At Forced Migration Online we are about to make a funding application to expand our Forced Migration Online Digital Library (http://fmo.qeh.ox.ac.uk/fmo/). This currently consists of 9715 items (total of approx. 220,000 pages). These items are comprised of:Um, yes, we do (http://www.dlconsulting.com). Please email me off list if you're interested in having us help you with this. To answer the question though, the best approach for migrating your data is probably to write a customized Greenstone plugin for importing the PNG images and XML files that make up your data. -----------------------------------------------------------------------------------------Yes. -----------------------------------------------------------------------------------------Yes, Greenstone can certainly search your OCR'ed text. Keep in mind though that the search is only as good as the text you index so the more accurate the text is the better. Having said that, we've built many collections using text from OCR and searching is quite reasonable (and presumably the existing Olive-based system searches the OCR'ed text, right?) -----------------------------------------------------------------------------------------We've done something similar to this for another client, and we effectively just build a version of the collection that includes only the subset of documents that they require on the CD-ROM. When run from CD-ROM/DVD Greenstone looks and operates exactly as it does on the web, including full-text and metadata searching. -----------------------------------------------------------------------------------------I'd recommend developing a plugin for importing the metadata directly from your XML files, as mentioned above. To add new content you'd then simply drop the new data in the source directory, along with the old data, then rebuild the collection. -----------------------------------------------------------------------------------------Greenstone is a simple CGI application so it'll run fine under most any operating system and web server. My own preference is for linux and apache, as they're nice and stable, and in the past Windows has perhaps not been quite so stable. There's little between them these days as web server platforms though I don't think, but you might take some of the following into account when making your decision. (a) Who will be installing Greenstone, building/rebuilding the collection, etc., and how? We typically build Greenstone collections from the command line, and a linux-based server makes it nice and easy to log in remotely to do that. (b) What existing IT systems do you have? If you have primarily a Windows-based system the benefits of a linux server may be outweighed by the additional headache of maintaining it. I hope some of this helps. Regards, Stefan Boddie DL Consulting Ltd. |