[greenstone-users] newspapers

From Enrico Silterra
DateMon, 13 Dec 2004 16:35:17 -0500
Subject [greenstone-users] newspapers
So,
we are trying to load a collection of digitized newspapers in
greenstone.
files provided by our vendor give us
1) pdf files per page
2) pdf files per article.
3) xml files which relate these things.

We have the following problems though.
a) we need a useful forward and backward navigation. How do people do forward,
and backward navigation? I am unable to find any documentation of forw/backw
are there good examples somewhere we could follow?
b) our indexes will be for over 100 years worth of material, and we simply
cannot rebuild an index for this material simply to add an issue. How do
we implement cross "collection" searching? I think we could implement a
year or decade, as one
collection, and batch things up that way.
c) Our indexing takes phenomenal amounts of time. How does mgpp index pdf
documents?
Is some sort of ocr happening?
d) Have other people solved similar problems with their newspaper collection?
How?
Thanks in advance for hints, information or suggestions.
Rick Silterra


******************************
Enrico Silterra
Meta Data Engineer
107-E Olin Library
Cornell University
Ithaca NY 14853

Voice: 607-255-6851
Fax: 607-255-6110
E-mail: es287@cornell.edu
http://www.library.cornell.edu/cts/
******************************