Re: Searching non-ASCII text

From Stefan Boddie
DateWed, 5 Feb 2003 20:40:07 +1300
Subject Re: Searching non-ASCII text
In-Reply-To (3E36A3D1-602-reltech-org)
Hi Tim,

> I have a number of Greek and Latin manuscript texts with English
> commentary. I would like to use Greenstone to build a digital library
> that can search and display the documents along with images of the
> manuscripts.
> Does anyone have any idea how I should do this? I would like to use
> Unicode for the Greek but I don't know how a Unicode search string can
> be entered using the Greenstone search interface. I can use a font that
> maps ASCII characters to Greek ones, but then whoever uses the library
> will need to install the font.

Greenstone can output to the browser in (amongst many other encodings)
utf-8, iso-8859-7, or windows code page 1253, any of which will allow you to
display your Greek documents correctly (you'll need to uncomment the
appropriate "Encoding" lines in gsdletcmain.cfg to enable the latter two).
This doesn't help you much with the problem of how your users will enter
Greek characters into the search box though. I had a similar problem
recently with a collection containing Hawaiian documents. Hawaiian text
contains six characters for which most of us don't have a key on our
keyboards (five vowels with macrons and the glottal stop character). For
that we came up with a system where there were six buttons above the search
box, each having on it an image of one of the problem characters. When the
user clicked one of the buttons some simple javascript code inserted the
appropriate character into the search box. For the six Hawaiian characters
it worked quite well, particularly since most of the text could be typed in
as plain old ascii, with just a few requiring a mouse click on the
appropriate button. I'm not sure if this approach is viable for Greek with
so many more characters to worry about.

> Also, the primary documents are in XML. I can use a stylesheet to
> convert everything to HTML before importing the documents into
> Greenstone. However, I would like to be able to use XML. Is Greenstone
> ever going to be able to work directly with XML encoded documents? I
> know that it uses XML internally, but I would like to use XML from start
> to finish, with stylesheets being used for display? It would be great to
> be able index the contents of individual elements (e.g. bibliographic
> references, names, authors, etc.)

Doing this would still require writing a new plugin I'm afraid.