Re: UNICODE ISO-10646

From John R. McPherson
DateTue, 07 Jan 2003 16:28:04 +1300
Subject Re: UNICODE ISO-10646
In-Reply-To (000201c2b65d$096dcdc0$6802a8c0-kiemcm)
Cao Minh Kiem wrote:
>
> Hello everybody,
> I am new to the list. I have no experience on Greenstone yet.
> I know that Greenstone support UNICODE UTF-8. However, I would like to know:
> - Does Greenstone support the encoding UNICODE (ISO 10646)?.
> - If Yes, how to specify it in the main.cfg?.
> - Where can I get the map file .ump?
>
> Thank you

Hmmmm. We handle "plain" unicode fine during import (as well as using
it internally). However, it looks like you can't get it to output
16-bit unicode.

The relevant function appears to be configure_encoding() in the
src/recpt/receptionist.cpp file - "utf-8" is handled specially while
everything else needs a map file.

A specially crafted .ump file might be able to trick greenstone into
converting unicode into unicode (!!) but I'm not entirely sure on
how that bit works. Hopefully someone else might be able to expand
on what I've said.

John McPherson