Re: UNICODE ISO-10646

From John R. McPherson
DateTue, 07 Jan 2003 17:27:42 +1300
Subject Re: UNICODE ISO-10646
In-Reply-To (3E1A4944-2ADFC6E8-cs-waikato-ac-nz)
"John R. McPherson" wrote:
>
> Cao Minh Kiem wrote:
> >
> > Hello everybody,
> > I am new to the list. I have no experience on Greenstone yet.
> > I know that Greenstone support UNICODE UTF-8. However, I would like to know:
> > - Does Greenstone support the encoding UNICODE (ISO 10646)?.
> > - If Yes, how to specify it in the main.cfg?.
> > - Where can I get the map file .ump?
> >


> A specially crafted .ump file might be able to trick greenstone into
> converting unicode into unicode (!!) but I'm not entirely sure on
> how that bit works. Hopefully someone else might be able to expand
> on what I've said.

Actually it's not as hard as I thought. In both the gsdl/mappings/from_uc
and gsdl/mappings/to_uc you need to create empty files called unicode.ump
and then add an entry to the main.cfg such as:

Encoding shortname=10646-1 "longname=Unicode (10646)" map=unicode.ump


The problem is that greenstone is now turning non-western characters
into question marks ("?") as far as I can tell.

Hope this helps.

John McPherson