Re: [greenstone-devel] Rendering special characters

From John R. McPherson
DateWed, 24 Mar 2004 07:19:12 +1200
Subject Re: [greenstone-devel] Rendering special characters
In-Reply-To (BC85C16A-B28F%james-elmborg-uiowa-edu)
On Tue, Mar 23, 2004 at 10:26:50AM -0600, Jim Elmborg wrote:
> Hello All,
> After multiple builds, I'm still losing all the diacritics from non English
> names when building a library. I've searched the archives and nobody else
> seems to have had a similar problem, so I fear I'm missing something simple
> and perhaps obvious.

> On the Redhat 9 and the iMac, any character with a diacritical mark
> is removed from the library. So Bildøen becomes Bilden.
> Göritz beomes Gritz. Gábór becomes Gbr. Oddly
> enough, using the same HTML source files, the laptop renders the
> special characters without a problem. Unfortunately, I can't use
> the laptop as the library server.

> One error message repeats itself in the problem builds:
>
> Buildcol.pl> Wide character in print at /usr/local/gsdl
> perllib/mgbuildproc.pm line 519
>
> I suspect this message points me to the problem, but can't interpret it.

This gives a little hint... what version of perl is in use?
You can type "perl -v" at a command line to see the version.

You can check to see how greenstone's import process has marked up files
by looking at the .xml files created in the <collectionname>/archive
directory. If those .xml files don't have the diacritics, then it is
the perl import process at fault.

Greenstone's HTML plug converts named entities such as &Eacute; into
unicode, otherwise you wouldn't be able to search for that word.

Also, you didn't say what version of greenstone you were running.
(Greenstone version, and binary distribution or source distribution).

John McPherson