Re: [greenstone-users] Problem Unicode

From John R. McPherson
DateMon, 29 Nov 2004 09:43:36 +1300
Subject Re: [greenstone-users] Problem Unicode
In-Reply-To (41A6FAEC-1040203-reltech-org)
On Fri, 2004-11-26 at 22:44, Tim Finney wrote:
> I would like to use names that include diacritics in the metadata.xml
> files for a collection.
> Here is an example:
> <?xml version="1.0" encoding="UTF-8"?>
> <DirectoryMetadata>
> <FileSet>
> <FileName>.*.html</FileName>
> <Description>
> <Metadata name="EdID">P.Herc.208 col. 12b</Metadata>
> <Metadata name="EdTitle">In Platonis Lysin</Metadata>
> <Metadata name="EdCreator">W. Crönert</Metadata>
> </Description>
> </FileSet>
> </DirectoryMetadata>
> When I build the collection, any HTML file(s) associated with metadata
> files that include characters like ö (LATIN SMALL LETTER O WITH
> DIAERESIS) fail to appear.
> This happens with Fedora Core 1 + Greenstone 2.50 and Fedora Core 2 +
> Greenstone 2.51.
> Any ideas what might be wrong?

this should definitely work - all I can suggest is to double
check that your metadata file is definitely encoded in utf-8
unicode and not iso-8859-1 (latin) or something else.

One way to check that a file is valid utf-8 is to use iconv -
$ iconv -f utf-8 -t utf-8 < metadata.xml
and maybe also check the generated greenstone archives file:
$ iconv -f utf-8 -t utf-8 < (collectiondir)/archives/.../doc.xml

John McPherson