Re: [greenstone-users] Problem Unicode

From John R. McPherson
DateTue, 14 Dec 2004 12:27:13 +1300
Subject Re: [greenstone-users] Problem Unicode
In-Reply-To (41BD8EC6-2010601-reltech-org)
On Tue, 2004-12-14 at 01:44, Tim Finney wrote:
> Dear John
>
> I am convinced that the metadata.xml file is UTF-8. When I spell Cronert
> with the umlaut, the corresponding HTML page fails to appear in the
> resultant collection.

> >>When I build the collection, any HTML file(s) associated with metadata
> >>files that include characters like รถ (LATIN SMALL LETTER O WITH
> >>DIAERESIS) fail to appear.
> >>
> >>This happens with Fedora Core 1 + Greenstone 2.50 and Fedora Core 2 +
> >>Greenstone 2.51.
> >>
> >>Any ideas what might be wrong?

Hi,
your files worked fine for me on gentoo linux with both greenstone
2.51 and greenstone 2.52.

So, things to check:
1) check that the generated file in the collection's archives/ directory
is valid utf-8. The build process will skip any archive files that
the xml parser can't parse (which includes badly encoded files).

2) check the collection's etc/fail.log file, which logs any files that
couldn't be imported. This means that they don't get marked up and
stored in the archives/ directory.

Once we've narrowed down where the problem is occurring, we can track
down why :)

John