Re: [greenstone-users] encoding problem under linux

From John R. McPherson
DateMon, 3 May 2004 19:40:30 +1200
Subject Re: [greenstone-users] encoding problem under linux
In-Reply-To (4095E818-30304-gmx-net)
On Mon, May 03, 2004 at 08:35:04AM +0200, jens wille wrote:
> John R. McPherson wrote:

> >in Greenstone the metadata.xml files must be encoded using UTF-8.

> well, if i convert the metadata.xml to utf-8 after creating it, the
> collection builds, but for almost every doc.xml i get "no plugin
> could handle this file". i suppose that the doc.xml's are not
> properly encoded ('file -i doc.xml' yields charset "unknown").

That sounds like a different problem.

You can use the "iconv" program on linux to change the encoding of a file.
But I often also use it for testing if a file is utf-8 or not by
converting "from" utf-8 "to" utf-8. (The conversion will fail if either
a character is read that isn't valid in the given encoding, or it can't
be converted to a character in the destination encoding).

$ iconv -f utf-8 -t utf-8 metadata.xml > /dev/null

If it doesn't complain, then the file is valid utf-8. Otherwise it will
say something like:

iconv: illegal input sequence at position 11

If you are having problems with our XML Archives plugin on redhat, try
reading our mailing list archives, for example: