Re: [greenstone-users] doc::add_utf8_metadata: warning: 'gsdlassocfile' wasn't utf8

From John R. McPherson
DateMon, 28 Aug 2006 20:22:25 +1200
Subject Re: [greenstone-users] doc::add_utf8_metadata: warning: 'gsdlassocfile' wasn't utf8
In-Reply-To (7-0-1-0-2-20060825234353-049d0100-namweb-com-na)
On Mon, Aug 28, 2006 at 08:38:02AM +0100, Renate Morgenstern wrote:
> Hi Michael,
>
> These are JPG files and in the metadata I have German Umlaute €,€, etc.)
> The error message is only coming up in those
> images where these characters appear in the dc.Title.
> I have changed the input and default encodings to
> various options, but I sill get this message.
> Below is part of the log file - you can see that
> it only occurs where the title contains diacritics.
> In the A-Z Compact lists on the display the
> diacritics are displayed as follows:
> <http://localhost:1025/gsdl?e=d-0-00-damals--00-1-0--0prompt-10---4------0-1l--1-en-50---20-about---00031-001-1-0utfZz-8-00&a=d&c=damals&cl=CL3.6>
> open this section of the library and view contents
> Günter Göttling (1) (should be €nter €tttling)

What has happened here is that the€ (which is 2 bytes in utf-8) has
been converted as if it was Latin/iso-8859-1, turning it into 4 bytes.
This might be a bug in the ImagePlug plugin.

> What input and default encoding should I use?

The depends entirely on what encoding your files are using. This is also
used to tell greenstone what encoding the file's *contents* are in, and
not the filename as read in off disk.


> link. Attempting to copy file: C:Program
> FilesGreenstonecollectdamalsimport1800 bis
> 19160001 Alter Baaiweg nach L?deritzbucht.jpg ->
> C:Program
> FilesGreenstonecollectdamalsarchivesHASH018a.dir0001
> Alter Baaiweg nach L?deritzbucht.jpg
> import.pl> doc::add_utf8_metadata: warning: 'gsdlassocfile' wasn't utf8

This warning isn't too bad, it means that the filename isn't in utf-8,
and it has been converted into utf-8 for greenstone (by assuming that
the original filename encoding is Latin/iso-8859-1).

John McPherson