Re: [greenstone-users] Problem Unicode

From Tim Finney
DateMon, 13 Dec 2004 20:44:54 +0800
Subject Re: [greenstone-users] Problem Unicode
In-Reply-To (1101674616-9192-8-camel-puriri-cs-waikato-ac-nz)
Dear John

I am convinced that the metadata.xml file is UTF-8. When I spell Cronert
with the umlaut, the corresponding HTML page fails to appear in the
resultant collection.

Please find attached the offending metadata.xml file and the config file
that I use. I have a separate directory for each HTML document -- the
directory contains the HTML document, associated image files, and the
metadata.xml file.

Best

Tim Finney

John R. McPherson wrote:

>On Fri, 2004-11-26 at 22:44, Tim Finney wrote:
>
>
>>I would like to use names that include diacritics in the metadata.xml
>>files for a collection.
>>
>>Here is an example:
>>
>><?xml version="1.0" encoding="UTF-8"?>
>><DirectoryMetadata>
>> <FileSet>
>> <FileName>.*.html</FileName>
>> <Description>
>> <Metadata name="EdID">P.Herc.208 col. 12b</Metadata>
>> <Metadata name="EdTitle">In Platonis Lysin</Metadata>
>> <Metadata name="EdCreator">W. C€nert</Metadata>
>> </Description>
>> </FileSet>
>></DirectoryMetadata>
>>
>>When I build the collection, any HTML file(s) associated with metadata
>>files that include characters like€ (LATIN SMALL LETTER O WITH
>>DIAERESIS) fail to appear.
>>
>>This happens with Fedora Core 1 + Greenstone 2.50 and Fedora Core 2 +
>>Greenstone 2.51.
>>
>>Any ideas what might be wrong?
>>
>>
>
>Hi,
>this should definitely work - all I can suggest is to double
>check that your metadata file is definitely encoded in utf-8
>unicode and not iso-8859-1 (latin) or something else.
>
>One way to check that a file is valid utf-8 is to use iconv -
>eg:
> $ iconv -f utf-8 -t utf-8 < metadata.xml
>and maybe also check the generated greenstone archives file:
> $ iconv -f utf-8 -t utf-8 < (collectiondir)/archives/.../doc.xml
>
>
>John McPherson
>
>
>


<<attachment>>
Type: text/xml
Filename: metadata.xml

.*.html P.Herc.208 In Platonis Lysin W. C€nert 1906 1804-6 Colotes Herculaneum Bodleian Library, Oxford Philosophical treatise grc Roll Description of papyrus 21.7 x 32.6 cm MS Gr. class. c. 2 0261


<<attachment>>
Type: text/html
Filename: herc.v002.0261.main.html

P.Herc.208


Images

Low resolution High resolution
thumbnail thumbnail

Transcription

See the printed edition for a transcription.



<<attachment>>
Type: text/plain
Filename: collect.cfg

creator tfinney@reltech.org
maintainer tfinney@reltech.org
public true

indexes document:text document:EdID document:EdTitle document:EdCreator document:RefTitle document:RefCreator document:MsID document:MsDateText document:MsProvenance document:MsGenre document:OtherIDs

defaultindex document:text

plugin GAPlug
plugin HTMLPlug
plugin ArcPlug
plugin RecPlug -use_metadata_files

classify AZList -buttonname "Creator" -metadata RefCreator
classify AZList -buttonname "Title" -metadata EdTitle
classify AZList -buttonname "Genre" -metadata MsGenre
classify AZList -buttonname "Date" -metadata MsDateNum
classify Hierarchy -buttonname "EdID" -hfile P.Herc.EdID.hier.txt -metadata EdID
classify List -buttonname "OtherIDs" -metadata OtherIDs

collectionmeta collectionname "P.Herc."
collectionmeta collectionextra "The Herculaneum Papyri"
collectionmeta .document:text "papyrus texts"
collectionmeta .document:EdID "papyrus IDs"
collectionmeta .document:EdTitle "editorial titles"
collectionmeta .document:EdCreator "editors"
collectionmeta .document:RefTitle "canonical titles"
collectionmeta .document:RefCreator "authors"
collectionmeta .document:MsID "other IDs"
collectionmeta .document:MsDateText "dates"
collectionmeta .document:MsProvenance "provenances"
collectionmeta .document:MsGenre "genres"
collectionmeta .document:OtherIDs "other IDs"

format CL1VList '
<td>[link][icon][/link]</td>
<td><strong>
{If}{[EdID],[EdID],[Title]}</strong>
{If}{[RefCreator],<br/>[RefCreator]}
</td>
'

format CL2VList '
<td>[link][icon][/link]</td>
<td><strong>
{If}{[EdID],[EdID],[Title]}</strong>
{If}{[EdTitle],<br/>[EdTitle]}
</td>
'

format CL3VList '
<td>[link][icon][/link]</td>
<td><strong>
{If}{[EdID],[EdID],[Title]}</strong>
{If}{[MsGenre],<br/>[MsGenre]}
</td>
'

format CL4VList '
<td>[link][icon][/link]</td>
<td><strong>
{If}{[EdID],[EdID],[Title]}</strong>
{If}{[MsDateText],<br/>[MsDateText]}
</td>
'

format CL5VList '
<td>[link][icon][/link]</td>
<td><strong>
{If}{[EdID],[EdID],[Title]}</strong>
{If}{[EdTitle],<br/>[EdTitle]}
</td>
'

format CL6VList '
<td>[link][icon][/link]</td>
<td><strong>
{If}{[OtherIDs],[OtherIDs],[Title]}</strong>
{If}{[EdTitle],<br/>[EdTitle]}
</td>
'

format DocumentHeading ''

format DocumentButtons ''

format DocumentText '
<table width="537" border="0" align="center"> <tr> <th> [EdID] </th> </tr> <tr> <th> [EdTitle] </th> </tr>
<tr> <th> [EdCreator] </th> </tr> </table> <hr width="537" align="center"/>
<table width="537" border="0" align="center"> <tr> <td width="134"></td> <td width="402"></td> </tr> {If}{[OtherIDs],<tr> <td valign="top">Alternative ID</td> <td valign="top">[OtherIDs]</td> </tr>} {If}{[EdDate],<tr> <td valign="top">Publication date</td> <td valign="top">[EdDate]</td> </tr>}
{If}{[RefTitle],<tr> <td valign="top">Canonical title</td> <td valign="top">[RefTitle]</td> </tr>}
{If}{[RefCreator],<tr> <td valign="top">Author</td> <td valign="top">[RefCreator]</td> </tr>}
{If}{[MsID],<tr> <td valign="top">Other IDs</td> <td valign="top">[MsID]</td> </tr>}
{If}{[MsDateText],<tr> <td valign="top">Date</td> <td valign="top">[MsDateText]</td> </tr>}
{If}{[MsProvenance],<tr> <td valign="top">Provenance</td> <td valign="top">[MsProvenance]</td> </tr>}
{If}{[MsLocation],<tr> <td valign="top">Location</td> <td valign="top">[MsLocation]</td> </tr>}
{If}{[MsDescription],<tr> <td valign="top">Description</td> <td valign="top">[MsDescription]</td> </tr>}
{If}{[MsSubject],<tr> <td valign="top">Subject</td> <td valign="top">[MsSubject]</td> </tr>}
{If}{[MsGenre],<tr> <td valign="top">Genre</td> <td valign="top">[MsGenre]</td> </tr>}
{If}{[MsFormat],<tr> <td valign="top">Format</td> <td valign="top">[MsFormat]</td> </tr>}
{If}{[MsMaterial],<tr> <td valign="top">Material</td> <td valign="top">[MsMaterial]</td> </tr>}

</table> [Text]'

collectionmeta iconcollection /gsdl/collect/P.Herc./images/P.Herc..gif
collectionmeta iconcollectionsmall /gsdl/collect/P.Herc./images/P.Herc.sm.gif