Re: [greenstone-users] metadata.xml and UTF-8

From graeme @ gmail
DateThu, 29 Jun 2006 01:45:18 -0400
Subject Re: [greenstone-users] metadata.xml and UTF-8
In-Reply-To (44A363E9-1060707-cs-waikato-ac-nz)

Many thanks for your time. I eventually found out that the file format I was generating was not UTF-8 but UCS-2 (a subset of UTF-16) - this is the default file format in VBA which I was using to extract the data. I couldn't work out how to save with VBA in UTF-8 but wrote a quick script to convert it from UTF-16 to UTF-8 and now it is importing correctly.

*** General Comment ***
I would like to point out though that I believe that the error reporting could be improved. The above conversion created a few xml files that couldn't be read (fractionally over 1%) I didn't find any log record and the "DOS window" whilst displaying the errors took something like 30 lines per error, hence I was not able to determine all the errors by looking at that window. More importantly the Java interface didn't report that anything was wrong. By running a script I wrote I was able to find out which folders were missing the metadat.xml file.
*** General Comment ***



Michael Dewsnip wrote:
Hi Graeme,

Greenstone's metadata.xml files do support UTF-8, and the GLI will write
Unicode metadata into them as UTF-8. Try copying some Unicode text from
your web browser and paste it into the GLI -- it should be saved into
the metadata.xml file correctly, and Greenstone shouldn't have any
trouble importing it.



graeme @ gmail wrote:


I'm trying to set up a collection that will use the metadata.xml file
similar to the demo collection. I have been using the dls metadata
set. After an afternoon of struggling with this I found out that the
file only works in ASCII format, but I have a need to add UNICODE
text, is there a metadata set that works with UNICODE and the UTF-8
encoding, and if so which one, or ones?


greenstone-users mailing list