Re: [greenstone-users] metadata.xml and UTF-8

From Michael Dewsnip
DateWed, 12 Jul 2006 12:05:41 +1200
Subject Re: [greenstone-users] metadata.xml and UTF-8
In-Reply-To (44A368EE-1050703-gmail-com)
Dear Graeme,

> Many thanks for your time. I eventually found out that the file format
> I was generating was not UTF-8 but UCS-2 (a subset of UTF-16) - this
> is the default file format in VBA which I was using to extract the
> data. I couldn't work out how to save with VBA in UTF-8 but wrote a
> quick script to convert it from UTF-16 to UTF-8 and now it is
> importing correctly.

Glad to hear you got it working.

> *** General Comment ***
> I would like to point out though that I believe that the error
> reporting could be improved. The above conversion created a few xml
> files that couldn't be read (fractionally over 1%) I didn't find any
> log record and the "DOS window" whilst displaying the errors took
> something like 30 lines per error, hence I was not able to determine
> all the errors by looking at that window. More importantly the Java
> interface didn't report that anything was wrong. By running a script I
> wrote I was able to find out which folders were missing the
> metadat.xml file.
> *** General Comment ***

Point taken, and we are always trying to improve things like this.
However, if you're advanced enough to be generating your own
metadata.xml files then you should probably have the GLI in a higher
mode than the default Librarian! If you had switched the GLI into Expert
mode I think you would have seen the errors.

All the best,

Michael

> Michael Dewsnip wrote:
>
>>Hi Graeme,
>>
>>Greenstone's metadata.xml files do support UTF-8, and the GLI will write
>>Unicode metadata into them as UTF-8. Try copying some Unicode text from
>>your web browser and paste it into the GLI -- it should be saved into
>>the metadata.xml file correctly, and Greenstone shouldn't have any
>>trouble importing it.
>>
>>Regards,
>>
>>Michael
>>
>>
>>
>>graeme @ gmail wrote:
>>
>>
>>
>>>Greetings,
>>>
>>>I'm trying to set up a collection that will use the metadata.xml file
>>>similar to the demo collection. I have been using the dls metadata
>>>set. After an afternoon of struggling with this I found out that the
>>>file only works in ASCII format, but I have a need to add UNICODE
>>>text, is there a metadata set that works with UNICODE and the UTF-8
>>>encoding, and if so which one, or ones?
>>>
>>>graeme.
>>>
>>>_______________________________________________
>>>greenstone-users mailing list
>>>greenstone-users@list.scms.waikato.ac.nz
>>>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>>>
>>>
>>>
>>>
>>
>>
>>
>>