I had a similar problem with my CD-ROM collection. I solved it by
changing the XML header as follows:
<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
I also used "-input_encoding iso_8859_1" on all plugins. This is to
indicate that your source files are ISO-8859-1.
>>> jens wille <firstname.lastname@example.org> 03-05-2004 08:35:04 >>>
John R. McPherson wrote:
> On Sun, May 02, 2004 at 03:23:15PM +0200, jens wille wrote:
>>i have to build a collection from plain text files which contain
>>non-ascii characters - originally they are encoded in ISO-8859-1
>>the problem now is that i use these files to create a metadata.xml
>>by extracting text and inserting it into meta tags. as a consequence
>>this yields a "not well formed" metadata.xml!
> in Greenstone the metadata.xml files must be encoded using UTF-8.
well, if i convert the metadata.xml to utf-8 after creating it, the
collection builds, but for almost every doc.xml i get "no plugin
could handle this file". i suppose that the doc.xml's are not
properly encoded ('file -i doc.xml' yields charset "unknown").
thank you anyway, but i'm afraid it's not as easy as that :-(
greenstone-users mailing list