Notepad is notoriously bad for causing xml to be invalid. For example, if
you save a metadata.xml file as UTF-8 in notepad, it adds a certain
character to the start of the file which breaks a lot of xml parsers. When I
use notepad (very infrequently), afterwards, I always open the file in emacs
and remove that character in question, although it seems your problem may be
a slightly different encoding problem. Its a little confusing that you have
saved the file as UTF-8, and you are getting that message, it shouldn't be
possible to have valid UTF-8 cause that message, I would have thought your
text was valid UTF-8 but with scrambled encoding.
In any case, you could try using the GLI(Greenstone Librarian Interface) if
at all possible, this is heavily recommended as it deals appropriately with
UTF-8 text. Alternatively you could use another editor, like Emacs or
Hope this helps,
Greenstone Digital Library and Digitization Specialists
On Jan 10, 2008 8:34 AM, <Pablo.MORETE@cepal.org> wrote:
> Hello all:
> I am using Greenstone 2.74 on a PC ( Windows 2000).
> I created a collection of pdfs putting dublin core metadata in a
> metadata.xml in a forlder for a series of chapters.
> I create metadata xml in notepad and save as UTF-8
> When I run the import.pl I get this kind of message:
> *doc::add_utf8_metadata: warning: 'dc.Creator' wasn't utf8*
> *doc::add_utf8_metadata: warning: 'dc.Title' wasn't utf8*
> And then after building I can browse the collection by classifiers but
> when I search for words with Spanish accents Greenstone doesn't find them.
> I tried rebuilding with Lucene, MGPP and MG, but the problem remains.
> I appreciate any help.
> greenstone-users mailing list
-------------- next part --------------
An HTML attachment was scrubbed...