[greenstone-users] encoding problem

From Richard Managh
DateFri Jan 18 09:42:44 2008
Subject [greenstone-users] encoding problem
In-Reply-To (OF14C29A18-A71AB894-ON842573CB-006ABB08-842573CB-006B8DD2-cepal-org)
Hi Pablo,

Notepad is notoriously bad for causing xml to be invalid. For example, if
you save a metadata.xml file as UTF-8 in notepad, it adds a certain
character to the start of the file which breaks a lot of xml parsers. When I
use notepad (very infrequently), afterwards, I always open the file in emacs
and remove that character in question, although it seems your problem may be
a slightly different encoding problem. Its a little confusing that you have
saved the file as UTF-8, and you are getting that message, it shouldn't be
possible to have valid UTF-8 cause that message, I would have thought your
text was valid UTF-8 but with scrambled encoding.

In any case, you could try using the GLI(Greenstone Librarian Interface) if
at all possible, this is heavily recommended as it deals appropriately with
UTF-8 text. Alternatively you could use another editor, like Emacs or
Textpad 4.

Hope this helps,

Richard.
--
DL Consulting
Greenstone Digital Library and Digitization Specialists
contact@dlconsulting.com
www.dlconsulting.com

On Jan 10, 2008 8:34 AM, <Pablo.MORETE@cepal.org> wrote:

>
> Hello all:
> I am using Greenstone 2.74 on a PC ( Windows 2000).
> I created a collection of pdfs putting dublin core metadata in a
> metadata.xml in a forlder for a series of chapters.
> I create metadata xml in notepad and save as UTF-8
> When I run the import.pl I get this kind of message:
>
> *doc::add_utf8_metadata: warning: 'dc.Creator' wasn't utf8*
> *doc::add_utf8_metadata: warning: 'dc.Title' wasn't utf8*
>
> And then after building I can browse the collection by classifiers but
> when I search for words with Spanish accents Greenstone doesn't find them.
> I tried rebuilding with Lucene, MGPP and MG, but the problem remains.
> I appreciate any help.
>
> Cheers
> Pablo
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20080118/00ea07c6/attachment-0001.html