Re: [greenstone-users] Explode cds/isis metadata database: problem withdiacritics

From Michael Dewsnip
DateFri, 16 Dec 2005 14:30:14 +1300
Subject Re: [greenstone-users] Explode cds/isis metadata database: problem withdiacritics
In-Reply-To (OF9F22C513-7BC437A5-ON842570D7-005753A3-842570D7-00584144-cepal-org)
Dear Pablo,

You need to specify Latin 1 as the input encoding when exploding the
metadata database. Unfortunately, due to an oversight on our part, the
only options for "input_encoding" are "auto", "ascii", "utf8" and "unicode".

I have updated the explode_metadata_database.pl script to include all of
the encodings supported by Greenstone, including Latin 1 (iso_8859_1).
You can download the updated script from
http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp/explode_metadata_database.pl.
You should put this in your Greenstone "bin/script" directory,
overwriting the existing file. Then, re-explode the database and specify
"iso_8859_1" as the input encoding, and everything should be fine.

Thanks for pointing out this problem.

All the best,

Michael

Pablo.MORETE@cepal.org wrote:

>Dear all:
> I am trying to implement the "explode" feature for
>my cds/isis database in order to index the full text of the pdf files
>whose location is provided in one of the database fields.
>I first import the cds/isis database using Latin 1 as input encoding in
>the ISISPlug configuration and accent marks used in Spanish are displayed
>well. But when I explode the metadata database and then treat it with
>NULPlug all diacritics get replaced by strange symbols.
>I apreciate any idea on how to solve this problem.
>Thanks,
>
>Pablo
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20051214/5cd27df5/attachment.html
>_______________________________________________
>greenstone-users mailing list
>greenstone-users@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
>
>