Re: [greenstone-users] accents in metadata

From Dominique Babini
DateMon, 06 Dec 2004 13:55:52 -0300
Subject Re: [greenstone-users] accents in metadata
In-Reply-To (20041206-104521-846947295-tao-lib-uchicago-edu)
Thank you VERY much Tod for your kind help, we will start working on it

Tod Olson <> escribe:
>>>>>> "DB" == Dominique Babini <> writes:
>DB> What change do we have to make so that our users can search a word
>DB> indistinctly with or without accents in the metadata fields
>DB> (spanish has a lot of accents!!!!).
>We had the same problem in Chopin Early Editions
>( lots of diacritics, but lot of
>users with US keyboards. The problem was addressed by processing the
>metadata in the GSAF files. Not ideal, but it required no
>modifications to the indexing engine.
>The idea is that from any metadata field, you can create a parallel
>version of the field that is modified to suit your search criteria.
>The modified version of the metadata is used for indexing, and the
>original is used for display. Here's some real title metadata;
> <Metadata name="Title">Prélude en ut dièse mineur, op. 45</Metadata>
>After the GSAF files are created, we run them through a filter which
>looks for certain metadata fields, like Title, creates a new field
>with "Idx" appended to the name and all the diacritics stripped out:
> <Metadata name="TitleIdx">Prelude en ut diese mineur, op. 45</Metadata>
>So if you title search the collection for the string "Prelude en ut
>diese mineur", the engine will match on the TitleIdx field above, but
>the search results format string displays the Title field.
>In your case, you might consider putting both forms of the Title, with
>and without diacritics, into the TitleIdx:
> <Metadata name="Title">Prélude en ut dièse mineur, op. 45
> Prelude en ut diese mineur, op. 45</Metadata>
>This idea adapts pretty broadly to various forms of metadata. For
>example, in the same collection, someone searching for scores
>published in London should also match Londres, so we do something
>similar for the place of publication, providing all spelling of the
>city that occur in the collection.
>There are two flaws to this approach: it may mess up the relevance
>rankings, and it does not work for the full-text index.
>Tod A. Olson <> "How do you know I'm mad?" said Alice.
>Sr. Programmer / Analyst "If you weren't mad, you wouldn't have
>The University of Chicago Library come here," said the Cat.

Dra. Dominique Babini
Coordinadora Area Informaci□n

Consejo Latinoamericano de Ciencias Sociales - CLACSO
Biblioteca Virtual de Ciencias Sociales de Am□rica Latina
y el Caribe de la red de centros miembros de CLACSO
Av. Callao 875, 3ro. E
(C1023 AAB) Buenos Aires, Argentina
Tel.: (54-11) 4814-2301/4811-6588
Fax: (54-11) 4812-8459