Re: [greenstone-users] problems with accent marks in words

From Flor
DateFri, 13 May 2005 08:40:20 -0300
Subject Re: [greenstone-users] problems with accent marks in words
In-Reply-To (20050512155149N-tao-lib-uchicago-edu)
Thank you Tod for your help, I will going to do a paralel version to the field.

Regards

Florencia Vergara Rossi
Biblioteca - Clacso
vergara@clacso.edu.ar
http://www.clacso.org.ar/biblioteca

At 15:51 12/05/2005 -0500, you wrote:
> >>>>> "F" == Flor <vergara@clacso.edu.ar> writes:
>
>F> In Spanish and Portuguese we use a lot of accent marks in words.
>F> When searching in our virtual library, our users sometimes include in
>their
>F> search the accent marks, and sometimes they do not.
>F> And in the input sometimes our members include the accent marks and
>F> sometimes they do not.
>
>F> How can we do so that Greenstone does not consider accent marks in the
>F> input and in the output?
>
>Chopin Early Editions (http://chopin.lib.uchicago.edu/) has the same
>issue. The problem of accents in metadata were address, but not
>accents in the full text.
>
>With the metadata, we would use two forms: one for indexing and one
>for display. The display version of a field has the text with
>accents. The index version can have any variants you want to match.
>
>This might be easier to illustrate first with the different spellings of
>cities. Take Leipzig. There are more spellings than what's shown below:
>
> <Metadata name="PubPlace">Leipzig</Metadata>
> <Metadata name="PubPlaceIdx">Leipsic Leipzig</Metadata>
>
>The CEE publication place index is built from PubPlaceIdx, so it
>matches any spelling, but any display is build from the version that
>actually appears on printed score, stored in PubPlace.
>
>You can apply this idea to accents, treating them as variant spellings.
>There's an example of accents in the title in this previous email:
>
>http://puka.cs.waikato.ac.nz/cgi-bin/library?a=d&c=gsarch&cl=CL2.18.11&d=20041206.104521.846947295.tao-lib.uchicago.edu
>
>This example assumes that the user will type in without accents, but
>could be adapted to accomodate either form of query.
>
>Never figured out how to do this for the full text searching.
>
>-Tod