Re: [greenstone-users] problems with accent marks in words

From Flor
DateFri, 13 May 2005 08:40:20 -0300
Subject Re: [greenstone-users] problems with accent marks in words
In-Reply-To (20050512155149N-tao-lib-uchicago-edu)
Thank you Tod for your help, I will going to do a paralel version to the field.


Florencia Vergara Rossi
Biblioteca - Clacso

At 15:51 12/05/2005 -0500, you wrote:
> >>>>> "F" == Flor <> writes:
>F> In Spanish and Portuguese we use a lot of accent marks in words.
>F> When searching in our virtual library, our users sometimes include in
>F> search the accent marks, and sometimes they do not.
>F> And in the input sometimes our members include the accent marks and
>F> sometimes they do not.
>F> How can we do so that Greenstone does not consider accent marks in the
>F> input and in the output?
>Chopin Early Editions ( has the same
>issue. The problem of accents in metadata were address, but not
>accents in the full text.
>With the metadata, we would use two forms: one for indexing and one
>for display. The display version of a field has the text with
>accents. The index version can have any variants you want to match.
>This might be easier to illustrate first with the different spellings of
>cities. Take Leipzig. There are more spellings than what's shown below:
> <Metadata name="PubPlace">Leipzig</Metadata>
> <Metadata name="PubPlaceIdx">Leipsic Leipzig</Metadata>
>The CEE publication place index is built from PubPlaceIdx, so it
>matches any spelling, but any display is build from the version that
>actually appears on printed score, stored in PubPlace.
>You can apply this idea to accents, treating them as variant spellings.
>There's an example of accents in the title in this previous email:
>This example assumes that the user will type in without accents, but
>could be adapted to accomodate either form of query.
>Never figured out how to do this for the full text searching.