An alternative way you could achieve this is to implement a stemmer.
Standard stemming reduces words to their root, eg computer, computing,
computation might all stem to comput. In the index, you use the stemmed
version. Then when a query is done, the search terms are also stemmed,
and therefore will match all variants.
You could implement a stemmer that just removed the accents. So words
with accents would be mapped to those without accents in the index.
This would need to be written in C/C++. Greenstone already has two
stemers (English and a simple French one). There is a mechanism to
choose which one to use.
If you are interested in doing this, I can give you more information
about how to add a new stemmer to Greenstone.
> Thank you Tod for your help, I will going to do a paralel version to the
> Florencia Vergara Rossi
> Biblioteca - Clacso
> At 15:51 12/05/2005 -0500, you wrote:
>> >>>>> "F" == Flor <email@example.com> writes:
>> F> In Spanish and Portuguese we use a lot of accent marks in words.
>> F> When searching in our virtual library, our users sometimes include
>> in their
>> F> search the accent marks, and sometimes they do not.
>> F> And in the input sometimes our members include the accent marks and
>> F> sometimes they do not.
>> F> How can we do so that Greenstone does not consider accent marks in the
>> F> input and in the output?
>> Chopin Early Editions (http://chopin.lib.uchicago.edu/) has the same
>> issue. The problem of accents in metadata were address, but not
>> accents in the full text.
>> With the metadata, we would use two forms: one for indexing and one
>> for display. The display version of a field has the text with
>> accents. The index version can have any variants you want to match.
>> This might be easier to illustrate first with the different spellings of
>> cities. Take Leipzig. There are more spellings than what's shown below:
>> <Metadata name="PubPlace">Leipzig</Metadata>
>> <Metadata name="PubPlaceIdx">Leipsic Leipzig</Metadata>
>> The CEE publication place index is built from PubPlaceIdx, so it
>> matches any spelling, but any display is build from the version that
>> actually appears on printed score, stored in PubPlace.
>> You can apply this idea to accents, treating them as variant spellings.
>> There's an example of accents in the title in this previous email:
>> This example assumes that the user will type in without accents, but
>> could be adapted to accomodate either form of query.
>> Never figured out how to do this for the full text searching.
> greenstone-users mailing list