I have develolped a patch to allow accent folding
in mgpp for greenstone 2.62, I will soon extend
that to mg, and gstone 3.
The approach was to add another stem method bit flag,
thus, 4 extra stem methods, namely:
4 accent folding
5 accent folding and case folding
6 accent folding and word stemming
7 accent folding, case folding and word stemming
This nearly duplicates the stem indexes.
The patch includes changes in the builder in
src/mgpp/text for files mgpp_stem_idx, stemmer.cpp, mg_files.h
and the mgppbuilder.pm perl module. On the part
of the colserver and queries, GSDLQueryTerm.cpp, IndexData.cpp
and a few others. The recepcionist adds a new option in
the preferences page.
I would like to contribute the patch for inclusion in
future Greenstone releases, if you find it usefull. Both
CLACSO (my employer) and I are happy to give the
code as GPL, LGPL or any other arrangement.
I would also like to chat a bit on the changes. First,
if you like the approach of adding stem methods.
Then if you would agree to link with anothe rlibrary
(libunac) or if I would rather reimplement unicode unaccenting
within gstone (libunac is GPL).
In the next months we also plan to add spanish
and portuguese stemming, so I am oppen to suggestions there
also (I was having a look at the snowball library).
I have also made a rudimentary debian package, which
needs further work to make it compliant to the debian policy.
Red de Bibliotecas Virtuales de Ciencias Sociales
de America Latina y el Caribe de la Red Clacso