[greenstone-devel] character mapping in index building

From jens wille
DateMon, 08 Aug 2005 14:10:24 +0200
hi there!

i'd like to map certain characters (diacritics etc.) onto
corresponding basic forms, e.g.: '□' ('c' with cedilla) -> 'c', '□'
('a' with diaeresis) -> 'ae' and/or 'a'.
this shall apply to indexes (search) and browsing lists, not to text
display (thus the filter_text method in mgbuildproc.pm is
inappropriate - besides, this doesn't affect browsing lists).

to give an illustration:
a collection contains documents with 'Fa□ade' and 'Facade'. a search
for 'Facade' only returns documents with this spelling and not
documents with 'Fa□ade' - but it should! and considering a browsing
list consisting of these words they will be sorted at quite
different positions thus separating the respective documents -
instead of bringing them together, e.g. 'Facade (11)' and 'Fa□ade
(3)' instead of 'Facade (14)'.

i hope i made myself clear enough to have someone point me at the
appropriate place ;-)