Re: [greenstone-devel] character mapping in index building

From Katherine Don
DateMon, 15 Aug 2005 13:36:15 +1200
Subject Re: [greenstone-devel] character mapping in index building
In-Reply-To (42F74BB0-4030708-gmx-net)
Hi Jens

You will need to make a change in two places:
For searching, add it to filter_text in mgbuildproc, but only do it if
you are indexing. Depending on which version on greenstone you are
using, you could add filter_text to basebuildproc, and remove the empty
ones from mg/mgppbuildprocs, then all build types can use it.


sub filter_text {
my $self = shift (@_);
my ($field, $text) = @_;

if ($self->{'indexing_text'} ) {

In, implement your filter_characters($text);

For browsing, add it to format_metadata_for_sorting in

This should be called on all metadata used for sorting in classifiers.

Hope this helps,

jens wille wrote:
> hi there!
> i'd like to map certain characters (diacritics etc.) onto
> corresponding basic forms, e.g.: '□' ('c' with cedilla) -> 'c', '□'
> ('a' with diaeresis) -> 'ae' and/or 'a'.
> this shall apply to indexes (search) and browsing lists, not to text
> display (thus the filter_text method in is
> inappropriate - besides, this doesn't affect browsing lists).
> to give an illustration:
> a collection contains documents with 'Fa□ade' and 'Facade'. a search
> for 'Facade' only returns documents with this spelling and not
> documents with 'Fa□ade' - but it should! and considering a browsing
> list consisting of these words they will be sorted at quite
> different positions thus separating the respective documents -
> instead of bringing them together, e.g. 'Facade (11)' and 'Fa□ade
> (3)' instead of 'Facade (14)'.
> i hope i made myself clear enough to have someone point me at the
> appropriate place ;-)
> tia
> jens
> _______________________________________________
> greenstone-devel mailing list