[greenstone-users] encoding again

From jens wille
DateWed, 15 Mar 2006 21:30:02 +0100
Subject [greenstone-users] encoding again
hi there!

it's time for me to ask for help with some encoding problems again ;-)

i'm building a collection using mgpp (v2.62, same with v2.63), the
source files are in utf8 and are processed with HTMLPlug. however,
i'm unable to search for terms containing umlauts :-( (the archives
files are correct utf8, so what goes wrong here has to be during
build phase)
a bit of examination lead me to the assumption that my metadata are
"decoded" (where? to what encoding? why?) and then encoded to utf8
_twice_ (!) - oddly enough, this used to work with mg (though i
couldn't find any difference between mg and mgpp in this regard).

now i wanted to break my umlauts (□ => ae, ...) which i'm doing for
other diacritics (□ => c, ...) all along (using the filter_text
function; and which worked and still does - apparently!), but no
change for the umlauts: still no results.

my question now is (apart from general help regarding this problem)
how i could have a look into the mg(pp)-index, to see what mg(pp)
actually has in there. there's db2txt for the text db, but this
doesn't seem to work for the index db. (besides, the Queryer shows
the same behaviour and isn't of much help here - at least not that i
know of)

again, any help would be greatly appreciated ;-)