Re: building mgpp collection from non-ascii

From John R. McPherson
DateThu, 20 Feb 2003 10:16:19 +1300
Subject Re: building mgpp collection from non-ascii
In-Reply-To (4a92919485abcec425681774a5aeca84-www3-mail-post-cz)
r c wrote:
> Hi all,
> I’m trying to build MGPP collection from pages that contain
> non-ascii characters (encoding Windows 1250). Import as well as
> building went smoothly but displayed results contain "cabalistic
> characters" (as soomeone pointed before).
> I’m sure that
> -import was OK (I made 2nd mg collection from the same archive
> files). -encoding preferences for receptionist were set (Windows
> 1250, UTF-8 later on ; mg collection is displayed right)
> The metadata is the only thing I can see well- ie. when I get
> search results there may be displayed First200 metadata which
> contains all non-ascii characters(I read that mgpp doesn’t
> compress metadata by default). I can’t search for strings
> that contain non-ascii characters at all(zero hits).
> Is there anyone who built *mgpp* collection from non-ascii
> encoded files? Please, let me know what I am doing wrong. Here
> are both collects.cfg and some building messages - (interesting
> is the different number of reported bytes for both collections -
> don't be stressed by the rest amount of text).

I use mgpp successfully on non-ascii collections. I recall there
was a problem where mgpp was returning text as utf-8 instead of
plain unicode, and greenstone ended up mangling some text. I think
this was fixed *after* 2.38 was released, so you can either follow
the instructions on to compile greenstone from
CVS sources, or if that sounds too daunting you will have to
wait for the next stable release, which is currently going through
testing and translation and hopefully isn't too far off.

Hope this helps

John McPherson