Re: [greenstone-users] searches with special characters

From James R. Adair
DateThu, 05 Jun 2003 18:22:08 -0400
Subject Re: [greenstone-users] searches with special characters
In-Reply-To (20030529212632-GD30800-wesson-cs-waikato-ac-nz)
My HTML file has this: Wörtern für

After Greenstone processes it, it looks like this: W├Ârtern f├1?4r

This doesn't look like UTF-8 to me, and it certainly doesn't display
correctly. I came up with a hack that forces the correct display by
bypassing the supposed entity to UTF-8 conversion, which didn't appear to be working right, but then I ran into a problem with searching,
described below. I've restored the default values, so the characters
are again displaying incorrectly, as above. Is there something I need to set/configure to make HTML character entities display properly, so
that I can eventually search them? Do I need to upgrade to a more
recent version of Greenstone?


On Thursday, May 29, 2003, at 05:26 PM, John R. McPherson wrote:

> On Thu, May 29, 2003 at 02:30:19PM -0400, James R. Adair wrote:
>> I have a collection of HTML documents in which character entities like
>> ä (a-umlaut) are used. Is there a way to configure Greenstone to
>> search for a word that has one of these characters? If not, does
>> anyone know a hack?
> Greenstone handles non-ascii characters just fine - you can search
> and retrieve accented and non-Western characters. If you are using
> mgpp instead of mg (mg is the default) for the backend of your
> collections
> then I think there is a slight problem with the search form
> when using Microsoft Internet Explorer browser, although an update
> to one of the macro config files fixes this. However most people
> will be using mg collections anyway.
> John McPherson