Re: [greenstone-users] searches with special characters

From Stefan Boddie
DateFri, 6 Jun 2003 11:13:51 +1200
Subject Re: [greenstone-users] searches with special characters
In-Reply-To (24F54BFE-97A4-11D7-8F10-0050E4D04E27-reltech-org)
Hi Jimmy,

Have you got gsdl-2.39? Earlier versions of Greenstone had lots of encoding


----- Original Message -----
From: "James R. Adair" <>
To: <>
Sent: Friday, June 06, 2003 10:22 AM
Subject: Re: [greenstone-users] searches with special characters

My HTML file has this: W&ouml;rtern f&uuml;r

After Greenstone processes it, it looks like this: Wörtern €1?4r

This doesn't look like UTF-8 to me, and it certainly doesn't display
correctly. I came up with a hack that forces the correct display by
bypassing the supposed entity to UTF-8 conversion, which didn't appear
to be working right, but then I ran into a problem with searching,
described below. I've restored the default values, so the characters
are again displaying incorrectly, as above. Is there something I need
to set/configure to make HTML character entities display properly, so
that I can eventually search them? Do I need to upgrade to a more
recent version of Greenstone?


On Thursday, May 29, 2003, at 05:26 PM, John R. McPherson wrote:

> On Thu, May 29, 2003 at 02:30:19PM -0400, James R. Adair wrote:
>> I have a collection of HTML documents in which character entities like
>> &auml; (a-umlaut) are used. Is there a way to configure Greenstone to
>> search for a word that has one of these characters? If not, does
>> anyone know a hack?
> Greenstone handles non-ascii characters just fine - you can search
> and retrieve accented and non-Western characters. If you are using
> mgpp instead of mg (mg is the default) for the backend of your
> collections
> then I think there is a slight problem with the search form
> when using Microsoft Internet Explorer browser, although an update
> to one of the macro config files fixes this. However most people
> will be using mg collections anyway.
> John McPherson

greenstone-users mailing list