Re: [greenstone-users] searches with special characters

From Stefan Boddie
DateFri, 30 May 2003 10:16:34 +1200
Subject Re: [greenstone-users] searches with special characters
In-Reply-To (20030529212632-GD30800-wesson-cs-waikato-ac-nz)
Just to clarify, greenstone converts entities like ä to their raw utf-8
equivalent before indexing them. This means that the actual a-umlaut
character is indexed, not the ä entity.

To search for a word containing non-ascii characters you'll need to enter it
in the search box in the correct encoding. If you go to the greenstone
preferences page you can alter the encoding in which greenstone expects
input (and produces output). The default for recent versions of greenstone
is utf-8.


----- Original Message -----
From: "John R. McPherson" <>
To: "James R. Adair" <>
Cc: <>
Sent: Friday, May 30, 2003 9:26 AM
Subject: Re: [greenstone-users] searches with special characters

> On Thu, May 29, 2003 at 02:30:19PM -0400, James R. Adair wrote:
> > I have a collection of HTML documents in which character entities like
> > &auml; (a-umlaut) are used. Is there a way to configure Greenstone to
> > search for a word that has one of these characters? If not, does
> > anyone know a hack?
> Greenstone handles non-ascii characters just fine - you can search
> and retrieve accented and non-Western characters. If you are using
> mgpp instead of mg (mg is the default) for the backend of your collections
> then I think there is a slight problem with the search form
> when using Microsoft Internet Explorer browser, although an update
> to one of the macro config files fixes this. However most people
> will be using mg collections anyway.
> John McPherson
> _______________________________________________
> greenstone-users mailing list