Re: [greenstone-devel] AZList multi-language problem

From Michael Dewsnip
DateTue, 06 Jan 2004 15:06:07 +1300
Subject Re: [greenstone-devel] AZList multi-language problem
In-Reply-To (004601c3c893$e0445c30$8207dbd9-hama)
Hello Yousef,

John McPherson answered Robert's original posting just a few minutes after it was received. In case you missed it, I've included it below.

Looking to the future, Greenstone 3 should have proper Unicode sorting, which will help with this problem. But, it is still a few months away from being seriously usable...




As mentioned last week on one of the lists, the A-Z list is called
that because it sorts entries based on the first letter found within
the range A to Z.

The problem with sorting non-ascii characters is that different languages
sort the same characters in different orders - for example, some put
accented "a" after "z", although I doubt cyrillic would have this

If someone wrote a classifier that worked well for non-A-Z characters
then I'm sure it would be included into the base greenstone distribution.

John McPherson


"Y.Torabi" wrote:

hello all
i have this problem with arabic language plz help me that i can sort my
thx in advance
Yousef Torabi

----- Original Message -----
From: "Robert Sleator" <>
To: <>
Sent: Thursday, December 11, 2003 4:29 AM
Subject: [greenstone-devel] AZList multi-language problem

> Hi,
> I'm trying to build a collection containing English,
> Spanish, and Russian documents.  I have one Russian
> document in my collection, which has a Russian
> language "Title" field in the metadata.
> At the end of the build I get the following message:
> WARNING: AZList: HASH0173b0b704384094d09af62c
> metadata is empty - not classifying
> I have the following line in my main.cfg file:
> classify AZList -metadata Title
> When I view my built collection, the Russian document
> is missing from the A-Z list.  I can search and find
> it, and if I put this line in my main.cfg file:
> classify DateList -metadata Date
> the document shows up there.
> If I add an English character anywhere in the metadata
> "Title" field the document reappears.  The
> alphabetical sort also appears to ignore accented
> characters for sorting.  What this suggests is that it
> is ignoring all
> characters outside of a certain range, and if all the
> characters in your title happen to be outside of that
> range (e.g., they're cyrillic), you're SOL.
> So my question is, is there a way to make this sort
> include non-ascii characters ?  Presumably with a
> Russian interface it would sort a collection of
> Russian documents correctly, but I don't want a
> Russian interface.
> Environment:
> GSDL 2.4.0
> Red Hat 9
> Thanks for any light anyone can shed on this.
> Robert Sleator
> __________________________________
> Do you Yahoo!?
> New Yahoo! Photos - easier uploading and sharing.
> _______________________________________________
> greenstone-devel mailing list

greenstone-devel mailing list