John McPherson answered Robert's original posting just a few minutes
after it was received. In case you missed it, I've included it below.
Looking to the future, Greenstone 3 should have proper Unicode sorting,
which will help with this problem. But, it is still a few months away from
being seriously usable...
As mentioned last week on one of the lists, the A-Z list is called
that because it sorts entries based on the first letter found within
the range A to Z.
The problem with sorting non-ascii characters is that different languages
sort the same characters in different orders - for example, some put
accented "a" after "z", although I doubt cyrillic would have this
If someone wrote a classifier that worked well for non-A-Z characters
then I'm sure it would be included into the base greenstone distribution.
i have this problem with arabic language plz help me that i can sort
thx in advance
----- Original Message -----
From: "Robert Sleator" <firstname.lastname@example.org>
Sent: Thursday, December 11, 2003 4:29 AM
Subject: [greenstone-devel] AZList multi-language problem
> I'm trying to build a collection containing English,
> Spanish, and Russian documents. I have one Russian
> document in my collection, which has a Russian
> language "Title" field in the metadata.
> At the end of the build I get the following message:
> WARNING: AZList: HASH0173b0b704384094d09af62c
> metadata is empty - not classifying
> I have the following line in my main.cfg file:
> classify AZList -metadata Title
> When I view my built collection, the Russian document
> is missing from the A-Z list. I can search and find
> it, and if I put this line in my main.cfg file:
> classify DateList -metadata Date
> the document shows up there.
> If I add an English character anywhere in the metadata
> "Title" field the document reappears. The
> alphabetical sort also appears to ignore accented
> characters for sorting. What this suggests is that it
> is ignoring all
> characters outside of a certain range, and if all the
> characters in your title happen to be outside of that
> range (e.g., they're cyrillic), you're SOL.
> So my question is, is there a way to make this sort
> include non-ascii characters ? Presumably with a
> Russian interface it would sort a collection of
> Russian documents correctly, but I don't want a
> Russian interface.
> GSDL 2.4.0
> Red Hat 9
> Thanks for any light anyone can shed on this.
> Robert Sleator
> Do you Yahoo!?
> New Yahoo! Photos - easier uploading and sharing.
> greenstone-devel mailing list
greenstone-devel mailing list