Re: [greenstone-devel] AZList multi-language problem

From John R. McPherson
DateThu, 11 Dec 2003 14:06:20 +1300
Subject Re: [greenstone-devel] AZList multi-language problem
In-Reply-To (20031211005947-44204-qmail-web41803-mail-yahoo-com)
On Wed, Dec 10, 2003 at 04:59:47PM -0800, Robert Sleator wrote:
> Hi,
>
> I'm trying to build a collection containing English,
> Spanish, and Russian documents. I have one Russian
> document in my collection, which has a Russian
> language "Title" field in the metadata.

[snip]

> If I add an English character anywhere in the metadata
> "Title" field the document reappears. The
> alphabetical sort also appears to ignore accented
> characters for sorting. What this suggests is that it
> is ignoring all
> characters outside of a certain range, and if all the
> characters in your title happen to be outside of that
> range (e.g., they're cyrillic), you're SOL.
>
> So my question is, is there a way to make this sort
> include non-ascii characters ? Presumably with a
> Russian interface it would sort a collection of
> Russian documents correctly, but I don't want a
> Russian interface.

As mentioned last week on one of the lists, the A-Z list is called
that because it sorts entries based on the first letter found within
the range A to Z.

The problem with sorting non-ascii characters is that different languages
sort the same characters in different orders - for example, some put
accented "a" after "z", although I doubt cyrillic would have this
problem.

If someone wrote a classifier that worked well for non-A-Z characters
then I'm sure it would be included into the base greenstone distribution.

John McPherson