Hello Yousef,
John McPherson answered Robert's original posting just a few minutes
after it was received. In case you missed it, I've included it below.
Looking to the future, Greenstone 3 should have proper Unicode sorting,
which will help with this problem. But, it is still a few months away from
being seriously usable...
Regards,
Michael
---------------------
As mentioned last week on one of the lists, the A-Z list is called
that because it sorts entries based on the first letter found within
the range A to Z.
The problem with sorting non-ascii characters is that different languages
sort the same characters in different orders - for example, some put
accented "a" after "z", although I doubt cyrillic would have this
problem.
If someone wrote a classifier that worked well for non-A-Z characters
then I'm sure it would be included into the base greenstone distribution.
John McPherson
"Y.Torabi" wrote:
hello all
i have this problem with arabic language plz help me that i can sort
my
document
thx in advance
Yousef Torabi
----- Original Message -----
From: "Robert Sleator" <r3l7s@yahoo.com>
To: <greenstone-devel@list.scms.waikato.ac.nz>
Sent: Thursday, December 11, 2003 4:29 AM
Subject: [greenstone-devel] AZList multi-language problem
> Hi,
>
> I'm trying to build a collection containing English,
> Spanish, and Russian documents. I have one Russian
> document in my collection, which has a Russian
> language "Title" field in the metadata.
>
> At the end of the build I get the following message:
>
> WARNING: AZList: HASH0173b0b704384094d09af62c
> metadata is empty - not classifying
>
> I have the following line in my main.cfg file:
>
> classify AZList -metadata Title
>
> When I view my built collection, the Russian document
> is missing from the A-Z list. I can search and find
> it, and if I put this line in my main.cfg file:
>
> classify DateList -metadata Date
>
> the document shows up there.
>
> If I add an English character anywhere in the metadata
> "Title" field the document reappears. The
> alphabetical sort also appears to ignore accented
> characters for sorting. What this suggests is that it
> is ignoring all
> characters outside of a certain range, and if all the
> characters in your title happen to be outside of that
> range (e.g., they're cyrillic), you're SOL.
>
> So my question is, is there a way to make this sort
> include non-ascii characters ? Presumably with a
> Russian interface it would sort a collection of
> Russian documents correctly, but I don't want a
> Russian interface.
>
>
> Environment:
> GSDL 2.4.0
> Red Hat 9
>
> Thanks for any light anyone can shed on this.
>
> Robert Sleator
>
>
>
> __________________________________
> Do you Yahoo!?
> New Yahoo! Photos - easier uploading and sharing.
> http://photos.yahoo.com/
>
> _______________________________________________
> greenstone-devel mailing list
> greenstone-devel@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-devel
>
>
_______________________________________________
greenstone-devel mailing list
greenstone-devel@list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-devel
|