[greenstone-users] Unicode data in classifiers

From Michael Dewsnip
DateTue Nov 27 15:38:41 2007
Subject [greenstone-users] Unicode data in classifiers
In-Reply-To (4746D96D-1080102-etfbl-net)
Hi Vladimir,

Try removing the "-sort_using_unicode_option" from GenericList, as it is
not required for basic Unicode classifiers.

Once you've got the classifier working, check if the metadata values are
sorted correctly. If they aren't then you will need to try the
"-sort_using_unicode_collation" option. To use this you first need to
download the allkeys.txt file from
http://www.unicode.org/Public/UCA/latest/allkeys.txt and put it in the
Perl "Unicode/Collate" directory.

If you get errors when building the collection try changing the GLI into
Expert mode (File -> Preferences) and see if the underlying error
message is shown.

Regards,

Michael

--
DL Consulting
Greenstone Digital Library and Digitisation Specialists
contact@dlconsulting.com
www.dlconsulting.com


Vladimir Risojevic wrote:
> Dear all,
>
> I have a problem with using classifiers with UTF-8 data. From earlier
> posts
> I learned that it is a known problem and can be solved using GenericList
> classifier. However, I didn't have much success in that.
> What I have is a collection of scanned magazines. I have created item
> files
> like the one below.
> All metadata is UTF-8 encoded. For titles I'm using
> AZCompactSectionList -metadata ex.Series -sort Number
> and everything works fine. However, I would like to have a classifier
> based
> on dc.Creator data. I tried various things but to no avail.
> For example,
> AZCompactSectionList -metadata dc.Creator -doclevel section
> gives classifier with ASCII names only.
> According to suggestions from the mailing list I also tried
> GenericList -metadata dc.Creator -classify_sections
> -sort_using_unicode_collation
> However, this one results with the following error message:
> An error has occured which will prevent the collection being previewed.
>
> What is going on here? Is there a way to build an UTF-8 classifier?
> Please, send me any suggestions, I didn't have much luck with this
> list so far.
>
> Below is the example of my item file:
>
> <PagedDocument>
> <Metadata name="Series">AAA</Metadata>
> <Metadata name="Date">19100101</Metadata>
> <Metadata name="Volume">1</Metadata>
> <Metadata name="Number">1</Metadata>
> <PageGroup>
> <Metadata name="dc.Title">xxx</Metadata>
> <Metadata name="dc.Creator">abc</Metadata>
> <Page pagenum="1" imgfile="001.tif"/>
> <Page pagenum="2" imgfile="002.tif"/>
> <Page pagenum="3" imgfile="003.tif"/>
> <Page pagenum="4" imgfile="004.tif"/>
> </PageGroup>
> <PageGroup>
> <Metadata name="dc.Title">yyy</Metadata>
> <Metadata name="dc.Creator">def</Metadata>
> <Page pagenum="5" imgfile="005.tif"/>
> </PageGroup>
> <PageGroup>
> <Metadata name="dc.Title">zzz</Metadata>
> <Metadata name="dc.Creator">xyz</Metadata>
> <Page pagenum="5" imgfile="005.tif"/>
> <Page pagenum="6" imgfile="006.tif"/>
> <Page pagenum="7" imgfile="007.tif"/>
> <Page pagenum="8" imgfile="008.tif"/>
> </PageGroup>
> </PagedDocument>
>
> Best regards,
>
> Vladimir
>
>