[greenstone-users] Unicode data in classifiers

From Vladimir Risojevic
DateSat Nov 24 02:44:59 2007
Subject [greenstone-users] Unicode data in classifiers
Dear all,

I have a problem with using classifiers with UTF-8 data. From earlier posts
I learned that it is a known problem and can be solved using GenericList
classifier. However, I didn't have much success in that.
What I have is a collection of scanned magazines. I have created item files
like the one below.
All metadata is UTF-8 encoded. For titles I'm using
AZCompactSectionList -metadata ex.Series -sort Number
and everything works fine. However, I would like to have a classifier based
on dc.Creator data. I tried various things but to no avail.
For example,
AZCompactSectionList -metadata dc.Creator -doclevel section
gives classifier with ASCII names only.
According to suggestions from the mailing list I also tried
GenericList -metadata dc.Creator -classify_sections
However, this one results with the following error message:
An error has occured which will prevent the collection being previewed.

What is going on here? Is there a way to build an UTF-8 classifier?
Please, send me any suggestions, I didn't have much luck with this list
so far.

Below is the example of my item file:

<Metadata name="Series">AAA</Metadata>
<Metadata name="Date">19100101</Metadata>
<Metadata name="Volume">1</Metadata>
<Metadata name="Number">1</Metadata>
<Metadata name="dc.Title">xxx</Metadata>
<Metadata name="dc.Creator">abc</Metadata>
<Page pagenum="1" imgfile="001.tif"/>
<Page pagenum="2" imgfile="002.tif"/>
<Page pagenum="3" imgfile="003.tif"/>
<Page pagenum="4" imgfile="004.tif"/>
<Metadata name="dc.Title">yyy</Metadata>
<Metadata name="dc.Creator">def</Metadata>
<Page pagenum="5" imgfile="005.tif"/>
<Metadata name="dc.Title">zzz</Metadata>
<Metadata name="dc.Creator">xyz</Metadata>
<Page pagenum="5" imgfile="005.tif"/>
<Page pagenum="6" imgfile="006.tif"/>
<Page pagenum="7" imgfile="007.tif"/>
<Page pagenum="8" imgfile="008.tif"/>

Best regards,


Vladimir Risojevic
Teaching Assistant
Faculty of Electrical Engineering
University of Banjaluka
Patre 5
78000 Banjaluka
Bosnia and Herzegovina

Phone: +387 51 221 847, +387 51 221 876
Fax: +387 51 211 408
Email: vlado@etfbl.net
WWW: http://www.etfbl.net