Re: [greenstone-users] Search Problem

From Katherine Don
DateThu, 14 Sep 2006 09:52:41 +1200
Subject Re: [greenstone-users] Search Problem
In-Reply-To (45078CBF-6030407-cs-waikato-ac-nz)
Hi

I have added in support for Mongolian unicode to Greenstone, so this
will be in the next release.
If you would like to try it in your own version, here is what you need
to do.
Note that this applies to adding support for any unicode characters that
Greenstone doesn't currently support.

Edit greenstone/packages/mg/lib/unitool.c and
greenstone/src/mgpp/lib/unitool.cpp.

Look for the line
#define NUM_LETTER_INFO 274
and change it to 275.
(You need to add on the number of ranges you are entering).

Then look for
static const unirange letter_info[NUM_LETTER_INFO]

This consists of a list of unicode ranges which are letters. Add in the
range that you are interested in, e.g. for Mongolian, its {0x1800,0x18a9}
I'm not sure if these ranges need to be in order or not.

This is the only change you'll need to make (you'll need to do it to
both files). Then in the greenstone/src/mgpp, greenstone/packages/mg
directories, do make clean, make and make install (or windows equivalent).
Then in greenstone/src/colservr, do a make clean and a make.
Then in greenstone/src/recpt, do a make clean, make and make install.

Another point to note is that searching didn't work if the Mongolian was
represented as entities (e.g. ᢆ). The characters need to be in
UTF-8 in the source files.

Regards,
Katherine

Michael Dewsnip wrote:
> Hi,
>
> You're right that there is a problem here. The MG and MGPP components of
> Greenstone have a file called unitool.c/unitool.cpp, which includes
> lists of Unicode characters that are considered letters, digits, spaces,
> etc. Unfortunately these lists haven't been updated for some time, and
> don't include the Traditional Mongolian characters. Because of this,
> Greenstone will throw away your search terms because it can't find any
> letters or digits in them.
>
> Unfortunately we're on the verge of releasing Greenstone v2.71, and
> probably won't have time to fix this problem prior to the release. We'll
> try to fix it for Greenstone v2.72; in the mean-time, you can always
> edit the code yourself if you have the programming knowledge.
>
> Regards,
>
> Michael
>
>
>
> Garmaa Kh wrote:
>
>
>>Dear all,
>>
>>I am Garmaabazar Khaltarkhuu, Research student of Ritsumeikan
>>University, Japan (http://www.ritsumei.ac.jp/eng/) building
>>Traditional Mongolian Script DL based on GSDL. I had search problems
>>after building Traditional Mongolian Script collection entering
>>Unicode data (Traditional Mongolian text-Unicode chart--1800).
>>Actually my search result was unsuccessful even with single charecter
>>however I've entered Unicode data from MS Word and HTML document. I
>>think GSDL search have to work with any unicoded character. In this
>>regard could you advice me how do I check and track existing error
>>since I'm not familiar with GSDL source code?
>>
>>Thank you in advance,
>>
>>Garmaabazar Khaltarkhuu,
>>Digital Library Laboratory
>>Ritsumeikan University, Japan
>>http://www.dl.is.ritsumei.ac.jp/
>>
>>------------------------------------------------------------------------
>>
>>ᠮᠩᠴ€@ᠤ€@ᠶᠩ€@ᠵᠵ€@€@€@ᠷ€@ᡳ᠃
>>
>>ᠮᠩ€@ᠦᠳᠳ€@€@ᠵ€@ᠮᠵ€@ᠦ᠃
>>
>>ᠮᠩ€@ᠵ€@€@ᠶᠩ€@€@€@ᠭ€@€@ᠲᠳ€@ᠵ€@ᠵᠷ€@€@€@ᠨ€@ᡳ€@ᠵ€@ᠣᠷᠳᡴ€@ᡳ᠃
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>greenstone-users mailing list
>>greenstone-users@list.scms.waikato.ac.nz
>>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>>
>>
>
>
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>