[greenstone-users] How the word separator character is defined?

From Biligsaikhan B.
DateTue Nov 10 15:24:21 2009
Subject [greenstone-users] How the word separator character is defined?
Dear list and Katherine,

Where can I define my own "word separator character" or bypass some
characters in word separator functions in the Greenstone?

It seems like, my collection in the Greenstone is considering some
Unicode special control characters as a space. For example, according to
the Unicode standard, Mongolian text contents have four special control
characters to change shapes (glyphs). Those are 1.Free Variation
Selector One(FSV1) (U+180B), 2. Free Variation Selector Two
(FSV2)(U+180C), 3. Free Variation Selector Three (FSV3)(U+180D) and 4.
Mongolian vowel separator (MSV)(U+180E). Those control characters must
be considered as a part of the word whether are in the middle, beginning
and end of the word. For example, abc'MSV'defg is the single word, not
two words 'abc' and 'defg'. I`ve failed to retrieve such words in the
Greenstone. The Greenstone retrieves Mongolian words with control
characters as two or more separate words (several control characters
could used in a single word).

So how can I consider Mongolian control characters in the Greenstone?

Your advices and tricks needed.

Thank you,

Biligsaikhan B.

Biligsaikhan Batjargal, M.Eng.
Ph.D. Candidate, Graduate School of Science and Engineering, Ritsumeikan University
Research Assistant, Digital Humanities Center for Japanese Arts and Cultures,
Ritsumeikan University
Biwako-Kusatsu Campus
1-1-1 Noji-Higashi,
Kusatsu, Shiga 525-8577, JAPAN
Phone: +81+77-561-5202 (ex. 6731)