Re: [greenstone-devel] Cannot search and index

From John R. McPherson
DateSat, 19 May 2007 22:06:59 +1200
Subject Re: [greenstone-devel] Cannot search and index
In-Reply-To (e309d9180705181903q63a9303bk86ce7748f26d07a5-mail-gmail-com)
On Sat, May 19, 2007 at 10:03:43AM +0800, Sandar Win wrote:

> At present,
> greenstone cannot search myanmar document. Please kindly test for myanmar
> document. attachment is myanmar document. You can get myanmar unicode font
> at http://www.myanmarnlp.net.mm/downloads/Myanmar2_Unicode.zip

> Here is my doc.xml after importing.
> Please kindly test for me to make sure for indexing and support the myanmar
> unicode documents.

> On 5/18/07, Michael Dewsnip < mdewsnip@cs.waikato.ac.nz> wrote:

> >You should check that the Myanmar text has been imported properly by
> >looking at some of the "doc.xml" files in the "archives" directory of your
> >collection.

Hi,
this document is in 16-bit unicode (big endian), and greenstone
successfully imports and builds a collection containing this file.
After installing the right fonts, I get what I presume is Myanmar text
in greenstone. I think the problem is only with searching - it looks
to me like this text has no spaces, so some sort of segmentation is
required to break the text up into separate words. I think greenstone
does this automatically for Chinese text, but not for other scripts
like Myanmar text.

John McPherson