| On Sat, May 19, 2007 at 10:03:43AM +0800, Sandar Win wrote:
> At present,
> greenstone cannot search myanmar document. Please kindly test for myanmar
> document. attachment is myanmar document. You can get myanmar unicode font
> at http://www.myanmarnlp.net.mm/downloads/Myanmar2_Unicode.zip
> Here is my doc.xml after importing.
> Please kindly test for me to make sure for indexing and support the myanmar
> unicode documents.
> On 5/18/07, Michael Dewsnip < mdewsnip@cs.waikato.ac.nz> wrote:
> >You should check that the Myanmar text has been imported properly by
> >looking at some of the "doc.xml" files in the "archives" directory of your
> >collection.
Hi,
this document is in 16-bit unicode (big endian), and greenstone
successfully imports and builds a collection containing this file.
After installing the right fonts, I get what I presume is Myanmar text
in greenstone. I think the problem is only with searching - it looks
to me like this text has no spaces, so some sort of segmentation is
required to break the text up into separate words. I think greenstone
does this automatically for Chinese text, but not for other scripts
like Myanmar text.
John McPherson |