[greenstone-users] Hindi collection

From ak19@cs.waikato.ac.nz
DateThu Mar 13 10:06:17 2008
Subject [greenstone-users] Hindi collection
In-Reply-To (383670-85114-qm-web62412-mail-re1-yahoo-com)
Do I understand your situation correctly: each document in your collection
could potentially contain text in three languages (Bengali, Hindi,

If so, then I need to know what encoding your html documents are in and
what indexer you're using. Better yet, if you could send me either
(1) one or two sample documents in your collection, OR
(2) create one or two documents into which you put random words from all
three languages in (as might occur in a genuine document in your
then I can try to build a collection with it here and see how the
searching goes.

HTML allows you to create documents where the primary encoding and
language is set to one thing (for instance English) and wherein different
sections can be marked up with html tags to indicate a different language
and encoding.
These are not specially dealt with in Greenstone, *but* if these documents
are encoded in UTF8 then you should be able to search for exact terms in
any of the languages.

Can you send me one or two demo documents that are representative of the
multilingual content in your actual documents?


> Oh, sorry. I have collection with english language only. Now using html
> tag I have prepared a document which contains sentences of bengali
> language. As suggested, I went to enrich the document. At dc.Title I am
> not able to type in bengali language. Searing in text, I have typed the
> word in bengali language, but nothing getting retrieved. What I have to do
> now.
> Regards