I am an Information Science student at the National Centre for Science
Information (NCSI), Indian Institute of Science, Bangalore, India.
As part of my project work, I am experimenting with developing digital
libraries with Indian language content. I am trying with two Indian
languages - Hindi and Kannada. I am using GSDL for this purpose. I am
working on Windows XP platform, and IE web browser.
I have successfully created the GSDL interface in these languages by
creating required .dm files using Office XP and appropriate Unicode fonts
for these two languages ("Mangal" font for Hindi language and
"Tunga" font for Kannada - both these are available in Windows XP). Thus,
I do not have any problems in creating GSDL interface for these languages.
I created two collections - one containing Word DOC files created using
Word/XP, and another containing the same files, saved as HTML files from
within Word/XP. Content was entered using the fonts mentioned above. (We
note that Windows uses Windows-1252 charset).
I indexed these using GSDL.
Now, I am able to search, browse and display properly the contents of the
first collection (DOC files). I am able to view both the HTML equivalent
and the native DOC file in the local language. I am also able to search by
entering text in the local language in the search box.
However, the HTML files give problem in display - junk is
displayed. Interestingly, I can copy portion of this junk text into the
search box - GSDL searches this correctly!
I will be glad if I can get some help to resolve this problem of improper
Thanking you in advance