Re: [greenstone-users] Help - again! Thai this time

From Richard Managh
DateTue, 04 Sep 2007 09:38:32 +1200
Subject Re: [greenstone-users] Help - again! Thai this time
In-Reply-To (46D48930-2030903-sdb-org)
Hi Julian,

Julian Fox wrote:
A little help, list please:
I've been sent a batch of files in Thai script.  With Japanese, Russian, and Korean with a bit of coaching, I got a collection up and running.  With the Thai files, if I open them inside GS I can read them (they are doc; I'm using Open Office to open them with and my Ubuntu machine is enabled for Thai) but both in the workspace and when transferred into the collection the file names show up as gibberish instead of in Thai script.
What do I need to enable in GS to be able to see that script properly?  The main config file seems to have the 'th' languagename enabled and it uses utf-8 encoding which is enabled, obviously, but I guess that's to ensure the interface works in Thai
The associated problem is that the two test files I ran through the 'create' process weren't being converted into html, so I guess it's associated with the basic issue of recognising Thai script?
Are your source files Microsoft Word documents? If so, the process works like this:

During the import process, Wvware converts your word document to html in "some" encoding that wvware detects.
Greenstone converts this document into its archive files from that "some" encoding to utf8.
Greenstone always stores information in its archive files, and internally in utf8.
Then Greenstone displays that document when it is imported as utf8 or as any other encoding you would like, that is available from the preferences page. When greenstone is asked to display your document in say Arabic encoding (using the option from the preferences page), it converts the text on the fly from utf8 to Arabic (Windows-1256) and then sends this text to the user's browser.

Its possible that wvware is getting confused with the encoding of your source documents, and is detecting the wrong encoding, which isnt ending up as utf8 in greenstone when the document is imported.

If the filenames of your source documents are in thai, that could be confusing wvware, perhaps you could try renaming your files with ascii filenames.

Please send us an import file so we can determine what encoding your files are in, and perhaps what is happening.


Hope this is helpful,


Richard.

-- 
DL Consulting
Greenstone Digital Library and Digitisation Specialists
contact@dlconsulting.com
www.dlconsulting.com