Re: [greenstone-users] UTF-8?

From Katherine Don
DateMon, 19 Feb 2007 12:34:09 +1300
Subject Re: [greenstone-users] UTF-8?
In-Reply-To (dfb4b74f0702180555k3ae86c8ax690f640c60c7032c-mail-gmail-com)
Hi Graeme and Julian

Setting the language in main.cfg will set it for the whole library.
You can do this on a per collection basis by adding something like the
following to the collect.cfg file of a collection:

receptionist "/gsdlmod?a=p&p=about&c=collname&l=it"

This sets the link that is used to access the collection from the home page.

You need the first three arguments to get to the about page of the
collection, then any other args for which you want to change the default
value can go after these.

Note, the cgi argument w specifies the encoding.

Julian, what version of Greenstone are you using? In my version, if I
start at the home page (the one that lists all the collections), go to
preferences and change the language, then click the home link at the
top, my home page is now in that new language. I am using greenstone
from CVS, but it should be pretty close to what happens in 2.72.


graeme wrote:
> Just stumbled upon a setting in main.cfg, right at the end...
> cgiarg shortname=l argdefault=en
> Change that and the default language will change but I don't like it
> because there should be a way to control it at the collection level.
> On 2/18/07, *Julian Fox* < <>> wrote:
> Graeme,
> Thanks for the fuller explanation of ASCII-encoding which ahs always
> fascinated me but which I have never really understood. The
> explanation is satisfying enough. Now I hope there is someone who
> can help me with the next bit - how I get Greenstone to do what I
> want it to do! I have a feeling it's just a setting somewhere int
> he works that I have failed to indicate. Problem is where.
> Julian
> graeme wrote:
>> As I understand it UTF-8 is not a superset of ISO-8859-1. In UTF-8
>> there is a direct mapping to 7-bit ASCII but not the 8th bit which
>> contains the accented characters of the Latin alphabet. This is
>> because of the design of UTF-8, The first bit(s) of a UTF-8
>> character is always used to describe how many bytes are required
>> to fully describe the UNICODE character. So for a one byte
>> character the first bit is always 0, for a 2 byte character the
>> first three bits are 110 (the first two bits of the subsequent
>> byte is 10), that means that 5 bits of the 16 available are used
>> in identifying the type and for a simple error checking, 11 bits
>> are available (excluding the 128 bit range available for 1 byte
>> encoding) that allows 1920 characters to be encoded using the 2
>> byte scheme. UTF-8 continues into three and four byte encodings.
>> In ISO-8859-1 □ has an encoding E0,
>> where as in UTF-8 it will be C3A0
>> E0 = 11100000
>> E0 = 00011 100000 (as 11 bits)
>> E0 = yyyyy zzzzzz
>> Convert to a two byte UTF-8 is given by 110yyyyy 10zzzzzz
>> Thus it equals:
>> 11000011 10100000 = C3 A0
>> 110yyyyy 10zzzzzz
>> Sorry if my explanation is too much, but in short UTF-8 and
>> IS08859-1 are different animals (when it gets to the higher bit,
>> which is where the accented characters reside).
>> Graeme.
>> On 2/17/07, *Julian Fox* < <>>
>> wrote:
>> Dear List,
>> Strange, but I discover that since all my documents are in
>> Italian, and
>> UTF-8 is the default encoding, accented characters are not
>> showing up
>> correctly, yet if I choose ISO-8859-1 from the 'about' page,
>> all is ok.
>> UTF-8 should not present this problem, as it is a
>> superset. Why could
>> it be? I am currently viewing the document result on
>> Ubuntu (firefox)
>> via VNC rather than on my Windows machine, since the server is
>> sitting
>> in a room far away at the moment in a very large building. IS it
>> perhaps to do with a setting in Firefox....but then why does
>> ISO-8859-1
>> work?
>> Other question (though to be honest I could probably fossick
>> this one
>> out from the documentation somewhere) - if I want the home
>> page to show
>> up in Italian, I can get that at the moment by going to 'about',
>> changing language preference, then clicking 'pagina
>> principale' to get
>> back to home where my translation shows up in all its
>> glory. But if I
>> want it to show up first off? How do I alter that default? Or
>> - even
>> better - why, when I select preference from the home page and
>> change
>> language, does it not immediately change the home page? I t
>> stays in
>> English - I seem to have to go to 'about', make the change,
>> THEN back to
>> 'home. Is that too, not odd?
>> Julian
>> _______________________________________________
>> greenstone-users mailing list
>> <>
>> <>
> ------------------------------------------------------------------------
> _______________________________________________
> greenstone-users mailing list