Re: [greenstone-users] UTF-8?

From graeme
DateSun, 18 Feb 2007 09:40:21 +0430
Subject Re: [greenstone-users] UTF-8?
In-Reply-To (45D75791-6090502-sdb-org)
As I understand it UTF-8 is not a superset of ISO-8859-1. In UTF-8 there is a direct mapping to 7-bit ASCII but not the 8th bit which contains the accented characters of the Latin alphabet. This is because of the design of UTF-8, The first bit(s) of a UTF-8 character is always used to describe how many bytes are required to fully describe the UNICODE character. So for a one byte character the first bit is always 0, for a 2 byte character the first three bits are 110 (the first two bits of the subsequent byte is 10), that means that 5 bits of the 16 available are used in identifying the type and for a simple error checking, 11 bits are available (excluding the 128 bit range available for 1 byte encoding) that allows 1920 characters to be encoded using the 2 byte scheme. UTF-8 continues into three and four byte encodings.

In ISO-8859-1 à has an encoding E0,
where as in UTF-8 it will be C3A0

E0 = 11100000
E0 = 00011 100000 (as 11 bits)
E0 = yyyyy zzzzzz

Convert to a two byte UTF-8 is given by 110yyyyy 10zzzzzz
Thus it equals:
11000011 10100000 = C3 A0
110yyyyy 10zzzzzz

Sorry if my explanation is too much, but in short UTF-8 and IS08859-1 are different animals (when it gets to the higher bit, which is where the accented characters reside).


On 2/17/07, Julian Fox <> wrote:
Dear List,
Strange, but I discover that since all my documents are in Italian, and
UTF-8 is the default encoding, accented characters are not showing up
correctly, yet if I choose ISO-8859-1 from the 'about' page, all is ok.
UTF-8 should not present this problem, as it is a superset.  Why could
it be?  I am currently viewing the document result on Ubuntu  (firefox)
via VNC rather than on my Windows machine, since the server is sitting
in a room far away at the moment in a very large building.  IS it
perhaps to do with a setting in Firefox....but then why does ISO-8859-1
Other question (though to be honest I could probably fossick this one
out from the documentation somewhere) - if I want the home page to show
up in Italian, I can get  that at the moment by going to 'about',
changing language preference, then clicking 'pagina principale' to get
back to home where my translation shows up in all its glory.  But if I
want it to show up first off? How do I alter that default?  Or - even
better - why, when I select preference from the home page and change
language, does it not immediately change the home page?  I t stays in
English - I seem to have to go to 'about', make the change, THEN back to
'home.  Is that too, not odd?

greenstone-users mailing list