Codecs for ISO 8859-11 (Thai) and 8859-16 (Romanian)

"Martin v. Löwis" martin at
Thu Jul 29 00:08:41 CEST 2004

Richard Brodie wrote:
>>ISO-8859-11 is actually very difficult to implement, as it is unclear
>>whether the characters \x80..\x9F are assigned in this character set
>>or not. In fact, it is unclear whether the character set contains
>>even C0.
> That seems like a very fine distinction to me; the Unicode mapping tables
> are the same for those points as in ISO-8859-1, so what's the difference?

For ISO-8859-1, I believe the standard actually says that those code
points are C1. For ISO-8859-11, you can find various statements in the
net, some claiming that it includes C1, and some claiming that it
doesn't. Somebody would actually have to take a look at ISO-8859-11 to
find out what is the case.

The issue is complicated by two facts:
- many sources indicate that ISO-8859-11 is derived by taking TIS-620,
   and adding NBSP into 0xa0. Now, it seems quite clear that TIS-620 does
   *not* include C1.
- some sources indicate certain restrictrions wrt. to control functions,
   eg. in

   which says "control functions are not used to create composite graphic
   symbols from two or more graphic characters (see 6). "
   I don't know what this means, especially as section 6 does not talk
   about control functions. Section 7 says that any control functions
   are out of scope of ISO 8859, which I believe is factually incorrect.


More information about the Python-list mailing list