recycling internationalized garbage
"Martin v. Löwis"
martin at v.loewis.de
Thu Mar 16 08:43:49 CET 2006
Ross Ridge wrote:
> It should be obvious that any 8-bit single-byte character set can
> produce byte sequences that are valid in UTF-8.
It is certainly possible to interpret UTF-8 data as if they were
in a specific single-byte encoding. However, the text you then
obtain is not meaningful in any language of the world.
So "valid" yes; "meaningful" no. Therefore, for all practical
purposes, 8-bit single-byte characters sets *will not* produce
byte sequences that are valid in UTF-8 (although they could -
it just won't happen).
> In fact I can't think of any multi-byte encoding that can't produce
> valid UTF-8 byte sequence.
The same reasoning applies for them.
More information about the Python-list