recycling internationalized garbage

"Martin v. Löwis" martin at v.loewis.de
Thu Mar 16 08:43:49 CET 2006


Ross Ridge wrote:
> It should be obvious that any 8-bit single-byte character set can 
> produce byte sequences that are valid in UTF-8.

It is certainly possible to interpret UTF-8 data as if they were
in a specific single-byte encoding. However, the text you then
obtain is not meaningful in any language of the world.

So "valid" yes; "meaningful" no. Therefore, for all practical
purposes, 8-bit single-byte characters sets *will not* produce
byte sequences that are valid in UTF-8 (although they could -
it just won't happen).

> In fact I can't think of any multi-byte encoding that can't produce
> valid UTF-8 byte sequence.

The same reasoning applies for them.

Regards,
Martin



More information about the Python-list mailing list