recycling internationalized garbage

Ross Ridge rridge at csclub.uwaterloo.ca
Wed Mar 15 09:06:21 CET 2006


Martin v. Löwis wrote:
> The point is that you can tell UTF-8 reliably. If the data decodes
> as UTF-8, it *is* UTF-8, because no other encoding in the world
> produces the same byte sequences (except for ASCII, which is
> an UTF-8 subset).

It should be obvious that any 8-bit single-byte character set can
produce byte sequences that are valid in UTF-8.   In fact I can't think
of any multi-byte encoding that can't produce valid UTF-8 byte
sequence.

                         Ross Ridge




More information about the Python-list mailing list