recycling internationalized garbage
Ross Ridge
rridge at csclub.uwaterloo.ca
Wed Mar 15 03:06:21 EST 2006
Martin v. Löwis wrote:
> The point is that you can tell UTF-8 reliably. If the data decodes
> as UTF-8, it *is* UTF-8, because no other encoding in the world
> produces the same byte sequences (except for ASCII, which is
> an UTF-8 subset).
It should be obvious that any 8-bit single-byte character set can
produce byte sequences that are valid in UTF-8. In fact I can't think
of any multi-byte encoding that can't produce valid UTF-8 byte
sequence.
Ross Ridge
More information about the Python-list
mailing list