recycling internationalized garbage
fredrik at pythonware.com
Wed Mar 15 10:04:36 CET 2006
> > The point is that you can tell UTF-8 reliably.
RFC 3629 says "fairly reliably" rather than "reliably", but they mean
the same thing...
> > If the data decodes
> > as UTF-8, it *is* UTF-8, because no other encoding in the world
> > produces the same byte sequences (except for ASCII, which is
> > an UTF-8 subset).
or as the RFC puts it,
"the probability that a string of characters in any other encoding
appears as valid UTF-8 is low, diminishing with increasing string
Ross Ridge wrote:
> It should be obvious that any 8-bit single-byte character set can
> produce byte sequences that are valid in UTF-8.
it should be fairly obvious that you don't know much about UTF-8...
More information about the Python-list