[Python-checkins] r72061 - peps/trunk/pep-0383.txt
martin.v.loewis
python-checkins at python.org
Tue Apr 28 19:08:15 CEST 2009
Author: martin.v.loewis
Date: Tue Apr 28 19:08:14 2009
New Revision: 72061
Log:
Clarify what invalid bytes are in UTF-8.
Modified:
peps/trunk/pep-0383.txt
Modified: peps/trunk/pep-0383.txt
==============================================================================
--- peps/trunk/pep-0383.txt (original)
+++ peps/trunk/pep-0383.txt Tue Apr 28 19:08:14 2009
@@ -82,8 +82,11 @@
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b", as the regular UTF-8 codec would not
re-encode half surrogates as single bytes. The UTF-8b codec decodes
-non-decodable bytes (which must be >= 0x80) into half surrogate codes
-U+DC80..U+DCFF.
+invalid bytes (which must be >= 0x80) into half surrogate codes
+U+DC80..U+DCFF. Unlike the utf-8 codec, the utf-8b codec follows the
+strict definition of UTF-8 to determine what an invalid byte is
+(which, among other restrictions, disallows to encode surrogate codes
+in UTF-8).
Discussion
==========
More information about the Python-checkins
mailing list