[Python-checkins] r72061 - peps/trunk/pep-0383.txt

martin.v.loewis python-checkins at python.org
Tue Apr 28 19:08:15 CEST 2009


Author: martin.v.loewis
Date: Tue Apr 28 19:08:14 2009
New Revision: 72061

Log:
Clarify what invalid bytes are in UTF-8.


Modified:
   peps/trunk/pep-0383.txt

Modified: peps/trunk/pep-0383.txt
==============================================================================
--- peps/trunk/pep-0383.txt	(original)
+++ peps/trunk/pep-0383.txt	Tue Apr 28 19:08:14 2009
@@ -82,8 +82,11 @@
 If the locale's encoding is UTF-8, the file system encoding is set to
 a new encoding "utf-8b", as the regular UTF-8 codec would not
 re-encode half surrogates as single bytes. The UTF-8b codec decodes
-non-decodable bytes (which must be >= 0x80) into half surrogate codes
-U+DC80..U+DCFF.
+invalid bytes (which must be >= 0x80) into half surrogate codes
+U+DC80..U+DCFF. Unlike the utf-8 codec, the utf-8b codec follows the
+strict definition of UTF-8 to determine what an invalid byte is
+(which, among other restrictions, disallows to encode surrogate codes
+in UTF-8).
 
 Discussion
 ==========


More information about the Python-checkins mailing list