[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Glenn Linderman v+python at g.nevcal.com
Tue Apr 28 20:48:37 CEST 2009

On approximately 4/28/2009 10:00 AM, came the following characters from 
the keyboard of Martin v. Löwis:

> An alternative that doesn't suffer from the risk of not being able to
> store decoded strings would have been the use of PUA characters, but
> people rejected it because of the potential ambiguities. So they clearly
> dislike one risk more than the other. UTF-8b is primarily meant as
> an in-memory representation.

The UTF-8b representation suffers from the same potential ambiguities as 
the PUA characters... perhaps slightly less likely in practice, due to 
the use of Unicode-illegal characters, but exactly the same theoretical 
likelihood in the space of Python-acceptable character codes.

Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

More information about the Python-Dev mailing list