[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Glenn Linderman v+python at g.nevcal.com
Wed Apr 29 13:47:00 CEST 2009

On approximately 4/29/2009 4:07 AM, came the following characters from 
the keyboard of R. David Murray:
> On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote:
>> On approximately 4/28/2009 7:40 PM, came the following characters from 
>> the keyboard of R. David Murray:
>>>  On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
>>> >  C. File on disk with the invalid surrogate code, accessed via the 
>>> str >  interface, no decoding happens, matches in memory the file on 
>>> disk with >  the byte that translates to the same surrogate, accessed 
>>> via the bytes >  interface. Ambiguity.
>>>  Unless I'm missing something, one of these is type str, and the 
>>> other is
>>>  type bytes, so no ambiguity.
>> You are missing that the bytes value would get decoded to a str; thus 
>> both are str; so ambiguity is possible.
> Only if you as the programmer decode it.  Now, I don't understand the
> subtleties of Unicode enough to know if Martin has already successfully
> addressed this concern in another fashion, but personally I think that
> if you as a programmer are comparing funnydecoded-str strings gotten
> via a string interface with normal-decoded strings gotten via a bytes
> interface, that we could claim that your program has a bug.

Hopefully Martin will clarify the PEP as I suggested in another branch 
of this thread.  He has eventually convinced me that this ambiguity is 
not possible, via email discussion, but the PEP is certainly less than 
sufficiently explanatory to make that obvious.

Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

More information about the Python-Dev mailing list