[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Mon Apr 27 19:45:15 CEST 2009

Paul Moore writes:
 > 2009/4/27 Stephen J. Turnbull <stephen at xemacs.org>:
 > > I believe there are solutions that don't have that problem.
 > > Specifically, if the return values were bytes, or (better for 2.x,
 > > where bytes are strings as far as most programmers are concerned) as a
 > > new data type, to indicate that they're not text until the client
 > > acknowledges them as such.  EIBTI.
 > 
 > I think you're ignoring the fact that under Windows, it's the *bytes*
 > APIs that are lossy.

The *Windows* bytes APIs may be lossy.  Python's bytes on the other
hand can represent anything that UTF-16 can.  Just represented as
UTF-8.  The point is that in Python 3 "bytes" means it's *your*
responsibility, not Python's, to decode that data.  The advantage of a
new data type is that Python can provide ways to do it and hide the
internal representation (in theory, it could even be different for the
different platforms).

 > Can I at least assume that you aren't recommending that only the bytes
 > API exists on Unix, and only the Unicode API on Windows?

I'm agnostic about the underlying APIs used to talk to the OS; people
who actually use that OS should decide that.  I'm just recommending
that the return values of the getters not be of a "character string"
type until converted explicitly by the application.

 > The *only* "robust" solution is to completely separate the 2
 > platforms.

I'm not so pessimistic, unless you're referring to Microsoft's
penchant for forking any solution they don't own.

 > People *want* a solution that doesn't require every application
 > developer to sweat blood to write working code, simply to cover
 > corner cases that they don't believe will happen.  The rest of us
 > don't want to be made to care.

Well, yes, I wrote pretty much the same thing in the post you're
replying to.  But do you really think PEP 383 as written is the unique
solution to those requirements?