[Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces

Stephen J. Turnbull stephen at xemacs.org
Mon Apr 27 20:04:44 CEST 2009


Antoine Pitrou writes:

 > > or (better for 2.x, where bytes are strings as far as most
 > > programmers are concerned) as a new data type,
 > 
 > I'm -1 on any new string-like type (for file paths or whatever
 > else) with custom encoding/decoding semantics. It's the best way to
 > ruin the clean str/bytes separation that 3.x introduced.

Excuse me, but I can't see a scheme that encodes bytes as Unicodes but
only sometimes as a "clean separation".  It's a dirty hack that makes
life a lot easier for Windows programmers and a little easier for many
Unix programmers.  Practicality beats purity, true, but at the cost of
the purity.

 > Besides, the goal is also to makes things easier for the
 > programmer. Otherwise, we'll have the same situation as in 2.x
 > where many English-centric programmers produced code that was
 > incapable of dealing with non-ASCII input, because they didn't care
 > about the distinction between str and unicode.

So what you'll get here, AFAICS, is a new situation where many
Windows-centric programmers will produce code that's incapable of
dealing with non-Unicode input because they don't have to care about
the distinction between Unicode and bytes.

That's an improvement, but we can do still better and not at huge
expense to programmers.


More information about the Python-Dev mailing list