[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

James Y Knight foom at fuhm.net
Tue Feb 14 08:09:55 CET 2006


On Feb 14, 2006, at 12:20 AM, Phillip J. Eby wrote:
>      bytes(map(ord, str_or_unicode))
>
> In other words, without an encoding, bytes() should simply treat  
> str and
> unicode objects *as if they were a sequence of integers*, and  
> produce an
> error when an integer is out of range.  This is a logical and  
> consistent
> interpretation in the absence of an encoding, because in that case you
> don't care about the encoding - it's just raw data.


If you're talking about "raw data", then make bytes(unicodestring)  
produce what buffer(unicodestring) currently does -- something  
completely and utterly worthless. :) [it depends on how you compiled  
python and what endianness your system has.]

There really is no case where you don't care about the  
encoding...there is always a specific desired output encoding, and  
you have to think about what encoding that is. The argument that  
latin-1 is a sensible default just because you can convert to latin-1  
by chopping off the upper 3 bytes of a unicode character's ordinal  
position is not convincing; you're still doing an encoding operation,  
it just happens to be computationally easy. That Jython programs have  
to pretend that unicode strings are an appropriate way to store  
bytes, and thus often have to do fake "latin-1" conversions which are  
really no such thing, doesn't make a convincing argument either.  
Using unicode strings to store bytes read from or written to a socket  
is really just broken.

Actually having any default encoding at all is IMO a poor idea, but  
as python has one at the moment (ascii), might as well keep using it  
for consistency until it's eliminated (sys.setdefaultencoding 
('undefined') is my friend.)

James


More information about the Python-Dev mailing list