[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
James Y Knight
foom at fuhm.net
Tue Feb 14 08:09:55 CET 2006
On Feb 14, 2006, at 12:20 AM, Phillip J. Eby wrote:
> bytes(map(ord, str_or_unicode))
>
> In other words, without an encoding, bytes() should simply treat
> str and
> unicode objects *as if they were a sequence of integers*, and
> produce an
> error when an integer is out of range. This is a logical and
> consistent
> interpretation in the absence of an encoding, because in that case you
> don't care about the encoding - it's just raw data.
If you're talking about "raw data", then make bytes(unicodestring)
produce what buffer(unicodestring) currently does -- something
completely and utterly worthless. :) [it depends on how you compiled
python and what endianness your system has.]
There really is no case where you don't care about the
encoding...there is always a specific desired output encoding, and
you have to think about what encoding that is. The argument that
latin-1 is a sensible default just because you can convert to latin-1
by chopping off the upper 3 bytes of a unicode character's ordinal
position is not convincing; you're still doing an encoding operation,
it just happens to be computationally easy. That Jython programs have
to pretend that unicode strings are an appropriate way to store
bytes, and thus often have to do fake "latin-1" conversions which are
really no such thing, doesn't make a convincing argument either.
Using unicode strings to store bytes read from or written to a socket
is really just broken.
Actually having any default encoding at all is IMO a poor idea, but
as python has one at the moment (ascii), might as well keep using it
for consistency until it's eliminated (sys.setdefaultencoding
('undefined') is my friend.)
James
More information about the Python-Dev
mailing list