[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
mal at egenix.com
Tue Feb 14 00:03:35 CET 2006
Phillip J. Eby wrote:
>>>> Why not just have the constructor be:
>>>> bytes(initializer [,encoding])
>>>> Where initializer must be either an iterable of suitable integers, or a
>>>> unicode/string object. If the latter (i.e., it's a basestring), the
>>>> encoding argument would then be required. Then, there's no need for
>>>> special codec support for the bytes type, since you call bytes on the
>>>> to be encoded. And of course, no need for a 'b' literal.
>>> It'd be cruel and unusual punishment though to have to write
>>> bytes("abc", "Latin-1")
>>> I propose that the default encoding (for basestring instances) ought
>>> to be "ascii" just like everywhere else. (Meaning, it should really be
>>> the system default encoding, which defaults to "ascii" and is
>>> intentionally hard to change.)
>> We're talking about Py3k here: "abc" will be a Unicode string,
>> so why restrict the conversion to 7 bits when you can have 8 bits
>> without any conversion problems ?
> Actually, I thought we were talking about adding bytes() in 2.5.
Then we'd need to make the "ascii" encoding assumption
again, just like Guido proposed.
> However, now that you've brought this up, it actually makes perfect sense
> to just use latin-1 as the effective encoding for both strings and
> unicode. In Python 2.x, strings are byte strings by definition, so it's
> only in 3.0 that an encoding would be required. And again, latin1 is a
> reasonable, roundtrippable default encoding.
It is. However, it's not a reasonable assumption of the
default encoding since there are many encodings out there
that special case the characters 0x80-0xFF, hence the choice
of using ASCII as default encoding in Python.
The conversion from Unicode to bytes is different in this
respect, since you are converting from a "bigger" type to
a "smaller" one. Choosing latin-1 as default for this
conversion would give you all 8 bits, instead of just 7
bits that ASCII provides.
> So, it sounds like making the encoding default to latin-1 would be a
> reasonably safe approach in both 2.x and 3.x.
Reasonable for bytes(): yes. In general: no.
>> While we're at it: I'd suggest that we remove the auto-conversion
>>from bytes to Unicode in Py3k and the default encoding along with
>> it. In Py3k the standard lib will have to be Unicode compatible
>> anyway and string parser markers like "s#" will have to go away
>> as well, so there's not much need for this anymore.
> I thought all this was already in the plan for 3.0, but maybe I assume too
> much. :)
Wouldn't want to wait for Py4D :-)
Professional Python Services directly from the Source (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev