[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

M.-A. Lemburg mal at egenix.com
Tue Feb 14 00:03:35 CET 2006


Phillip J. Eby wrote:
>>>> Why not just have the constructor be:
>>>>
>>>>      bytes(initializer [,encoding])
>>>>
>>>> Where initializer must be either an iterable of suitable integers, or a
>>>> unicode/string object.  If the latter (i.e., it's a basestring), the
>>>> encoding argument would then be required.  Then, there's no need for
>>>> special codec support for the bytes type, since you call bytes on the 
>> thing
>>>> to be encoded.  And of course, no need for a 'b' literal.
>>> It'd be cruel and unusual punishment though to have to write
>>>
>>>   bytes("abc", "Latin-1")
>>>
>>> I propose that the default encoding (for basestring instances) ought
>>> to be "ascii" just like everywhere else. (Meaning, it should really be
>>> the system default encoding, which defaults to "ascii" and is
>>> intentionally hard to change.)
>> We're talking about Py3k here: "abc" will be a Unicode string,
>> so why restrict the conversion to 7 bits when you can have 8 bits
>> without any conversion problems ?
> 
> Actually, I thought we were talking about adding bytes() in 2.5.

Then we'd need to make the "ascii" encoding assumption
again, just like Guido proposed.

> However, now that you've brought this up, it actually makes perfect sense 
> to just use latin-1 as the effective encoding for both strings and 
> unicode.  In Python 2.x, strings are byte strings by definition, so it's 
> only in 3.0 that an encoding would be required.  And again, latin1 is a 
> reasonable, roundtrippable default encoding.

It is. However, it's not a reasonable assumption of the
default encoding since there are many encodings out there
that special case the characters 0x80-0xFF, hence the choice
of using ASCII as default encoding in Python.

The conversion from Unicode to bytes is different in this
respect, since you are converting from a "bigger" type to
a "smaller" one. Choosing latin-1 as default for this
conversion would give you all 8 bits, instead of just 7
bits that ASCII provides.

> So, it sounds like making the encoding default to latin-1 would be a 
> reasonably safe approach in both 2.x and 3.x.

Reasonable for bytes(): yes. In general: no.

>> While we're at it: I'd suggest that we remove the auto-conversion
>>from bytes to Unicode in Py3k and the default encoding along with
>> it. In Py3k the standard lib will have to be Unicode compatible
>> anyway and string parser markers like "s#" will have to go away
>> as well, so there's not much need for this anymore.
> 
> I thought all this was already in the plan for 3.0, but maybe I assume too 
> much.  :)

Wouldn't want to wait for Py4D :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list