[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Tue Feb 14 05:07:49 CET 2006

On 2/13/06, Neil Schemenauer <nas at arctrix.com> wrote:
> Guido van Rossum <guido at python.org> wrote:
> >> In py3k, when the str object is eliminated, then what do you have?
> >> Perhaps
> >> - bytes("\x80"), you get an error, encoding is required. There is no
> >> such thing as "default encoding" anymore, as there's no str object.
> >> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
> >> single byte of value 0x80.
> >
> > Yes to both again.
>
> I haven't been following this dicussion about bytes() real closely
> but I don't think that bytes() should do the encoding.  We already
> have a way to spell that:
>
>     "\x80".encode('latin-1')

But in 2.5 we can't change that to return a bytes object without
creating HUGE incompatibilities.

In general I've come to appreciate that there are two ways of
converting an object of type A to an object of type B: ask an A
instance to convert itself to a B, or ask the type B to create a new
instance from an A. Depending on what A and B are, both APIs make
sense; sometimes reasons of decoupling require that A can't know about
B, in which case you have to use the latter approach; sometimes B
can't know about A, in which case you have to use the former. Even
when A == B we sometimes support both APIs: to create a new list from
a list a, you can write a[:] or list(a); to create a new dict from a
dict d, you can write d.copy() or dict(d).

An advantage of the latter API is that there's no confusion about the
resulting type -- dict(d) is definitely a dict, and list(a) is
definitely a list. Not so for d.copy() or a[:] -- if the input type is
another mapping or sequence, it'll probably return an object of that
same type.

Again, it depends on the application which is better.

I think that bytes(s, <encoding>) is fine, especially for expressing a
new type, since it is unambiguous about the result type, and has no
backwards compatibility issues.

> Also, I think it would useful to introduce byte array literals at
> the same time as the bytes object.  That would allow people to use
> byte arrays without having to get involved with all the silly string
> encoding confusion.

You missed the part where I said that introducing the bytes type
*without* a literal seems to be a good first step. A new type, even
built-in, is much less drastic than a new literal (which requires
lexer and parser support in addition to everything else).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)