[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Wed Feb 15 00:14:07 CET 2006

On 2/14/06, M.-A. Lemburg <mal at egenix.com> wrote:
> Guido van Rossum wrote:
> > As Phillip guessed, I was indeed thinking about introducing bytes()
> > sooner than that, perhaps even in 2.5 (though I don't want anything
> > rushed).
>
> Hmm, that is probably going to be too early. As the thread shows
> there are lots of things to take into account, esp. since if you
> plan to introduce bytes() in 2.x, the upgrade path to 3.x would
> have to be carefully planned. Otherwise, we end up introducing
> a feature which is meant to prepare for 3.x and then we end up
> causing breakage when the move is finally implemented.

You make a good point. Someone probably needs to write up a new PEP
summarizing this discussion (or rather, consolidating the agreement
that is slowly emerging, where there is agreement, and summarizing the
key open questions).

> > Even in Py3k though, the encoding issue stands -- what if the file
> > encoding is Unicode? Then using Latin-1 to encode bytes by default
> > might not by what the user expected. Or what if the file encoding is
> > something totally different? (Cyrillic, Greek, Japanese, Klingon.)
> > Anything default but ASCII isn't going to work as expected. ASCII
> > isn't going to work as expected either, but it will complain loudly
> > (by throwing a UnicodeError) whenever you try it, rather than causing
> > subtle bugs later.
>
> I think there's a misunderstanding here: in Py3k, all "string"
> literals will be converted from the source code encoding to
> Unicode. There are no ambiguities - a Klingon character will still
> map to the same ordinal used to create the byte content regardless
> of whether the source file is encoded in UTF-8, UTF-16 or
> some Klingon charset (are there any ?).

OK, so a string (literal or otherwise) containing a Klingon character
won't be acceptable to the bytes() constructor in 3.0. It shouldn't be
in 2.x either then.

I still think that someone who types a file in Latin-1 and enters
non-ASCII Latin-1 characters in a string literal and then passes it to
the bytes() constructor might expect to get bytes encoded in Latin-1,
and someone who types a file in UTF-8 and enters non-ASCII Unicode
characters might expect to get UTF-8-encoded bytes. Since they can't
both get what they want, we should disallow both, and only allow
ASCII.

> Furthermore, by restricting to ASCII you'd also outrule hex escapes
> which seem to be the natural choice for presenting binary data in
> literals - the Unicode representation would then only be an
> implementation detail of the way Python treats "string" literals
> and a user would certainly expect to find e.g. \x88 in the bytes object
> if she writes bytes('\x88').

I guess we'l just have to disappoint her. Too bad for the person who
wrote bytes("\x12\x34\x56\x78\x9a\xbc\xde\xf0") -- they'll have to
write bytes([0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0]). Not so bad IMO
and certainly easier than a *mixture* of hex and ASCII like
'\xabc\xdef'.

> But maybe you have something different in mind... I'm talking
> about ways to create bytes() in Py3k using "string" literals.

I'm not sure that's going to be common practive except for ASCII
characters used in network protocols.

> >> While we're at it: I'd suggest that we remove the auto-conversion
> >> from bytes to Unicode in Py3k and the default encoding along with
> >> it.
> >
> > I'm not sure which auto-conversion you're talking about, since there
> > is no bytes type yet. If you're talking about the auto-conversion from
> > str to unicode: the bytes type should not be assumed to have *any*
> > properties that the current str type has, and that includes
> > auto-conversion.
>
> I was talking about the automatic conversion of 8-bit strings to
> Unicode - which was a key feature to make the introduction of
> Unicode less painful, but will no longer be necessary in Py3k.

OK. The bytes type certainly won't have this property.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)