[Python-Dev] bytes type discussion

Wed Feb 15 00:13:25 CET 2006

I'm about to send 6 or 8 replies to various salient messages in the
PEP 332 revival thread. That's probably a sign that there's still a
lot to be sorted out. In the mean time, to save you reading through
all those responses, here's a summary of where I believe I stand.
Let's continue the discussion in this new thread unless there are
specific hairs to be split in the other thread that aren't addressed
below or by later posts.

Non-controversial (or almost):

- we need a new PEP; PEP 332 won't cut it

- no b"..." literal

- bytes objects are mutable

- bytes objects are composed of ints in range(256)

- you can pass any iterable of ints to the bytes constructor, as long
as they are in range(256)

- longs or anything with an __index__ method should do, too

- when you index a bytes object, you get a plain int

- repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'

Somewhat controversial:

- it's probably too big to attempt to rush this into 2.5

- bytes("abc") == bytes(map(ord, "abc"))

- bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128, 256])

Very controversial:

- bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" argument

- bytes(u"abc") == bytes("abc") # for ASCII at least

- bytes(u"\x80\xff") raises UnicodeError

- bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")

Martin von Loewis's alternative for the "very controversial" set is to
disallow an encoding argument and (I believe) also to disallow Unicode
arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
only way to convert a string (which is always unicode) to bytes. The
problem with this is that there's no code that works in both 2.x and
3.0.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)