[Python-3000] Immutable bytes type and bsddb or other IO

Fri Aug 24 19:26:16 CEST 2007

On 8/23/07, Guido van Rossum <guido at python.org> wrote:
> > > BTW PyUnicode should *not* support the buffer API.

> On 8/23/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > Why not? It should set readonly to 1, and format to "u" or "w".

[me again]
> Because the read() method of binary files (and similar places, like
> socket.send() and in the future probably various database objects)
> accept anything that supports the buffer API, but writing a (text)
> string to these is almost certainly a bug. Not supporting the buffer
> API in PyUnicode is IMO preferable to making explicit exceptions for
> PyUnicode in all those places.
>
> I don't think that the savings possible when writing to a text file
> using the UTF-16 or -32 encoding (whichever matches Py_UNICODE_SIZE)
> in the native byte order are worth leaving that bug unchecked.

I looked at the code, and it's even more complicated than that. The
new buffer API continues to make a distinction between binary and
character data, and there's collusion between the bytes and unicode
types so that this works:

  b = b"abc"
  b[1:2] = "X"

even though these things all fail:

  b.extend("XYZ")
  b += "ZYX"

Unfortunately taking the buffer API away from unicode makes things
fail early (before sys.std{in,out,err} are set), so apparently the I/O
library or something else somehow depends on this.

I'll investigate.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)