[Python-3000] Immutable bytes type and dbm modules

"Martin v. Löwis" martin at v.loewis.de
Tue Aug 7 08:22:02 CEST 2007


> For low-level I/O code, I totally agree that a mutable buffery object
> is needed.

The code we are looking at right now (dbm interfaces) *is* low-level
I/O code.

> For example, to support re-using bytes buffers, socket.send()
> would need to take start and end offsets into its bytes argument.
> Otherwise, you have to slice the object to select the right data,
> which *because bytes are mutable* requires a copy. PEP 3116's .write()
> method has the same problem. Making those changes is, of course,
> doable, but it seems like something that should be consciously
> committed to.

Sure. There are several ways to do that, including producing view
objects - which would be possible even though the underlying buffer
is mutable; the view would then be just as mutable.

> Python 2 seems to have gotten away with doing all the buffery stuff in
> C. Is there a reason Python 3 shouldn't do the same?

I think Python 2 has demonstrated that this doesn't really work. People
repeatedly did += on strings (leading to quadratic performance),
invented the buffer interface (which is semantically flawed), added
direct support for mmap, and so on.

> $ ./python.exe -m timeit 'b = bytes(v % 256 for v in range(1000))'
> 1000 loops, best of 3: 272 usec per loop
> $ ./python.exe -m timeit -s 'b=bytes(v%256 for v in range(2000))' 'for
> v in range(1000): b[v] = v % 256'
> 1000 loops, best of 3: 298 usec per loop
> 
> which seems to demonstrate that pre-allocating the bytes object is
> slightly _more_ expensive than re-allocating it each time.

There must be more conditions to it; I get

martin at mira:~/work/3k$ ./python -m timeit 'b = bytes(v % 256 for v in
range(1000))'
1000 loops, best of 3: 434 usec per loop
martin at mira:~/work/3k$ ./python -m timeit -s 'b=bytes(v%256 for v in
range(2000))' 'for v in range(1000): b[v] = v % 256'
1000 loops, best of 3: 394 usec per loop

which is the reverse result.

> In any case, if people want to use bytes as both the low-level buffery
> I/O thing and the high-level byte string, I think PEP 358 should
> document it, since right now it just asserts that bytes are mutable
> without any reason why.

That point is mute now; the PEP has been accepted. Documenting things
is always good, but the time for objections to the PEP is over now -
that's what the PEP process is for.

Regards,
Martin



More information about the Python-3000 mailing list