[Python-3000] Immutable bytes type and dbm modules

Tue Aug 7 07:45:34 CEST 2007

On 8/6/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > Apologies if this has been answered before, but why are you waiting
> > for a show-stopper that requires an immutable bytes type rather than
> > one that requires a mutable one?
>
> You mean, the need for a mutable bytes type might not be clear yet?
>
> Code that has been ported to the bytes type probably doesn't use it
> correctly yet, but to me, the need for a buffery thing where you
> can allocate some buffer, and then fill it byte-for-byte is quite
> obvious. It's a standard thing in all kinds of communication
> protocols: in sending, you allocate plenty of memory, fill it, and
> then send the fraction you actually consumed. In receiving, you
> allocate plenty of memory (not knowing yet how much you will receive),
> then only process as much as you needed. You do all that without
> creating new buffers all the time - you use a single one over and
> over again.

For low-level I/O code, I totally agree that a mutable buffery object
is needed. What I'm wondering about is why that object needs to bleed
up into the code the struni branch is fixing. The bytes type isn't
even going to serve that function without some significant interface
changes. For example, to support re-using bytes buffers, socket.send()
would need to take start and end offsets into its bytes argument.
Otherwise, you have to slice the object to select the right data,
which *because bytes are mutable* requires a copy. PEP 3116's .write()
method has the same problem. Making those changes is, of course,
doable, but it seems like something that should be consciously
committed to.

Python 2 seems to have gotten away with doing all the buffery stuff in
C. Is there a reason Python 3 shouldn't do the same? I was about to
wonder if the performance was even worth the nuisance, but then I
realized that I could run my own (naïve) benchmark. Running revision
56747 of the p3yk branch, I get:

$ ./python.exe -m timeit 'b = bytes(v % 256 for v in range(1000))'
1000 loops, best of 3: 272 usec per loop
$ ./python.exe -m timeit -s 'b=bytes(v%256 for v in range(2000))' 'for
v in range(1000): b[v] = v % 256'
1000 loops, best of 3: 298 usec per loop

which seems to demonstrate that pre-allocating the bytes object is
slightly _more_ expensive than re-allocating it each time.

In any case, if people want to use bytes as both the low-level buffery
I/O thing and the high-level byte string, I think PEP 358 should
document it, since right now it just asserts that bytes are mutable
without any reason why.

> Code that has been ported to bytes from str8 often tends to still
> follow the immutable pattern, creating a list of bytes objects to
> be joined later - this can be improved in code reviews.
>
> > Taking TOOWTDI as a guideline: If you have immutable bytes and need a
> > mutable object, just use list().
>
> I don't think this is adequate. Too much lower-level API relies on
> having memory blocks, and that couldn't be implemented efficiently
> with a list.
>
> Regards,
> Martin
>

-- 
Namasté,
Jeffrey Yasskin