[Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

Wed Jun 8 07:26:45 EDT 2016

Hello,

On Wed, 8 Jun 2016 14:05:19 +0300
Serhiy Storchaka <storchaka at gmail.com> wrote:

> On 08.06.16 13:37, Paul Sokolovsky wrote:
> >> The obvious way to create the bytes object of length n is b'\0' *
> >> n.
> >
> > That's very inefficient: it requires allocating useless b'\0', then
> > a generic function to repeat arbitrary memory block N times. If
> > there's a talk of Python to not be laughed at for being SLOW, there
> > would rather be efficient ways to deal with blocks of binary data.
> 
> Do you have any evidences for this claim?

Yes, it's written above, let me repeat it: bytes(n) is (can be)
calloc(1, n) underlyingly, while b"\0" * n is a more complex algorithm. 

> 
> $ ./python -m timeit -s 'n = 10000' -- 'bytes(n)'
> 1000000 loops, best of 3: 1.32 usec per loop
> $ ./python -m timeit -s 'n = 10000' -- 'b"\0" * n'
> 1000000 loops, best of 3: 0.858 usec per loop

I don't know how inefficient CPython's bytes(n) or how efficient
repetition (maybe 1-byte repetitions are optimized into memset()?), but
MicroPython (where bytes(n) is truly calloc(n)) gives expected results:

$ ./run-bench-tests bench/bytealloc*
bench/bytealloc:
    3.333s (+00.00%) bench/bytealloc-1-bytes_n.py
    11.244s (+237.35%) bench/bytealloc-2-repeat.py

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com