[pypy-issue] Issue #2662: Using memoryview to shift bytes is 18x slower than cPython (pypy/pypy)

Fri Sep 22 10:05:55 EDT 2017

New issue 2662: Using memoryview to shift bytes is 18x slower than cPython
https://bitbucket.org/pypy/pypy/issues/2662/using-memoryview-to-shift-bytes-is-18x

Taras Voynarovsky:

I'm building a protocol and had a use case, where I prepare a moderate buffer (~16kb) and want to prepend a 4-byte integer before it for the size. Using memoryview to shift the buffer forward really slowed everything down.

To reproduce I built a benchmark that moves 4 last bytes of a random string to beginning.
```
#!/usr/bin/env python3
import perf
import random
import hashlib

MSG_LEN = 100000
md5_collect = hashlib.md5()

def random_bytes(length):
    buffer = bytearray(length)
    for i in range(length):
        buffer[i] = random.randint(0, 255)
    return buffer

def bench_copy(loops: int):
    msg = random_bytes(MSG_LEN)

    # Main benchmark code.
    t0 = perf.perf_counter()
    for _ in range(loops):
        msg = msg[-4:] + msg[:-4]
    res = perf.perf_counter() - t0

    # Just to assert code is not optimized away
    md5_collect.update(msg)

    return res

def bench_memview(loops: int):
    msg = random_bytes(MSG_LEN)

    # Main benchmark code.
    t0 = perf.perf_counter()
    for _ in range(loops):
        memview = memoryview(msg)
        end_bytes = memview[-4:].tobytes()
        memview[4:] = memview[:-4]
        memview[:4] = end_bytes
        memview.release()
    res = perf.perf_counter() - t0

    # Just to assert code is not optimized away
    md5_collect.update(msg)

    return res

runner = perf.Runner()
runner.bench_time_func('batch_bytes', bench_copy)
runner.bench_time_func('batch_memview', bench_memview)
```

CPython results:

```
(.aiokafka) vagrant at my-dev:/workspace/aiokafka$ python benchmark/test_pypy_memview.py 
.....................
batch_bytes: Mean +- std dev: 11.5 us +- 0.7 us
.....................
batch_memview: Mean +- std dev: 4.64 us +- 0.28 us
```

PyPy results:
```
(.aiokafka-pypy3) vagrant at my-dev:/workspace/aiokafka$ python benchmark/test_pypy_memview.py
.........
batch_bytes: Mean +- std dev: 15.9 us +- 1.1 us
.........
batch_memview: Mean +- std dev: 83.2 us +- 4.9 us
```

To run the script install `perf`. `pip install perf`

Versions:
```
PyPy 5.8.0-beta0 with GCC 6.2.0
cPython 3.5.2 [GCC 4.8.4]
```