[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

Sat Oct 22 06:26:30 EDT 2016

On 22 October 2016 at 07:57, Chris Barker <chris.barker at noaa.gov> wrote:
> I'm still confused about the "io" in "iobuffers" -- I've used buffers a lot
> -- for passing data around between various C libs -- numpy, image
> processing, etc... I never really thought of it as IO though. which is why a
> simple frombuffer() seems to make a lot of sense to me, without any other
> stuff. (to be honest, I reach for Cyton these days for that sort of thing
> though)

That's the essence of my point though: if you care enough about the
performance of a piece of code for the hidden copy in
"bytes(mydata[start:stop])" to be deemed unacceptable, and also can't
afford the lazy cleanup of the view in
"bytes(memoryview(mydata)[start:stop])", then it seems likely that
you're writing specialist, high performance, low overhead, data
manipulation code, that probably shouldn't be written in Python

In such cases, an extension module written in something like Cython, C
or Rust would be a better fit, as using the more appropriate tool will
give you a range of additional performance improvements (near)
automatically, such as getting to avoid the runtime overhead of
Python's dynamic type system.

At that point, having to write the lowest-available-overhead version
explicitly in Python as:

    with memoryview(mydata) as view:
        return bytes(mydata[start:stop]

is a sign that someone is insisting on staying in pure Python code
when they're do sufficiently low level bit bashing that it probably
isn't the best idea to continue down that path.

>From that perspective, adding "[bytes/bytearray].frombuffer" is adding
complexity to the core language for the sake of giving people one
small additional piece of incremental performance improvement that
they can eke out before they admit to themselves "OK, I'm probably not
using the right language for this part of my application".

By contrast, a library that provided better low level data buffer
manipulation that was suitable for asyncio's needs is *much* easier to
emulate on older versions, and provides more scope for extracting
efficient data manipulation patterns beyond this one very specific
case of more efficiently snapshotting a subset of an existing buffer.

Cheers,
Nick.

P.S. I bring up Rust and the runtime overhead of the type system
specifically here, as Armin Ronacher recently wrote an excellent post
about that in relation to some performance improvement work they were
doing at Sentry:
https://blog.sentry.io/2016/10/19/fixing-python-performance-with-rust.html

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia