[issue10181] Problems with Py_buffer management in memoryobject.c (and elsewhere?)

Sun Jun 26 14:31:35 CEST 2011

Nick Coghlan <ncoghlan at gmail.com> added the comment:

The idea of PyManagedBuffer is for it to be an almost completely passive object that *just* acts as a refcounted wrapper around the Py_buffer structure, so it doesn't care about the actual contents. The only supplemental functionality I think it should provide is to disallow explicitly releasing the buffer while the reference count is greater than 1. I'm OK with my example cited above being unreliable. The correct way to write such code would then be:

  with memoryview(obj) as m:
    with m[:] as m2:
      ...

I think separating the concerns this way, letting PyManagedBuffer worry about the lifecycle issues of the underlying buffer reference, while PyMemoryView deals with the *interpretation* of the buffer description (such as by providing useful slicing functionality) will make the whole arrangement easier to handle. When a memoryview is sliced, it would create a new memoryview that has a reference to the same PyManagedBuffer object, but different internal state that affects how that buffer is accessed. This is better than requiring that every implementor of the buffer API worry about the slicing logic - we can do it right in memoryview and then implementers of producer objects don't have to worry about it.

Currently, however, memoryview gets tied up in knots since it is trying to do everything itself in a way that makes it unclear what is going on. The semantics of copying the Py_buffer struct or of accessing the PEP 3118 API on the underlying object when slicing or copying views are demonstrably broken. If we try to shoehorn reference counting semantics into the current object model, we would end up with two distinct modes of operation for memoryview:

  Direct: the view is directly accessing an underlying object via the PEP 3118 API
  Indirect: the view has a reference to another memoryview object that it is using as a data source

That's complicated - hard to implement in the first place and hard to follow when reading the code. Adding the PyManagedBuffer object makes the object model more complex, but simplifies the runtime semantics: every memoryview instance will access a PyManagedBuffer object which takes care of the underlying PEP 3118 details. Direct use of the PEP 3118 consumer API in 3rd party code will also be strongly discouraged, with PyManagedBuffer promoted as the preferred alternative (producers, of course, will still need to provide the raw Py_buffer data that PyManagedBuffer exposes).

At the Python level, I don't think it is necessary to expose a new object, so we can stick with Antoine's preferred model where memoryview is the only public API. My proposed new PyManagedBuffer object would just be about making life easier at the C level.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10181>
_______________________________________