[Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)

Mon Feb 26 21:28:47 CET 2007

On 2/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> Guido van Rossum wrote:
> > I realized this thinking about the 3.0 bytes object, but the 2.x array
> > object has the same problems, and probably every other object that
> > uses the buffer API and has a mutable size (if there are any).
>
> Yes, the NumPy object has this problem as well (although it has *very*
> conservative checks so that if the reference count on the array is not
> 1, memory is not reallocated).

That would be *too* conservative for me -- just passing it as an
argument to another function increfs it (for the duration of the
call).

> > I agree that getting the pointer and length should be separated from
> > finding out how the bytes should be interpreted. I'd like to propose a
> > simple stack or hierarchy of classes to address (what I think are)
> > Travis's needs:
> >
> > - At the bottom is a redesigned buffer API: add locking, remove
> > segcount and char buffers.
>
> Great.  I have no problem with this.  Is your idea of locking the same
> as mine (i.e. a function in the API for release?)

Right.

> > - There is a mixin class (at least conceptually it's a mixin) which
> > takes anything implementing the redesigned buffer API and adds the
> > bytes API (see recently updated PEP 358); operations like .strip() or
> > slicing should return copies (of the same or a different type) or
> > views at the discretion of the underlying object. (Maybe there should
> > be a read-only and read-write version of this; note that read-only is
> > not the same as immutable, since the underlying buffer may be modified
> > by other APIs, if it allows this.)
>
> I'm not sure what this mixin class is.  Is this a base class for the
> bytes object?   I need to understand this better in order to write a PEP.

Yes, that's a good way to describe it.

> > - *Another* API built on top of the redesigned buffer API would be
> > something more aligned with numpy's needs, adding (a) a shape
> > descriptor indicating the size, offset and stride of each dimension,
> > and (b) a record descriptor indicating the interpretation of one
> > element of the array. For (a), a list of 3-tuples of ints would
> > probably be sufficient (constrained so that no valid combination of
> > indexes points outside the buffer); for (b), I propose (with Jim
> > Hugunin who first suggested this at PyCon) to use the same concise but
> > expressing format-string-like notation used by the struct module. (The
> > bytes API is not quite a special case of this, since it provides more
> > string-like operations.)
>
> Great.  NumPy has already adopted the struct standard for it's "hidden"
> character codes.

Glad to get agreement.

> We also need to add some format codes for complex-data ('F','D','G') and
> for long doubles ('g').

No problem. Just make this  a separate section in your PEP ("proposed
additions for the struct module").

> I would also propose that we make an
> enumeration in Python so we can refer to these codes in C/C++ as constants:
>
> PYFORMAT_LONG
> PYFORMAT_UINT
>
> etc.

Not sure I follow but sounds fine; hopefully the PEP draft will clarify this.

> a) I would prefer a 3-tuple of lists for the shape descriptor
> (shape list, stride list, offset list)
>
> That way default striding could be given as None and there would not
> have to be any offset as well.

Of course. I don't know much about the traditional way of representing
MD array structure.

> My view on the offset is that it is not necessary as the start of the
> array is already given by the memory pointer.  But, if others see a
> strong need for it, I have no problem with including it.

Well don't you end up with an offset as soon as you take a rectangular
slice out of a 2d array?

> b) I'm also fine with just returning a string for the record descriptor
> like the struct module uses.

Excellent. Are we all set then?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)