[Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)

Mon Feb 26 21:37:32 CET 2007

Guido van Rossum wrote:
> On 2/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> 
>>Guido van Rossum wrote:
>>
>>>I realized this thinking about the 3.0 bytes object, but the 2.x array
>>>object has the same problems, and probably every other object that
>>>uses the buffer API and has a mutable size (if there are any).
>>
>>Yes, the NumPy object has this problem as well (although it has *very*
>>conservative checks so that if the reference count on the array is not
>>1, memory is not reallocated).
> 
> 
> That would be *too* conservative for me -- just passing it as an
> argument to another function increfs it (for the duration of the
> call).
> 

It's too conservative for us to.  We just don't see anyway around it 
without the locking mechanism (right now you can over-ride the ref-count 
checking if you know what you are doing).

>>
>>I'm not sure what this mixin class is.  Is this a base class for the
>>bytes object?   I need to understand this better in order to write a PEP.
> 
> 
> Yes, that's a good way to describe it.
> 
> 
>>>- *Another* API built on top of the redesigned buffer API would be
>>>something more aligned with numpy's needs, adding (a) a shape
>>>descriptor indicating the size, offset and stride of each dimension,
>>>and (b) a record descriptor indicating the interpretation of one
>>>element of the array. For (a), a list of 3-tuples of ints would
>>>probably be sufficient (constrained so that no valid combination of
>>>indexes points outside the buffer); for (b), I propose (with Jim
>>>Hugunin who first suggested this at PyCon) to use the same concise but
>>>expressing format-string-like notation used by the struct module. (The
>>>bytes API is not quite a special case of this, since it provides more
>>>string-like operations.)
>>
>>Great.  NumPy has already adopted the struct standard for it's "hidden"
>>character codes.
> 
> 
> Glad to get agreement.
> 
> 
>>We also need to add some format codes for complex-data ('F','D','G') and
>>for long doubles ('g').
> 
> 
> No problem. Just make this  a separate section in your PEP ("proposed
> additions for the struct module").
> 

O.K. great.

> 
>>I would also propose that we make an
>>enumeration in Python so we can refer to these codes in C/C++ as constants:
>>
>>PYFORMAT_LONG
>>PYFORMAT_UINT
>>
>>etc.
> 
> 
> Not sure I follow but sounds fine; hopefully the PEP draft will clarify this.
> 

This is just some header magic (either defines or an enum statement so 
you don't have to remember character codes in C/C++).

> 
>>a) I would prefer a 3-tuple of lists for the shape descriptor
>>(shape list, stride list, offset list)
>>
>>That way default striding could be given as None and there would not
>>have to be any offset as well.
> 
> 
> Of course. I don't know much about the traditional way of representing
> MD array structure.
> 
> 
>>My view on the offset is that it is not necessary as the start of the
>>array is already given by the memory pointer.  But, if others see a
>>strong need for it, I have no problem with including it.
> 
> 
> Well don't you end up with an offset as soon as you take a rectangular
> slice out of a 2d array?

You can either 1) keep the same base memory pointer and create an offset 
list, or 2) have no offset and change the starting memory pointer.

NumPy uses option 2 (it stores the starting point of the array).

> 
> 
>>b) I'm also fine with just returning a string for the record descriptor
>>like the struct module uses.
> 
> 
> Excellent. Are we all set then?

I think so.  I have some additional ideas about the string format 
description that I will explain in the PEP.   The draft is coming along at

http://wiki.python.org/moin/ArrayInterface

Feel free to make changes there.

-Travis