[Python-Dev] Extended Buffer Interface/Protocol

Carl Banks pythondev at aerojockey.com
Tue Mar 27 03:49:01 CEST 2007


Travis Oliphant wrote:
> Carl Banks wrote:
> 
>> Tr
>> ITSM that we are using the word "view" very differently.  Consider 
>> this example:
>>
>> A = zeros((100,100))
>> B = A.transpose()
> 
> 
> You are thinking of NumPy's particular use case.  I'm thinking of a 
> generic use case.  So, yes I'm using the word view in two different 
> contexts.
> 
> In this scenario, NumPy does not even use the buffer interface.  It 
> knows how to transpose it's own objects and does so by creating a new 
> NumPy object (with it's own shape and strides space) with a data buffer 
> pointed to by "A".

I realized that as soon as I tried a simple Python demonstration of it. 
  So it's a poor example.  But I hope it's obvious how it would 
generalize to a different type.


>>> Having such a thing as a view object would actually be nice because 
>>> it could hold on to a particular view of data with a given set of 
>>> shape and strides (whose memory it owns itself) and then the 
>>> exporting object would be free to change it's shape/strides 
>>> information as long as the data did not change.
>>
>>
>> What I don't undestand is why it's important for the provider to 
>> retain this data.  The requestor only needs the information when it's 
>> created; it will calculate its own versions of the data, and will not 
>> need the originals again, so no need to the provider to keep them around.
> 
> That is certainly a policy we could enforce (and pretty much what I've 
> been thinking).  We just need to make it explicit that the shape and 
> strides provided is only guaranteed up until a GIL release (i.e. 
> arbitrary Python code could change these memory areas both their content 
> and location) and so if you need them later, make your own copies.
> 
> If this were the policy, then NumPy could simply pass pointers to its 
> stored shape and strides arrays when the buffer interface is called but 
> then not worry about re-allocating these arrays before the "buffer" lock 
> is released.   Another object could hold on to the memory area of the 
> NumPy array but would have to store shape and strides information if it 
> wanted to keep it.
> NumPy could also just pass a pointer to the char * representation of the 
> format (which in NumPy would be stored in the dtype object) and would 
> not have to worry about the dtype being re-assigned later.

Bingo!  This is my preference.


>>>> The reason I ask is: if things like "buf" and "strides" and "shape" 
>>>> could change when a buffer is re-exported, then it can complicate 
>>>> things for PIL-like buffers.  (How would you account for offsets in 
>>>> a dimension that's in a subarray?)
>>>
>>>
>>> I'm not sure what you mean, offsets are handled by changing the 
>>> starting location of the pointer to the buffer.
>>
>>
>>
>> But to anwser your question: you can't just change the starting 
>> location because there's hundreds of buffers.  You'd either have to 
>> change the starting location of each one of them, which is highly 
>> undesirable, or to factor in an offset somehow.  (This was, in fact, 
>> the point of the "derefoff" term in my original suggestion.)
> 
> 
> I get better what your derefoff was doing now.  I was missing the 
> de-referencing that was going on.   Couldn't you still just store a 
> pointer to the start of the array.  In other words, isn't your **isptr  
> suggestion sufficient?   It seems to be.

No.  The problem arises when slicing.  In a single buffer, you would 
adjust the base pointer to point at the element [0,0] of the slice.  But 
you can't do that with multiple buffers.  Instead, you have to add an 
offset after deferencing the pointer to the subarray.

Hence my derefoff proposal.  It dereferenced the pointer, then added an 
offset to get you to the 0 position in that dimension.


>> Anyways, despite the miscommunications so far, I now have a very good 
>> idea of what's going on.  We definitely need to get terms straight.  I 
>> agree that getbuffer should return an object.  I think we need to 
>> think harder about the case when requestors re-export the buffer.  
>> (Maybe it's time to whip up some experimental objects?)
> 
> I'm still not clear what you are concerned about.   If an object 
> consumes the buffer interface and then wants to be able to later export 
> it to another, then from our discussion about the shape/strides and 
> format information, it would have to maintain it's own copies of these 
> things, because it could not rely on the original provider (or exporter) 
> to keep them around once the GIL is released.

Right.  So, if someone calls getbuffer, it would send its own copies of 
the buffer information, and not the original exporter's.  The values 
returned by getbuffer can vary for a given buffer, depending on the 
exporter.  Which means the data returned by getbuffer could reflect 
slicing.  Which means the isptr array is not sufficient for the 
PIL-style multiple buffers.


> This is the reason we would have to be very clear about the guaranteed 
> persistance of the shape/strides and format memory whose pointers are 
> returned through the proposed buffer interface.
> 
> Thanks for the discussion.  It is nice to have someone to talk with 
> about these things.   A conversation always results in a better 
> implementation.

I just hope it wasn't all due to terminological misunderstading.


Carl Banks


More information about the Python-Dev mailing list