[Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
Travis E. Oliphant
oliphant at enthought.com
Tue Sep 11 07:10:48 CEST 2007
Guido van Rossum wrote:
> I'd like to see Travis's response to this. It's setting a precedent
> regarding locking objects in read-only mode; I haven't found other
> examples of objects using LOCKDATA (the only mentions of it seem to be
> rejecting it :). I keep getting confused by the two separate lock
> counts (and I think in this version the comment is inconsistent with
> the code). So I'm hoping Travis has a particular way in mind of
> handling LOCKDATA that can be used as a template.
>
> Travis?
>
The use case I had in mind comes about quite often in NumPy when you
want to modify the data-area of an object which may have a
non-contiguous chunk of memory, but the algorithm being used expects
contiguous data. Imagine, for example, that the exporting object is an
image whose rows are stored in different segments.
The consumer of the buffer interface, however, may be an extension
module that does fast image-processing operations and requires
contiguous data. Because it wants to write the results back in to the
memory area when it is done with the algorithm (which may be thread-safe
and may release the GIL), it requests the object to lock its data to
read-only so that other consumers do not try to get writeable buffers
while it is processing.
When the algorithm is done, it alone can write to the memory area and
then when it releases the buffer, the original object will restore
itself to being writeable. Of course, the exporting object must support
this kind of operation and not all objects will. I expect the NumPy
array object and the PIL to support it for example, and other
media-centric objects.
It would probably be useful if the bytes object supported it because
then other objects could use it as the memory area. To do it
correctly, the object exporting the interface must only allow locking if
no other writeable interfaces have been exported (which it must keep
track of) and then on release must check to see if the buffer that is
being released is the one that locked its data.
For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a
slightly different implementation of the concept. When this flag is
set during conversion to an array, then if a copy must be made to
satisfy the requirements, the original array is set as read-only and
this special flag is set on the array. When the copy is deleted, its
memory is automatically copied (and possibly casted, etc.) back into the
original array. It is a nice abstraction of the concept of an output
data area that was borrowed from Numarray and allows many things to be
implemented very quickly in NumPy.
One of the main things people use the NumPy C-API for is to get a
contiguous chunk of memory from an array in order to do processing in
another language (such as C or Fortran). It is nice to be able to
specify that the result gets placed back into another chunk of memory
(which may or may not be contiguous) in a unified fashion. NumPy
handles all the copying for you.
My thinking was that many people will want to be able to get contiguous
chunks of memory, do processing, and then copy the result back into a
segment of memory from a buffer-exporting object which is passed into
the routine as an output object.
I'm not sure if my explanations are helpful. Please let me know if I
can explain further.
-Travis
More information about the Python-3000
mailing list