[Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
Gregory P. Smith
greg at krypto.org
Tue Sep 11 23:10:58 CEST 2007
On 9/11/07, Guido van Rossum <guido at python.org> wrote:
> On 9/10/07, Travis E. Oliphant <oliphant at enthought.com> wrote:
> > Guido van Rossum wrote:
> > > I'd like to see Travis's response to this. It's setting a precedent
> > > regarding locking objects in read-only mode; I haven't found other
> > > examples of objects using LOCKDATA (the only mentions of it seem to be
> > > rejecting it :). I keep getting confused by the two separate lock
> > > counts (and I think in this version the comment is inconsistent with
> > > the code). So I'm hoping Travis has a particular way in mind of
> > > handling LOCKDATA that can be used as a template.
> > >
> > > Travis?
> > The use case I had in mind comes about quite often in NumPy when you
> > want to modify the data-area of an object which may have a
> > non-contiguous chunk of memory, but the algorithm being used expects
> > contiguous data. Imagine, for example, that the exporting object is an
> > image whose rows are stored in different segments.
> > The consumer of the buffer interface, however, may be an extension
> > module that does fast image-processing operations and requires
> > contiguous data. Because it wants to write the results back in to the
> > memory area when it is done with the algorithm (which may be thread-safe
> > and may release the GIL), it requests the object to lock its data to
> > read-only so that other consumers do not try to get writeable buffers
> > while it is processing.
> > When the algorithm is done, it alone can write to the memory area and
> > then when it releases the buffer, the original object will restore
> > itself to being writeable. Of course, the exporting object must support
> > this kind of operation and not all objects will. I expect the NumPy
> > array object and the PIL to support it for example, and other
> > media-centric objects.
> Hm, so this is completely different from what I thought. It seems you
> are describing the following:
> 1. acquire the buffer with LOCK_DATA
> 2. copy the data out of the buffer into a scratch area
> 3. work on the scratch area
> 4. copy the data from the scratch area back into the buffer
> 5. release the buffer
> i would call this an exclusive write lock, which is quite different
> from the read lock interpretation implemented by Greg in his patch.
> Could you add some language to PEP 3118 to clarify this usage? Or is
> it already there? I admit to not having read it in full...
Yes that is different from what I was using it for based on what the pep
3118 description said. Perhaps the existing description in PEP 3118 should
be renamed from LOCKDATA to READONLY?
> It would probably be useful if the bytes object supported it because
> > then other objects could use it as the memory area. To do it
> > correctly, the object exporting the interface must only allow locking if
> > no other writeable interfaces have been exported (which it must keep
> > track of) and then on release must check to see if the buffer that is
> > being released is the one that locked its data.
> Right. So it seems you would need a counter of outstanding
> non-data-locked buffer requests and a single bit indicating whether
> there's a data-locked request. (Rather than two counters like Greg's
> patch currently uses.)
> The hacker in me is already exploring the possibility of making the
> count negative if there's a data-locked request; it sounds like the
> valid transitions are:
> 0 -> 1 -> 2 -> ... (SIMPLE or WRITABLE get)
> ... -> 2 -> 1 -> ... (SIMPLE or WRITABLE release)
> 0 -> -1 (LOCKDATA get)
> -1 -> 0 (LOCKDATA release)
> Have I got that right? I think that you should only be able to request
> LOCKDATA if there are no other readers *or* writers, but that SIMPLE
> and WRITABLE clients should be able to coexist (any mess that creates
> would be the requester's own fault). Any nonzero value here would
> indicate that the buffer can't be moved.
> I note that the use case in the bsddb wrapper extension is a bit
> different -- Greg suspects that BerkeleyDB won't like the data
> changing while it is using it (e.g. it might violate its own invariant
> if the key changes between the time its hash is computed and the time
> it is written to disk). To ensure this, currently LOCKDATA is the only
> option; but a classic read lock would allow multiple concurrent
> readers (which is how Greg's patch to bytesobject.c interprets
> I think this needs to be clarified. Perhaps we need to separate
> clearer the type of access (read or write) and the amount of locking
> desired (can others read? can others write?).
bsddb is not alone here but was just the code I was working on that made me
think it necessary. I am hoping that -all- file/socket/whatever output
operations using the buffer API will get properly read-locked views of the
buffer so that they can release the GIL and not have the data change out
from underneath them by other threads. (this avoids hard to debug issues
which python has so far been pretty good at avoiding)
(BTW The current implementation in bytesobject.c allows changing the
> size as long as it fits within the allocated size; I think this is
> probably too lenient, and begging for latent bugs.)
> (Spelling alert: 'writeable' is apparently not an English word. I hope
> it's not too late to rename the flag to PyBUF_WRITABLE. I've opened
> http://bugs.python.org/issue1150 to track this.)
eek, yes please lets spell correctly. :)
> For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a
> > slightly different implementation of the concept. When this flag is
> > set during conversion to an array, then if a copy must be made to
> > satisfy the requirements, the original array is set as read-only and
> > this special flag is set on the array. When the copy is deleted, its
> > memory is automatically copied (and possibly casted, etc.) back into the
> > original array. It is a nice abstraction of the concept of an output
> > data area that was borrowed from Numarray and allows many things to be
> > implemented very quickly in NumPy.
> So in terms of locks, this effectively sets read *and* write locks on
> the original object (since whatever you might read out of it may be
> invalidated when the modified copy is written back). But how to
> enforce that at the Python level? If we had something like this for
> the bytes object, any *use* of the bytes object from Python (e.g.
> iterating over it or indexing or slicing it) should be prohibited. Is
> this reasonable?
> > One of the main things people use the NumPy C-API for is to get a
> > contiguous chunk of memory from an array in order to do processing in
> > another language (such as C or Fortran). It is nice to be able to
> > specify that the result gets placed back into another chunk of memory
> > (which may or may not be contiguous) in a unified fashion. NumPy
> > handles all the copying for you.
> > My thinking was that many people will want to be able to get contiguous
> > chunks of memory, do processing, and then copy the result back into a
> > segment of memory from a buffer-exporting object which is passed into
> > the routine as an output object.
> This is probably common for numpy; for the bytes object, I expect that
> it's all much simpler, since it's just a contiguous 1D array of
fwiw, in the bsddb and hashlib code I raise an error if the buffer returned
is not a 1D array.
> I'm not sure if my explanations are helpful. Please let me know if I
> > can explain further.
> --Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-3000