[Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support

Tue Sep 11 21:49:11 CEST 2007

Guido van Rossum wrote:
> On 9/10/07, Travis E. Oliphant <oliphant at enthought.com> wrote:
>   
>>
>
> Hm, so this is completely different from what I thought. It seems you
> are describing the following:
>
> 1. acquire the buffer with LOCK_DATA
> 2. copy the data out of the buffer into a scratch area
> 3. work on the scratch area
> 4. copy the data from the scratch area back into the buffer
> 5. release the buffer
>
> i would call this an exclusive write lock, which is quite different
> from the read lock interpretation implemented by Greg in his patch.
> Could you add some language to PEP 3118 to clarify this usage? Or is
> it already there? I admit to not having read it in full...
>   
Yes, you have nailed the usage I was thinking of.  I admit that there 
are other usage variants that I am not thinking of.   These should be 
vetted. 
>> It would probably be useful if the bytes object supported it because
>> then other objects could use it as the memory area.    To do it
>> correctly, the object exporting the interface must only allow locking if
>> no other writeable interfaces have been exported (which it must keep
>> track of) and then on release must check to see if the buffer that is
>> being released is the one that locked its data.
>>     
>
> Right. So it seems you would need a counter of outstanding
> non-data-locked buffer requests and a single bit indicating whether
> there's a data-locked request. (Rather than two counters like Greg's
> patch currently uses.)
>
> The hacker in me is already exploring the possibility of making the
> count negative if there's a data-locked request; it sounds like the
> valid transitions are:
>
> 0 -> 1 -> 2 -> ... (SIMPLE or WRITABLE get)
> ... -> 2 -> 1 -> ... (SIMPLE or WRITABLE release)
> 0 -> -1 (LOCKDATA get)
> -1 -> 0 (LOCKDATA release)
>
> Have I got that right? I think that you should only be able to request
> LOCKDATA if there are no other readers *or* writers, but that SIMPLE
> and WRITABLE clients should be able to coexist (any mess that creates
> would be the requester's own fault). Any nonzero value here would
> indicate that the buffer can't be moved.
>   
Your understanding looks fine to me.  A comment I got at SciPy gave me 
the feeling that this has the look of an infrastructure that is 
necessary for shared-memory and thread-safe memory management.  But, I 
do not admit to having thought through all of those issues.  However, I 
would welcome any suggestions for improvement that would allow the 
buffer interface to be used to manage memory in thread-safe ways.
> I note that the use case in the bsddb wrapper extension is a bit
> different -- Greg suspects that BerkeleyDB won't like the data
> changing while it is using it (e.g. it might violate its own invariant
> if the key changes between the time its hash is computed and the time
> it is written to disk). To ensure this, currently LOCKDATA is the only
> option; but a classic read lock would allow multiple concurrent
> readers (which is how Greg's patch to bytesobject.c interprets
> LOCKDATA).
>   
I'm not sure I understand the difference between a classic read lock and 
the exclusive write lock concept.   Does the classic read-lock just 
prevent writing to the memory area.  In my mind that is a read-only 
memory buffer and the buffer interface would complain if a writeable 
buffer was requested.

> I think this needs to be clarified. Perhaps we need to separate
> clearer the type of access (read or write) and the amount of locking
> desired (can others read? can others write?).
>   
Yes, I think the clarification is useful.  
> (BTW The current implementation in bytesobject.c allows changing the
> size as long as it fits within the allocated size; I think this is
> probably too lenient, and begging for latent bugs.)
>
> (Spelling alert: 'writeable' is apparently not an English word. I hope
> it's not too late to rename the flag to PyBUF_WRITABLE. I've opened
> http://bugs.python.org/issue1150 to track this.)
>
>   
Actually, writeable is an accepted variant of 'writable' (but it doesn't 
show up in many spell-check dictionaries).  No, it is not too late to 
change it.  Or just define WRITEABLE as WRITABLE.   NumPy uses 
"WRITEABLE" simply because I like that spelling better. 
>> For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a
>> slightly different implementation of the concept.   When this flag is
>> set during conversion to an array, then if a copy must be made to
>> satisfy the requirements, the original array is set as read-only and
>> this special flag is set on the array.  When the copy is deleted, its
>> memory is automatically copied (and possibly casted, etc.) back into the
>> original array.  It is a nice abstraction of the concept of an output
>> data area that was borrowed from Numarray and allows many things to be
>> implemented very quickly in NumPy.
>>     
>
> So in terms of locks, this effectively sets read *and* write locks on
> the original object (since whatever you might read out of it may be
> invalidated when the modified copy is written back). 
Sort of, the object is set as read-only before the UPDATEIFCOPY version 
is made.  Another python thread could technically read the data (but the 
flag would be set on it so that the user could know that another memory 
area was shadowing this one).  Usually these kinds of object only show 
up as output arguments to functions and the programmer is left 
responsible to not try and rely on data that may be changing. 

Perhaps more fine-grained locks are needed. 

>
> This is probably common for numpy; for the bytes object, I expect that
> it's all much simpler, since it's just a contiguous 1D array of
> bytes...
>   
Yes, indeed it is much simpler....

I'm anxious for feedback and help with the locking mechanism, because I 
do not have all use cases in mind.  I have never thought about a lock 
that prevents reading.  In my mind, this would be handled by the object 
itself.  It could refuse buffer requests if it's data had been locked or 
it could not. 

On the other hand, there could be two concepts of locking that a 
consumer could request from an object

1) Lock so that no other reads or writes are possible until the lock is 
released.
2) Lock so that only reads are possible. 

I had only thought of #2 for the current buffer interface.

-Travis