[Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support

Thu Sep 13 21:27:33 CEST 2007

Guido van Rossum wrote:
> On 9/11/07, Travis E. Oliphant <oliphant at enthought.com> wrote:
>   
>> I'm not sure I understand the difference between a classic read lock and
>> the exclusive write lock concept.   Does the classic read-lock just
>> prevent writing to the memory area.  In my mind that is a read-only
>> memory buffer and the buffer interface would complain if a writeable
>> buffer was requested.
>>     
>
> There are different notions of reading and writing.  Sometimes an
> object it naturally read-only (e.g. a PyString). In that case
> requesting SIMPLE access should pass but requesting WRITABLE or
> LOCKDATA access should fail. (I think the other flags are orthogonal
> to these, right?). Any number of concurrent SIMPLE accesses can
> coexist since the clients promise they will only read.
>   
Yes, the other flags are orthogonal to this concept.
> OTOH suppose we have an object that is naturally writable (e.g. e
> PyBytes). I understood that in this case any number of SIMPLE or
> WRITABLE requests would be allowed to be outstanding simultaneously,
> and any of these would simply prevent the buffer from moving (fixing
> the object's size). But this doesn't sound like it is how you meant it
> -- you seem to say that once any SIMPLE (readonly) requests are
> outstanding, WRITABLE requests should fail. 
Wait a minute.  I want to clarify that normally any number of SIMPLE or 
WRITEABLE requests would be possible for an object that is naturally 
writeable.   That is my thinking. 

The purpose of LOCKDATA is to allow an object to request that the object 
not be writeable in the future while it holds a view to the object.   I 
did not think that this would be the normal behavior, but exceptional.

What seems to be needed is yet another flag that allows a buffer 
requester to insist that the object not allow any buffer accesses read 
or write until its view is done.   So, you would have something like

LOCK_FOR_WRITE
LOCK_FOR_READ

I would want to encourage people not to use the LOCK_FOR_READ unless 
there is an important benefit or need to use it.   On the other hand, 
the argument about dma mechanisms (like moving memory to a video card 
for processing) needing to make the buffer unavailable temporarily 
sounds like a reasonable one to me.  I can already see applications for it.
> And I suppose that only
> one WRITABLE request ought to be allowed at a time. But then I don't
> know what the difference between WRITABLE and LOCKDATA would be.
>
>
>   
I hope I've clarified the difference between these in my mind.
> Then a "classic read lock" would request read
> access while locking out writers (bsddb would use this);
I did not separate this case in my mind, as I presumed that if something 
wanted to prevent other writers it would itself want to write.  I can 
see what is wanted here now.
>  a "classic
> write lock" would request write access while locking out writers (your
> scratch area example would use this); others who don't really care if
> the data changes underneath them as long as it doesn't move (e.g.
> traditional I/O) could request read access without locking. I'm not
> sure if there's a use case to be made for write access without
> locking, but I wouldn't rule it out -- possibly when two threads share
> a memory area they might have their own protocol for locking it and
> might just both want to be able to write to (parts of) it.
>   
Yes, I would not rule out write-access without locking either.  NumPy 
actually uses that all the time internally where two or more objects 
share the same data and can both write to it (although the community 
warns people about doing this without knowing what you are doing).
> What do you think? Another way to look at this would be to consider
> these 4 cases:
>   
I think I was leaving out the cases

1) requesting a read access with future write locking ('classic read lock')
2) requesting a read or write access with future read locking.

Let me see how my thinking maps to your list below which at first glance 
looks pretty good.
> basic read access (I can read, others can read or write)
> locked read access (I can read, others can only read)
> basic write access (I can read and write, others can read or write)
> exclusive write access (I can read and write, no others can read or write)
>
>   
I guess my original LOCK_DATA concept (I can read and write, others can 
only read) is not even in this list as you discuss below.   I'm actually 
wondering if another function should be added to handle the concept of 
locking.  I can imagine that it will want to grow more fine-grained 
locking possibilities.

> Except that accessing the object from Python (e.g. iteration or
> indexing) never gets locked out. (Or perhaps it should be? That can
> also be done.)
>   
I think if it doesn't go through the buffer interface it is up to the 
object to decide (i.e. what does the object do with itself when buffers 
are exported --- that will depend on the object).   All it must do is 
support the buffer interface in the correct way (i.e. not move the 
memory buffers are relying on and support the access modes correctly 
that it purports to export).
>> Actually, writeable is an accepted variant of 'writable' (but it doesn't
>> show up in many spell-check dictionaries).  No, it is not too late to
>> change it.  Or just define WRITEABLE as WRITABLE.   NumPy uses
>> "WRITEABLE" simply because I like that spelling better.
>>     
>
> Google found 1.4M occurrences of writeable vs. 3.9M occurrences of
> writable. I guess you represent a strong minority. :-) I'd still like
> to see it changed. We can leave WRITEABLE as an alias for WRITABLE for
> those who are used to seeing it that way in NumPy.
>   
I'm fine with that. 
>
> Well, the scratch area scenario you describe makes it iffy to read
> anything out of the original object since you wouldn't know whether
> you were reading before, during or after the write back from the
> scratch area to the object's buffer. The question is, do we really
> care. If we adopted my 4 access modes above, we could say that basic
> read access will still be granted when someone has exclusive write
> access if we don't care, OR we could say that basic reads are locked
> out by exclusive write access. (And then there's the separate issue of
> whether python-level access counts as basic read access or doesn't
> count at all -- though the moer I think about it, I think it should be
> treated the smne as basic read access.)
>
>   
>> On the other hand, there could be two concepts of locking that a
>> consumer could request from an object
>>
>> 1) Lock so that no other reads or writes are possible until the lock is
>> released.
>> 2) Lock so that only reads are possible.
>>
>> I had only thought of #2 for the current buffer interface.
>>     
>
> #1 maps to locked read OR exclusive write access in the strict variant.
> #2 maps to locked read in my scheme.
>
>   
Let me think about adding a function for read-write locking that is 
separate from getting a view (which implements memory-location 
locking).  I appreciate the discussion as it is helping me clarify my 
thinking.

-Travis