Store shared/locked state inside of the lock object

Since the introduction of context managers, using the lock object itself has become easier and safer (in that there's very little chance of forgetting to release a lock anymore). A big annoyance remains though: relation between lock and lockee remains informal and there is no structured way to indicate: 1. whether a state-set is protected by a lock 2. which lock protects the state-set 3. which state-set is protected by a lock Some languages (e.g. Java) have tried to solve 2 and 3 using intrinsic locks, however * that does not solve 1 (and it becomes impossible to informally look for lock objects lying around and try to find corresponding state-sets) * it does not help much when state isn't coalesced in a single object, and for state hierarchies there is no way to express whether the whole hierarchy should be protected under the same lock (the root's) or each leaf should be locked individually. AFAIK intrinsic locks are not hierarchical themselves * things get very awkward when using alternate concurrency-management strategies such as explicit locks for security reason[0], the non-use of intrinsic locks has to be again documented informally A fairly small technical change I've been considering to improve this situation is to store the state-set inside the lock object, and only yield it through the context manager: that the state-set is protected by a lock is made obvious, and so is the relation between lock and state-set. I was delighted to discover that Rust's sync::Mutex[1] and sync::RWLock[2] follow a very similar strategy of owning the state-set It's not a panacea, it doesn't fix issues of lock acquisition ordering for instance, but I think it would go a fairly long way towards making correct use of locks easier in Python. The basic changes would be: * threading.Lock and threading.RLock would now take an optional `data` parameter * the parameter would be stored internally and not directly accessible by users of the lock * that parameter would be returned by __enter__ and provided to the current "owner" of the lock These should cause no forward-compatibility issues, Lock() currently takes no arguments, and its __enter__ returns no value. Possible improvements/questions/issues I can see: * with Lock, the locked state would not be available unless using as a context manager. RLock could allow getting the protected state only while locked by the current thread * as-is, the scheme requires mutable state as it's not possible to swap the internal state entirely. RLock could allow state-replacement when locked * because Python has no ownership concept, it would be possible for a consumer to keep a reference to the locked state and manipulate it without locking I don't consider the third issue to be huge, it could be mitigated by yielding a proxy to the internal state only valid for the current lock span. However I do not know if it's possible to create completely transparent proxies in Python. The first two issues are slightly more troubling and could be mitigated by yielding not the state-set alone but a proxy object living only for the current lock span (or both the state-set and a proxy object) that proxy would allow getting and setting the state-set, and would error-out after unlocking. Lock.acquire() could be altered to return the same proxy (or (state-set, proxy) pair) however it's currently defined as returning either True or False so that'd be a backwards- incompatible change. An alternative would be to add a new acquisition method or a new flag parameter changing the return value from True to these on acquisition. A drawback of this additional change is that it would require the lock object to keep track of the current live proxy(/proxies for rlock?), and invalidate it(/them) on unlocking, increasing its complexity much more than just adding a new attribute. Thoughts? [0] https://www.securecoding.cert.org/confluence/display/java/LCK00-J.+Use+priva... [1] http://doc.rust-lang.org/sync/struct.Mutex.html [2] http://doc.rust-lang.org/sync/struct.RWLock.html

Hi! On Sat, Nov 08, 2014 at 02:05:07PM +0100, Masklinn <masklinn@masklinn.net> wrote:
... a proxy to the internal state only valid for the current lock span. However I do not know if it's possible to create completely transparent proxies in Python.
I think it's possible -- in a C extension. See http://www.egenix.com/products/python/mxBase/mxProxy/ for an example. (I don't have any opinion on the proposal.) Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, 8 Nov 2014 14:05:07 +0100 Masklinn <masklinn@masklinn.net> wrote:
A fairly small technical change I've been considering to improve this situation is to store the state-set inside the lock object, and only yield it through the context manager: that the state-set is protected by a lock is made obvious, and so is the relation between lock and state-set. I was delighted to discover that Rust's sync::Mutex[1] and sync::RWLock[2] follow a very similar strategy of owning the state-set
It's not a panacea, it doesn't fix issues of lock acquisition ordering for instance, but I think it would go a fairly long way towards making correct use of locks easier in Python.
The basic changes would be: * threading.Lock and threading.RLock would now take an optional `data` parameter * the parameter would be stored internally and not directly accessible by users of the lock * that parameter would be returned by __enter__ and provided to the current "owner" of the lock
For clarity it should probably be a separate class (or set of classes), e.g. DataLock. Regards Antoine.

On 2014-11-08, at 16:04 , Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 8 Nov 2014 14:05:07 +0100 Masklinn <masklinn@masklinn.net> wrote:
A fairly small technical change I've been considering to improve this situation is to store the state-set inside the lock object, and only yield it through the context manager: that the state-set is protected by a lock is made obvious, and so is the relation between lock and state-set. I was delighted to discover that Rust's sync::Mutex[1] and sync::RWLock[2] follow a very similar strategy of owning the state-set
It's not a panacea, it doesn't fix issues of lock acquisition ordering for instance, but I think it would go a fairly long way towards making correct use of locks easier in Python.
The basic changes would be: * threading.Lock and threading.RLock would now take an optional `data` parameter * the parameter would be stored internally and not directly accessible by users of the lock * that parameter would be returned by __enter__ and provided to the current "owner" of the lock
For clarity it should probably be a separate class (or set of classes), e.g. DataLock.
On the one hand this'd allow completely ignoring backwards-compatibility issues wrt acquire() which is nice, on the other hand it would double the number of lock types and introduce redundancy as DataLock would be pretty much a strict superset of Lock, which is why I thought extending Lock made sense.

On 11/08/2014 10:42 AM, Masklinn wrote:
On 2014-11-08, at 16:04 , Antoine Pitrou wrote:
On Sat, 8 Nov 2014 14:05:07 +0100 Masklinn wrote:
A fairly small technical change I've been considering to improve this situation is to store the state-set inside the lock object, and only yield it through the context manager: that the state-set is protected by a lock is made obvious, and so is the relation between lock and state-set. I was delighted to discover that Rust's sync::Mutex and sync::RWLock follow a very similar strategy of owning the state-set
For clarity it should probably be a separate class (or set of classes), e.g. DataLock.
On the one hand this'd allow completely ignoring backwards-compatibility issues wrt acquire() which is nice, on the other hand it would double the number of lock types and introduce redundancy as DataLock would be pretty much a strict superset of Lock, which is why I thought extending Lock made sense.
How does transforming existing locks into this kind of lock benefit existing code? If existing code has to change to take advantage of the new features, said code could just as easily change the name of the lock it was using. -- ~Ethan~

On 2014-11-08, at 21:01 , Ethan Furman <ethan@stoneleaf.us> wrote:
On 11/08/2014 10:42 AM, Masklinn wrote:
On the one hand this'd allow completely ignoring backwards-compatibility issues wrt acquire() which is nice, on the other hand it would double the number of lock types and introduce redundancy as DataLock would be pretty much a strict superset of Lock, which is why I thought extending Lock made sense.
How does transforming existing locks into this kind of lock benefit existing code?
I don't think I claimed that anywhere, as far as I think it makes absolutely no difference to existing code.
If existing code has to change to take advantage of the new features, said code could just as easily change the name of the lock it was using.
Yes?

On 9 November 2014 04:42, Masklinn <masklinn@masklinn.net> wrote:
On 2014-11-08, at 16:04 , Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 8 Nov 2014 14:05:07 +0100 Masklinn <masklinn@masklinn.net> wrote:
A fairly small technical change I've been considering to improve this situation is to store the state-set inside the lock object, and only yield it through the context manager: that the state-set is protected by a lock is made obvious, and so is the relation between lock and state-set. I was delighted to discover that Rust's sync::Mutex[1] and sync::RWLock[2] follow a very similar strategy of owning the state-set
It's not a panacea, it doesn't fix issues of lock acquisition ordering for instance, but I think it would go a fairly long way towards making correct use of locks easier in Python.
The basic changes would be: * threading.Lock and threading.RLock would now take an optional `data` parameter * the parameter would be stored internally and not directly accessible by users of the lock * that parameter would be returned by __enter__ and provided to the current "owner" of the lock
For clarity it should probably be a separate class (or set of classes), e.g. DataLock.
On the one hand this'd allow completely ignoring backwards-compatibility issues wrt acquire() which is nice, on the other hand it would double the number of lock types and introduce redundancy as DataLock would be pretty much a strict superset of Lock, which is why I thought extending Lock made sense.
Merging it into Lock would make Lock itself harder to learn and use, so the separate DataLock notion sounds better to me - it keeps the documentation separate, so folks that just want a basic Lock or RLock don't need to care that DataLock exists. It's also worth considering just always making DataLock recursive, and not worrying about the non-recursive variant. If you'd like to experiment with this as a 3rd party module, Graham Dumpleton's wrapt library makes it possible to write almost completely transparent proxies in pure Python: http://wrapt.readthedocs.org/en/latest/wrappers.html Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Antoine Pitrou
-
Ethan Furman
-
Masklinn
-
Nick Coghlan
-
Oleg Broytman