
Sigh. Sent this only to bruce by accident. Sorry about the duplicates Bruce. ---------- Forwarded message ---------- From: Mike Meyer <mwm@mired.org> Date: Mon, Oct 31, 2011 at 10:30 PM Subject: Re: [Python-ideas] Concurrent safety? To: Bruce Leban <bruce@leapyear.org> On Mon, 31 Oct 2011 18:07:19 -0700 Bruce Leban <bruce@leapyear.org> wrote: > On Mon, Oct 31, 2011 at 4:19 PM, Mike Meyer <mwm@mired.org> wrote: > >> That would mean that all of my current code is broken. > > Pretty much, yes. It's like adding garbage collection and removing > > alloc*/free. It's going to break a *lot* of code. That's why I said "not in > > 3. and possibly never in cPython." > In order to make concurrent code slightly safer, you're going to break all > existing programs that don't use concurrency. That seems to me like a new > language, not Python. You've been on this list long enough to see the > attention that's paid to backward compatibility. The way adding automatic memory management just made pointer arithmetic slightly safer. But yeah, the first thing I said was "never in 3.x, possibly never in cPython." I've been on this list long enough to know that, while the community pays a lot of attention to backwards compatibility, it is willing to throw it out when there's enough benefit. As you point out, this is a hard problem. I know I haven't covered all the issues. That's why the second thing I said was that I'm hoping to get people smarter than me to look at things. The cpu manufacturers have switched to improving performance by adding more cores instead of cranking clock speeds. As time goes by, more and more programmers are going to want to leverage that in serious ways. I already do, and find that Python makes some design choices unavailable or very fragile(*). I'd like to make those choices available, and help Python get ready for the time when that desire is the norm instead of the exception. > > Furthermore, merely *reading* an object that isn't locked can cause > >> problems. This code is not thread-safe: > >> if element in dictionary: return dictionary[element] > >> so you have to decide how much safety you want and what cost we're > >> willing to pay for this. > > You're right - it's not thread safe. However, it also doesn't suffer from > > the problem I'm trying to deal with, where you mutate an object in a way > > that leaves things broken, but won't be detected at that point. If it > > breaks because someone mutates the object underneath it, it'll throw an > > exception at that point. I know you can construct cases where that isn't > > so. > I think the cases where non-thread-safe code won't throw an exception are > numerous, for example, the equally trivial: Again, I said such cases can be built. I *didn't* say they were exceptions, I proposed a change to deal with them. > If you're going to tackle thread safety, it should address more of the > problem. These bugs are in many ways worse than mutating "an object in a > way that leaves things broken, but won't be detected at that point." The > above bugs may *never* be detected. I've come across bugs like that that > were in code for many years before I found them (and I'm sure that's > happened to others on this list as well). Like me. That's part of why I want to get the interpreter to help find them. > The first thing to do is identify the problems you want to solve and make > sure that the problems are well understood. Then design some solutions. > Starting with a bad solution to a fraction of the problem isn't a good > start. I've identified the problem I want to solve: I want to make concurrent use of python objects "safe by default", so that doing unsafe things causes the programmer to have to do something explicit about making things safe. I believe this can be done at the mutation points (again, clojure shows that it can be done). I also want to preserve as much of Python's existing code as possible. It may be that Python's existing data structures mean my believe about mutation points is wrong. This may be the wrong solution. It may be that such a change is to large to be acceptable. But the only way to find out is to investigate it. This discussion has already generated significant changes in the original proposal, plus some implementation ideas. <mike *) I'll note that some of the alternatives make the choices available in Python unavailable in a similar way. -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

On 11/1/2011 1:32 AM, Mike Meyer wrote:
This is one of the hard problems that keep getting swept under the rug while we do easier things. Well, we have overhauled unicode and packaging for 3.3, so maybe concurrency can get some attention. I keep thinking that CPython's design of allowing C coded modules either outside or inside the stdlib should allow some progress. Would it be helpful, for instance, to have truly immutable restricted tuples and frozensets, whose __new__ methods only allowed true immutables (None, booleans, numbers, strings, other restricted tuples and frozensets) as members? How about a metaclass, say 'immutable', that made the instances of a user class truly immutable? (I don't know how to do this, but lets assume it can be done -- perhaps with a new frozendict.) If such were possible, instances of instances of such a metaclass could be added to the list above. Could a metaclass automatically add fine-grained locks around around attribute mutations? -- Terry Jan Reedy

On Mon, Oct 31, 2011 at 11:55 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Hey, it worked!
I keep thinking that CPython's design of allowing C coded modules either outside or inside the stdlib should allow some progress.
Would it be helpful, for instance, to have truly immutable restricted tuples and frozensets, whose __new__ methods only allowed true immutables (None, booleans, numbers, strings, other restricted tuples and frozensets) as members?
Possibly. However, so long as the mutations they make don't change the externally visible behavior, then for the purposes of this discussion, they already are immutable. Or is it possible that concurrent updates of that not-externally-visible state could cause things to break?
How about a metaclass, say 'immutable', that made the instances of a user class truly immutable? (I don't know how to do this, but lets assume it can be done -- perhaps with a new frozendict.) If such were possible, instances of instances of such a metaclass could be added to the list above.
Well, on the basis that we're all adults, I'm willing to accept that a programmer saying "I want instances of this class to be immutable" means they'll only subvert whatever mechanism is used to do this when it's safe to do so (i.e. - "not externally visible"), so catching casual attempts - assignments to attributes - to do so will do, then we can do this by providing a __setattr__ method that always throws an exception. Actually, I think that's the key to implementing this efficiently. __setattr__ on objects that aren't locked throws an exception (or triggers locking inside an STM). Locking them changes __setattr__ to something that works appropriately. Builtin types will need more extensive tweaking along those lines. An immutable type doesn't need the working variant of __setattr__.
Could a metaclass automatically add fine-grained locks around around attribute mutations?
Wouldn't that be another variation on the __setattr__ method, that did: locking self.__dict__: self.__dict__[name] = value I can see that that would be useful, but would expect most objects would want to change more than one attribute in a consistent method, so they'd have a method that locked self and made all those changes. <mike

On 11/1/2011 7:49 PM, Mike Meyer wrote:
On Mon, Oct 31, 2011 at 11:55 PM, Terry Reedy<tjreedy@udel.edu> wrote:
Yes. Coroutines, which are a form of in-thread concurrency, are another 'under the rug' subject, which Greg has re-exposed. We need some possibly initially off-the-wall ideas for both.
It would seem to me that a list within a tuple or frozenset is just as 'dangerous' as a list that is exposed directly. Correspondingly, a safe_list that locked all appropriate operations should be equally safe within or without. Tuples containing mutables cannot be a dict key because they cannot be hashed because (truthful) mutables do not have a hash method returning an int. ({}.__hash__() and [].__hash__() are both None.) This suggests that one implementation for a safe(r) tuple class would be to try to hash a tuple upon creation (and might as well stash it away).
I know about that, but there is no way for a safe_tuple class to know what a __setattr__ method does. But my new idea here is just to call hash(). That is not completely safe, but perhaps within 'consenting adults' territory. (In other words, if someone adds a __hash__ method to a class with mutable instances, puts such instances into a safe_tuple shared with multiple threads, mutates from multiple threads, and has a problem, ... too bad. And I hope you would never have to debug such ;-). -- Terry Jan Reedy

On Wed, Nov 2, 2011 at 4:01 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I've been avoiding pointing out that connection, but that's one of the reasons I kept saying "thread of execution" instead of just "thread" - at least until recently. There are more ways to get concurrency than threading and processes.
Right. However, What needs to made safe for concurrency here is the list, not the tuple. As you point out, that doesn't help for things like dict keys. This discussion was more about objects that compute and then cache a value based on their contents - hash values being one of them.
Presumably, there'd be a better way to ask that question. Something like "isinstance(object, Immutable)" where Immutable adds the appropriate __setattr__ method (though that other problems).
Doesn't sound much worse than most concurrency bugs to me. They tend to crop up in places where it's "obvious" things aren't concurrent. Those have replaced stray pointer bugs as the nastiest bugs to deal with, at least for me. <mike

Mike Meyer writes:
Please, everybody is aware of that. Anybody against improved support for concurrency is probably in favor of criminalizing motherhood and apple pie, too. What you need to justify is the apparently expensive approach you propose, and you need to resolve the apparent contradiction between that expense and the basic argument for threads which is precisely how expensive they *aren't*.
I've identified the problem I want to solve: I want to make concurrent use of python objects "safe by default",
But that's not what you've proposed, AIUI. You've proposed making concurrent use *safer*, but not yet *safe*. That's quite different from the analogy with automatic memory management, where the programmer can't do anything dangerous with pointers (because they can't do anything at all). The analogous model for concurrency is processes, it seems to me. (I don't have a desperate need for high- performance concurrency, so I take no position on processes + message passing vs. threads + shared resources.)
so that doing unsafe things causes the programmer to have to do something explicit about making things safe.
This is un-Pythonic, IMO[1]. Python generally permits dangerous (and even ugly) things when done by "consenting adults", on the theory that the programmer knows more about her problem than Python does. It seems to me that a more Pythonic approach to this would be to provide something like STM as a metaclass, mixin class, or decorator. (Don't ask me how.)
I believe this can be done at the mutation points (again, clojure shows that it can be done).
But clojure is a Lisp-derived language. Lisp was designed as a pure functional language (although AFAIK it pretty much immediately acquired "set"), and a very large number of Lisp algorithms are designed around conses which are (like Python tuples) basically immutable (yes, I know about setcar and setcdr, but use of those functions is generally considered a bug). Whether that orientation toward immutable objects continues in Clojure I don't know, but if it does, the problem of designing a "write barrier" for mutations may be (a) simpler and (b) have less performance impact than the analogous task applied to Python. While Python-the-language does have some immutable objects like tuples and strings, it's really kind of hard to avoid use of containers like lists and dictionaries and classes with mutable objects. Footnotes: [1] But I've been clobbered for expressing my opinion on Pythonicity in the past, so don't put too much weight on that.<wink/>

On Tue, Nov 1, 2011 at 6:01 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Guido and python-dev in general *have* effectively taken a position on that, though (mainly due to Global Interpreter Lock discussions). 1. Even for threads, the recommended approach is to use queue.Queue to avoid the common concurrency issues (such as race conditions and deadlock) associated with explicit locking 2. In Python 3, concurrent.futures offers an even *safer* interface and higher level interface for many concurrent workloads 3. If you use multiple processes and serialised messages, or higher level APIs like concurrent.futures, you can not only scale to multiple cores, but also to multiple *machines*. This has led to a quite deserved reputation for being intolerant of changes that claim to make multithreaded development "better", but only at the expense of making single-threaded development worse. That's actually one of the more interesting aspects of PyPy's experiments with software transactional memory - the sophistication of their toolchain means that they can make the STM feature optional at the translation stage, without resorting to ugly #ifdef hackery the way we would need to in order to make such a feature similarly optional in CPython. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
sjt wrote:
Sure, as a matter of "development politics" that's pretty clear. I'm sure Mike understands that, too. (And is frustrated by it!) My point is that Mike's approach of trying to make *everything* safe for concurrency seems to point in the direction of process + message passing, but I don't claim that this proves that processes are in any sense technically superior, just that his approach needs justification beyond "hey, we need to do something about concurrency in Python!"

On Tue, Nov 1, 2011 at 1:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I am aware of all this. I've written large systems using Queue.queue and the multiple process/serialized messages model. I've dealt with code that tried to mix the two (*not* a good idea). The process model works really well - if you can use it. The problem is, if you can't, you lose all the protection it provides. That's the area I'm trying to address. Also, the process model doesn't prevent these concurrency issues, it just moves them to external objects. I figure that's an even harder problem, since it can involve multiple machines. An improvement in the shared storage case might shed some light on it.
I think I've found a way to implement the proposal without having a serious impact on single-threaded code - at least in terms of performance and having to change the code. <mike

On Tue, Nov 1, 2011 at 1:01 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
No, the proposal does make things "safe by default". The default behavior disallows all mutation. You have to do something explicit to allow it - because "explicit is better than implicit."
Adding STM would make concurrency easier to deal with, but wouldn't address the fundamental problem. The proposed change doesn't prevent users from doing dangerous (and even ugly things). It just forces them to *think* about what they're doing beforehand. I can even see allowing immutable objects to change their attributes, with the caveat that this shouldn't change the externally visible behavior of the object.
Um, yeah, I did point out later in the paragraph that preserving pythons data types may make this assumption false.
And I also pointed out that this may be to much of a change to be palatable to Python users. For that matter, if it requires losing pythons primitive container types, it's probably to much of a change to be palatable to me. <mike

On 1 November 2011 15:38, Mike Meyer <mwm@mired.org> wrote:
I don't know if you've considered this already, but a for-loop in Python creates an iterator and then mutates it (by calling next()) on each run through the loop. I can't see any way this could be a concurrency problem in itself, but you'd likely need to either reimplement the for loop to avoid relying on mutable iterators, or you'd need to add some sort of exclusion for iterators in for loops. It'll be details like this that will be the hardest to thrash out, I suspect... Paul.

On Tue, Nov 1, 2011 at 9:36 AM, Paul Moore <p.f.moore@gmail.com> wrote:
How about a third option? Iterators have to be locked to do a next in general, as they can be bound and thus shared between execution threads. On the other hand, locking & unlocking should be the major performance hit, so you don't want to do that on something that's going to be happening a lot, so the caller should be allowed to do something to indicate that it's not required. Locking the iterator should do that. So the next method needs to add a test to see if self is locked, and if not lock and then unlock self. It'll be details like this that will be the hardest to thrash out, I
suspect...
Yup. Thanks, <mike

On 1 November 2011 21:09, Mike Meyer <mwm@mired.org> wrote:
I'm not sure what you mean here. Suppose I have l = [1,2,3] for i in l: print(i) Here, the thing you need to lock is not l, as it's not being mutated, but the temporary iterator generated by the for loop. That's not exposed to the user, so you can't lock it manually. Should it be locked? It can never be seen from another thread. But how do you code that exception to the rule? What about l = iter([1,2,3]) for i in l: print(i) Here the for loop gnerates iter(l) - which, simply because of the implementation of __iter__ for iterators, returns l. So should I lock l here? It *is* exposed to other threads, potentially. How does the compiler detect the difference between this and the previous example? This seems to me to be a recipe for having users scatter arbitrary locks around their code "just to shut the interpreter up". It's not at all clear that it helps people think, in that there's no easy mental model people can acquire to help them reason about what is going on. Just a load of exceptions that need to be silenced somehow. Paul.

On Wed, Nov 2, 2011 at 2:27 AM, Paul Moore <p.f.moore@gmail.com> wrote:
You don't have to do anything. Iterators need to lock themselves to be safe in concurrent use, so this will work fine, with the temporary iterator doing whatever locking is needed.
This will work the same as the above. However, since you have the iterator in hand, you have the option to lock it before entering the for loop, which will cause it to *not* do it's own internal locking. This brings up an interesting point, though - look for mail with the subject "Concurrency survey..." <mike

On 2 November 2011 18:10, Mike Meyer <mwm@mired.org> wrote:
So all iterators automatically lock themselves? Sounds like potentially quite an overhead. But you've obviously thought about it (even if I don't follow all the details) so I'll leave it at that. Paul.

On Wed, Nov 2, 2011 at 12:31 PM, Paul Moore <p.f.moore@gmail.com> wrote:
No, all iterators are written to be thread safe. This is pretty much a requirement if you want to use them in a threaded environment. Some iterators may be able to do this without locking. I suspect most won't. This makes me wonder about something. Is there a high-performance threading world that 1) doesn't assume that threading is the norm and thus doesn't worry about single-threaded performance (this is the impression I get about Java, but I may well be wrong) and 2) doesn't require knowing at either build (C/C++) or launch (haskell) time that it's going to be threaded? I haven't found such. <mike

On 11/2/2011 5:27 AM, Paul Moore wrote:
If l is exposed to another thread, it can be mutated, and the hidden iterator in the for loop will work, but with indeterminant, or at least unexpected results. Were you implicitly excluding such exposure? (A list can also be mutated within its iteration loop. There is even a use case for deleting an item while iterating in reverse.) Dicts *are* locked for iteration because mutating a hash array during iteration could have more drastic effects and there is no good use case. A built-in subclass of list could use the same mechanism as dict for locking during iteration.
So no need to lock *it*. -- Terry Jan Reedy

Mike Meyer writes:
The proposed change doesn't prevent users from doing dangerous (and even ugly things).
I didn't say it did, it "merely" imposes substantial inconvenience in hope that:
It just forces them to *think* about what they're doing beforehand.
which I believe to be un-Pythonic. But you say that you have an approach in mind which is reasonably performant and doesn't change things too much for single-threaded apps, which would make the discussion moot. So let's see how that works out.

On Tue, Nov 1, 2011 at 11:05 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
Really? Thinking is unpythonic?
If all you want to do is get the old semantics back in a single-threaded application, you could do something like turning: if __name__ == '__main__': main() into: if __name__ == '__main__': locking: main() Actually, that achieves my goal - you hopefully thought about this long enough to realize that this was safe before doing it. <mike

Mike Meyer writes:
On Tue, Nov 1, 2011 at 11:05 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
No, "forcing" is. Consenting adults and all that.
Anybody who does that is simply shutting off the warning/errors, and clearly is not thinking about their app at all. But this is revealing: you say *your* goal is making *me* think. That's what I consider un-Pythonic. A Pythonic approach would allow me to worry about it when *I* think it necessary. Maybe we don't have that choice, maybe concurrency is too hard to solve without some annoying constraints. But that's not at all clear to me, and I'd rather make gradual progress toward safety in a language that's fun and profitable to use, rather than have safety in a language that is a pain in the neck to use.

On Wed, 02 Nov 2011 10:45:42 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
But you yourself admit that this isn't forcing you to think:
So you admit this doesn't force you to think. It just makes you add a statement to shut up the warnings. Pretty much the same thing as using a bare except clause. Me, I'd think about it long enough to convince myself that the app really was single-threaded.
But this is revealing: you say *your* goal is making *me* think.
Only if I may wind up maintaining the code you wrote. But that's a driving factor in a *lot* of the design decisions when it comes to extending python.
That's what I consider un-Pythonic.
I feel just the opposite. Python doesn't allow errors to silently pass, or guess what the programmer wanted to do, or make inferences about things - it raises exceptions. That forces the programmer to think about the exception and handle it properly. Or they can not think about it, and just use a bare except clause. I think that's very pythonic. In fact, getting tired of chasing down such bugs in Perl code was why I switched from Perl to Python, and then cut my rates in order to convince my clients to let me write in what was then a strange new language. This proposal builds on that base: it catches errors of a type that are currently ignored and raises an exception. It also adds a new statement for *dealing* with those errors, because handling them with exceptions won't really work. There's even an analog for the bare except if you want to use it. And it comes about for much the same reason: I'm getting tired of chasing down bugs in concurrent code. There are languages that offer that. Some even run in environments I like, and are fun to write when they're applicable. But I find myself wishing for Python's features when I write in them.
Based on my experience, your second sentence is true. If that were all it were, the Queue module would be most of a solution, and there are STM modules available if that's not good enough. But they only solve half the problem - they make it easier to get things right once you decide the data is shared. People are as likely to miss that data is shared as they are to screw up the locking. In other words, if we do it your way, it'll deal with less than half of whats bugging me. It may be that Python's data structures will make this unworkable. It may be that a workable solution will suck the performance out of non-concurrent applications. It may be that anything that fixes both of those will be unpalatable for other reasons. There's no way to find out except by trying. And I'd rather try that than start trying to convince people to let me write in some strange new language again. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

On Wed, Nov 2, 2011 at 3:53 PM, Mike Meyer <mwm@mired.org> wrote:
It's forcing you to think the way Java's checked exceptions force you to think - they make you think "Gee, it's tedious having to write all this boilerplate to get the compiler/interpreter to STFU and let me get on with doing my job". "safe by default" is an excellent design principle, but so is "stay out of the way". The two are often in tension, and this is one of those times. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mike Meyer writes:
No, "forcing" is. Consenting adults and all that.
But you yourself admit that this isn't forcing you to think:
Nice try, but I didn't say it forces me to think. It forces me to do something to shut up the language. That's ugly.
It just makes you add a statement to shut up the warnings. Pretty much the same thing as using a bare except clause.
The bare except clause is optional; I can (and often do) simply let the exception terminate the process *if* it ever happens. My understanding is that that isn't good enough for you (because concurrency errors usually lead to silent data corruption rather than a spectacular and immediate crash).
Well, if you want help chasing down bugs in concurrent code, I would think that you would want to focus on concurrent code. First, AFAICS ordinary function calls don't expose additional objects to concurrency (they may access exposed objects, of course, but they were passed in from above by a task, or are globals). So basically every object exposed to concurrency is in either args or kwargs in a call to threading.Thread (or thread.start_new_thread), no? Wouldn't it be possible to wrap those objects (and only those objects) such that the wrapper intercepts attempts to access the wrapped objects, and "does something" (warn, raise, dance on the head of a pin) if the access is unlocked or whatever? Then only concurrent code and the objects exposed to it pay the cost. If it's really feasible to do it via wrapper, you could write a decorator or something that could easily be turned into a no-op for tested code ready to go into production.
Well, no, it's not about doing it my way; I'm perfectly happy with processes and message-passing in my applications, and aside from wacky ideas like the above, that I don't really know how to implement myself, I don't have a lot of suggestions for concurrency by threading. Rather, it's that my guess is that if you don't make the costs of safe(r) concurrency look more reasonable you won't be getting much help here.

On Wed, Nov 2, 2011 at 1:49 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
No. You missed two things. First, all the objects that can be accessed - however indirectly - through those objects are exposed to concurrency. Also, any globals and anything that can be accessed through them are exposed to concurrency. No, make that three things: a wrapped C library that has call backs to Python code and uses threads internally can expose anything it's passed (and anything accessible from those objects) to concurrency, without ever using the Python threading code. I mentioned see a bug yesterday. My clients response means I can't in good faith try and fix it (it's in a testing framework, so doesn't affect the product, so they don't care). So this is a guess, but here's what I think is going on: 1) We're using a test framework that creates a mock time module for some reason. At some point, the mock object has the value None. 2) The application being tested uses a database module that uses threads as part of managing a connection pool. The concurrency unsafe codde in the test framework (which is clearly and obviously single-threaded, right?) managed to get the None-valued mock inserted in sys.modules due to a concurrency bug. So when I then use the time module in testing, I get an exception trying to access it's attributes. This does show a bigger problem. Look for my next mail... <mike

Mike Meyer writes:
No. You missed two things.
Actually, I didn't. Containers contain; I considered that sufficiently obvious that I neglected to mention it. And I did mention globals as "exposed to concurrency". Both just expand the list of objects needing protection, but don't change the basic principle AFAICS. I point this out because I hope to convince you to concentrate on arguing *for* your proposal, instead of wasting time scoring points *against* those who question it. Please understand: *nobody* is against better support for developing concurrent programs, including those using threads to implement concurrency. But nobody is willing to pay arbitrarily high cost for arbitrarily small improvements. As for "wrapped C libraries", I'm having trouble imagining what you're talking about. Wrapped libraries won't be accessing data via Python protocols anyway, so any accesses that your framework could intercept will happen in the marshaling code. If that creates threads, I don't see why it wouldn't use threading.Thread. Ditto for Python modules implemented in C.
Again, I don't understand. In the von Neumann model of computation, all *code* is single-threaded pretty much by definition. With preemptively scheduled threads, the process executing that code is single-threaded if and only if it blocks until all other threads exit. The test framework is "obviously" multiple-threaded in the scenario you describe.
managed to get the None-valued mock inserted in sys.modules due to a concurrency bug.
I don't see your point. You claim your proposal is supposed to help identify which objects are shared. Taking that literally, this example is useless as an argument for your framework -- you already know which object is the problem. Of course, I suppose you actually meant to get more precise information in cases like this, specifically about in what execution contexts the object is mutated. But in this case, you probably already know precisely where in the code the offending mutation happens. The bug is that execution of the framework code resumes "unexpectedly" before the "subordinate" thread restores the problem object's state to status quo ante. Well, these are *threads*, so you can't say anything about when to expect them to resume, you have to expect resumption at any point. *Now* what does your framework buy you? If the answer is "nothing," you need to find better use cases to motivate your proposal.

On Thu, Nov 3, 2011 at 2:35 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
This discussion is actually starting to sound a bit like the Python security model discussions to me - ensuring that only certain kinds of objects are reachable from a given point in the code, and controlling the operations that can be performed on them. Victor Stinner eventually managed to create his pysandbox module based on those discussions, which does a fairly good job of locking things down, even though I'd personally still be inclined to back it up with additional OS level defences. Still, like the GIL vs free threading discussions, there are unlimited numbers of things that *could* be done, the question is what is *worth* doing. And that's not going to be figured out in a mailing list discussion - it's going to take real code and plenty of experimentation, and it's going to have be done by people that don't already believe the "right" answer is to use message queues and a "no concurrent access to mutable objects" architecture at the application level (i.e. where what threads buy you is the fact that you can hand over an object just by putting the reference in a queue and dropping yours, without any serialisation overhead). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Nov 3, 2011 at 12:35 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
As for "wrapped C libraries", I'm having trouble imagining what you're talking about.
C code (currently) can create or modify python objects using a C pointer, instead of python access. That is the biggest barrier to a tracing (as opposed to reference-counting) garbage collector. It is also a problem for security sandboxes (though they can just ban C extensions). Most relevant here, C extensions can also bypass any locks or protections put in place for concurrency. -jJ

On Fri, Nov 4, 2011 at 5:36 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Jim Jewett writes: > On Thu, Nov 3, 2011 at 12:35 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
> > As for "wrapped C libraries", I'm having trouble imagining what you're > > talking about.
> C code (currently) can create or modify python objects using a C > pointer, instead of python access.
Can it?
Not if it is just an additional compile-time restriction. If the implementation is prepared to enforce the restriction at run-time, that will almost certainly require some changes to memory allocation. If done right, that might also be able to lock out rogue C code, which should also allow a tracing GC, stronger security guarantees, and GIL removal. I don't personally see it happening without a currently unacceptable slowdown for all memory access, but if he is looking 5-10 years out, it is plausible. (I would still bet against it for mainstream CPython, but it is plausible.) -jJ

On Fri, Nov 4, 2011 at 7:38 AM, Jim Jewett <jimjjewett@gmail.com> wrote:
Which is why it calls for raising exceptions at run-time.
I'm not really interested in locking out rogue C code - or at least not malicious code. This is very much in the "we're all adults here" vein, in that if you want to go out of your way to defeat it, it won't stand in your way.
I'm just trying to get the ball rolling. It clearly won't be in Python 3, so I'm looking to at least Python 4. What I expect to come out of this is an information PEP summarizing the discussion, and possibly some code that might be useful now (i.e. - a new library, or an automated "concurrency safe" auditor, or ...). <mike

On 11/2/2011 1:53 AM, Mike Meyer wrote:
On Wed, 02 Nov 2011 10:45:42 +0900 "Stephen J. Turnbull"<stephen@xemacs.org> wrote:
It is true that Python actually does a lot to protects programmers (from their own ignorance and foolishness -- but with ctypes wiping out all protections ;-). One easily overloop example is the impossibility by design of buffer over-runs, a common security problem. But it does so without feeling like a straightjacket. Breaking a large fraction of existing code is too much of a sledgehammer approach. If Python had been designed from the beginning for multi-threading as the default, with single-threading as the exception, the case would be different. But even now, many are happy with single thread in multiple processes or multiple task objects within a single thread for concurrency. So burdening single thread code will not be popular. -- Terry Jan Reedy

Here are issues to think about. You wrote: 'locking' value [',' value]*':' suite The list of values are the objects that can be mutated in this lock. An immutable object showing up in the list of values is a TypeError. However: (1) Greg Ewing gave an example sort of like this: new_balance = balance + deposit lock(balance) balance = new_balance unlock(balance) and pointed out that the locks don't help. He was talking about the race condition on reading balance before writing. But it's worse than that. If these are numbers, then they're immutable and locking them does nothing and this isn't any better: lock(balance) new_balance = balance + deposit balance = new_balance unlock(balance) Consider this operating on lists: locking stuff: #1 stuff += added_stuff locking stuff: #2 stuff = stuff + added_stuff In #2, locking is completely useless. Sure the values are mutable but I'm not mutating them. Is it obvious that #1 works and #2 doesn't? You don't want to lock the *value* of stuff, you want to lock the *variable*, i.e., locking globals() or wherever the value of stuff is found. Locking globals() seems like a bad idea. Which brings me to... (2) How does locking something like a dictionary work? Does it lock the dictionary and all the values in it, walking the entire data structure to lock it? Ypu suggest that's the case and that seems like a performance nightmare. Given x = [1, 2] d = {3: x, 4: 5} How do I lock d[3]? That is, I want to lock the dictionary slot where the value of d[3] is stored so another thread can't do d[3] = 0. If I write locking d[3]: # do stuff that's going to lock the value of x. Another thread won't be able to do x.append(0) but they would be able to do x = 0 or d[3] = 0. If I have to write locking d: # do stuff that hampers concurrency. If I can lock the dictionary slot d[3], can I lock list slots? After all, the compiler doesn't know they type of d. How do I lock just an attribute without locking the whole object? (3) What happens if one thread does: locking d, x: # stuff and another does locking x, d: # stuff I think I know the answer and it's "deadlock". Avoiding deadlock is an important problem and by forcing me to lock every single object before I mutate it the important locks (the ones for objects that are actually shared) will be lost among the many that are unnecessary, making it very difficult to find bad lock ordering. (4) What if the lock can't be acquired right away? Maybe there's other stuff my thread can do while it's waiting for the lock. You don't consider that. Maybe I have a whole bunch of things all of which can be done in any order. (5) You haven't thought about read vs. write locks. If I'm passed an iterable or a sequence in a concurrent program, I want to read lock it so no one else changes it while I'm working with it. But you prohibit locking immutable objects, so I first have to find out if it's immutable which is something else you'll have to add to the language. On Mon, Oct 31, 2011 at 10:32 PM, Mike Meyer <mwm@mired.org> wrote:
I've identified the problem I want to solve: I want to make concurrent use of python objects "safe by default", ...
To me it looks like this proposal deals with a small part of the problem with the equivalent of a sledgehammer. When I said identify the problem, the above issues are more on the lines of what I was talking about, not a general statement "make concurrency better." But as an overall goal, I think this is better: "make it as easy as possible to write error-free concurrent code." I would think that "without making it hard to write non-concurrent code" is implicit but I'll spell that out since I've heard that explicit is better than implicit. And here are some things people writing concurrent programs want to do (in no particular order): (1) lock an object (even when the object doesn't have any code to support locking) (2) lock part of an object (a slot in list or dictionary, or an attribute) - consider that databases support row and rang elocking because if the only thing you can do is lock entire tables, you're going to get poor performance (3) lock multiple things at the same time (4) queue operations to be performed on an object when it can be locked, which requires ... (5) wait until a queued operation completes, which requires ... (6) avoid starvation (7) avoid deadlock (8) avoid other concurrency bugs (9) avoid debugging (not avoid it when it's necessary but do things to avoid it being necessary) ... so that doing unsafe things
Clojure is very different from Python. Core data structures are immutable in Clojure so adding to a list creates a new also immutable list. Changing Python's core data structures to be immutable would be a bigger compatibility break than 2 to 3. --- Bruce

On Thu, Nov 3, 2011 at 12:39 AM, Bruce Leban <bruce@leapyear.org> wrote:
Yup. You can't keep people from writing code that's just wrong. But they can't get it right if they don't add any locking at all.
This case would still throw an exception, because what needs to be locked isn't balance, but whatever balance is an attribute of. Unless it's a local variable, in which case it doesn't need to be locked. Given the code as is, balance needing to be locked would make it global, so you'd lock the module. A more realistic implementation would be if balance were self.balance, in which case you'd lock self.
Right. Possibly "value" was the wrong choice for a placeholder in the original description, because what you're actually locking is a container of some sort.
If I suggested that, then it was unintentional. Yes, it's a performance nightmare. Locking a container just locks for changes to set of contained objects. It's analogous to tuples being immutable: you can't change the set of objects in the tuple, but if those objects are mutable, you can change them. If you want to change an object in a list/dictionary/etc - then you have to lock that object, but not the dictionary. Given
Yes, it does. But so does any form of locking. This proposal locks objects. Changes that don't change the object - even if they might change it's value by changing a contained object - aren't locked, because of the expense you pointed out. If there's some different level of granularity for locking that makes sense, I don't know what it is.
Attributes are names. You don't lock names, you lock objects. To prevent binding of an attribute, you lock the object it's an attribute of. If you want to prevent mutating the value of the object bound to the attribute, you lock that object.
That one I dealt with in the specification. There's an implementation requirement that if two locking statements share objects, the shared objects have to be locked in the same order. So one of the two will lock things in the opposite of the order they appear in the statement.
I think I know the answer and it's "deadlock".
No, avoiding deadlock is one of the goals. That's why that requirement exists, and there's a further requirement that you can only nest locking statements if the inner one locks a subset of the outer one. I have as yet to work out how an automatic STM version (this wasn't in the original proposal) would interact here. <mike

Mike Meyer wrote:
This case would still throw an exception, because what needs to be locked isn't balance, but whatever balance is an attribute of.
All right, suppose we have an "account" object that holds the balance. Now we can do locking account: account.balance = account.balance + deposit That's okay. But what if there are two accounts involved, and we want to do account1.balance = account1.balance - amount account2.balance = account2.balance + amount These two operations need to be done atomically, but there is no single object to lock, so we have to lock both of them -- and then there is an opportunity for deadlock if two threads do this in different orders.
and there's a further requirement that you can only nest locking statements if the inner one locks a subset of the outer one.
Then it's *impossible* to solve the above problem, because neither of the objects needing to be locked is contained in the other. The notion of containment is not even well defined in general, because Python objects can form arbitrary graphs. -- Greg

On Thu, Nov 3, 2011 at 2:35 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
That's why the proposal specifies that locking statements determine the order of the locking, and require that any two locking statements lock all objects they have in common in the same order - at least during one run of the application. So the above is done by: locking account1, account: # stuff If a second locking statement does "locking account2, foobar, account1", then account1 and account2 will be locked in the same order by both statements.
I wasn't clear enough. The sets I'm talking about are the objects being locked, not anything they may contained, in order to prevent deadlocks. So if you start by doing: locking account1, account2, foobar: and then later on - but with those locks still held - do locking account1, account2: things work fine (the second locking is a nop). However, if you do (under the same condition): locking account1, aardvark: you get an exception - the outer lock has to lock aardvark as well. And yes, the implications still need to be worked out.
Containment has a very specific meaning for this specification, in that an object only contains things it has a direct reference to. That should be spelled out, or maybe I need to use a better word. <mike

On Thu, Nov 3, 2011 at 9:38 AM, Mike Meyer <mwm@mired.org> wrote:
You conveniently ignore my comment about the importance of database row locking. :-)
You're handwaving. How would that work? There's no way to know at compile time what the correct order is. So locking just got a whole lot more complicated. But it's worse than complicated: it's unworkable. No, avoiding deadlock is one of the goals. That's why that requirement
exists, and there's a further requirement that you can only nest locking statements if the inner one locks a subset of the outer one.
In other words, this code is not allowed: def f(): locking x: for y in x: locking y: y.append(1) And neither is this: def f(x): locking x: x.append(1) if it is called inside code that has something locked. Which virtually all code will do since it's required everywhere. So your proposal is that I must lock everything before I change it but I can't lock it unless it's already locked. That's ridiculous. You might as well have a single semaphore to lock everything. We're wasting our time. You said above "You don't lock names, you lock objects." Why not? Because that was your original plan and your mind is closed to other ideas? You want to ignore legitimate issues that aren't convenient to your proposal. I hate to say things like "this idea sucks," but sometimes that's the only way to put it. I think that this proposal is a terrible idea. It's pointless to argue about the details of something this bad. A healthy discussion about how to make concurrency better might be interesting but as long as all you want to do is defend your proposal, this thread isn't sparking useful discussion. Sorry to be so blunt but I don't see any other way to get the message across. --- Bruce

On 04/11/2011 01:51, Bruce Leban wrote:
[snip] If they're threads running in the same address space in CPython then they could lock always in order of increasing address. In an implementation of Python which uses references they could lock always in increasing numeric value of reference (if that's possible). This would not, of course, protect against general nested locks.

MRAB writes:
On 04/11/2011 01:51, Bruce Leban wrote:
OK, I think that works.
That doesn't make sense to me; if it's not an address, how do you know it's unique (you know the object is unique, of course, but there may be other references)? If the reference is not unique, how do you know that some other thread hasn't locked it via a different reference? I think for this to be workable you would need to use something like a lock tick that would increase with each lock taken, and store that in the object. Then you would lock the objects in order of their ticks, with unlocked objects coming last. (You'd have to be careful about the wraparound, though, and that could double or triple the performance hit for incrementing the tick and determining lock order.)

On 04/11/2011 03:53, Stephen J. Turnbull wrote:
In CPython the implementation refers to an object by its address, but my understanding is that in an implementation which uses references the reference is actually an index into a table which contains the addresses of the objects, therefore you would be locking in order of increasing index.

MRAB writes:
Ah, OK. So a reference is actually an object in the sense of an identifiable region of memory, but it's not a Python object. When I see "reference" I think of C++ references or git refs, which are just names, and an object can have many references. But the scheme you describe is like a C stdio file descriptor, and presumably is an indirection which allows relocating Python objects in memory. You can even relocate the table in such a scheme.

On Thu, Nov 3, 2011 at 6:51 PM, Bruce Leban <bruce@leapyear.org> wrote:
I assure you it was unintentional, but there's been a lot of text flowing here. I believe I've answered every comment, though I know some of them were answered when first brought up, and not every time thereafter. And some of the answers were "Yes, that's an issue" or "Yes, but we can't keep people from writing bad code". I don't recall anything specifically about database row locking, but if you feel it's important enough to reiterate it, I'll treat it as equally important.
How is a requirement just handwaving? There is no "correct" order. Given two objects, any locking statement will always lock them in the same order. That's sufficient to prevent deadlocks. Sorting the list passed to each value with a key of id will do the trick, but may not be the best way to do it.
Yup, it's pretty bad. For that reason - among others - the proposal has mutated. The original required listing objects, and either started an STM transaction, or did explicit locks (undecided). Someone pointed out that the STM variant couldn't deal with IO, causing me to note a change to the proposal wherein listing objecs caused explicit locking, and having an empty list started an STM transaction. This is not only easier to use, but solves several problems - including this one. The STM variant hasn't gotten a lot of attention.
No, because I couldn't think of a way to make locking names work. You presented a problem with locking names. Since the proposal doesn't lock names, I figured I wasn't clear enough in the first place, and tried fix that. I've repeatedly said this wasn't meant to be a finished idea, and asked for alternatives. If you have a proposal that works by locking names, by all means tell us about it!
You want to ignore legitimate issues that aren't convenient to your proposal.
No, I don't. I want to collect them all. But I'm not perfect - sometimes two comments may look similar when they in fact aren't.
That may well be the case, and wouldn't particularly disappoint me.
I'm doing a lot more than defending the proposal. I'm tweaking this proposal to solve problems people point out, collecting ideas for alternative approaches, implementations, and tools that might help - which are what make this discussion interesting and useful. I do defend the proposal, especially when people point out issues that were dealt with in the original post, which isn't interesting, but the alternative (ignoring them) seems less useful. <mike

On Fri, Nov 4, 2011 at 1:11 PM, Mike Meyer <mwm@mired.org> wrote:
No, because I couldn't think of a way to make locking names work.
Replace dicts (or at least globals and attribute dicts) with something that enforces your policy -- whether that is by a mutex, a mutex per entry, returning only copies of the values, etc... -jJ

On Fri, Nov 4, 2011 at 10:49 AM, Jim Jewett <jimjjewett@gmail.com> wrote:
Sorry, I wasn't sufficiently detailed. I never worried about how to implement locking names, because I never could find a way to let people use it. I'd like fine-grained locking (i.e. - specific attributes of objects, the value at an index in a key or dictionary, etc.), you still need to be able to lock objects. They ways I came up with for specifying "here's a name we can't rebind" were all very complicated - among other things, what's a "name" is itself complicated - and not worth the gain you got over object-level locking. Especially if you have an STM option available. As always, if you can think of a way to get all this working together, please share! Thanks, <mike

On 11/1/2011 1:32 AM, Mike Meyer wrote:
This is one of the hard problems that keep getting swept under the rug while we do easier things. Well, we have overhauled unicode and packaging for 3.3, so maybe concurrency can get some attention. I keep thinking that CPython's design of allowing C coded modules either outside or inside the stdlib should allow some progress. Would it be helpful, for instance, to have truly immutable restricted tuples and frozensets, whose __new__ methods only allowed true immutables (None, booleans, numbers, strings, other restricted tuples and frozensets) as members? How about a metaclass, say 'immutable', that made the instances of a user class truly immutable? (I don't know how to do this, but lets assume it can be done -- perhaps with a new frozendict.) If such were possible, instances of instances of such a metaclass could be added to the list above. Could a metaclass automatically add fine-grained locks around around attribute mutations? -- Terry Jan Reedy

On Mon, Oct 31, 2011 at 11:55 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Hey, it worked!
I keep thinking that CPython's design of allowing C coded modules either outside or inside the stdlib should allow some progress.
Would it be helpful, for instance, to have truly immutable restricted tuples and frozensets, whose __new__ methods only allowed true immutables (None, booleans, numbers, strings, other restricted tuples and frozensets) as members?
Possibly. However, so long as the mutations they make don't change the externally visible behavior, then for the purposes of this discussion, they already are immutable. Or is it possible that concurrent updates of that not-externally-visible state could cause things to break?
How about a metaclass, say 'immutable', that made the instances of a user class truly immutable? (I don't know how to do this, but lets assume it can be done -- perhaps with a new frozendict.) If such were possible, instances of instances of such a metaclass could be added to the list above.
Well, on the basis that we're all adults, I'm willing to accept that a programmer saying "I want instances of this class to be immutable" means they'll only subvert whatever mechanism is used to do this when it's safe to do so (i.e. - "not externally visible"), so catching casual attempts - assignments to attributes - to do so will do, then we can do this by providing a __setattr__ method that always throws an exception. Actually, I think that's the key to implementing this efficiently. __setattr__ on objects that aren't locked throws an exception (or triggers locking inside an STM). Locking them changes __setattr__ to something that works appropriately. Builtin types will need more extensive tweaking along those lines. An immutable type doesn't need the working variant of __setattr__.
Could a metaclass automatically add fine-grained locks around around attribute mutations?
Wouldn't that be another variation on the __setattr__ method, that did: locking self.__dict__: self.__dict__[name] = value I can see that that would be useful, but would expect most objects would want to change more than one attribute in a consistent method, so they'd have a method that locked self and made all those changes. <mike

On 11/1/2011 7:49 PM, Mike Meyer wrote:
On Mon, Oct 31, 2011 at 11:55 PM, Terry Reedy<tjreedy@udel.edu> wrote:
Yes. Coroutines, which are a form of in-thread concurrency, are another 'under the rug' subject, which Greg has re-exposed. We need some possibly initially off-the-wall ideas for both.
It would seem to me that a list within a tuple or frozenset is just as 'dangerous' as a list that is exposed directly. Correspondingly, a safe_list that locked all appropriate operations should be equally safe within or without. Tuples containing mutables cannot be a dict key because they cannot be hashed because (truthful) mutables do not have a hash method returning an int. ({}.__hash__() and [].__hash__() are both None.) This suggests that one implementation for a safe(r) tuple class would be to try to hash a tuple upon creation (and might as well stash it away).
I know about that, but there is no way for a safe_tuple class to know what a __setattr__ method does. But my new idea here is just to call hash(). That is not completely safe, but perhaps within 'consenting adults' territory. (In other words, if someone adds a __hash__ method to a class with mutable instances, puts such instances into a safe_tuple shared with multiple threads, mutates from multiple threads, and has a problem, ... too bad. And I hope you would never have to debug such ;-). -- Terry Jan Reedy

On Wed, Nov 2, 2011 at 4:01 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I've been avoiding pointing out that connection, but that's one of the reasons I kept saying "thread of execution" instead of just "thread" - at least until recently. There are more ways to get concurrency than threading and processes.
Right. However, What needs to made safe for concurrency here is the list, not the tuple. As you point out, that doesn't help for things like dict keys. This discussion was more about objects that compute and then cache a value based on their contents - hash values being one of them.
Presumably, there'd be a better way to ask that question. Something like "isinstance(object, Immutable)" where Immutable adds the appropriate __setattr__ method (though that other problems).
Doesn't sound much worse than most concurrency bugs to me. They tend to crop up in places where it's "obvious" things aren't concurrent. Those have replaced stray pointer bugs as the nastiest bugs to deal with, at least for me. <mike

Mike Meyer writes:
Please, everybody is aware of that. Anybody against improved support for concurrency is probably in favor of criminalizing motherhood and apple pie, too. What you need to justify is the apparently expensive approach you propose, and you need to resolve the apparent contradiction between that expense and the basic argument for threads which is precisely how expensive they *aren't*.
I've identified the problem I want to solve: I want to make concurrent use of python objects "safe by default",
But that's not what you've proposed, AIUI. You've proposed making concurrent use *safer*, but not yet *safe*. That's quite different from the analogy with automatic memory management, where the programmer can't do anything dangerous with pointers (because they can't do anything at all). The analogous model for concurrency is processes, it seems to me. (I don't have a desperate need for high- performance concurrency, so I take no position on processes + message passing vs. threads + shared resources.)
so that doing unsafe things causes the programmer to have to do something explicit about making things safe.
This is un-Pythonic, IMO[1]. Python generally permits dangerous (and even ugly) things when done by "consenting adults", on the theory that the programmer knows more about her problem than Python does. It seems to me that a more Pythonic approach to this would be to provide something like STM as a metaclass, mixin class, or decorator. (Don't ask me how.)
I believe this can be done at the mutation points (again, clojure shows that it can be done).
But clojure is a Lisp-derived language. Lisp was designed as a pure functional language (although AFAIK it pretty much immediately acquired "set"), and a very large number of Lisp algorithms are designed around conses which are (like Python tuples) basically immutable (yes, I know about setcar and setcdr, but use of those functions is generally considered a bug). Whether that orientation toward immutable objects continues in Clojure I don't know, but if it does, the problem of designing a "write barrier" for mutations may be (a) simpler and (b) have less performance impact than the analogous task applied to Python. While Python-the-language does have some immutable objects like tuples and strings, it's really kind of hard to avoid use of containers like lists and dictionaries and classes with mutable objects. Footnotes: [1] But I've been clobbered for expressing my opinion on Pythonicity in the past, so don't put too much weight on that.<wink/>

On Tue, Nov 1, 2011 at 6:01 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Guido and python-dev in general *have* effectively taken a position on that, though (mainly due to Global Interpreter Lock discussions). 1. Even for threads, the recommended approach is to use queue.Queue to avoid the common concurrency issues (such as race conditions and deadlock) associated with explicit locking 2. In Python 3, concurrent.futures offers an even *safer* interface and higher level interface for many concurrent workloads 3. If you use multiple processes and serialised messages, or higher level APIs like concurrent.futures, you can not only scale to multiple cores, but also to multiple *machines*. This has led to a quite deserved reputation for being intolerant of changes that claim to make multithreaded development "better", but only at the expense of making single-threaded development worse. That's actually one of the more interesting aspects of PyPy's experiments with software transactional memory - the sophistication of their toolchain means that they can make the STM feature optional at the translation stage, without resorting to ugly #ifdef hackery the way we would need to in order to make such a feature similarly optional in CPython. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
sjt wrote:
Sure, as a matter of "development politics" that's pretty clear. I'm sure Mike understands that, too. (And is frustrated by it!) My point is that Mike's approach of trying to make *everything* safe for concurrency seems to point in the direction of process + message passing, but I don't claim that this proves that processes are in any sense technically superior, just that his approach needs justification beyond "hey, we need to do something about concurrency in Python!"

On Tue, Nov 1, 2011 at 1:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I am aware of all this. I've written large systems using Queue.queue and the multiple process/serialized messages model. I've dealt with code that tried to mix the two (*not* a good idea). The process model works really well - if you can use it. The problem is, if you can't, you lose all the protection it provides. That's the area I'm trying to address. Also, the process model doesn't prevent these concurrency issues, it just moves them to external objects. I figure that's an even harder problem, since it can involve multiple machines. An improvement in the shared storage case might shed some light on it.
I think I've found a way to implement the proposal without having a serious impact on single-threaded code - at least in terms of performance and having to change the code. <mike

On Tue, Nov 1, 2011 at 1:01 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
No, the proposal does make things "safe by default". The default behavior disallows all mutation. You have to do something explicit to allow it - because "explicit is better than implicit."
Adding STM would make concurrency easier to deal with, but wouldn't address the fundamental problem. The proposed change doesn't prevent users from doing dangerous (and even ugly things). It just forces them to *think* about what they're doing beforehand. I can even see allowing immutable objects to change their attributes, with the caveat that this shouldn't change the externally visible behavior of the object.
Um, yeah, I did point out later in the paragraph that preserving pythons data types may make this assumption false.
And I also pointed out that this may be to much of a change to be palatable to Python users. For that matter, if it requires losing pythons primitive container types, it's probably to much of a change to be palatable to me. <mike

On 1 November 2011 15:38, Mike Meyer <mwm@mired.org> wrote:
I don't know if you've considered this already, but a for-loop in Python creates an iterator and then mutates it (by calling next()) on each run through the loop. I can't see any way this could be a concurrency problem in itself, but you'd likely need to either reimplement the for loop to avoid relying on mutable iterators, or you'd need to add some sort of exclusion for iterators in for loops. It'll be details like this that will be the hardest to thrash out, I suspect... Paul.

On Tue, Nov 1, 2011 at 9:36 AM, Paul Moore <p.f.moore@gmail.com> wrote:
How about a third option? Iterators have to be locked to do a next in general, as they can be bound and thus shared between execution threads. On the other hand, locking & unlocking should be the major performance hit, so you don't want to do that on something that's going to be happening a lot, so the caller should be allowed to do something to indicate that it's not required. Locking the iterator should do that. So the next method needs to add a test to see if self is locked, and if not lock and then unlock self. It'll be details like this that will be the hardest to thrash out, I
suspect...
Yup. Thanks, <mike

On 1 November 2011 21:09, Mike Meyer <mwm@mired.org> wrote:
I'm not sure what you mean here. Suppose I have l = [1,2,3] for i in l: print(i) Here, the thing you need to lock is not l, as it's not being mutated, but the temporary iterator generated by the for loop. That's not exposed to the user, so you can't lock it manually. Should it be locked? It can never be seen from another thread. But how do you code that exception to the rule? What about l = iter([1,2,3]) for i in l: print(i) Here the for loop gnerates iter(l) - which, simply because of the implementation of __iter__ for iterators, returns l. So should I lock l here? It *is* exposed to other threads, potentially. How does the compiler detect the difference between this and the previous example? This seems to me to be a recipe for having users scatter arbitrary locks around their code "just to shut the interpreter up". It's not at all clear that it helps people think, in that there's no easy mental model people can acquire to help them reason about what is going on. Just a load of exceptions that need to be silenced somehow. Paul.

On Wed, Nov 2, 2011 at 2:27 AM, Paul Moore <p.f.moore@gmail.com> wrote:
You don't have to do anything. Iterators need to lock themselves to be safe in concurrent use, so this will work fine, with the temporary iterator doing whatever locking is needed.
This will work the same as the above. However, since you have the iterator in hand, you have the option to lock it before entering the for loop, which will cause it to *not* do it's own internal locking. This brings up an interesting point, though - look for mail with the subject "Concurrency survey..." <mike

On 2 November 2011 18:10, Mike Meyer <mwm@mired.org> wrote:
So all iterators automatically lock themselves? Sounds like potentially quite an overhead. But you've obviously thought about it (even if I don't follow all the details) so I'll leave it at that. Paul.

On Wed, Nov 2, 2011 at 12:31 PM, Paul Moore <p.f.moore@gmail.com> wrote:
No, all iterators are written to be thread safe. This is pretty much a requirement if you want to use them in a threaded environment. Some iterators may be able to do this without locking. I suspect most won't. This makes me wonder about something. Is there a high-performance threading world that 1) doesn't assume that threading is the norm and thus doesn't worry about single-threaded performance (this is the impression I get about Java, but I may well be wrong) and 2) doesn't require knowing at either build (C/C++) or launch (haskell) time that it's going to be threaded? I haven't found such. <mike

On 11/2/2011 5:27 AM, Paul Moore wrote:
If l is exposed to another thread, it can be mutated, and the hidden iterator in the for loop will work, but with indeterminant, or at least unexpected results. Were you implicitly excluding such exposure? (A list can also be mutated within its iteration loop. There is even a use case for deleting an item while iterating in reverse.) Dicts *are* locked for iteration because mutating a hash array during iteration could have more drastic effects and there is no good use case. A built-in subclass of list could use the same mechanism as dict for locking during iteration.
So no need to lock *it*. -- Terry Jan Reedy

Mike Meyer writes:
The proposed change doesn't prevent users from doing dangerous (and even ugly things).
I didn't say it did, it "merely" imposes substantial inconvenience in hope that:
It just forces them to *think* about what they're doing beforehand.
which I believe to be un-Pythonic. But you say that you have an approach in mind which is reasonably performant and doesn't change things too much for single-threaded apps, which would make the discussion moot. So let's see how that works out.

On Tue, Nov 1, 2011 at 11:05 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
Really? Thinking is unpythonic?
If all you want to do is get the old semantics back in a single-threaded application, you could do something like turning: if __name__ == '__main__': main() into: if __name__ == '__main__': locking: main() Actually, that achieves my goal - you hopefully thought about this long enough to realize that this was safe before doing it. <mike

Mike Meyer writes:
On Tue, Nov 1, 2011 at 11:05 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
No, "forcing" is. Consenting adults and all that.
Anybody who does that is simply shutting off the warning/errors, and clearly is not thinking about their app at all. But this is revealing: you say *your* goal is making *me* think. That's what I consider un-Pythonic. A Pythonic approach would allow me to worry about it when *I* think it necessary. Maybe we don't have that choice, maybe concurrency is too hard to solve without some annoying constraints. But that's not at all clear to me, and I'd rather make gradual progress toward safety in a language that's fun and profitable to use, rather than have safety in a language that is a pain in the neck to use.

On Wed, 02 Nov 2011 10:45:42 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
But you yourself admit that this isn't forcing you to think:
So you admit this doesn't force you to think. It just makes you add a statement to shut up the warnings. Pretty much the same thing as using a bare except clause. Me, I'd think about it long enough to convince myself that the app really was single-threaded.
But this is revealing: you say *your* goal is making *me* think.
Only if I may wind up maintaining the code you wrote. But that's a driving factor in a *lot* of the design decisions when it comes to extending python.
That's what I consider un-Pythonic.
I feel just the opposite. Python doesn't allow errors to silently pass, or guess what the programmer wanted to do, or make inferences about things - it raises exceptions. That forces the programmer to think about the exception and handle it properly. Or they can not think about it, and just use a bare except clause. I think that's very pythonic. In fact, getting tired of chasing down such bugs in Perl code was why I switched from Perl to Python, and then cut my rates in order to convince my clients to let me write in what was then a strange new language. This proposal builds on that base: it catches errors of a type that are currently ignored and raises an exception. It also adds a new statement for *dealing* with those errors, because handling them with exceptions won't really work. There's even an analog for the bare except if you want to use it. And it comes about for much the same reason: I'm getting tired of chasing down bugs in concurrent code. There are languages that offer that. Some even run in environments I like, and are fun to write when they're applicable. But I find myself wishing for Python's features when I write in them.
Based on my experience, your second sentence is true. If that were all it were, the Queue module would be most of a solution, and there are STM modules available if that's not good enough. But they only solve half the problem - they make it easier to get things right once you decide the data is shared. People are as likely to miss that data is shared as they are to screw up the locking. In other words, if we do it your way, it'll deal with less than half of whats bugging me. It may be that Python's data structures will make this unworkable. It may be that a workable solution will suck the performance out of non-concurrent applications. It may be that anything that fixes both of those will be unpalatable for other reasons. There's no way to find out except by trying. And I'd rather try that than start trying to convince people to let me write in some strange new language again. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

On Wed, Nov 2, 2011 at 3:53 PM, Mike Meyer <mwm@mired.org> wrote:
It's forcing you to think the way Java's checked exceptions force you to think - they make you think "Gee, it's tedious having to write all this boilerplate to get the compiler/interpreter to STFU and let me get on with doing my job". "safe by default" is an excellent design principle, but so is "stay out of the way". The two are often in tension, and this is one of those times. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mike Meyer writes:
No, "forcing" is. Consenting adults and all that.
But you yourself admit that this isn't forcing you to think:
Nice try, but I didn't say it forces me to think. It forces me to do something to shut up the language. That's ugly.
It just makes you add a statement to shut up the warnings. Pretty much the same thing as using a bare except clause.
The bare except clause is optional; I can (and often do) simply let the exception terminate the process *if* it ever happens. My understanding is that that isn't good enough for you (because concurrency errors usually lead to silent data corruption rather than a spectacular and immediate crash).
Well, if you want help chasing down bugs in concurrent code, I would think that you would want to focus on concurrent code. First, AFAICS ordinary function calls don't expose additional objects to concurrency (they may access exposed objects, of course, but they were passed in from above by a task, or are globals). So basically every object exposed to concurrency is in either args or kwargs in a call to threading.Thread (or thread.start_new_thread), no? Wouldn't it be possible to wrap those objects (and only those objects) such that the wrapper intercepts attempts to access the wrapped objects, and "does something" (warn, raise, dance on the head of a pin) if the access is unlocked or whatever? Then only concurrent code and the objects exposed to it pay the cost. If it's really feasible to do it via wrapper, you could write a decorator or something that could easily be turned into a no-op for tested code ready to go into production.
Well, no, it's not about doing it my way; I'm perfectly happy with processes and message-passing in my applications, and aside from wacky ideas like the above, that I don't really know how to implement myself, I don't have a lot of suggestions for concurrency by threading. Rather, it's that my guess is that if you don't make the costs of safe(r) concurrency look more reasonable you won't be getting much help here.

On Wed, Nov 2, 2011 at 1:49 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
No. You missed two things. First, all the objects that can be accessed - however indirectly - through those objects are exposed to concurrency. Also, any globals and anything that can be accessed through them are exposed to concurrency. No, make that three things: a wrapped C library that has call backs to Python code and uses threads internally can expose anything it's passed (and anything accessible from those objects) to concurrency, without ever using the Python threading code. I mentioned see a bug yesterday. My clients response means I can't in good faith try and fix it (it's in a testing framework, so doesn't affect the product, so they don't care). So this is a guess, but here's what I think is going on: 1) We're using a test framework that creates a mock time module for some reason. At some point, the mock object has the value None. 2) The application being tested uses a database module that uses threads as part of managing a connection pool. The concurrency unsafe codde in the test framework (which is clearly and obviously single-threaded, right?) managed to get the None-valued mock inserted in sys.modules due to a concurrency bug. So when I then use the time module in testing, I get an exception trying to access it's attributes. This does show a bigger problem. Look for my next mail... <mike

Mike Meyer writes:
No. You missed two things.
Actually, I didn't. Containers contain; I considered that sufficiently obvious that I neglected to mention it. And I did mention globals as "exposed to concurrency". Both just expand the list of objects needing protection, but don't change the basic principle AFAICS. I point this out because I hope to convince you to concentrate on arguing *for* your proposal, instead of wasting time scoring points *against* those who question it. Please understand: *nobody* is against better support for developing concurrent programs, including those using threads to implement concurrency. But nobody is willing to pay arbitrarily high cost for arbitrarily small improvements. As for "wrapped C libraries", I'm having trouble imagining what you're talking about. Wrapped libraries won't be accessing data via Python protocols anyway, so any accesses that your framework could intercept will happen in the marshaling code. If that creates threads, I don't see why it wouldn't use threading.Thread. Ditto for Python modules implemented in C.
Again, I don't understand. In the von Neumann model of computation, all *code* is single-threaded pretty much by definition. With preemptively scheduled threads, the process executing that code is single-threaded if and only if it blocks until all other threads exit. The test framework is "obviously" multiple-threaded in the scenario you describe.
managed to get the None-valued mock inserted in sys.modules due to a concurrency bug.
I don't see your point. You claim your proposal is supposed to help identify which objects are shared. Taking that literally, this example is useless as an argument for your framework -- you already know which object is the problem. Of course, I suppose you actually meant to get more precise information in cases like this, specifically about in what execution contexts the object is mutated. But in this case, you probably already know precisely where in the code the offending mutation happens. The bug is that execution of the framework code resumes "unexpectedly" before the "subordinate" thread restores the problem object's state to status quo ante. Well, these are *threads*, so you can't say anything about when to expect them to resume, you have to expect resumption at any point. *Now* what does your framework buy you? If the answer is "nothing," you need to find better use cases to motivate your proposal.

On Thu, Nov 3, 2011 at 2:35 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
This discussion is actually starting to sound a bit like the Python security model discussions to me - ensuring that only certain kinds of objects are reachable from a given point in the code, and controlling the operations that can be performed on them. Victor Stinner eventually managed to create his pysandbox module based on those discussions, which does a fairly good job of locking things down, even though I'd personally still be inclined to back it up with additional OS level defences. Still, like the GIL vs free threading discussions, there are unlimited numbers of things that *could* be done, the question is what is *worth* doing. And that's not going to be figured out in a mailing list discussion - it's going to take real code and plenty of experimentation, and it's going to have be done by people that don't already believe the "right" answer is to use message queues and a "no concurrent access to mutable objects" architecture at the application level (i.e. where what threads buy you is the fact that you can hand over an object just by putting the reference in a queue and dropping yours, without any serialisation overhead). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Nov 3, 2011 at 12:35 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
As for "wrapped C libraries", I'm having trouble imagining what you're talking about.
C code (currently) can create or modify python objects using a C pointer, instead of python access. That is the biggest barrier to a tracing (as opposed to reference-counting) garbage collector. It is also a problem for security sandboxes (though they can just ban C extensions). Most relevant here, C extensions can also bypass any locks or protections put in place for concurrency. -jJ

On Fri, Nov 4, 2011 at 5:36 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Jim Jewett writes: > On Thu, Nov 3, 2011 at 12:35 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
> > As for "wrapped C libraries", I'm having trouble imagining what you're > > talking about.
> C code (currently) can create or modify python objects using a C > pointer, instead of python access.
Can it?
Not if it is just an additional compile-time restriction. If the implementation is prepared to enforce the restriction at run-time, that will almost certainly require some changes to memory allocation. If done right, that might also be able to lock out rogue C code, which should also allow a tracing GC, stronger security guarantees, and GIL removal. I don't personally see it happening without a currently unacceptable slowdown for all memory access, but if he is looking 5-10 years out, it is plausible. (I would still bet against it for mainstream CPython, but it is plausible.) -jJ

On Fri, Nov 4, 2011 at 7:38 AM, Jim Jewett <jimjjewett@gmail.com> wrote:
Which is why it calls for raising exceptions at run-time.
I'm not really interested in locking out rogue C code - or at least not malicious code. This is very much in the "we're all adults here" vein, in that if you want to go out of your way to defeat it, it won't stand in your way.
I'm just trying to get the ball rolling. It clearly won't be in Python 3, so I'm looking to at least Python 4. What I expect to come out of this is an information PEP summarizing the discussion, and possibly some code that might be useful now (i.e. - a new library, or an automated "concurrency safe" auditor, or ...). <mike

On 11/2/2011 1:53 AM, Mike Meyer wrote:
On Wed, 02 Nov 2011 10:45:42 +0900 "Stephen J. Turnbull"<stephen@xemacs.org> wrote:
It is true that Python actually does a lot to protects programmers (from their own ignorance and foolishness -- but with ctypes wiping out all protections ;-). One easily overloop example is the impossibility by design of buffer over-runs, a common security problem. But it does so without feeling like a straightjacket. Breaking a large fraction of existing code is too much of a sledgehammer approach. If Python had been designed from the beginning for multi-threading as the default, with single-threading as the exception, the case would be different. But even now, many are happy with single thread in multiple processes or multiple task objects within a single thread for concurrency. So burdening single thread code will not be popular. -- Terry Jan Reedy

Here are issues to think about. You wrote: 'locking' value [',' value]*':' suite The list of values are the objects that can be mutated in this lock. An immutable object showing up in the list of values is a TypeError. However: (1) Greg Ewing gave an example sort of like this: new_balance = balance + deposit lock(balance) balance = new_balance unlock(balance) and pointed out that the locks don't help. He was talking about the race condition on reading balance before writing. But it's worse than that. If these are numbers, then they're immutable and locking them does nothing and this isn't any better: lock(balance) new_balance = balance + deposit balance = new_balance unlock(balance) Consider this operating on lists: locking stuff: #1 stuff += added_stuff locking stuff: #2 stuff = stuff + added_stuff In #2, locking is completely useless. Sure the values are mutable but I'm not mutating them. Is it obvious that #1 works and #2 doesn't? You don't want to lock the *value* of stuff, you want to lock the *variable*, i.e., locking globals() or wherever the value of stuff is found. Locking globals() seems like a bad idea. Which brings me to... (2) How does locking something like a dictionary work? Does it lock the dictionary and all the values in it, walking the entire data structure to lock it? Ypu suggest that's the case and that seems like a performance nightmare. Given x = [1, 2] d = {3: x, 4: 5} How do I lock d[3]? That is, I want to lock the dictionary slot where the value of d[3] is stored so another thread can't do d[3] = 0. If I write locking d[3]: # do stuff that's going to lock the value of x. Another thread won't be able to do x.append(0) but they would be able to do x = 0 or d[3] = 0. If I have to write locking d: # do stuff that hampers concurrency. If I can lock the dictionary slot d[3], can I lock list slots? After all, the compiler doesn't know they type of d. How do I lock just an attribute without locking the whole object? (3) What happens if one thread does: locking d, x: # stuff and another does locking x, d: # stuff I think I know the answer and it's "deadlock". Avoiding deadlock is an important problem and by forcing me to lock every single object before I mutate it the important locks (the ones for objects that are actually shared) will be lost among the many that are unnecessary, making it very difficult to find bad lock ordering. (4) What if the lock can't be acquired right away? Maybe there's other stuff my thread can do while it's waiting for the lock. You don't consider that. Maybe I have a whole bunch of things all of which can be done in any order. (5) You haven't thought about read vs. write locks. If I'm passed an iterable or a sequence in a concurrent program, I want to read lock it so no one else changes it while I'm working with it. But you prohibit locking immutable objects, so I first have to find out if it's immutable which is something else you'll have to add to the language. On Mon, Oct 31, 2011 at 10:32 PM, Mike Meyer <mwm@mired.org> wrote:
I've identified the problem I want to solve: I want to make concurrent use of python objects "safe by default", ...
To me it looks like this proposal deals with a small part of the problem with the equivalent of a sledgehammer. When I said identify the problem, the above issues are more on the lines of what I was talking about, not a general statement "make concurrency better." But as an overall goal, I think this is better: "make it as easy as possible to write error-free concurrent code." I would think that "without making it hard to write non-concurrent code" is implicit but I'll spell that out since I've heard that explicit is better than implicit. And here are some things people writing concurrent programs want to do (in no particular order): (1) lock an object (even when the object doesn't have any code to support locking) (2) lock part of an object (a slot in list or dictionary, or an attribute) - consider that databases support row and rang elocking because if the only thing you can do is lock entire tables, you're going to get poor performance (3) lock multiple things at the same time (4) queue operations to be performed on an object when it can be locked, which requires ... (5) wait until a queued operation completes, which requires ... (6) avoid starvation (7) avoid deadlock (8) avoid other concurrency bugs (9) avoid debugging (not avoid it when it's necessary but do things to avoid it being necessary) ... so that doing unsafe things
Clojure is very different from Python. Core data structures are immutable in Clojure so adding to a list creates a new also immutable list. Changing Python's core data structures to be immutable would be a bigger compatibility break than 2 to 3. --- Bruce

On Thu, Nov 3, 2011 at 12:39 AM, Bruce Leban <bruce@leapyear.org> wrote:
Yup. You can't keep people from writing code that's just wrong. But they can't get it right if they don't add any locking at all.
This case would still throw an exception, because what needs to be locked isn't balance, but whatever balance is an attribute of. Unless it's a local variable, in which case it doesn't need to be locked. Given the code as is, balance needing to be locked would make it global, so you'd lock the module. A more realistic implementation would be if balance were self.balance, in which case you'd lock self.
Right. Possibly "value" was the wrong choice for a placeholder in the original description, because what you're actually locking is a container of some sort.
If I suggested that, then it was unintentional. Yes, it's a performance nightmare. Locking a container just locks for changes to set of contained objects. It's analogous to tuples being immutable: you can't change the set of objects in the tuple, but if those objects are mutable, you can change them. If you want to change an object in a list/dictionary/etc - then you have to lock that object, but not the dictionary. Given
Yes, it does. But so does any form of locking. This proposal locks objects. Changes that don't change the object - even if they might change it's value by changing a contained object - aren't locked, because of the expense you pointed out. If there's some different level of granularity for locking that makes sense, I don't know what it is.
Attributes are names. You don't lock names, you lock objects. To prevent binding of an attribute, you lock the object it's an attribute of. If you want to prevent mutating the value of the object bound to the attribute, you lock that object.
That one I dealt with in the specification. There's an implementation requirement that if two locking statements share objects, the shared objects have to be locked in the same order. So one of the two will lock things in the opposite of the order they appear in the statement.
I think I know the answer and it's "deadlock".
No, avoiding deadlock is one of the goals. That's why that requirement exists, and there's a further requirement that you can only nest locking statements if the inner one locks a subset of the outer one. I have as yet to work out how an automatic STM version (this wasn't in the original proposal) would interact here. <mike

Mike Meyer wrote:
This case would still throw an exception, because what needs to be locked isn't balance, but whatever balance is an attribute of.
All right, suppose we have an "account" object that holds the balance. Now we can do locking account: account.balance = account.balance + deposit That's okay. But what if there are two accounts involved, and we want to do account1.balance = account1.balance - amount account2.balance = account2.balance + amount These two operations need to be done atomically, but there is no single object to lock, so we have to lock both of them -- and then there is an opportunity for deadlock if two threads do this in different orders.
and there's a further requirement that you can only nest locking statements if the inner one locks a subset of the outer one.
Then it's *impossible* to solve the above problem, because neither of the objects needing to be locked is contained in the other. The notion of containment is not even well defined in general, because Python objects can form arbitrary graphs. -- Greg

On Thu, Nov 3, 2011 at 2:35 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
That's why the proposal specifies that locking statements determine the order of the locking, and require that any two locking statements lock all objects they have in common in the same order - at least during one run of the application. So the above is done by: locking account1, account: # stuff If a second locking statement does "locking account2, foobar, account1", then account1 and account2 will be locked in the same order by both statements.
I wasn't clear enough. The sets I'm talking about are the objects being locked, not anything they may contained, in order to prevent deadlocks. So if you start by doing: locking account1, account2, foobar: and then later on - but with those locks still held - do locking account1, account2: things work fine (the second locking is a nop). However, if you do (under the same condition): locking account1, aardvark: you get an exception - the outer lock has to lock aardvark as well. And yes, the implications still need to be worked out.
Containment has a very specific meaning for this specification, in that an object only contains things it has a direct reference to. That should be spelled out, or maybe I need to use a better word. <mike

On Thu, Nov 3, 2011 at 9:38 AM, Mike Meyer <mwm@mired.org> wrote:
You conveniently ignore my comment about the importance of database row locking. :-)
You're handwaving. How would that work? There's no way to know at compile time what the correct order is. So locking just got a whole lot more complicated. But it's worse than complicated: it's unworkable. No, avoiding deadlock is one of the goals. That's why that requirement
exists, and there's a further requirement that you can only nest locking statements if the inner one locks a subset of the outer one.
In other words, this code is not allowed: def f(): locking x: for y in x: locking y: y.append(1) And neither is this: def f(x): locking x: x.append(1) if it is called inside code that has something locked. Which virtually all code will do since it's required everywhere. So your proposal is that I must lock everything before I change it but I can't lock it unless it's already locked. That's ridiculous. You might as well have a single semaphore to lock everything. We're wasting our time. You said above "You don't lock names, you lock objects." Why not? Because that was your original plan and your mind is closed to other ideas? You want to ignore legitimate issues that aren't convenient to your proposal. I hate to say things like "this idea sucks," but sometimes that's the only way to put it. I think that this proposal is a terrible idea. It's pointless to argue about the details of something this bad. A healthy discussion about how to make concurrency better might be interesting but as long as all you want to do is defend your proposal, this thread isn't sparking useful discussion. Sorry to be so blunt but I don't see any other way to get the message across. --- Bruce

On 04/11/2011 01:51, Bruce Leban wrote:
[snip] If they're threads running in the same address space in CPython then they could lock always in order of increasing address. In an implementation of Python which uses references they could lock always in increasing numeric value of reference (if that's possible). This would not, of course, protect against general nested locks.

MRAB writes:
On 04/11/2011 01:51, Bruce Leban wrote:
OK, I think that works.
That doesn't make sense to me; if it's not an address, how do you know it's unique (you know the object is unique, of course, but there may be other references)? If the reference is not unique, how do you know that some other thread hasn't locked it via a different reference? I think for this to be workable you would need to use something like a lock tick that would increase with each lock taken, and store that in the object. Then you would lock the objects in order of their ticks, with unlocked objects coming last. (You'd have to be careful about the wraparound, though, and that could double or triple the performance hit for incrementing the tick and determining lock order.)

On 04/11/2011 03:53, Stephen J. Turnbull wrote:
In CPython the implementation refers to an object by its address, but my understanding is that in an implementation which uses references the reference is actually an index into a table which contains the addresses of the objects, therefore you would be locking in order of increasing index.

MRAB writes:
Ah, OK. So a reference is actually an object in the sense of an identifiable region of memory, but it's not a Python object. When I see "reference" I think of C++ references or git refs, which are just names, and an object can have many references. But the scheme you describe is like a C stdio file descriptor, and presumably is an indirection which allows relocating Python objects in memory. You can even relocate the table in such a scheme.

On Thu, Nov 3, 2011 at 6:51 PM, Bruce Leban <bruce@leapyear.org> wrote:
I assure you it was unintentional, but there's been a lot of text flowing here. I believe I've answered every comment, though I know some of them were answered when first brought up, and not every time thereafter. And some of the answers were "Yes, that's an issue" or "Yes, but we can't keep people from writing bad code". I don't recall anything specifically about database row locking, but if you feel it's important enough to reiterate it, I'll treat it as equally important.
How is a requirement just handwaving? There is no "correct" order. Given two objects, any locking statement will always lock them in the same order. That's sufficient to prevent deadlocks. Sorting the list passed to each value with a key of id will do the trick, but may not be the best way to do it.
Yup, it's pretty bad. For that reason - among others - the proposal has mutated. The original required listing objects, and either started an STM transaction, or did explicit locks (undecided). Someone pointed out that the STM variant couldn't deal with IO, causing me to note a change to the proposal wherein listing objecs caused explicit locking, and having an empty list started an STM transaction. This is not only easier to use, but solves several problems - including this one. The STM variant hasn't gotten a lot of attention.
No, because I couldn't think of a way to make locking names work. You presented a problem with locking names. Since the proposal doesn't lock names, I figured I wasn't clear enough in the first place, and tried fix that. I've repeatedly said this wasn't meant to be a finished idea, and asked for alternatives. If you have a proposal that works by locking names, by all means tell us about it!
You want to ignore legitimate issues that aren't convenient to your proposal.
No, I don't. I want to collect them all. But I'm not perfect - sometimes two comments may look similar when they in fact aren't.
That may well be the case, and wouldn't particularly disappoint me.
I'm doing a lot more than defending the proposal. I'm tweaking this proposal to solve problems people point out, collecting ideas for alternative approaches, implementations, and tools that might help - which are what make this discussion interesting and useful. I do defend the proposal, especially when people point out issues that were dealt with in the original post, which isn't interesting, but the alternative (ignoring them) seems less useful. <mike

On Fri, Nov 4, 2011 at 1:11 PM, Mike Meyer <mwm@mired.org> wrote:
No, because I couldn't think of a way to make locking names work.
Replace dicts (or at least globals and attribute dicts) with something that enforces your policy -- whether that is by a mutex, a mutex per entry, returning only copies of the values, etc... -jJ

On Fri, Nov 4, 2011 at 10:49 AM, Jim Jewett <jimjjewett@gmail.com> wrote:
Sorry, I wasn't sufficiently detailed. I never worried about how to implement locking names, because I never could find a way to let people use it. I'd like fine-grained locking (i.e. - specific attributes of objects, the value at an index in a key or dictionary, etc.), you still need to be able to lock objects. They ways I came up with for specifying "here's a name we can't rebind" were all very complicated - among other things, what's a "name" is itself complicated - and not worth the gain you got over object-level locking. Especially if you have an STM option available. As always, if you can think of a way to get all this working together, please share! Thanks, <mike
participants (9)
-
Bruce Leban
-
Greg Ewing
-
Jim Jewett
-
Mike Meyer
-
MRAB
-
Nick Coghlan
-
Paul Moore
-
Stephen J. Turnbull
-
Terry Reedy