[Python-ideas] Fwd: Concurrent safety?
Bruce Leban
bruce at leapyear.org
Thu Nov 3 08:39:17 CET 2011
Here are issues to think about. You wrote:
'locking' value [',' value]*':' suite
The list of values are the objects that can be mutated in this lock. An
immutable object showing up in the list of values is a TypeError.
However:
(1) Greg Ewing gave an example sort of like this:
new_balance = balance + deposit
lock(balance)
balance = new_balance
unlock(balance)
and pointed out that the locks don't help. He was talking about the race
condition on reading balance before writing. But it's worse than that. If
these are numbers, then they're immutable and locking them does nothing and
this isn't any better:
lock(balance)
new_balance = balance + deposit
balance = new_balance
unlock(balance)
Consider this operating on lists:
locking stuff: #1
stuff += added_stuff
locking stuff: #2
stuff = stuff + added_stuff
In #2, locking is completely useless. Sure the values are mutable but I'm
not mutating them. Is it obvious that #1 works and #2 doesn't? You don't
want to lock the *value* of stuff, you want to lock the *variable*, i.e.,
locking globals() or wherever the value of stuff is found. Locking
globals() seems like a bad idea. Which brings me to...
(2) How does locking something like a dictionary work? Does it lock the
dictionary and all the values in it, walking the entire data structure to
lock it? Ypu suggest that's the case and that seems like a performance
nightmare. Given
x = [1, 2]
d = {3: x, 4: 5}
How do I lock d[3]? That is, I want to lock the dictionary slot where the
value of d[3] is stored so another thread can't do d[3] = 0. If I write
locking d[3]: # do stuff
that's going to lock the value of x. Another thread won't be able to do
x.append(0) but they would be able to do x = 0 or d[3] = 0. If I have to
write
locking d: # do stuff
that hampers concurrency. If I can lock the dictionary slot d[3], can I
lock list slots? After all, the compiler doesn't know they type of d. How
do I lock just an attribute without locking the whole object?
(3) What happens if one thread does:
locking d, x: # stuff
and another does
locking x, d: # stuff
I think I know the answer and it's "deadlock". Avoiding deadlock is an
important problem and by forcing me to lock every single object before I
mutate it the important locks (the ones for objects that are actually
shared) will be lost among the many that are unnecessary, making it very
difficult to find bad lock ordering.
(4) What if the lock can't be acquired right away? Maybe there's other
stuff my thread can do while it's waiting for the lock. You don't consider
that. Maybe I have a whole bunch of things all of which can be done in any
order.
(5) You haven't thought about read vs. write locks. If I'm passed an
iterable or a sequence in a concurrent program, I want to read lock it so
no one else changes it while I'm working with it. But you prohibit locking
immutable objects, so I first have to find out if it's immutable which is
something else you'll have to add to the language.
On Mon, Oct 31, 2011 at 10:32 PM, Mike Meyer <mwm at mired.org> wrote:
> I've identified the problem I want to solve: I want to make concurrent
> use of python objects "safe by default", ...
>
To me it looks like this proposal deals with a small part of the problem
with the equivalent of a sledgehammer. When I said identify the problem,
the above issues are more on the lines of what I was talking about, not a
general statement "make concurrency better."
But as an overall goal, I think this is better: "make it as easy as
possible to write error-free concurrent code." I would think that "without
making it hard to write non-concurrent code" is implicit but I'll spell
that out since I've heard that explicit is better than implicit.
And here are some things people writing concurrent programs want to do (in
no particular order):
(1) lock an object (even when the object doesn't have any code to support
locking)
(2) lock part of an object (a slot in list or dictionary, or an attribute)
- consider that databases support row and rang elocking because if the only
thing you can do is lock entire tables, you're going to get poor performance
(3) lock multiple things at the same time
(4) queue operations to be performed on an object when it can be locked,
which requires ...
(5) wait until a queued operation completes, which requires ...
(6) avoid starvation
(7) avoid deadlock
(8) avoid other concurrency bugs
(9) avoid debugging (not avoid it when it's necessary but do things to
avoid it being necessary)
... so that doing unsafe things
> causes the programmer to have to do something explicit about making
> things safe. I believe this can be done at the mutation points
> (again, clojure shows that it can be done). I also want to preserve as
> much of Python's existing code as possible. It may be that Python's
> existing data structures mean my believe about mutation points is
> wrong. This may be the wrong solution. It may be that such a change is
> to large to be acceptable. But the only way to find out is to
> investigate it.
>
Clojure is very different from Python. Core data structures are immutable
in Clojure so adding to a list creates a new also immutable list. Changing
Python's core data structures to be immutable would be a
bigger compatibility break than 2 to 3.
--- Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20111103/c565a359/attachment.html>
More information about the Python-ideas
mailing list