[Python-ideas] fork

Wed Aug 12 05:33:08 CEST 2015

On Aug 11, 2015, at 07:33, Sven R. Kunze <srkunze at mail.de> wrote:
> 
> Am 05-Aug-2015 16:30:27 +0200 schrieb abarnert at yahoo.com:
> 
> > What does that even mean? How would you not allow races? If you let people throw arbitrary tasks at a thread pool, with no restriction on mutable shared state, you've allowed races.
> 
> Let me answer this in a more implicit way.
> 
> Why do we need to mark global variables as such?
> I think the answer is clear: to mark side-effects (quoting the docs).
> 
> Why are all variables thread-shared by default?
> I don't know, maybe efficiency reasons but that hardly apply to Python in the first place.

First, are you suggesting that your idea doesn't make sense unless Python is first modified to not have shared variables? In that case, it doesn't seem like a very useful proposal, because it applies to some different language that isn't Python. And applying it to Python instead means you're still inviting race conditions. Pointing out that in a different language those races wouldn't exist is not really an answer to that.

Second, the reason for the design is that that's what threads mean, by definition: things that are like processes except that they share the same heap and other global state. What's the point of a proposal that lets people select between threads and processes if its threads aren't actually processes?

Finally, just making variables thread-local wouldn't help. You'd need a completely separate heap for each thread; otherwise, just passing a list to another thread means it can modify your values. And if you make a separate heap for each thread, what happens when you do x[0]=y if x is local and y shared, or vice-versa? You could build a whole shared-memory API and/or message-passing API a la the multiprocessing module, but if that's an acceptable solution, what's stopping you from using multiprocessing in the first place? (If you're going to say "not every message can be pickled", consider how you could deep-copy an object that can't be pickled.)

Of course there's no reason that you couldn't implement something that's basically a process at the abstract level, but implemented with threads at the OS level. And that could make both explicit shared memory and IPC simpler at least under the covers, and more efficient. And it could lead to a way to eliminate the GIL. And there could be other benefits as well. That's why people are exploring things like the recent subinterpreters thread, PyParallel, PyPy+STM, etc.

If this were an easy problem, it would have been solved by now. (Well, it _has_ been solved for different classes of languages--pure-immutable languages can share with impunity; languages designed from ground up for message passing can get away with only message passing; etc. But that doesn't help for Python.)

> > And that's exactly the problem. What makes concurrent code with shared state hard, more than anything else, is people who don't realize what's hard about it and write code that seems to work but doesn't.
> 
> Precisely because 'shared state' is hard, why is it the default?

The default is to write sequential code. You have to go out of your way to use threads. And when you do, you have to intentionally choose threads over processes or some kind of microthreads. It's only when you've chosen to use shared-memory threading as the design for your app that shared memory becomes the default.

> > Making it easier for such people to write broken code without even realizing they're doing so is not a good thing.
> 
> That argument only applies when the broken code (using shared states) is the default.

But that is the default in Python, so your proposal would make it easier for such people to write broken code without even realizing they're doing so, so it's not a good thing.