
Thanks everybody for the feedback on 'fork'. Let me address the issues and specify it further: 1) Process vs. Thread vs. Coroutine From my understanding, the main fallacy here is that the caller would be able to decide which type of pool is best suited. Take create_thumbnail as an example. You do not know whether this is cpu-bound or io-bound; you can just make a guess or try it out. But who knows then? I would say: the callee. create_thumbnail is cpu-bound when doing the work itself on the machine. create_thumbnail is io-bound when delegating the work to, say, a web service. SAME FUNCTIONALITY, SAME NAME, SAME API, DIFFERENT POOLS REQUIRED. This said, I would propose something like a marking solution: @cpu_bound def create_thumbnail(image): # impl @io_bound def create_thumbnail(image): # impl (coroutines are already marked as such) From this, the Python interpreter should be able to infer which type of pool is appropriate. 2) Pool size Do lists have a fixed length? Do I need to define their lengths right from the start? Do I know them in advance? I think the answers to these questions are obvious. I don't understand why it should be different for the size of the pools. They could grow and shrink depending on the workload and the available resources. 3) Pool Management in General There is a reason why I hesitate to explicitly manage pools. Our code runs on a plethora of platforms ranging from few to many hardware threads. We actually do not want to integrate platform-specific properties right into the source. The point of having parallelism and concurrency is to squeeze out more of the machines and get better response times. Anything else wouldn't be honest in my opinion (besides from researching and experimenting). Thus, a practical solution needs to be simple and universal. Explicitly setting the size of the pool is not universal and definitely not easy. It doesn't need to be perfect. Even if a first draft implementation would simply define pools having exactly 4 processes/threads/coroutines, that would be awesome. Even cutting execution time into half would be an amazing accomplishment. Maybe, even 'fork' is too complicated. It could work without it given the decorators above. But then, we could not decide whether to run things in parallel or sequentially. I think I do not like that. 4) Keyword 'fork' Well, first shot. If you have a better one, I am all in for it (4 letters or shorter only ;) )... Or maybe something like 'par' for parallel or 'con' for concurrent. 5) Awaiting the Completion of Something As Andrew proposed, using the return value should result in blocking. What if there is no result to wait for? That one is harder but I think another keyword like 'wait' or 'await' should work here fine. for image in images: fork create_thumbnail(image) wait print(get_size_of_thumbnail_dir()) 6) Exceptions As close to sequential execution as possible. That is, when some function is forked out and raises an exception, it should behave as if it were a normal function call. for image in images: fork create_thumbnail(image) # I would like to see that in my stacktrace Also true for expressions. '+=' might raise an exception because, say, huge_calculation returns 'None'. Although the actually evaluation of the sum needs to take place only at the print statement, I would like to see the exception raised at the highlighted place: end_result = 0 for items in items_list: end_result += fork huge_calculation(items) # stacktrace for '+=' should be here print(end_result) # not here Best, Sven ------------------------------------------------------------------------------------------------- FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT

On Aug 1, 2015, at 10:36, Sven R. Kunze <srkunze@mail.de> wrote:
There's a whole separate thread going on about making it easier to understand the distinctions between coroutine/thread/process, separate tasks/pools/executors, etc. There's really no way to take that away from the programmer, but Python (and, more importantly, the Python docs) could do a lot to make that easier. Your idea of having a single global "pool manager" object, where you could submit tasks and, depending on how they're marked, they get handled differently might have merit. But that's something you could build pretty easily on top of concurrent.futures (at least for threads vs. processes; you can add in coroutines later, because they're not quite as easy to integrate), upload to PyPI, and start getting experience with before trying to push it into the stdlib, much less the core language. (Notice that Greg Ewing had a proposal a few years ago that was very similar to the recent async/await change, but he couldn't sell anyone on it. But then, after extensive experience with the asyncio module, first as tulip on PyPI and then added to the stdlib, the need for the new syntax became more obvious to everyone, and people--including Guido--who had rejected Greg's proposal out of hand enthusiastically supported the new proposal.)
The available resources rarely change at runtime. If you're doing CPU-bound work, the number of cores is unlikely to change during a run. (In rare cases, you might want to sometimes count hyperthreads as separate cores and sometimes not, but that would depend on intimate knowledge of the execution characteristics of the tasks you're submitting in two different places.) Similarly, if you're doing threads, the ideal pool size usually depends more on what you're waiting for than on what you're doing--12 threads may be great for submitting URLs to arbitrary servers on the internet, 4 threads may be better for submitting to a specific web service that you've configured to match, 16 threads may be better for a simulation with 2^n bodies, etc Sometimes these really do need to grow and shrink configurably--not during a run, but during a deployment. In that case, you should store them in a config file rather than hard coding them. Then your sysadmin/deploy manager/whatever can learn how to test and configure them. For a real-life example (although not in Python), I know Level3 configured their video servers to use 4 processes of 4 threads per machine, while Akamai used 1 process of 16 threads (actually 2, but the second only for failover, not used live). Why? I have no idea, but presumably they tested the software with their machines and their networks and came to different results, and it's a good thing their software allowed them to configure it so they could each save that 1.3% heat or whatever it was they were trying to optimize.
3) Pool Management in General
There is a reason why I hesitate to explicitly manage pools. Our code runs on a plethora of platforms ranging from few to many hardware threads. We actually do not want to integrate platform-specific properties right into the source. The point of having parallelism and concurrency is to squeeze out more of the machines and get better response times. Anything else wouldn't be honest in my opinion (besides from researching and experimenting).
Which is exactly why some apps should expose these details to the sysadmin as configuration variables. Hiding the details inside the interpreter would make that harder, not easier.
Thus, a practical solution needs to be simple and universal. Explicitly setting the size of the pool is not universal and definitely not easy.
If you want universal and easy, the default value is the number of CPUs, which is often the best value to use. When you don't need to manually configure things to squeeze out the last few %, just rely on the defaults. When you do need to, it should be as easy as possible. And that's the way things currently are.
This only allows you to wait on everything to finish, or nothing at all. Very often, you want to wait on things in whatever order they come in. Or wait until the first task has finished. Or wait on them in the order they were submitted (which still allows you to get some pipelining over waiting on all). This is a well-known problem, and the standard solution across many languages is futures. The concurrent.futures module and the asyncio module are both designed around futures. You can explicitly wait on a future, or chain further operations onto a future--and, more importantly, you can compose futures into various kinds of group-waiting objects (wait for all, wait for any, wait for all or until first error, wait in any order, wait in specified order) that are themselves futures. If you want to try to collapse futures into syntax, you need something that still retains all of the power of futures. A single keyword isn't going to do that. Also, note that await is already a keyword in Python; it's used to explicitly block until another coroutine is ready. In other words, it's a syntactic form of the very simplest way to use futures (and note that, because futures are composable, anything can ultimately be reduced to "block until this one future is ready"). The reason the thread/process futures don't have such a keyword is that they don't need one; just calling a function blocks on it, and, because threads and processes are preemptive rather than cooperative, that works without blocking any other tasks. So, instead of writing "await futures.wait(iterable_of_futures, where=FIRST_EXCEPTION)" you just write the same thing without "await" and it already does what you want.
Futures already take care of this. They automatically transport exceptions (with stack traces) across the boundary to reraise where they're waited for.
In this code, your += isn't inside a "fork", so there's no way the implementation could know that you want it delayed. What you're asking for here is either implicit lazy evaluation, contagious futures, or dataflow variables, all of which are much more radical changes to the language than just adding syntactic sugar for explicit futures.

On 02.08.2015 02:02, Andrew Barnert wrote:
You mean something like this? https://pypi.python.org/pypi/xfork

On Aug 3, 2015, at 10:11, Sven R. Kunze <srkunze@mail.de> wrote:
Did you just write this today? Then yes, that proves my point about how easy it is to write it. Now you just have to get people using it, get some experience with it, etc. and you can come back with a proposal to put something like this in the stdlib, add syntactic support, etc. that it will be hard for anyone to disagree with. (Or to discover that it has flaws that need to be fixed, or fundamental flaws that can't be fixed, before making the proposal.) One quick comment: from my experience (mostly with other languages that are very different from Python, so I can't promise how well it applies here...), implicit futures without implicit laziness or even an explicit delay mechanism are not as useful as they look at first glance. Code that forks off 8 Fibonacci calls, but waits for each one's result before forking off the next one, might as well have just stayed sequential. And if you're going to use the result by forking off another job, then it's actually more convenient to use explicit futures like the ones in the stdlib. One slightly bigger idea: If you really want to pursue your implicit-as-possible design further, you might want to consider making the decorators replace the function with an object whose __call__ method just implicitly submits it to the pool. Then you can use normal function-calling syntax and pretend everything is magic. You can even add operator dunder methods to your future class that do the same thing (so "result * 2" just builds a new future out of "self.get() * 2", either submitted to the pool, probably better, tacked on as an add_done_callback). I think there's a limit to how far you can push this without some mechanism to mark when you need to actual value (in ML-derived languages and C++, static types make this easier: a cast, implicit or explicit, forces a wait; in Python, that doesn't work), but it might be worth exploring that limit. Or it might be better to just stop at the magic function calls and leave the futures alone.

On 04.08.2015 05:21, Andrew Barnert wrote:
I presented it today. The team members already showed interest. They also noted they like its simplicity. The missing syntax support seemed like minor issue compared to what complexity is hidden. Others admitted they knew about the existence of concurrent.futures and such but never used it due to - its complexity - AND *drum roll* the '.result()' of the future objects As it seems, it doesn't feel natural.
I added two new decorators for this. But they don't work with the @ syntax. It seems like a well-known issue of Python: _pickle.PicklingError: Can't pickle <function fib_fork at 0x7f8eaeb09730>: it's not the same object as __main__.fib_fork Would be great if somebody could fix that.
I actually like the idea of contagious futures and I might outline why this is not an issue with the current Python language. Have a look at the following small interactive Python session:
Question: When has the add operation being executed? Answer: Unknown from the programmer's perspective. Only requirement: Exceptions are raised exactly where the operation is supposed to take place in the source code (even if the operation that raises the exception is performed later). Best, Sven

On Aug 4, 2015, at 11:09, Sven R. Kunze <srkunze@mail.de> wrote:
I don't know how to put this nicely, but I think anyone who finds the complexity of concurrent.futures too daunting to even attempt to learn it should not be working on any code that uses less explicit concurrency. I have taught concurrent.futures to rank novices in a brief personal session or a single StackOverflow answer and they responded, "Wow, I didn't realize it could be this simple". Someone who can't grasp it is almost certain to be someone who introduces races all over your code and can't even understand the problem, much less debug it.
Not true. The language clearly defines when each step happens. The a.__add__ method is called, then the result is assigned to a, then the statement finishes. (Then, in the next statement, nothing happens--except, because this is happening in the interactive interpreter, and it's an expression statement, after the statement finishes doing nothing, the value of the expression is assigned to _ and its repr is printed out.) This ordering relationship may be very important if the variable a is shared by multiple threads, especially if more than one thread may modify it, especially if you're using non-atomic operations like += (where another thread can read, use, and assign the variable between the __add__ call and the assignment). If a references a mutable object with an __iadd__ method, the variable doesn't even need to be shared, only the value, for this to matter. The only way to safely ignore these problems is to never share any variables or any mutable values between threads. (This is why concurrency features are easier to design in pure functional languages.) Hiding this fact when you or the people you're hiding it from don't even understand the issue is exactly how you create races.
Only requirement: Exceptions are raised exactly where the operation is supposed to take place in the source code (even if the operation that raises the exception is performed later).

On 04.08.2015 21:38, Andrew Barnert wrote:
I am sorry because I disagree here with you.
Nobody says that concurrent.futures is not an vast improvement over previous approaches. But it is still not the end of the line of simplifications.
Nobody wants races, yet everybody still talks about them. Don't allow races in the first place and be done with it.
Where can find this definition in the docs? To me, we are talking about class customization as described on reference/datamodel.html. Seems like an implementation detail, not a language detail. I am not saying, CPython doesn't do it like that, but I saying the Python language could support lazy evaluation and not disagreeing with the docs.
Mutual variables are global variables. And these have gone out of style quite some time ago. Btw. this is races again and I thought we agreed on not having them because nobody really can/wants to debug them.

On Aug 4, 2015, at 14:03, Sven R. Kunze <srkunze@mail.de> wrote:
What does that even mean? How would you not allow races? If you let people throw arbitrary tasks at a thread pool, with no restriction on mutable shared state, you've allowed races.
No, the data model is a feature of the language, not one specific implementation. The fact that you can define classes that work the same way as builtin types like int is a fundamental feature. It's something Guido and others worked very hard on making true back in Python 2.2-2.3. It's one of the things that makes Python or C++ more pleasant to use than Tcl or Java. Any implementation that didn't do the same would not be Python, and would not run a good deal of Python code.
No. Shared values include global variables, nonlocal variables used by two closures from the same scope, attributes of objects passed to both functions, members of collections passed to both functions, etc. The existence of all of these other things is why global variables are not necessary. They have many advantage over globals, allowing you to better control how state is shared, to share it reentrantly, to make it more explicit in the code, etc. But because they have all the same benefits, they also have the exact same race problem when used to share state between threads.
Btw. this is races again and I thought we agreed on not having them because nobody really can/wants to debug them.
And how do you propose "not having them"? It's not impossible to write purely functional code that doesn't use any mutable state, in which case it doesn't matter whether your state is shared. But the fact that your example uses += proves that this isn't your intention. If you take the code from your example and run it in two threads simultaneously, you have a race. The fact that you didn't intend to create a race because you don't understand that doesn't mean the problem isn't there, it just means you have no idea you've just written buggy code and no idea how to test for it or debug it. And that's exactly the problem. What makes concurrent code with shared state hard, more than anything else, is people who don't realize what's hard about it and write code that seems to work but doesn't. Making it easier for such people to write broken code without even realizing they're doing so is not a good thing.

Hi everybody, I finally managed to implement all the tiny little details of fork that were important from my perspective (cf. https://pypi.python.org/pypi/xfork). An interesting piece of code is the iterative evaluation of OperationFuture using generators to avoid stack overflows. The only thing I am not satisfied with is exception handling. In spite of preserving the original traceback, when the ResultEvaluationError is thrown is unfortunately up to the evalutor. Maybe, somebody here has a better idea or compromise here. Co-workers proposed using function scopes as the ultimate evaluation scope. That is when a function returns a ResultProxy, it gets evaluated. However, I have absolutely no idea how to do this as I couldn't find any __returned__ hook or something. I learned from writing this module and some key insights I would like to share: 1) Pickle not working with decorated functions 2) One 'traceback' is not like another. There are different concepts in Python with the same name. 3) Tracebacks are not really first-class, thus customizing them is hard/impossible. 4) contextlib.contextmanager only creates decorators/context managers with parameters but what if you have none? @decorator() looks weird. 5) Generators can be used for operation evaluation to avoid the stack limit 6) Python is awesome: despite the above obstacles, I managed to hammer out a short and comprehensible implementation for fork. It would be great if experts here could fix 1) - 4). 1) - 3) have corresponding StackOverflow threads. @_Andrew_ I am going to address you questions shortly after this. Best, Sven ------------------------------------------------------------------------------------------------- FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT

On Aug 11, 2015, at 06:54, Sven R. Kunze <srkunze@mail.de> wrote:
Co-workers proposed using function scopes as the ultimate evaluation scope. That is when a function returns a ResultProxy, it gets evaluated. However, I have absolutely no idea how to do this as I couldn't find any __returned__ hook or something.
I'm not sure I completely understand what you're looking for here. If you just want a hook that gets called whenever a function returns, just write a decorator that calls the real function then does the hook thing: def hookify(func): @wraps def wrapper(*args, **kwargs): result = func(*args, **kwargs) do_hook_stuff() return result return wrapper (Or, if you want to hook both raising and returning, use a finally.) But I'm not sure what good that would do anyway. If you unwrap futures every time they're returned, they're not doing anything useful as futures in the first place; you might as well just return the values directly.

On 12.08.2015 05:06, Andrew Barnert wrote:
I think I found a better solution. Not functions should be the boundaries but try: blocks. Why? Because they mark the boundaries for exception handling and this is what the problem is about. I started another thread here: https://mail.python.org/pipermail/python-list/2015-August/695313.html If an exception is raised within an try: block that is not supposed to be handled there, weird things might happen (wrong handling, superfluous handling, no handling, etc.). Confining the evaluation of result proxies within the try: blocks they are created in would basically retain all sequential properties. So, plugging in 'fork' and removing it would basically change nothing (at least if you don't try anything really insane which at least is disallowed by our coding standards. ;) ) Some example ('function' here mean stack frame of a function): def b(): return 'a string' try: function: a = fork(b) a += 3 function: b = 5 b *= 4 * a except TypeError: print('damn, I mixed strings and numbers') The given try: block needs to make sure if eventually collects all exceptions that would have been raised in the sequential case. Conclusion: the approach is compromise between: 1) deferred evaluation (later is better) 2) proper exception handling (early is better) Best, Sven

Am 05-Aug-2015 16:30:27 +0200 schrieb abarnert@yahoo.com:
What does that even mean? How would you not allow races? If you let people throw arbitrary tasks at a thread pool, with no restriction on mutable shared state, you've allowed races.
Let me answer this in a more implicit way. Why do we need to mark global variables as such? I think the answer is clear: to mark side-effects (quoting the docs). Why are all variables thread-shared by default? I don't know, maybe efficiency reasons but that hardly apply to Python in the first place.
And how do you propose "not having them"?
What would happen if all shared variables were thread-local by default and need to marked as shared if desired? I think the answer would also be very clear: to mark side-effects and to have people think about it explicitly.
And that's exactly the problem. What makes concurrent code with shared state hard, more than anything else, is people who don't realize what's hard about it and write code that seems to work but doesn't.
Precisely because 'shared state' is hard, why is it the default?
Making it easier for such people to write broken code without even realizing they're doing so is not a good thing.
That argument only applies when the broken code (using shared states) is the default. As you can see, this thought experiment assumes that there could be another way to approach that situation. How and when this can be done and if at all is a completely different matter. As usual, I leave that to the experts like you to figure out. Best, Sven ------------------------------------------------------------------------------------------------- FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT

On Aug 11, 2015, at 07:33, Sven R. Kunze <srkunze@mail.de> wrote:
First, are you suggesting that your idea doesn't make sense unless Python is first modified to not have shared variables? In that case, it doesn't seem like a very useful proposal, because it applies to some different language that isn't Python. And applying it to Python instead means you're still inviting race conditions. Pointing out that in a different language those races wouldn't exist is not really an answer to that. Second, the reason for the design is that that's what threads mean, by definition: things that are like processes except that they share the same heap and other global state. What's the point of a proposal that lets people select between threads and processes if its threads aren't actually processes? Finally, just making variables thread-local wouldn't help. You'd need a completely separate heap for each thread; otherwise, just passing a list to another thread means it can modify your values. And if you make a separate heap for each thread, what happens when you do x[0]=y if x is local and y shared, or vice-versa? You could build a whole shared-memory API and/or message-passing API a la the multiprocessing module, but if that's an acceptable solution, what's stopping you from using multiprocessing in the first place? (If you're going to say "not every message can be pickled", consider how you could deep-copy an object that can't be pickled.) Of course there's no reason that you couldn't implement something that's basically a process at the abstract level, but implemented with threads at the OS level. And that could make both explicit shared memory and IPC simpler at least under the covers, and more efficient. And it could lead to a way to eliminate the GIL. And there could be other benefits as well. That's why people are exploring things like the recent subinterpreters thread, PyParallel, PyPy+STM, etc. If this were an easy problem, it would have been solved by now. (Well, it _has_ been solved for different classes of languages--pure-immutable languages can share with impunity; languages designed from ground up for message passing can get away with only message passing; etc. But that doesn't help for Python.)
And that's exactly the problem. What makes concurrent code with shared state hard, more than anything else, is people who don't realize what's hard about it and write code that seems to work but doesn't.
Precisely because 'shared state' is hard, why is it the default?
The default is to write sequential code. You have to go out of your way to use threads. And when you do, you have to intentionally choose threads over processes or some kind of microthreads. It's only when you've chosen to use shared-memory threading as the design for your app that shared memory becomes the default.
Making it easier for such people to write broken code without even realizing they're doing so is not a good thing.
That argument only applies when the broken code (using shared states) is the default.
But that is the default in Python, so your proposal would make it easier for such people to write broken code without even realizing they're doing so, so it's not a good thing.

On 12.08.2015 05:33, Andrew Barnert wrote:
My point was: 1) processes are fine (more or less) 2) threads aren't because their are hard to manage, so let's make them easier
Finally, just making variables thread-local wouldn't help. You'd need a completely separate heap for each thread; So?
At this point, talking about internal implementation hardly seems to relevant. Not exactly sure, what you mean by heap here, but I could imagine more like a overlay approach. As long as, I only read the original variable, we are fine. But setting it would require me to store the thread-local value somewhere else. I am uncertain why you are so averse about making threading easier to handle and to maintain. If you bother about 'easier', let's call it 'code works more reliably', 'code is more readable', 'code has lesser side-effects', 'code produces lesser races'. I am not talking about 100%. I am talking about 80% less places in your code where you need to worry thread-safety. That makes 20% places where you really need to. Btw. the stdlib would also benefit from this, in order to provide thread-safe modules out of the box. Not every maintainer needs to re-implement the desired thread-safety from scratch over and over again.
otherwise, just passing a list to another thread means it can modify your values.
Just depends on what you want here. I would rather see Python assuming thread-safe behavior by default whereas the programmer can actively chose a more flexible/dangerous model if needed for some small areas.
[...implementation...] what happens when you do x[0]=y if x is local and y shared, or vice-versa?
Now, we are talking. A) As soon as a single variable (x or/and y) is shared, all expressions using/writing such variables are basically unsafe. It's dangerous, you might need some locks and so forth to get it running properly. You might need extra thought to handle some weird corner cases and so forth. B) If all variables of an expression are local, everything is fine. No additional work needed. I regard case B) as the common case where you DON'T want others to mess around with your variables and you don't can do anything about it. Case A) is more like the data communication channel where threads could communicate with each other, aggregate results in a common list, and so forth. I only can image this taking place at the end of the threading part of a program where the results needs to be propagated back to the MainThread.
Of course there's no reason that you couldn't implement something that's basically a process at the abstract level, but implemented with threads at the OS level. And that could make both explicit shared memory and IPC simpler at least under the covers, and more efficient. And it could lead to a way to eliminate the GIL. And there could be other benefits as well. That's why people are exploring things like the recent subinterpreters thread, PyParallel, PyPy+STM, etc.
Yes, transactional memory would basically the term that would cover that. A thread basically gets a snapshot of the world right from the start and after it finishes, the variables get merged back. However, I am unsure whether I would want that for all variables ("shared vs local" exists also here; and I would prefer an explicit way to declare it).
The default is to write sequential code. You have to go out of your way to use threads. And when you do, you have to intentionally choose threads over processes or some kind of microthreads.
It's only when you've chosen to use shared-memory threading as the design for your app that shared memory becomes the default. I am not sure if I can follow here. If I look at the threading API of
We are talking about threading all the way long. There is no point in going back to sequential. the Python standard lib, it is shared-memory. So, it is the default, like it or not.
But that is the default in Python, so your proposal would make it easier for such people to write broken code without even realizing they're doing so, so it's not a good thing.
I am sorry? Because shared-memory is the default in Python, my proposal would make it easier for such people to write broken code? We must be talking about different proposals. Maybe, you could give an example. Just for the record, my proposal: 1) processes are almost fine 2) threads aren't, so let's make it easier to work with them Best, Sven

On Wed, Aug 19, 2015 at 2:28 AM, Sven R. Kunze <srkunze@mail.de> wrote:
Python has two completely distinct concepts that, together, make up the whole variable pool: a) Names, which live in scopes and are usually bound to objects b) Objects, which are always global and may refer to other objects Names may be built-ins, module globals, class attributes, or function locals. The latter exist on a stack, where you can access only the current function call, and all others are shadowed; also, tighter forms shadow broader forms (eg a function-local 'str' will shadow the built-in type of that name). Objects exist independently of all scopes. Names in multiple scopes can simultaneously be bound to the same name, and objects can reference other objects. Objects can never reference names (though some name bindings are implemented with dictionary lookups, cf globals() for example). So far, I think everyone on this list understands everything I've said. Nothing surprising here; and nothing that depends on a particular implementation. The notion of "a completely separate heap for each thread" is talking about part B - you need a completely separate pile of objects. And if you're going to do that, you may as well make them separate processes. There's no way to make module globals thread-local without making the module object itself thread-local, and if you do that, you probably need to make every object it references thread-local too, etc, etc, etc. Does that answer the questions? Apart from "heap" being perhaps a term of implementation, this isn't about the internals - it's about the script-visible behaviour. ChrisA

On 18.08.2015 18:55, Chris Angelico wrote:
The notion of "a completely separate heap for each thread" is talking about part B - you need a completely separate pile of objects.
Right. However, only if needed. As long as, threads only reads from a common variable, there is no need to interfere. I would except the same behavior as with class attributes and instance attributes. The latter overlay the former once they are assigned. Otherwise, the former can be read without an issue.
And if you're going to do that, you may as well make them separate processes.
Consulting the table elaborated on the other thread "Concurrency Modules", that is not entirely true. I agree, behavior-wise, processes behave almost as desired (relevant data is copied over and there are no shared variables). However, regarding the cpu/memore/communication footprint for a new process (using spawn) is enormous compared to a thread. So, threading still have its merits (IMHO).
Something wrong with that? Shouldn't matter as long as there is only a single thread.
Yep, thanks a lot, Chris. :) Best, Sven

On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze@mail.de> wrote:
Sure, but as soon as you change something, you have to thread-local-ify it. So I suppose what you would have is three separate pools of objects: 1) Thread-specific objects, which are referenced only from one thread. These can be read and written easily and cheaply. 2) Global objects which have never been changed in any way since threading began. These can be read easily, but if written, must be transformed into... 3) Thread-local objects, which exist for all threads, but are different. The id() of such an object depends on which thread is asking. Conceptually, all three types behave the same way - changes are visible only within the thread that made them. But the implementation could have these magic "instanced" objects for only those ones which have actually been changed, and save a whole lot of memory for the others.
So really, you're asking for process semantics, with some optimizations to take advantage of the fact that most of the processes are going to be just reading and not writing. That may well be possible, using something like the above three-way-split, but I'd want the opinion of someone who's actually implemented something like this - from me, it's just "hey, wouldn't this be cool".
As long as there's only a single thread, there's no difference between process-wide and thread-local. Once you start a second thread, something needs to know what objects belong where. That's all. ChrisA

On 18.08.2015 19:27, Chris Angelico wrote: > On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze@mail.de> wrote: >> On 18.08.2015 18:55, Chris Angelico wrote: >>> The notion of "a completely separate heap for each thread" is talking >>> about part B - you need a completely separate pile of objects. >> >> Right. However, only if needed. As long as, threads only reads from a common >> variable, there is no need to interfere. > Sure, but as soon as you change something, you have to > thread-local-ify it. So I suppose what you would have is three > separate pools of objects: > > 1) Thread-specific objects, which are referenced only from one thread. > These can be read and written easily and cheaply. > 2) Global objects which have never been changed in any way since > threading began. These can be read easily, but if written, must be > transformed into... > 3) Thread-local objects, which exist for all threads, but are > different. The id() of such an object depends on which thread is > asking. > > Conceptually, all three types behave the same way - changes are > visible only within the thread that made them. But the implementation > could have these magic "instanced" objects for only those ones which > have actually been changed, and save a whole lot of memory for the > others. Indeed. I think that is sensible approach here. Speaking of an implementation though, I don't know where I would start when looking at CPython. Thinking more about id(). Consider a complex object like an instance of a class. Is it really necessary to deep copy it? It seems to me that we actually just need to hide the atomic/immutable values (e.g. strings, integers etc.) of that object. The object itself can remain the same. # first thread class X: a = 0 class Y: x = X #thread spawned by first thread Y.x.a = 3 # should leave id(X) and id(Y) alone Maybe, that example is too simple, but I cannot think of an issue here. As long as the current thread is the only one being able to change the values of its variables, all is fine. >> I agree, behavior-wise, processes behave almost as desired (relevant data is >> copied over and there are no shared variables). >> >> However, regarding the cpu/memore/communication footprint for a new process >> (using spawn) is enormous compared to a thread. So, threading still have its >> merits (IMHO). > So really, you're asking for process semantics, with some > optimizations to take advantage of the fact that most of the processes > are going to be just reading and not writing. That may well be > possible, using something like the above three-way-split, but I'd want > the opinion of someone who's actually implemented something like this > - from me, it's just "hey, wouldn't this be cool". If you put it this way, maybe yes. I also look forward to more feedback on this. To me, a process/thread or any other concurrency solution, is basically a function that I can call but runs in the background. Later, when I am ready, I can collect its result. In the meantime, the main thread continues. (Again) to me, that is the only sensible way to approach concurrency. When I recall the details of locks, semaphores etc. and compare it to what real-world applications really need... You can create huge tables of all the possible cases that might happen just in order to find out that you missed an important one. Even worse, as soon as you change something about your program, you are doomed to redo the complete case analysis, find a dead/live-lock-free solution and so forth. It's a time sink; costly and dangerous from a company's point of view. Best, Sven

On 18 August 2015 at 21:32, Sven R. Kunze <srkunze@mail.de> wrote:
It seems to me that this is accurate, but glosses over all of the issues that result in multiple solutions being needed. Sure, all concurrency solutions provide this. But the difference lies in the environment of the function you're calling. Does it have access to the same non-local name bindings as it would if run in the foreground? To the same objects? Is it able to write to those objects safely, or must it treat them as read only. Or can it write, but only if it follows a particular protocol (semaphores, locks, etc fit here)? If you reduce the functionality you're considering to the lowest common denominator, then all solutions look the same, in essence by definition (that's basically what lowest common denominator means). But you haven't actually solved any real-world problems by doing so. Conversely, it *is* true that a lot of problems that benefit from concurrency can work with the minimal guarantees of a lowest common denominator solution (no shared state pure functions). Functional programming has shown us that. For those problems, any of the options are fine, and the decision gets made on other factors (most likely performance, as each solution makes different performance trade-offs in the process of providing whatever extra guarantees they make). I'm confused as to what your point is. "People should write concurrent code in a no shared state pure function manner" seems to be what your comment "the only sensible way" implies. If so, then fine that's your opinion, but others differ and Python caters for those people as well. If, on the other hand, you accept the need for shared state (even if it's just I/O) then discounting the constraints that such shared state implies seems either naive or simply wrong. Or I'm missing something, but I can't see what it is. Paul

On 18.08.2015 23:53, Paul Moore wrote:
I can identify 2 common patterns I label as jobs and servers. Jobs are things that get delegated out to some background process/thread/coroutine/subinterpreter. They come back when the job is done. No shared state necessary. Servers are more like while true loops running in some separate process/thread/coroutine/subinterpreter. They only return on shutdown or so. Shared state a là queues for input/output could come in handy. Maybe, there is more to discover but it's more like research that production.
I am uncertain of how to approach this in the correct way. However, my approach here would be simply by imitating sequential behavior. I always would ask: "What would happen if that is executed sequentially?" The answer would then be retrofitted to the parallel scenario.
I hope eventually, the interpreter will take the decision burden away from the developers and make educated guesses to achieve the best performance.
I think you are referring to that statement only. It's just the underlaying motivation for me to engage in this discussion and it's born by observing real-world code development and maintenance. If I were to ask people around the globe of what they use in production, I guess I would get answers like that (that'll be an awesome survey btw.): 30% declaration - you say what you want (cfg files, decorators, sql, css, html, etc.) 20% imperative - you say what to do (function call hierarchies) 25% object oriented - you think in "objects" (object relationship tree) 15% packages/modules - you cluster functionality - (file/directory hierarchies) 10% magic stuff - you never get it right (generators, concurrency, meta-classes, import hooks, AST, etc.) Just look at big projects. Look at different programming languages. All the same. Maybe, my observation is wrong; that could be. Maybe, the observation is true and people are just dumb, lazy and unable to appreciate a fine 10 hours live-lock bug hunting session (but I doubt that). Most professionals, I am working with, are highly intelligent people working on very complex problems and with a very few resources. Thus, they don't appreciate tools making their lives more difficult by changing the problem's domain from complex to complicated. As a result, they are not going to use it at all. Make it like the top 90% of what people are used to and they are going to use it. Point is not, people should do this or that. Point is, the tools should make it stupidly easy to get things done. People then will follow and do this or that automatically. Let's improve the tools. Best, Sven

On Thu, Aug 20, 2015 at 12:01:12AM +0200, Sven R. Kunze wrote:
Point is, the tools should make it stupidly easy to get things done.
"The problem with this is that we will have done what humans often do, which is to use technology to make things easier while missing an opportunity to make them significantly better." -- Rory Sutherland and Glen Weyl If Python becomes everything that you want from your proposal, how will it be *better* rather than just easier? -- Steve

On 20.08.2015 02:18, Steven D'Aprano wrote:
"I am uncertain why you are so averse about making threading easier to handle and to maintain. If you bother about 'easier', let's call it 'code works more reliably', 'code is more readable', 'code has lesser side-effects', 'code produces lesser races'." -- Sven R. Kunze This quoted: what is your definition of *better*? Best, Sven PS: there is no such thing as **better* (at least from my perspective). It all comes down to a personal definition. Watching https://vimeo.com/79539317 , I can tell that there much more potential for improvements under the hoods. However, the public API for the "end developers" should be made and stay as simple as possible. Just for the sake of "getting things done".

On Aug 18, 2015, at 13:32, Sven R. Kunze <srkunze@mail.de> wrote: > >> On 18.08.2015 19:27, Chris Angelico wrote: >>> On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze@mail.de> wrote: >>>> On 18.08.2015 18:55, Chris Angelico wrote: >>>> The notion of "a completely separate heap for each thread" is talking >>>> about part B - you need a completely separate pile of objects. >>> >>> Right. However, only if needed. As long as, threads only reads from a common >>> variable, there is no need to interfere. >> Sure, but as soon as you change something, you have to >> thread-local-ify it. So I suppose what you would have is three >> separate pools of objects: >> >> 1) Thread-specific objects, which are referenced only from one thread. >> These can be read and written easily and cheaply. >> 2) Global objects which have never been changed in any way since >> threading began. These can be read easily, but if written, must be >> transformed into... >> 3) Thread-local objects, which exist for all threads, but are >> different. The id() of such an object depends on which thread is >> asking. >> >> Conceptually, all three types behave the same way - changes are >> visible only within the thread that made them. But the implementation >> could have these magic "instanced" objects for only those ones which >> have actually been changed, and save a whole lot of memory for the >> others. > > Indeed. I think that is sensible approach here. Speaking of an implementation though, I don't know where I would start when looking at CPython. > > Thinking more about id(). Consider a complex object like an instance of a class. Is it really necessary to deep copy it? It seems to me that we actually just need to hide the atomic/immutable values (e.g. strings, integers etc.) of that object. Why wouldn't hiding the mutable members be just as necessary? In your example, if I can replace Y.x, isn't that even worse than replacing Y.x.a? > The object itself can remain the same. What does it mean for an object to be "the same" if it potentially holds different values in different threads. > # first thread > class X: > a = 0 > class Y: > x = X > > #thread spawned by first thread > Y.x.a = 3 # should leave id(X) and id(Y) alone OK, but does the second thread see 0 or 3? If the former, then these aren't shared objects at all. If the latter, then that's how things already work. > Maybe, that example is too simple, but I cannot think of an issue here. As long as the current thread is the only one being able to change the values of its variables, all is fine. No. If other threads can see those changes, it's still a problem. They can see things happening out of order, see objects in inconsistent intermediate states, etc.--all the problems cause by races are still there. >>> I agree, behavior-wise, processes behave almost as desired (relevant data is >>> copied over and there are no shared variables). >>> >>> However, regarding the cpu/memore/communication footprint for a new process >>> (using spawn) is enormous compared to a thread. So, threading still have its >>> merits (IMHO). >> So really, you're asking for process semantics, with some >> optimizations to take advantage of the fact that most of the processes >> are going to be just reading and not writing. That may well be >> possible, using something like the above three-way-split, but I'd want >> the opinion of someone who's actually implemented something like this >> - from me, it's just "hey, wouldn't this be cool". > > If you put it this way, maybe yes. I also look forward to more feedback on this. Have you looked into the subinterpreters project, the PyParallel project, or the PyPy-STM project, all of which, as I mentioned earlier, are possible ways of getting some of the advantages of process semantics without all of the performance costs? (Although none of them are exactly that, of course.) > To me, a process/thread or any other concurrency solution, is basically a function that I can call but runs in the background. Later, when I am ready, I can collect its result. In the meantime, the main thread continues. (Again) to me, that is the only sensible way to approach concurrency. When I recall the details of locks, semaphores etc. and compare it to what real-world applications really need... You can create huge tables of all the possible cases that might happen just in order to find out that you missed an important one. Yes, that is the problem that makes multithreading hard in the first place (except in pure functional languages). If the same value is visible in two threads, and can be changed by either of those threads, you have to start thinking either about lock discipline, or about ordering atomic operations; either way, things get very complicated very fast. A compromise solution is to allow local mutable objects, but not allow them to be shared between threads; instead, you provide a way to (deep-)copy them between threads, and/or to (destructively) move them between threads. You can do that syntactically, as with the channel operators used by Erlang and the languages it's inspired, or you can do it purely at a semantic level, as with Python's multiprocessing library; the effect is the same: process semantics, or message-passing semantics, or whatever you want to call it gives you the advantages of immutable threading in a language with mutability. > Even worse, as soon as you change something about your program, you are doomed to redo the complete case analysis, find a dead/live-lock-free solution and so forth. It's a time sink; costly and dangerous from a company's point of view. This is an argument for companies to share as little mutable state as possible across threads. If you don't have any shared state at all, you don't need locks or other synchronization mechanisms at all. If you only have very limited and specific shared state, you have very limited and hopefully simple locking, which is a lot easier to keep track of. And you can already do this today, using multiprocessing. It's an obvious and explicit way to ask for process semantics. If you're not using it, you have to explain why you can't use it, and why you think rebuilding the same semantics on top of threads would solve your problem. There are possible answers to that. Some projects need a better "unsafe escape hatch" for sharing than either raw shared memory or proxy-manager protocols can provide; for some, there may be a specific performance bottleneck that could in theory be avoided but in practice the current design makes it impossible; etc. None of these are very common, but they do exist. If you're running into a specific one, we should be looking for ways to characterize and then solve that specific problem, not trying to rebuild what we already have and hope that this time the problem doesn't come up.

On 19.08.2015 04:09, Andrew Barnert wrote:
Not sure what you mean about 'worse'. Replacing Y.x is just pointer to some value. So, if I replace it with something else, it's not different/worse/better than replacing Y.x.a, right?
The object itself can remain the same. What does it mean for an object to be "the same" if it potentially holds different values in different threads.
I was talking about the id(...) and deep copying.
before (*) first thread sees: Y.x.a == 0 thread spawned by first thread sees: Y.x.a = 0 after (*) first thread sees: Y.x.a == 0 thread spawned by first thread sees: Y.x.a = 3 If that wasn't clear, we were talking about the preferred 'process-like' semantics. Just assume for a moment, that, by default, Python would wrap up variables (as soon as they are shared across 2 or more threads) like this (self is the variable): class ProxyObject: def __init__(self, variable) self.__original__ = variable self.__threaded__ = threading.local() def __proxy_get__(self): return getattr(self.__threaded__, 'value', self.__original) def __proxy_set__(self, value): self.__threaded__.value = value I think you get the idea; it should work like descriptors. Basically, descriptors for general access on a variable and not for classes only => proxy objects. Is there something like that Python? That would vastly simplify the implementation of xfork, btw. So, to get you some example (still assuming the behavior described above), I abuse our venerable thumbnails. Let's calculate the total sum of the thumbnail bytes created: 1: images = ['0.jpg', '1.jpg', '2.jpg', '3.jpg', '4.jpg'] 2: sizes = [] 3: for image in images: 4: fork create_thumbnails(image, sizes) 5: wait # for all forks to come back 6: shared sizes 7: print('sum:', sum(sizes)) 8: 9: @io_bound 10: def create_thumbnails(image, sizes): 11: with open(image) as image_file: 12: # and so forth 13: shared sizes 14: sizes.append(100) Here, you can see what I meant by explicitly stating that we enter dangerous space: the keyword "shared" in lines 5 and 13. It basically removes the wrapper described above and reveals the dangerous/shared stated of the object (like 'global'). So, both functions needs to agree to remove the veil and thus to be able to read/modify the shared state. shared x translates to: x = x.__original__
Have you looked into the subinterpreters project, the PyParallel project, or the PyPy-STM project, all of which, as I mentioned earlier, are possible ways of getting some of the advantages of process semantics without all of the performance costs? (Although none of them are exactly that, of course.)
Yes, I did. STM is nice as a proof of concept, waiting for HTM. However, as I mentioned it earlier I am not sure whether I would really want that within the semantics of multiple threads. Trent Nelson (PyParallel) seems to agree on this. It's kind of weird and would be followed by all sorts of workarounds in case of a transaction failure. The general intention of PyParallel seems to be interesting. It also is all in about "built-in thread-safety", which is very nice. Trent also agrees on 'never share state'.
I am glad, we agree on this. However, just saying it's hard and keeping status quo does not help, I suppose.

On Aug 19, 2015, at 14:10, Sven R. Kunze <srkunze@mail.de> wrote:
The reason you need to deep-copy things to avoid shared data is that if you only shallow-copy things, any mutable members and elements of the thing are still shared, and therefore you still have races. Trying to change things so that immutable members and elements are "hidden" (which sounds more like copy-on-write than hiding) doesn't help anything. Except in the very simple case, where you only have one level of mutability (e.g., a list of ints), you still end up sharing mutable members and elements, and therefore you still have races. And if the only cases you care about are the simple ones, you could just shallow-copy in the first place instead of inventing something more complicated that doesn't add anything. Think about it this way: You have a tree. You want to let me work on this tree, but you want to make sure that there's no way I could change any subtree you're working on or vice-versa. Copying the whole tree solves that problem. Copying just the leaves, which is what you're effectively proposing, doesn't.
Not sure what you mean about 'worse'. Replacing Y.x is just pointer to some value. So, if I replace it with something else, it's not different/worse/better than replacing Y.x.a, right?
Well, I suppose the fact that they're both race conditions means neither one is really worse, it's just that replacing the pointer in Y.x can potentially break more code than replacing the pointer in Y.x.a. At any rate, whether you call it worse, or equivalently bad, any solution that doesn't help with Y.x isn't a solution.
So in effect, you want to preserve "a is b" even though "a == b" will likely be false, and every operation on a or one of its members will do something different from an operation on b or one of its members? Why would you want that? Implicitly deep-copying values but then hiding that fact to lie to code that tries to check whether it's sharing a value is just going to make code harder to understand.
Then these aren't shared. What's the point of pretending they are? Why not just give the second thread a deep copy of Y in the first place? You still have yet to explain why you can't just use processes when you want process semantics. The most common good reason for that is to avoid the performance cost of serializing/deep-copying large object trees. Going through and instead deep-wrapping those trees in proxies will take just as long, and then add an extra indirection cost on every single lookup, so it doesn't solve the problem at all. If you have a different problem that you think this would solve, you'll have to explain what that problem is.
OK, so you had to explicitly mark sizes as shared to create a race here, so it should be obvious where you need to add the locks. But with a tiny change, either you create implicit races, or you can't mark them, because there is no actual variable that's being shared, just a value. What happens here: 1: images = ['0.jpg', '1.jpg', '2.jpg', '3.jpg', '4.jpg'] 2: stats = {'sizes': []} 3: for image in images: 4: fork create_thumbnails(image, stats['sizes']) 5: wait # for all forks to come back 6: # no shared here 7: print('sum:', sum(stats['sizes'])) 8: 9: @io_bound 10: def create_thumbnails(image, sizes): 11: with open(image) as image_file: 12: # and so forth 13: # no shared here 14: sizes.append(100) The fact that stats is a "proxy object" wrapping a dict instead of a dict doesn't matter; the list still ends up shared, and mutated in multiple threads at the same time. To fix this, you'd need to make the proxy not just proxy stats, but also wrap its __getattr__ and __getitem__ methods in something that recursively wraps the elements they return in proxy objects and wraps their __getattr__ and __getitem__. This is what everyone has been trying to explain to you: shared variables are not an issue, shared values are.
There would be no visible difference between STM and HTM from within Python. Everything in Python is just a reference, so an atomic compare-and-swap of a pointer and counter (which we already have) is sufficient to implement STM perfectly and transparently. More powerful HTM than that might allow the PyPy JIT to optimize STM code more efficiently, but beyond seeing things speed up, you wouldn't see any difference. So I don't see why you feel you need to wait.
However, as I mentioned it earlier I am not sure whether I would really want that within the semantics of multiple threads.
Trent Nelson (PyParallel) seems to agree on this. It's kind of weird and would be followed by all sorts of workarounds in case of a transaction failure.
Those workarounds are only when you want to use STM explicitly, from within your code, to explicitly allow races, but have a simpler way of dealing with them than locks. I don't actually know if that's what you want because, again, you still haven't explained what's wrong with just using multiprocessing for your use case.
But it's only hard if you choose thread semantics (sharing) instead of process semantics (copying) in the first place. The status quo is that you can use multiprocessing when you want process semantics, but if your app really needs to be written in terms of extensive mutable shared data, you can use threading and thread semantics. Unless you can explain why that's a problem, I don't see why you think anyone should change anything. As I've said before, different people have identified different specific cases where it is a problem, and people are working on at least three different solutions to those specific problems. So, if your problem is covered by one of them, you're in luck. If it isn't, you have to explain what your problem is, and why none of their solutions will work for it, and why yours will work better. Otherwise, you're just saying "instead of a choice between thread semantics and process semantics, let's have a choice between sort-of process semantics and process semantics because this seems neat". Going back to the venerable thumbnail example, a developer today has two choices for correct code: change the code to return the size instead of appending it to an array, in which case you're no longer sharing anything, or put an explicit lock around the shared value. Your proposal doesn't affect the first solution in any way, and it makes the second solution harder to write, without buying anything for anyone. People who don't want to figure out how to do the locking already have an answer, and what's the point in trying to build something equivalent to the answer we already have without any intended improvements?

On 08/18/2015 01:27 PM, Chris Angelico wrote:
Just a thought... What if a new name type, co_mutables, is added to functions objects, and a new keyword to go with it, "mutable" to be used like "globals" and "nonlocals". And then raise an error on an attempt to bind a mutable object to a name not in co_mutables. Also, don't include co_mutables in functions scopes. That would require mutable objects to be passed as function arguments to be seen. Then only mutable objects/names explicitly passed to a new thread need to be tracked. It might be possible to turn that on with a compiler directive at the top of a module so normal python code would work normally, and thread safe python code would be limited. Would something in this direction simplify the problem? Cheers, Ron

On Aug 19, 2015, at 09:57, Ron Adam <ron3200@gmail.com> wrote:
Well, the problem can be solved with a strong enough static typing system, as multiple ML-derived languages that add mutability and/or unsafety prove. But what you're suggesting doesn't nearly strong enough, or static enough, to accomplish that. First, in this code, is i mutable or not: def spam(a): for i in a: eggs(i) And if eggs is imported from a module not marked "thread-safe", does is the call illegal, or assumed to do something that mutates i, or assumed to be safe? Also, whether eggs is from a "thread-safe" module or a normal one, how does the compiler know whether it's passing i to a new thread? And what happens if eggs stores i in a list or other mutable object and some other code mutates it later? Finally, if you only track mutability at the top level, how can you tell the compiler that a (mutable) queue of ints is thread-safe, but a queue of lists is not? And how can the compiler know which one it's looking at without doing a complete whole-program type inference?

On 08/19/2015 04:46 PM, Andrew Barnert via Python-ideas wrote:
I'll try to explain what I'm thinking, and see where this goes. The general idea is to keep mutable objects in function local only names. It's not a complete solution by it self. Some objects will still need to be managed as shared objects, but they will easy(er) to identify.
In this case 'i' is immutable as it's not marked as 'mutable'. That determines what byte code is used, (or how the byte code that is used acts).
3 13 LOAD_GLOBAL 0 (eggs) 16 LOAD_FAST 1 (i) 19 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 22 POP_TOP 23 JUMP_ABSOLUTE 7 >> 26 POP_BLOCK >> 27 LOAD_CONST 0 (None) 30 RETURN_VALUE If a compiler directive/flag for threading was set, then the interpreter/compiler may use different LOAD_FAST and GET_ITER routines that will raise an exception if "i" or "a" is mutable. (by checking a flag on the object, or other means.) If however... def eggs(i): mutable i # mutable objects clearly marked. i.scramble() def spam(a): mutable a, i for i in a: eggs(i) # eggs can't access 'a' or 'i' in it's scope. # so i needs to be passed. Instead of LOAD_FAST and STORE_FAST, it would have LOAD_MUTABLE and STORE_MUTABLE. That doesn't mean they are mutable, but that they may be, and are stored as co_mutable names that can't be accessed by a closure or non_local scope. It's not as extreme as requiring all immutable objects.
And if eggs is imported from a module not marked "thread-safe", does is the call illegal, or assumed to do something that mutates i, or assumed to be safe?
The module could be local to only that thread. That may be one way a thread can use non thread safe code safely.
Also, whether eggs is from a "thread-safe" module or a normal one, how does the compiler know whether it's passing i to a new thread?
The only objects that are both shared and mutable with this model are those passed through a thread API call explicitly. future_result = newthread(foo, ...) # <-- only these "foo, ..." # need attention.
And what happens if eggs stores i in a list or other mutable object and some other code mutates it later?
If it's all in the same thread, no problem. If it's in another thread, then it may be an issue, and those details will still need to be worked out. def main(): mutable a, f new_thread(foo, a, b, [e, f, g]) new_thread(bar, e, f, g) ... Only a and f are both shared and mutable. Or: def main(): mutable persons, people people = get_people() places = get_places() new_thread(foo, persons) new_thread(bar, persons) ... The list of persons would need all mutable items in them managed in some way. places is not mutable, and so they will be visible to foo and bar, but need no special handling. But we don't have to worry about the list of people as it's not passed or visible to new_threads. So the problem is reduced to a smaller set of relatively easy to identify objects.
The problem of nested mutable objects will still be a concern. Cheers, Ron

On 20.08.2015 04:20, Ron Adam wrote:
I had absolutely not idea what you mean when saying: "co_mutables". Reading the byte code and the examples above though, I just can agree. Thanks a lot for this. Yesterday, I used "shared" in my posts to illustrate that. "mutable" is another word for it. Best, Sven

On 08/20/2015 11:45 AM, Sven R. Kunze wrote:
When a bytecode to load an object is executed such as LOAD_FAST, it gets it's reference to the object from the function's list of names in it's code object.
3 6 LOAD_FAST 0 (x) 9 LOAD_FAST 1 (y) 12 BINARY_ADD 13 RETURN_VALUE
foo.__code__.co_varnames ('x', 'y')
LOAD_FAST 0, reads __code__.co_varnames[0] LOAD_FAST 1, reads __code__.co_varnames[1] Adding a co_mutables name list to the __code__ attribute, along with new bytecodes to access them, it would create a way to keep private local names without changing how the other bytecodes work. LOAD_MUTABLE 0, would get the first reference in __code__.co_mutables. Look for co_name, and co_freevars here to see how they relate to other byte codes. https://docs.python.org/3.5/library/dis.html#python-bytecode-instructions Hope that helps. Cheers, Ron
Yesterday, I used "shared" in my posts to illustrate that. "mutable" is another word for it.

On Thu, Aug 20, 2015 at 12:27:55PM -0400, Ron Adam wrote:
Bytes codes are implementation, not semantics: there is no part of the Python language that promises that retrieving local variables will use a byte code LOAD_FAST. That's not how IronPython or Jython work, and there is no stability guarantee that CPython will always use LOAD_FAST.
What difference would it make to have a different implemention for retrieving and setting the object? How will having this byte code "keep private local names"? -- Steve

On 08/20/2015 12:51 PM, Steven D'Aprano wrote:
Yes, but semantics needs a workable implementation at some point. The semantics is to have a way to make names (for mutable objects) in outer scopes not be visible to function defined in inner scopes. def foo(x): """ Example of hiding a mutable object from inner scopes. """ mutable items items = [1, 2, 3] def bar(y): """ can't see items here. So can't mutate it.""" return -y return [bar(y)+x for y in items] So foo is able to protect items from being mutated by functions defined in it's scope. We could use localonly instead of mutable, but in the context of threading mutable may be more appropriate. (Way too soon to decide what color to paint this bike.) It may seem like it isn't needed, because you have control over what a function has access too... ie... just don't do that. But when you have many programmers working on large projects, things can get messy. And this helps with that, but also helps in the case of threads.
If someone wanted to abuse it, they could. But that is true for many other areas of Python. Just as declaring a name as global or nonlocal changes which co_xxxx attribute a name reference is in, declaring it mutable would do the same. And just as the compiler generates different bytecode for global and nonlocal, it would generate different bytecode in this case too. That bytecode, LOAD_MUTABLE, (and STORE_MUTABLE) would always look in the co_mutables list for it's references. Just as the other bytecodes look in certain lists for their references. So it using an implementation consistent with how the other names are referenced. The important point of that is the name *won't* be in co_names, co_freenames, or co_cellvars. While it may be possible to do without using a new name list in __code__, that would require keeping track of which names are local only, and which one are local but visible in the scope for functions defined under that function, in some other way. As I said it's a partial solution. Shared & mutable names passed as function arguments will still need to be protected But I thought the idea was interesting enough to post.by locks or some other means in threads, but they will be easy to identify and other mutable objects that are only used locally in a function won't need additional protections as they won't be visible outside of that function. (without abusing the code object, or introspecting it.) To describe this further would probably require me to actually attempt to write a patch. I'm not sure I'm up to that on my own right now. Cheers, Ron

On Aug 20, 2015, at 13:02, Ron Adam <ron3200@gmail.com> wrote:
The only case you're helping with here is the case where the race is entirely local to one function and the functions it defines--a relatively uncommon case that also gets the least messy and is the easiest to spot and debug. Also, the "dangerous" cases are already marked today: the local function has to explicitly declare the variable nonlocal or it can't assign to it.
As a side note, closure variables aren't accessed by LOAD_FAST and SAVE_FAST from either side; that's what cellvars and freevars are for. So, your details don't actually work. But it's not hard to s/fast/cell/ and similar in your details and understand what you mean. But I don't see why you couldn't just implement the "mutable" keyword to mean that the variable must be in names rather than cellnames or freenames or varnames (raising a compile-time error if that's not possible) and just continue using *_FAST on them. That would be a lot simpler to implement. It's also a lot simpler to explain: declaring a variable mutable means it can't participate in closures.
As I said it's a partial solution. Shared & mutable names passed as function arguments will still need to be protected
The problem here is the same as in Sven's proposal: the problem is shared values, not shared variables, so any solution that just tries to limit shared variables is only a vanishingly tiny piece of the solution. It doesn't do anything for mutable values passed to functions, or returned or yielded, or stored on self, or stored in any other object's attributes or in any container, or even in globals. It also doesn't prevent you from mutating them by calling a method (including __setattr__ and __setitem__ and the various __i*__ methods as well as explicit method calls). And, even if you managed to solve all of those problems, it still wouldn't be useful, because it doesn't do anything for any case where you share a member or element of the object rather than the object itself--e.g., if I have a dict mapping sockets to Connection objects, marking the dict unshareable doesn't protect the Connection objects in any way.

On 08/20/2015 05:26 PM, Andrew Barnert via Python-ideas wrote:
But you can mutate it, so it's not already marked today.
Yes... thats what I mean. ;-)
Sounds good to me.
The problem is shared mutable values. One thing I've been wondering is what is considered a mutable object in an practical sense. For example a tuple is not mutable, but it's items may be. But it seems to me it should still be considered a mutable object if it has any mutable sub items in it. So a test for mutability needs to test content. It probably needs another word or description. Completely immutable? (?) It seems like there should already be a word for an object that has no mutable sub items or parts. It also seems like there should be an easy way to test an object for that.
Returned mutable values shouldn't be a problem unless the function reuses the same object over again. But yes it doesn't solve those cases.
Nonlocal doesn't prevent that. Not being able to get to it does. For small programs that aren't nested deeply it usually not a problem. Just don't reuse names and be sure to initialize values locally.
That would be the part that this won't address, you would still need to either deep copy or lock all the shared mutable locations within that dict including the items and any attributes. Not only the items passed and returned, but for all mutable values that can seen by any functions in threads. This last part was what I was thinking could be made easier.. is it possible to reduce how much is seen, and so reduce the amount of locks. I think it was an interesting discussion and will help me think about these things in the future, but I'm not sure it would do what I was thinking at the start. Cheers, Ron

On Aug 20, 2015, at 16:57, Ron Adam <ron3200@gmail.com> wrote:
But your solution does nothing to stop values from being mutated, only to stop variables from being reassigned, so you haven't fixed anything.
Yes, that problem already exists for hashabllity (which, in Python, is strongly connected to immutability). Calling hash() on a tuple may still raise an exception of one of its elements is mutable. The reason this is rarely a problem is that it's not hard to just avoid storing mutable values in tuples that are used as dict keys. But technically, the problem is there. In your case, the same doesn't help, because it _is_ hard to avoid storing mutable values in everything in the world except specially-marked local variables.
But it seems to me it should still be considered a mutable object if it has any mutable sub items in it. So a test for mutability needs to test content. It probably needs another word or description. Completely immutable? (?) It seems like there should already be a word for an object that has no mutable sub items or parts. It also seems like there should be an easy way to test an object for that.
Only by recursing through the entire thing, as calling hash() does. In a statically-typed language, it's a different story, because if you declare, or the compiler infers, that you've got, say, a 3-tuple of nothing but ints or other 3-tuples of the same type or frozensets of the same type, then any value that it holds must be recursively immutable. (Although that can get tricky in practice, of course. Imagine writing that ADT manually, or reading it.) But that doesn't help for Python. The only obvious efficient way I can think of to solve this is in Python what I said before: having an entirely separate heap of objects, so you know that the objects in that thread can't be part of any objects in another thread. In other words, process semantics, which we already have. Maybe there are other ways to solve it, but ignoring it definitely doesn't count as solving it.
Sure it would: connections.append(connect()) with Pool() as pool: for _ in range(10): pool.apply_async(broadcast, connections, "warning") Now you've got 10 threads that all want to mutate the same object returned by connect(), even though it isn't stored in a variable.
But yes it doesn't solve those cases.
Right, it only solves an uncommon case, which also happens to be the simplest to detect and debug.
That's the whole problem: reusing names isn't the issue, it's different parts of the program having the same values under different names (or just using them without binding them to names at all). That's where races come from in real-life programs. And most complex programs are not deeply nested (they're wide, not tall), so that isn't the problem either.
If you want to take this idea further, look at what you could do with a two-level store. For example, in Oz (slightly oversimplifying and distorting things), names are bound to variables, and lists hold variables, and so on, while variables hold values. You can also implement this in something like C++, if you only store shared_ptr<T> values rather than storing T values directly (at least whenever T is mutable). I think you could build something around declaring the sharedness of the variables, rather than the names. Would that be sufficient without transitive static typing? I'm not sure, but it might be worth thinking through. But I don't think that would lead to anything useful for Python. Another possibility is to look at ways to move rather than copy values. That doesn't solve everything, but a few years of experience with C++11 seems to show that it can solve many real-world problems. (Of course this assumes the existence of collections that you can only move, not copy, things into. Or maybe just using a move-to-queue API and not directly accessing the collection underneath it?) There might be something doable for Python there.

On Thu, Aug 20, 2015 at 04:02:52PM -0400, Ron Adam wrote:
The semantics is to have a way to make names (for mutable objects) in outer scopes not be visible to function defined in inner scopes.
Why would you want that? This is a serious question -- there is already an easy way to ensure that two functions can't see the other's variables: *don't* nest them. The reason for nesting them is to ensure that the inner function *can* see the outer's variables. That's the whole point of nesting. So instead of this:
we can write this, with no new keyword, and it will work today: def foo(x): items = [1, 2, 3] return [bar(y)+x for y in items] def bar(y): """can't see items here. So can't mutate it.""" return -y The only reason to nest bar inside foo is if you want bar to have access to foo's namespace.
So foo is able to protect items from being mutated by functions defined in it's scope.
But functions defined inside foo are under foo's control. They are part of foo. foo can mutate items, say by calling items.append(1); why do you think it matters whether the call to append comes from inside a subfunction or not? Either way, it is still inside foo and part of foo's responsibility. The danger comes, not from the inside of foo, but from the outside of foo. foo has no way of knowing whether some other function, let's call it spam, has access to the *object* items (not the name!) and is mutating it. That is a real danger in threaded programming, but your proposal does nothing to protect against it. The danger comes from shared mutable state, not nested namespaces.
At the point that you're worried about a single function being so complicated or big that developers might accidentally mutate a value inside that function, worrying about nested functions is superfluous: def spam(obj): obj.mutate() def foo(): obj = something_mutable() # Don't mutate it! def inner(): obj.mutate() # masses of code # more masses of code # even more code still obj.mutate() # No protection offered against this spam(obj) # or this inner() # but this is protected against Why bother singling out such an unlikely and specific source of problems? -- Steve

On 08/20/2015 10:57 PM, Steven D'Aprano wrote:
Ok, I'm convinced it's wouldn't do what I initially was thinking. It could possibly offer some benefits to catch some programming errors, but not enough, and would not help with threads. Hmmm... I think maybe I mixed up some dynamic scope behaviour with static scope in my initial thoughts. That would be quite different, but not python. (No, don't explain further, it was a mistake on my part, as I know the difference.) Oh, and thanks to you and Andrew for the feedback, even though it didn't go anywhere. Cheers, Ron

On 8/4/2015 5:03 PM, Sven R. Kunze wrote:
Not true. The language clearly defines when each step happens. The a.__add__ method is called,
a.__iadd__, if it exists. https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types
https://docs.python.org/3/reference/simple_stmts.html#augmented-assignment-s... -- Terry Jan Reedy

On Aug 1, 2015, at 10:36, Sven R. Kunze <srkunze@mail.de> wrote:
There's a whole separate thread going on about making it easier to understand the distinctions between coroutine/thread/process, separate tasks/pools/executors, etc. There's really no way to take that away from the programmer, but Python (and, more importantly, the Python docs) could do a lot to make that easier. Your idea of having a single global "pool manager" object, where you could submit tasks and, depending on how they're marked, they get handled differently might have merit. But that's something you could build pretty easily on top of concurrent.futures (at least for threads vs. processes; you can add in coroutines later, because they're not quite as easy to integrate), upload to PyPI, and start getting experience with before trying to push it into the stdlib, much less the core language. (Notice that Greg Ewing had a proposal a few years ago that was very similar to the recent async/await change, but he couldn't sell anyone on it. But then, after extensive experience with the asyncio module, first as tulip on PyPI and then added to the stdlib, the need for the new syntax became more obvious to everyone, and people--including Guido--who had rejected Greg's proposal out of hand enthusiastically supported the new proposal.)
The available resources rarely change at runtime. If you're doing CPU-bound work, the number of cores is unlikely to change during a run. (In rare cases, you might want to sometimes count hyperthreads as separate cores and sometimes not, but that would depend on intimate knowledge of the execution characteristics of the tasks you're submitting in two different places.) Similarly, if you're doing threads, the ideal pool size usually depends more on what you're waiting for than on what you're doing--12 threads may be great for submitting URLs to arbitrary servers on the internet, 4 threads may be better for submitting to a specific web service that you've configured to match, 16 threads may be better for a simulation with 2^n bodies, etc Sometimes these really do need to grow and shrink configurably--not during a run, but during a deployment. In that case, you should store them in a config file rather than hard coding them. Then your sysadmin/deploy manager/whatever can learn how to test and configure them. For a real-life example (although not in Python), I know Level3 configured their video servers to use 4 processes of 4 threads per machine, while Akamai used 1 process of 16 threads (actually 2, but the second only for failover, not used live). Why? I have no idea, but presumably they tested the software with their machines and their networks and came to different results, and it's a good thing their software allowed them to configure it so they could each save that 1.3% heat or whatever it was they were trying to optimize.
3) Pool Management in General
There is a reason why I hesitate to explicitly manage pools. Our code runs on a plethora of platforms ranging from few to many hardware threads. We actually do not want to integrate platform-specific properties right into the source. The point of having parallelism and concurrency is to squeeze out more of the machines and get better response times. Anything else wouldn't be honest in my opinion (besides from researching and experimenting).
Which is exactly why some apps should expose these details to the sysadmin as configuration variables. Hiding the details inside the interpreter would make that harder, not easier.
Thus, a practical solution needs to be simple and universal. Explicitly setting the size of the pool is not universal and definitely not easy.
If you want universal and easy, the default value is the number of CPUs, which is often the best value to use. When you don't need to manually configure things to squeeze out the last few %, just rely on the defaults. When you do need to, it should be as easy as possible. And that's the way things currently are.
This only allows you to wait on everything to finish, or nothing at all. Very often, you want to wait on things in whatever order they come in. Or wait until the first task has finished. Or wait on them in the order they were submitted (which still allows you to get some pipelining over waiting on all). This is a well-known problem, and the standard solution across many languages is futures. The concurrent.futures module and the asyncio module are both designed around futures. You can explicitly wait on a future, or chain further operations onto a future--and, more importantly, you can compose futures into various kinds of group-waiting objects (wait for all, wait for any, wait for all or until first error, wait in any order, wait in specified order) that are themselves futures. If you want to try to collapse futures into syntax, you need something that still retains all of the power of futures. A single keyword isn't going to do that. Also, note that await is already a keyword in Python; it's used to explicitly block until another coroutine is ready. In other words, it's a syntactic form of the very simplest way to use futures (and note that, because futures are composable, anything can ultimately be reduced to "block until this one future is ready"). The reason the thread/process futures don't have such a keyword is that they don't need one; just calling a function blocks on it, and, because threads and processes are preemptive rather than cooperative, that works without blocking any other tasks. So, instead of writing "await futures.wait(iterable_of_futures, where=FIRST_EXCEPTION)" you just write the same thing without "await" and it already does what you want.
Futures already take care of this. They automatically transport exceptions (with stack traces) across the boundary to reraise where they're waited for.
In this code, your += isn't inside a "fork", so there's no way the implementation could know that you want it delayed. What you're asking for here is either implicit lazy evaluation, contagious futures, or dataflow variables, all of which are much more radical changes to the language than just adding syntactic sugar for explicit futures.

On 02.08.2015 02:02, Andrew Barnert wrote:
You mean something like this? https://pypi.python.org/pypi/xfork

On Aug 3, 2015, at 10:11, Sven R. Kunze <srkunze@mail.de> wrote:
Did you just write this today? Then yes, that proves my point about how easy it is to write it. Now you just have to get people using it, get some experience with it, etc. and you can come back with a proposal to put something like this in the stdlib, add syntactic support, etc. that it will be hard for anyone to disagree with. (Or to discover that it has flaws that need to be fixed, or fundamental flaws that can't be fixed, before making the proposal.) One quick comment: from my experience (mostly with other languages that are very different from Python, so I can't promise how well it applies here...), implicit futures without implicit laziness or even an explicit delay mechanism are not as useful as they look at first glance. Code that forks off 8 Fibonacci calls, but waits for each one's result before forking off the next one, might as well have just stayed sequential. And if you're going to use the result by forking off another job, then it's actually more convenient to use explicit futures like the ones in the stdlib. One slightly bigger idea: If you really want to pursue your implicit-as-possible design further, you might want to consider making the decorators replace the function with an object whose __call__ method just implicitly submits it to the pool. Then you can use normal function-calling syntax and pretend everything is magic. You can even add operator dunder methods to your future class that do the same thing (so "result * 2" just builds a new future out of "self.get() * 2", either submitted to the pool, probably better, tacked on as an add_done_callback). I think there's a limit to how far you can push this without some mechanism to mark when you need to actual value (in ML-derived languages and C++, static types make this easier: a cast, implicit or explicit, forces a wait; in Python, that doesn't work), but it might be worth exploring that limit. Or it might be better to just stop at the magic function calls and leave the futures alone.

On 04.08.2015 05:21, Andrew Barnert wrote:
I presented it today. The team members already showed interest. They also noted they like its simplicity. The missing syntax support seemed like minor issue compared to what complexity is hidden. Others admitted they knew about the existence of concurrent.futures and such but never used it due to - its complexity - AND *drum roll* the '.result()' of the future objects As it seems, it doesn't feel natural.
I added two new decorators for this. But they don't work with the @ syntax. It seems like a well-known issue of Python: _pickle.PicklingError: Can't pickle <function fib_fork at 0x7f8eaeb09730>: it's not the same object as __main__.fib_fork Would be great if somebody could fix that.
I actually like the idea of contagious futures and I might outline why this is not an issue with the current Python language. Have a look at the following small interactive Python session:
Question: When has the add operation being executed? Answer: Unknown from the programmer's perspective. Only requirement: Exceptions are raised exactly where the operation is supposed to take place in the source code (even if the operation that raises the exception is performed later). Best, Sven

On Aug 4, 2015, at 11:09, Sven R. Kunze <srkunze@mail.de> wrote:
I don't know how to put this nicely, but I think anyone who finds the complexity of concurrent.futures too daunting to even attempt to learn it should not be working on any code that uses less explicit concurrency. I have taught concurrent.futures to rank novices in a brief personal session or a single StackOverflow answer and they responded, "Wow, I didn't realize it could be this simple". Someone who can't grasp it is almost certain to be someone who introduces races all over your code and can't even understand the problem, much less debug it.
Not true. The language clearly defines when each step happens. The a.__add__ method is called, then the result is assigned to a, then the statement finishes. (Then, in the next statement, nothing happens--except, because this is happening in the interactive interpreter, and it's an expression statement, after the statement finishes doing nothing, the value of the expression is assigned to _ and its repr is printed out.) This ordering relationship may be very important if the variable a is shared by multiple threads, especially if more than one thread may modify it, especially if you're using non-atomic operations like += (where another thread can read, use, and assign the variable between the __add__ call and the assignment). If a references a mutable object with an __iadd__ method, the variable doesn't even need to be shared, only the value, for this to matter. The only way to safely ignore these problems is to never share any variables or any mutable values between threads. (This is why concurrency features are easier to design in pure functional languages.) Hiding this fact when you or the people you're hiding it from don't even understand the issue is exactly how you create races.
Only requirement: Exceptions are raised exactly where the operation is supposed to take place in the source code (even if the operation that raises the exception is performed later).

On 04.08.2015 21:38, Andrew Barnert wrote:
I am sorry because I disagree here with you.
Nobody says that concurrent.futures is not an vast improvement over previous approaches. But it is still not the end of the line of simplifications.
Nobody wants races, yet everybody still talks about them. Don't allow races in the first place and be done with it.
Where can find this definition in the docs? To me, we are talking about class customization as described on reference/datamodel.html. Seems like an implementation detail, not a language detail. I am not saying, CPython doesn't do it like that, but I saying the Python language could support lazy evaluation and not disagreeing with the docs.
Mutual variables are global variables. And these have gone out of style quite some time ago. Btw. this is races again and I thought we agreed on not having them because nobody really can/wants to debug them.

On Aug 4, 2015, at 14:03, Sven R. Kunze <srkunze@mail.de> wrote:
What does that even mean? How would you not allow races? If you let people throw arbitrary tasks at a thread pool, with no restriction on mutable shared state, you've allowed races.
No, the data model is a feature of the language, not one specific implementation. The fact that you can define classes that work the same way as builtin types like int is a fundamental feature. It's something Guido and others worked very hard on making true back in Python 2.2-2.3. It's one of the things that makes Python or C++ more pleasant to use than Tcl or Java. Any implementation that didn't do the same would not be Python, and would not run a good deal of Python code.
No. Shared values include global variables, nonlocal variables used by two closures from the same scope, attributes of objects passed to both functions, members of collections passed to both functions, etc. The existence of all of these other things is why global variables are not necessary. They have many advantage over globals, allowing you to better control how state is shared, to share it reentrantly, to make it more explicit in the code, etc. But because they have all the same benefits, they also have the exact same race problem when used to share state between threads.
Btw. this is races again and I thought we agreed on not having them because nobody really can/wants to debug them.
And how do you propose "not having them"? It's not impossible to write purely functional code that doesn't use any mutable state, in which case it doesn't matter whether your state is shared. But the fact that your example uses += proves that this isn't your intention. If you take the code from your example and run it in two threads simultaneously, you have a race. The fact that you didn't intend to create a race because you don't understand that doesn't mean the problem isn't there, it just means you have no idea you've just written buggy code and no idea how to test for it or debug it. And that's exactly the problem. What makes concurrent code with shared state hard, more than anything else, is people who don't realize what's hard about it and write code that seems to work but doesn't. Making it easier for such people to write broken code without even realizing they're doing so is not a good thing.

Hi everybody, I finally managed to implement all the tiny little details of fork that were important from my perspective (cf. https://pypi.python.org/pypi/xfork). An interesting piece of code is the iterative evaluation of OperationFuture using generators to avoid stack overflows. The only thing I am not satisfied with is exception handling. In spite of preserving the original traceback, when the ResultEvaluationError is thrown is unfortunately up to the evalutor. Maybe, somebody here has a better idea or compromise here. Co-workers proposed using function scopes as the ultimate evaluation scope. That is when a function returns a ResultProxy, it gets evaluated. However, I have absolutely no idea how to do this as I couldn't find any __returned__ hook or something. I learned from writing this module and some key insights I would like to share: 1) Pickle not working with decorated functions 2) One 'traceback' is not like another. There are different concepts in Python with the same name. 3) Tracebacks are not really first-class, thus customizing them is hard/impossible. 4) contextlib.contextmanager only creates decorators/context managers with parameters but what if you have none? @decorator() looks weird. 5) Generators can be used for operation evaluation to avoid the stack limit 6) Python is awesome: despite the above obstacles, I managed to hammer out a short and comprehensible implementation for fork. It would be great if experts here could fix 1) - 4). 1) - 3) have corresponding StackOverflow threads. @_Andrew_ I am going to address you questions shortly after this. Best, Sven ------------------------------------------------------------------------------------------------- FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT

On Aug 11, 2015, at 06:54, Sven R. Kunze <srkunze@mail.de> wrote:
Co-workers proposed using function scopes as the ultimate evaluation scope. That is when a function returns a ResultProxy, it gets evaluated. However, I have absolutely no idea how to do this as I couldn't find any __returned__ hook or something.
I'm not sure I completely understand what you're looking for here. If you just want a hook that gets called whenever a function returns, just write a decorator that calls the real function then does the hook thing: def hookify(func): @wraps def wrapper(*args, **kwargs): result = func(*args, **kwargs) do_hook_stuff() return result return wrapper (Or, if you want to hook both raising and returning, use a finally.) But I'm not sure what good that would do anyway. If you unwrap futures every time they're returned, they're not doing anything useful as futures in the first place; you might as well just return the values directly.

On 12.08.2015 05:06, Andrew Barnert wrote:
I think I found a better solution. Not functions should be the boundaries but try: blocks. Why? Because they mark the boundaries for exception handling and this is what the problem is about. I started another thread here: https://mail.python.org/pipermail/python-list/2015-August/695313.html If an exception is raised within an try: block that is not supposed to be handled there, weird things might happen (wrong handling, superfluous handling, no handling, etc.). Confining the evaluation of result proxies within the try: blocks they are created in would basically retain all sequential properties. So, plugging in 'fork' and removing it would basically change nothing (at least if you don't try anything really insane which at least is disallowed by our coding standards. ;) ) Some example ('function' here mean stack frame of a function): def b(): return 'a string' try: function: a = fork(b) a += 3 function: b = 5 b *= 4 * a except TypeError: print('damn, I mixed strings and numbers') The given try: block needs to make sure if eventually collects all exceptions that would have been raised in the sequential case. Conclusion: the approach is compromise between: 1) deferred evaluation (later is better) 2) proper exception handling (early is better) Best, Sven

Am 05-Aug-2015 16:30:27 +0200 schrieb abarnert@yahoo.com:
What does that even mean? How would you not allow races? If you let people throw arbitrary tasks at a thread pool, with no restriction on mutable shared state, you've allowed races.
Let me answer this in a more implicit way. Why do we need to mark global variables as such? I think the answer is clear: to mark side-effects (quoting the docs). Why are all variables thread-shared by default? I don't know, maybe efficiency reasons but that hardly apply to Python in the first place.
And how do you propose "not having them"?
What would happen if all shared variables were thread-local by default and need to marked as shared if desired? I think the answer would also be very clear: to mark side-effects and to have people think about it explicitly.
And that's exactly the problem. What makes concurrent code with shared state hard, more than anything else, is people who don't realize what's hard about it and write code that seems to work but doesn't.
Precisely because 'shared state' is hard, why is it the default?
Making it easier for such people to write broken code without even realizing they're doing so is not a good thing.
That argument only applies when the broken code (using shared states) is the default. As you can see, this thought experiment assumes that there could be another way to approach that situation. How and when this can be done and if at all is a completely different matter. As usual, I leave that to the experts like you to figure out. Best, Sven ------------------------------------------------------------------------------------------------- FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT

On Aug 11, 2015, at 07:33, Sven R. Kunze <srkunze@mail.de> wrote:
First, are you suggesting that your idea doesn't make sense unless Python is first modified to not have shared variables? In that case, it doesn't seem like a very useful proposal, because it applies to some different language that isn't Python. And applying it to Python instead means you're still inviting race conditions. Pointing out that in a different language those races wouldn't exist is not really an answer to that. Second, the reason for the design is that that's what threads mean, by definition: things that are like processes except that they share the same heap and other global state. What's the point of a proposal that lets people select between threads and processes if its threads aren't actually processes? Finally, just making variables thread-local wouldn't help. You'd need a completely separate heap for each thread; otherwise, just passing a list to another thread means it can modify your values. And if you make a separate heap for each thread, what happens when you do x[0]=y if x is local and y shared, or vice-versa? You could build a whole shared-memory API and/or message-passing API a la the multiprocessing module, but if that's an acceptable solution, what's stopping you from using multiprocessing in the first place? (If you're going to say "not every message can be pickled", consider how you could deep-copy an object that can't be pickled.) Of course there's no reason that you couldn't implement something that's basically a process at the abstract level, but implemented with threads at the OS level. And that could make both explicit shared memory and IPC simpler at least under the covers, and more efficient. And it could lead to a way to eliminate the GIL. And there could be other benefits as well. That's why people are exploring things like the recent subinterpreters thread, PyParallel, PyPy+STM, etc. If this were an easy problem, it would have been solved by now. (Well, it _has_ been solved for different classes of languages--pure-immutable languages can share with impunity; languages designed from ground up for message passing can get away with only message passing; etc. But that doesn't help for Python.)
And that's exactly the problem. What makes concurrent code with shared state hard, more than anything else, is people who don't realize what's hard about it and write code that seems to work but doesn't.
Precisely because 'shared state' is hard, why is it the default?
The default is to write sequential code. You have to go out of your way to use threads. And when you do, you have to intentionally choose threads over processes or some kind of microthreads. It's only when you've chosen to use shared-memory threading as the design for your app that shared memory becomes the default.
Making it easier for such people to write broken code without even realizing they're doing so is not a good thing.
That argument only applies when the broken code (using shared states) is the default.
But that is the default in Python, so your proposal would make it easier for such people to write broken code without even realizing they're doing so, so it's not a good thing.

On 12.08.2015 05:33, Andrew Barnert wrote:
My point was: 1) processes are fine (more or less) 2) threads aren't because their are hard to manage, so let's make them easier
Finally, just making variables thread-local wouldn't help. You'd need a completely separate heap for each thread; So?
At this point, talking about internal implementation hardly seems to relevant. Not exactly sure, what you mean by heap here, but I could imagine more like a overlay approach. As long as, I only read the original variable, we are fine. But setting it would require me to store the thread-local value somewhere else. I am uncertain why you are so averse about making threading easier to handle and to maintain. If you bother about 'easier', let's call it 'code works more reliably', 'code is more readable', 'code has lesser side-effects', 'code produces lesser races'. I am not talking about 100%. I am talking about 80% less places in your code where you need to worry thread-safety. That makes 20% places where you really need to. Btw. the stdlib would also benefit from this, in order to provide thread-safe modules out of the box. Not every maintainer needs to re-implement the desired thread-safety from scratch over and over again.
otherwise, just passing a list to another thread means it can modify your values.
Just depends on what you want here. I would rather see Python assuming thread-safe behavior by default whereas the programmer can actively chose a more flexible/dangerous model if needed for some small areas.
[...implementation...] what happens when you do x[0]=y if x is local and y shared, or vice-versa?
Now, we are talking. A) As soon as a single variable (x or/and y) is shared, all expressions using/writing such variables are basically unsafe. It's dangerous, you might need some locks and so forth to get it running properly. You might need extra thought to handle some weird corner cases and so forth. B) If all variables of an expression are local, everything is fine. No additional work needed. I regard case B) as the common case where you DON'T want others to mess around with your variables and you don't can do anything about it. Case A) is more like the data communication channel where threads could communicate with each other, aggregate results in a common list, and so forth. I only can image this taking place at the end of the threading part of a program where the results needs to be propagated back to the MainThread.
Of course there's no reason that you couldn't implement something that's basically a process at the abstract level, but implemented with threads at the OS level. And that could make both explicit shared memory and IPC simpler at least under the covers, and more efficient. And it could lead to a way to eliminate the GIL. And there could be other benefits as well. That's why people are exploring things like the recent subinterpreters thread, PyParallel, PyPy+STM, etc.
Yes, transactional memory would basically the term that would cover that. A thread basically gets a snapshot of the world right from the start and after it finishes, the variables get merged back. However, I am unsure whether I would want that for all variables ("shared vs local" exists also here; and I would prefer an explicit way to declare it).
The default is to write sequential code. You have to go out of your way to use threads. And when you do, you have to intentionally choose threads over processes or some kind of microthreads.
It's only when you've chosen to use shared-memory threading as the design for your app that shared memory becomes the default. I am not sure if I can follow here. If I look at the threading API of
We are talking about threading all the way long. There is no point in going back to sequential. the Python standard lib, it is shared-memory. So, it is the default, like it or not.
But that is the default in Python, so your proposal would make it easier for such people to write broken code without even realizing they're doing so, so it's not a good thing.
I am sorry? Because shared-memory is the default in Python, my proposal would make it easier for such people to write broken code? We must be talking about different proposals. Maybe, you could give an example. Just for the record, my proposal: 1) processes are almost fine 2) threads aren't, so let's make it easier to work with them Best, Sven

On Wed, Aug 19, 2015 at 2:28 AM, Sven R. Kunze <srkunze@mail.de> wrote:
Python has two completely distinct concepts that, together, make up the whole variable pool: a) Names, which live in scopes and are usually bound to objects b) Objects, which are always global and may refer to other objects Names may be built-ins, module globals, class attributes, or function locals. The latter exist on a stack, where you can access only the current function call, and all others are shadowed; also, tighter forms shadow broader forms (eg a function-local 'str' will shadow the built-in type of that name). Objects exist independently of all scopes. Names in multiple scopes can simultaneously be bound to the same name, and objects can reference other objects. Objects can never reference names (though some name bindings are implemented with dictionary lookups, cf globals() for example). So far, I think everyone on this list understands everything I've said. Nothing surprising here; and nothing that depends on a particular implementation. The notion of "a completely separate heap for each thread" is talking about part B - you need a completely separate pile of objects. And if you're going to do that, you may as well make them separate processes. There's no way to make module globals thread-local without making the module object itself thread-local, and if you do that, you probably need to make every object it references thread-local too, etc, etc, etc. Does that answer the questions? Apart from "heap" being perhaps a term of implementation, this isn't about the internals - it's about the script-visible behaviour. ChrisA

On 18.08.2015 18:55, Chris Angelico wrote:
The notion of "a completely separate heap for each thread" is talking about part B - you need a completely separate pile of objects.
Right. However, only if needed. As long as, threads only reads from a common variable, there is no need to interfere. I would except the same behavior as with class attributes and instance attributes. The latter overlay the former once they are assigned. Otherwise, the former can be read without an issue.
And if you're going to do that, you may as well make them separate processes.
Consulting the table elaborated on the other thread "Concurrency Modules", that is not entirely true. I agree, behavior-wise, processes behave almost as desired (relevant data is copied over and there are no shared variables). However, regarding the cpu/memore/communication footprint for a new process (using spawn) is enormous compared to a thread. So, threading still have its merits (IMHO).
Something wrong with that? Shouldn't matter as long as there is only a single thread.
Yep, thanks a lot, Chris. :) Best, Sven

On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze@mail.de> wrote:
Sure, but as soon as you change something, you have to thread-local-ify it. So I suppose what you would have is three separate pools of objects: 1) Thread-specific objects, which are referenced only from one thread. These can be read and written easily and cheaply. 2) Global objects which have never been changed in any way since threading began. These can be read easily, but if written, must be transformed into... 3) Thread-local objects, which exist for all threads, but are different. The id() of such an object depends on which thread is asking. Conceptually, all three types behave the same way - changes are visible only within the thread that made them. But the implementation could have these magic "instanced" objects for only those ones which have actually been changed, and save a whole lot of memory for the others.
So really, you're asking for process semantics, with some optimizations to take advantage of the fact that most of the processes are going to be just reading and not writing. That may well be possible, using something like the above three-way-split, but I'd want the opinion of someone who's actually implemented something like this - from me, it's just "hey, wouldn't this be cool".
As long as there's only a single thread, there's no difference between process-wide and thread-local. Once you start a second thread, something needs to know what objects belong where. That's all. ChrisA

On 18.08.2015 19:27, Chris Angelico wrote: > On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze@mail.de> wrote: >> On 18.08.2015 18:55, Chris Angelico wrote: >>> The notion of "a completely separate heap for each thread" is talking >>> about part B - you need a completely separate pile of objects. >> >> Right. However, only if needed. As long as, threads only reads from a common >> variable, there is no need to interfere. > Sure, but as soon as you change something, you have to > thread-local-ify it. So I suppose what you would have is three > separate pools of objects: > > 1) Thread-specific objects, which are referenced only from one thread. > These can be read and written easily and cheaply. > 2) Global objects which have never been changed in any way since > threading began. These can be read easily, but if written, must be > transformed into... > 3) Thread-local objects, which exist for all threads, but are > different. The id() of such an object depends on which thread is > asking. > > Conceptually, all three types behave the same way - changes are > visible only within the thread that made them. But the implementation > could have these magic "instanced" objects for only those ones which > have actually been changed, and save a whole lot of memory for the > others. Indeed. I think that is sensible approach here. Speaking of an implementation though, I don't know where I would start when looking at CPython. Thinking more about id(). Consider a complex object like an instance of a class. Is it really necessary to deep copy it? It seems to me that we actually just need to hide the atomic/immutable values (e.g. strings, integers etc.) of that object. The object itself can remain the same. # first thread class X: a = 0 class Y: x = X #thread spawned by first thread Y.x.a = 3 # should leave id(X) and id(Y) alone Maybe, that example is too simple, but I cannot think of an issue here. As long as the current thread is the only one being able to change the values of its variables, all is fine. >> I agree, behavior-wise, processes behave almost as desired (relevant data is >> copied over and there are no shared variables). >> >> However, regarding the cpu/memore/communication footprint for a new process >> (using spawn) is enormous compared to a thread. So, threading still have its >> merits (IMHO). > So really, you're asking for process semantics, with some > optimizations to take advantage of the fact that most of the processes > are going to be just reading and not writing. That may well be > possible, using something like the above three-way-split, but I'd want > the opinion of someone who's actually implemented something like this > - from me, it's just "hey, wouldn't this be cool". If you put it this way, maybe yes. I also look forward to more feedback on this. To me, a process/thread or any other concurrency solution, is basically a function that I can call but runs in the background. Later, when I am ready, I can collect its result. In the meantime, the main thread continues. (Again) to me, that is the only sensible way to approach concurrency. When I recall the details of locks, semaphores etc. and compare it to what real-world applications really need... You can create huge tables of all the possible cases that might happen just in order to find out that you missed an important one. Even worse, as soon as you change something about your program, you are doomed to redo the complete case analysis, find a dead/live-lock-free solution and so forth. It's a time sink; costly and dangerous from a company's point of view. Best, Sven

On 18 August 2015 at 21:32, Sven R. Kunze <srkunze@mail.de> wrote:
It seems to me that this is accurate, but glosses over all of the issues that result in multiple solutions being needed. Sure, all concurrency solutions provide this. But the difference lies in the environment of the function you're calling. Does it have access to the same non-local name bindings as it would if run in the foreground? To the same objects? Is it able to write to those objects safely, or must it treat them as read only. Or can it write, but only if it follows a particular protocol (semaphores, locks, etc fit here)? If you reduce the functionality you're considering to the lowest common denominator, then all solutions look the same, in essence by definition (that's basically what lowest common denominator means). But you haven't actually solved any real-world problems by doing so. Conversely, it *is* true that a lot of problems that benefit from concurrency can work with the minimal guarantees of a lowest common denominator solution (no shared state pure functions). Functional programming has shown us that. For those problems, any of the options are fine, and the decision gets made on other factors (most likely performance, as each solution makes different performance trade-offs in the process of providing whatever extra guarantees they make). I'm confused as to what your point is. "People should write concurrent code in a no shared state pure function manner" seems to be what your comment "the only sensible way" implies. If so, then fine that's your opinion, but others differ and Python caters for those people as well. If, on the other hand, you accept the need for shared state (even if it's just I/O) then discounting the constraints that such shared state implies seems either naive or simply wrong. Or I'm missing something, but I can't see what it is. Paul

On 18.08.2015 23:53, Paul Moore wrote:
I can identify 2 common patterns I label as jobs and servers. Jobs are things that get delegated out to some background process/thread/coroutine/subinterpreter. They come back when the job is done. No shared state necessary. Servers are more like while true loops running in some separate process/thread/coroutine/subinterpreter. They only return on shutdown or so. Shared state a là queues for input/output could come in handy. Maybe, there is more to discover but it's more like research that production.
I am uncertain of how to approach this in the correct way. However, my approach here would be simply by imitating sequential behavior. I always would ask: "What would happen if that is executed sequentially?" The answer would then be retrofitted to the parallel scenario.
I hope eventually, the interpreter will take the decision burden away from the developers and make educated guesses to achieve the best performance.
I think you are referring to that statement only. It's just the underlaying motivation for me to engage in this discussion and it's born by observing real-world code development and maintenance. If I were to ask people around the globe of what they use in production, I guess I would get answers like that (that'll be an awesome survey btw.): 30% declaration - you say what you want (cfg files, decorators, sql, css, html, etc.) 20% imperative - you say what to do (function call hierarchies) 25% object oriented - you think in "objects" (object relationship tree) 15% packages/modules - you cluster functionality - (file/directory hierarchies) 10% magic stuff - you never get it right (generators, concurrency, meta-classes, import hooks, AST, etc.) Just look at big projects. Look at different programming languages. All the same. Maybe, my observation is wrong; that could be. Maybe, the observation is true and people are just dumb, lazy and unable to appreciate a fine 10 hours live-lock bug hunting session (but I doubt that). Most professionals, I am working with, are highly intelligent people working on very complex problems and with a very few resources. Thus, they don't appreciate tools making their lives more difficult by changing the problem's domain from complex to complicated. As a result, they are not going to use it at all. Make it like the top 90% of what people are used to and they are going to use it. Point is not, people should do this or that. Point is, the tools should make it stupidly easy to get things done. People then will follow and do this or that automatically. Let's improve the tools. Best, Sven

On Thu, Aug 20, 2015 at 12:01:12AM +0200, Sven R. Kunze wrote:
Point is, the tools should make it stupidly easy to get things done.
"The problem with this is that we will have done what humans often do, which is to use technology to make things easier while missing an opportunity to make them significantly better." -- Rory Sutherland and Glen Weyl If Python becomes everything that you want from your proposal, how will it be *better* rather than just easier? -- Steve

On 20.08.2015 02:18, Steven D'Aprano wrote:
"I am uncertain why you are so averse about making threading easier to handle and to maintain. If you bother about 'easier', let's call it 'code works more reliably', 'code is more readable', 'code has lesser side-effects', 'code produces lesser races'." -- Sven R. Kunze This quoted: what is your definition of *better*? Best, Sven PS: there is no such thing as **better* (at least from my perspective). It all comes down to a personal definition. Watching https://vimeo.com/79539317 , I can tell that there much more potential for improvements under the hoods. However, the public API for the "end developers" should be made and stay as simple as possible. Just for the sake of "getting things done".

On Aug 18, 2015, at 13:32, Sven R. Kunze <srkunze@mail.de> wrote: > >> On 18.08.2015 19:27, Chris Angelico wrote: >>> On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze@mail.de> wrote: >>>> On 18.08.2015 18:55, Chris Angelico wrote: >>>> The notion of "a completely separate heap for each thread" is talking >>>> about part B - you need a completely separate pile of objects. >>> >>> Right. However, only if needed. As long as, threads only reads from a common >>> variable, there is no need to interfere. >> Sure, but as soon as you change something, you have to >> thread-local-ify it. So I suppose what you would have is three >> separate pools of objects: >> >> 1) Thread-specific objects, which are referenced only from one thread. >> These can be read and written easily and cheaply. >> 2) Global objects which have never been changed in any way since >> threading began. These can be read easily, but if written, must be >> transformed into... >> 3) Thread-local objects, which exist for all threads, but are >> different. The id() of such an object depends on which thread is >> asking. >> >> Conceptually, all three types behave the same way - changes are >> visible only within the thread that made them. But the implementation >> could have these magic "instanced" objects for only those ones which >> have actually been changed, and save a whole lot of memory for the >> others. > > Indeed. I think that is sensible approach here. Speaking of an implementation though, I don't know where I would start when looking at CPython. > > Thinking more about id(). Consider a complex object like an instance of a class. Is it really necessary to deep copy it? It seems to me that we actually just need to hide the atomic/immutable values (e.g. strings, integers etc.) of that object. Why wouldn't hiding the mutable members be just as necessary? In your example, if I can replace Y.x, isn't that even worse than replacing Y.x.a? > The object itself can remain the same. What does it mean for an object to be "the same" if it potentially holds different values in different threads. > # first thread > class X: > a = 0 > class Y: > x = X > > #thread spawned by first thread > Y.x.a = 3 # should leave id(X) and id(Y) alone OK, but does the second thread see 0 or 3? If the former, then these aren't shared objects at all. If the latter, then that's how things already work. > Maybe, that example is too simple, but I cannot think of an issue here. As long as the current thread is the only one being able to change the values of its variables, all is fine. No. If other threads can see those changes, it's still a problem. They can see things happening out of order, see objects in inconsistent intermediate states, etc.--all the problems cause by races are still there. >>> I agree, behavior-wise, processes behave almost as desired (relevant data is >>> copied over and there are no shared variables). >>> >>> However, regarding the cpu/memore/communication footprint for a new process >>> (using spawn) is enormous compared to a thread. So, threading still have its >>> merits (IMHO). >> So really, you're asking for process semantics, with some >> optimizations to take advantage of the fact that most of the processes >> are going to be just reading and not writing. That may well be >> possible, using something like the above three-way-split, but I'd want >> the opinion of someone who's actually implemented something like this >> - from me, it's just "hey, wouldn't this be cool". > > If you put it this way, maybe yes. I also look forward to more feedback on this. Have you looked into the subinterpreters project, the PyParallel project, or the PyPy-STM project, all of which, as I mentioned earlier, are possible ways of getting some of the advantages of process semantics without all of the performance costs? (Although none of them are exactly that, of course.) > To me, a process/thread or any other concurrency solution, is basically a function that I can call but runs in the background. Later, when I am ready, I can collect its result. In the meantime, the main thread continues. (Again) to me, that is the only sensible way to approach concurrency. When I recall the details of locks, semaphores etc. and compare it to what real-world applications really need... You can create huge tables of all the possible cases that might happen just in order to find out that you missed an important one. Yes, that is the problem that makes multithreading hard in the first place (except in pure functional languages). If the same value is visible in two threads, and can be changed by either of those threads, you have to start thinking either about lock discipline, or about ordering atomic operations; either way, things get very complicated very fast. A compromise solution is to allow local mutable objects, but not allow them to be shared between threads; instead, you provide a way to (deep-)copy them between threads, and/or to (destructively) move them between threads. You can do that syntactically, as with the channel operators used by Erlang and the languages it's inspired, or you can do it purely at a semantic level, as with Python's multiprocessing library; the effect is the same: process semantics, or message-passing semantics, or whatever you want to call it gives you the advantages of immutable threading in a language with mutability. > Even worse, as soon as you change something about your program, you are doomed to redo the complete case analysis, find a dead/live-lock-free solution and so forth. It's a time sink; costly and dangerous from a company's point of view. This is an argument for companies to share as little mutable state as possible across threads. If you don't have any shared state at all, you don't need locks or other synchronization mechanisms at all. If you only have very limited and specific shared state, you have very limited and hopefully simple locking, which is a lot easier to keep track of. And you can already do this today, using multiprocessing. It's an obvious and explicit way to ask for process semantics. If you're not using it, you have to explain why you can't use it, and why you think rebuilding the same semantics on top of threads would solve your problem. There are possible answers to that. Some projects need a better "unsafe escape hatch" for sharing than either raw shared memory or proxy-manager protocols can provide; for some, there may be a specific performance bottleneck that could in theory be avoided but in practice the current design makes it impossible; etc. None of these are very common, but they do exist. If you're running into a specific one, we should be looking for ways to characterize and then solve that specific problem, not trying to rebuild what we already have and hope that this time the problem doesn't come up.

On 19.08.2015 04:09, Andrew Barnert wrote:
Not sure what you mean about 'worse'. Replacing Y.x is just pointer to some value. So, if I replace it with something else, it's not different/worse/better than replacing Y.x.a, right?
The object itself can remain the same. What does it mean for an object to be "the same" if it potentially holds different values in different threads.
I was talking about the id(...) and deep copying.
before (*) first thread sees: Y.x.a == 0 thread spawned by first thread sees: Y.x.a = 0 after (*) first thread sees: Y.x.a == 0 thread spawned by first thread sees: Y.x.a = 3 If that wasn't clear, we were talking about the preferred 'process-like' semantics. Just assume for a moment, that, by default, Python would wrap up variables (as soon as they are shared across 2 or more threads) like this (self is the variable): class ProxyObject: def __init__(self, variable) self.__original__ = variable self.__threaded__ = threading.local() def __proxy_get__(self): return getattr(self.__threaded__, 'value', self.__original) def __proxy_set__(self, value): self.__threaded__.value = value I think you get the idea; it should work like descriptors. Basically, descriptors for general access on a variable and not for classes only => proxy objects. Is there something like that Python? That would vastly simplify the implementation of xfork, btw. So, to get you some example (still assuming the behavior described above), I abuse our venerable thumbnails. Let's calculate the total sum of the thumbnail bytes created: 1: images = ['0.jpg', '1.jpg', '2.jpg', '3.jpg', '4.jpg'] 2: sizes = [] 3: for image in images: 4: fork create_thumbnails(image, sizes) 5: wait # for all forks to come back 6: shared sizes 7: print('sum:', sum(sizes)) 8: 9: @io_bound 10: def create_thumbnails(image, sizes): 11: with open(image) as image_file: 12: # and so forth 13: shared sizes 14: sizes.append(100) Here, you can see what I meant by explicitly stating that we enter dangerous space: the keyword "shared" in lines 5 and 13. It basically removes the wrapper described above and reveals the dangerous/shared stated of the object (like 'global'). So, both functions needs to agree to remove the veil and thus to be able to read/modify the shared state. shared x translates to: x = x.__original__
Have you looked into the subinterpreters project, the PyParallel project, or the PyPy-STM project, all of which, as I mentioned earlier, are possible ways of getting some of the advantages of process semantics without all of the performance costs? (Although none of them are exactly that, of course.)
Yes, I did. STM is nice as a proof of concept, waiting for HTM. However, as I mentioned it earlier I am not sure whether I would really want that within the semantics of multiple threads. Trent Nelson (PyParallel) seems to agree on this. It's kind of weird and would be followed by all sorts of workarounds in case of a transaction failure. The general intention of PyParallel seems to be interesting. It also is all in about "built-in thread-safety", which is very nice. Trent also agrees on 'never share state'.
I am glad, we agree on this. However, just saying it's hard and keeping status quo does not help, I suppose.

On Aug 19, 2015, at 14:10, Sven R. Kunze <srkunze@mail.de> wrote:
The reason you need to deep-copy things to avoid shared data is that if you only shallow-copy things, any mutable members and elements of the thing are still shared, and therefore you still have races. Trying to change things so that immutable members and elements are "hidden" (which sounds more like copy-on-write than hiding) doesn't help anything. Except in the very simple case, where you only have one level of mutability (e.g., a list of ints), you still end up sharing mutable members and elements, and therefore you still have races. And if the only cases you care about are the simple ones, you could just shallow-copy in the first place instead of inventing something more complicated that doesn't add anything. Think about it this way: You have a tree. You want to let me work on this tree, but you want to make sure that there's no way I could change any subtree you're working on or vice-versa. Copying the whole tree solves that problem. Copying just the leaves, which is what you're effectively proposing, doesn't.
Not sure what you mean about 'worse'. Replacing Y.x is just pointer to some value. So, if I replace it with something else, it's not different/worse/better than replacing Y.x.a, right?
Well, I suppose the fact that they're both race conditions means neither one is really worse, it's just that replacing the pointer in Y.x can potentially break more code than replacing the pointer in Y.x.a. At any rate, whether you call it worse, or equivalently bad, any solution that doesn't help with Y.x isn't a solution.
So in effect, you want to preserve "a is b" even though "a == b" will likely be false, and every operation on a or one of its members will do something different from an operation on b or one of its members? Why would you want that? Implicitly deep-copying values but then hiding that fact to lie to code that tries to check whether it's sharing a value is just going to make code harder to understand.
Then these aren't shared. What's the point of pretending they are? Why not just give the second thread a deep copy of Y in the first place? You still have yet to explain why you can't just use processes when you want process semantics. The most common good reason for that is to avoid the performance cost of serializing/deep-copying large object trees. Going through and instead deep-wrapping those trees in proxies will take just as long, and then add an extra indirection cost on every single lookup, so it doesn't solve the problem at all. If you have a different problem that you think this would solve, you'll have to explain what that problem is.
OK, so you had to explicitly mark sizes as shared to create a race here, so it should be obvious where you need to add the locks. But with a tiny change, either you create implicit races, or you can't mark them, because there is no actual variable that's being shared, just a value. What happens here: 1: images = ['0.jpg', '1.jpg', '2.jpg', '3.jpg', '4.jpg'] 2: stats = {'sizes': []} 3: for image in images: 4: fork create_thumbnails(image, stats['sizes']) 5: wait # for all forks to come back 6: # no shared here 7: print('sum:', sum(stats['sizes'])) 8: 9: @io_bound 10: def create_thumbnails(image, sizes): 11: with open(image) as image_file: 12: # and so forth 13: # no shared here 14: sizes.append(100) The fact that stats is a "proxy object" wrapping a dict instead of a dict doesn't matter; the list still ends up shared, and mutated in multiple threads at the same time. To fix this, you'd need to make the proxy not just proxy stats, but also wrap its __getattr__ and __getitem__ methods in something that recursively wraps the elements they return in proxy objects and wraps their __getattr__ and __getitem__. This is what everyone has been trying to explain to you: shared variables are not an issue, shared values are.
There would be no visible difference between STM and HTM from within Python. Everything in Python is just a reference, so an atomic compare-and-swap of a pointer and counter (which we already have) is sufficient to implement STM perfectly and transparently. More powerful HTM than that might allow the PyPy JIT to optimize STM code more efficiently, but beyond seeing things speed up, you wouldn't see any difference. So I don't see why you feel you need to wait.
However, as I mentioned it earlier I am not sure whether I would really want that within the semantics of multiple threads.
Trent Nelson (PyParallel) seems to agree on this. It's kind of weird and would be followed by all sorts of workarounds in case of a transaction failure.
Those workarounds are only when you want to use STM explicitly, from within your code, to explicitly allow races, but have a simpler way of dealing with them than locks. I don't actually know if that's what you want because, again, you still haven't explained what's wrong with just using multiprocessing for your use case.
But it's only hard if you choose thread semantics (sharing) instead of process semantics (copying) in the first place. The status quo is that you can use multiprocessing when you want process semantics, but if your app really needs to be written in terms of extensive mutable shared data, you can use threading and thread semantics. Unless you can explain why that's a problem, I don't see why you think anyone should change anything. As I've said before, different people have identified different specific cases where it is a problem, and people are working on at least three different solutions to those specific problems. So, if your problem is covered by one of them, you're in luck. If it isn't, you have to explain what your problem is, and why none of their solutions will work for it, and why yours will work better. Otherwise, you're just saying "instead of a choice between thread semantics and process semantics, let's have a choice between sort-of process semantics and process semantics because this seems neat". Going back to the venerable thumbnail example, a developer today has two choices for correct code: change the code to return the size instead of appending it to an array, in which case you're no longer sharing anything, or put an explicit lock around the shared value. Your proposal doesn't affect the first solution in any way, and it makes the second solution harder to write, without buying anything for anyone. People who don't want to figure out how to do the locking already have an answer, and what's the point in trying to build something equivalent to the answer we already have without any intended improvements?

On 08/18/2015 01:27 PM, Chris Angelico wrote:
Just a thought... What if a new name type, co_mutables, is added to functions objects, and a new keyword to go with it, "mutable" to be used like "globals" and "nonlocals". And then raise an error on an attempt to bind a mutable object to a name not in co_mutables. Also, don't include co_mutables in functions scopes. That would require mutable objects to be passed as function arguments to be seen. Then only mutable objects/names explicitly passed to a new thread need to be tracked. It might be possible to turn that on with a compiler directive at the top of a module so normal python code would work normally, and thread safe python code would be limited. Would something in this direction simplify the problem? Cheers, Ron

On Aug 19, 2015, at 09:57, Ron Adam <ron3200@gmail.com> wrote:
Well, the problem can be solved with a strong enough static typing system, as multiple ML-derived languages that add mutability and/or unsafety prove. But what you're suggesting doesn't nearly strong enough, or static enough, to accomplish that. First, in this code, is i mutable or not: def spam(a): for i in a: eggs(i) And if eggs is imported from a module not marked "thread-safe", does is the call illegal, or assumed to do something that mutates i, or assumed to be safe? Also, whether eggs is from a "thread-safe" module or a normal one, how does the compiler know whether it's passing i to a new thread? And what happens if eggs stores i in a list or other mutable object and some other code mutates it later? Finally, if you only track mutability at the top level, how can you tell the compiler that a (mutable) queue of ints is thread-safe, but a queue of lists is not? And how can the compiler know which one it's looking at without doing a complete whole-program type inference?

On 08/19/2015 04:46 PM, Andrew Barnert via Python-ideas wrote:
I'll try to explain what I'm thinking, and see where this goes. The general idea is to keep mutable objects in function local only names. It's not a complete solution by it self. Some objects will still need to be managed as shared objects, but they will easy(er) to identify.
In this case 'i' is immutable as it's not marked as 'mutable'. That determines what byte code is used, (or how the byte code that is used acts).
3 13 LOAD_GLOBAL 0 (eggs) 16 LOAD_FAST 1 (i) 19 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 22 POP_TOP 23 JUMP_ABSOLUTE 7 >> 26 POP_BLOCK >> 27 LOAD_CONST 0 (None) 30 RETURN_VALUE If a compiler directive/flag for threading was set, then the interpreter/compiler may use different LOAD_FAST and GET_ITER routines that will raise an exception if "i" or "a" is mutable. (by checking a flag on the object, or other means.) If however... def eggs(i): mutable i # mutable objects clearly marked. i.scramble() def spam(a): mutable a, i for i in a: eggs(i) # eggs can't access 'a' or 'i' in it's scope. # so i needs to be passed. Instead of LOAD_FAST and STORE_FAST, it would have LOAD_MUTABLE and STORE_MUTABLE. That doesn't mean they are mutable, but that they may be, and are stored as co_mutable names that can't be accessed by a closure or non_local scope. It's not as extreme as requiring all immutable objects.
And if eggs is imported from a module not marked "thread-safe", does is the call illegal, or assumed to do something that mutates i, or assumed to be safe?
The module could be local to only that thread. That may be one way a thread can use non thread safe code safely.
Also, whether eggs is from a "thread-safe" module or a normal one, how does the compiler know whether it's passing i to a new thread?
The only objects that are both shared and mutable with this model are those passed through a thread API call explicitly. future_result = newthread(foo, ...) # <-- only these "foo, ..." # need attention.
And what happens if eggs stores i in a list or other mutable object and some other code mutates it later?
If it's all in the same thread, no problem. If it's in another thread, then it may be an issue, and those details will still need to be worked out. def main(): mutable a, f new_thread(foo, a, b, [e, f, g]) new_thread(bar, e, f, g) ... Only a and f are both shared and mutable. Or: def main(): mutable persons, people people = get_people() places = get_places() new_thread(foo, persons) new_thread(bar, persons) ... The list of persons would need all mutable items in them managed in some way. places is not mutable, and so they will be visible to foo and bar, but need no special handling. But we don't have to worry about the list of people as it's not passed or visible to new_threads. So the problem is reduced to a smaller set of relatively easy to identify objects.
The problem of nested mutable objects will still be a concern. Cheers, Ron

On 20.08.2015 04:20, Ron Adam wrote:
I had absolutely not idea what you mean when saying: "co_mutables". Reading the byte code and the examples above though, I just can agree. Thanks a lot for this. Yesterday, I used "shared" in my posts to illustrate that. "mutable" is another word for it. Best, Sven

On 08/20/2015 11:45 AM, Sven R. Kunze wrote:
When a bytecode to load an object is executed such as LOAD_FAST, it gets it's reference to the object from the function's list of names in it's code object.
3 6 LOAD_FAST 0 (x) 9 LOAD_FAST 1 (y) 12 BINARY_ADD 13 RETURN_VALUE
foo.__code__.co_varnames ('x', 'y')
LOAD_FAST 0, reads __code__.co_varnames[0] LOAD_FAST 1, reads __code__.co_varnames[1] Adding a co_mutables name list to the __code__ attribute, along with new bytecodes to access them, it would create a way to keep private local names without changing how the other bytecodes work. LOAD_MUTABLE 0, would get the first reference in __code__.co_mutables. Look for co_name, and co_freevars here to see how they relate to other byte codes. https://docs.python.org/3.5/library/dis.html#python-bytecode-instructions Hope that helps. Cheers, Ron
Yesterday, I used "shared" in my posts to illustrate that. "mutable" is another word for it.

On Thu, Aug 20, 2015 at 12:27:55PM -0400, Ron Adam wrote:
Bytes codes are implementation, not semantics: there is no part of the Python language that promises that retrieving local variables will use a byte code LOAD_FAST. That's not how IronPython or Jython work, and there is no stability guarantee that CPython will always use LOAD_FAST.
What difference would it make to have a different implemention for retrieving and setting the object? How will having this byte code "keep private local names"? -- Steve

On 08/20/2015 12:51 PM, Steven D'Aprano wrote:
Yes, but semantics needs a workable implementation at some point. The semantics is to have a way to make names (for mutable objects) in outer scopes not be visible to function defined in inner scopes. def foo(x): """ Example of hiding a mutable object from inner scopes. """ mutable items items = [1, 2, 3] def bar(y): """ can't see items here. So can't mutate it.""" return -y return [bar(y)+x for y in items] So foo is able to protect items from being mutated by functions defined in it's scope. We could use localonly instead of mutable, but in the context of threading mutable may be more appropriate. (Way too soon to decide what color to paint this bike.) It may seem like it isn't needed, because you have control over what a function has access too... ie... just don't do that. But when you have many programmers working on large projects, things can get messy. And this helps with that, but also helps in the case of threads.
If someone wanted to abuse it, they could. But that is true for many other areas of Python. Just as declaring a name as global or nonlocal changes which co_xxxx attribute a name reference is in, declaring it mutable would do the same. And just as the compiler generates different bytecode for global and nonlocal, it would generate different bytecode in this case too. That bytecode, LOAD_MUTABLE, (and STORE_MUTABLE) would always look in the co_mutables list for it's references. Just as the other bytecodes look in certain lists for their references. So it using an implementation consistent with how the other names are referenced. The important point of that is the name *won't* be in co_names, co_freenames, or co_cellvars. While it may be possible to do without using a new name list in __code__, that would require keeping track of which names are local only, and which one are local but visible in the scope for functions defined under that function, in some other way. As I said it's a partial solution. Shared & mutable names passed as function arguments will still need to be protected But I thought the idea was interesting enough to post.by locks or some other means in threads, but they will be easy to identify and other mutable objects that are only used locally in a function won't need additional protections as they won't be visible outside of that function. (without abusing the code object, or introspecting it.) To describe this further would probably require me to actually attempt to write a patch. I'm not sure I'm up to that on my own right now. Cheers, Ron

On Aug 20, 2015, at 13:02, Ron Adam <ron3200@gmail.com> wrote:
The only case you're helping with here is the case where the race is entirely local to one function and the functions it defines--a relatively uncommon case that also gets the least messy and is the easiest to spot and debug. Also, the "dangerous" cases are already marked today: the local function has to explicitly declare the variable nonlocal or it can't assign to it.
As a side note, closure variables aren't accessed by LOAD_FAST and SAVE_FAST from either side; that's what cellvars and freevars are for. So, your details don't actually work. But it's not hard to s/fast/cell/ and similar in your details and understand what you mean. But I don't see why you couldn't just implement the "mutable" keyword to mean that the variable must be in names rather than cellnames or freenames or varnames (raising a compile-time error if that's not possible) and just continue using *_FAST on them. That would be a lot simpler to implement. It's also a lot simpler to explain: declaring a variable mutable means it can't participate in closures.
As I said it's a partial solution. Shared & mutable names passed as function arguments will still need to be protected
The problem here is the same as in Sven's proposal: the problem is shared values, not shared variables, so any solution that just tries to limit shared variables is only a vanishingly tiny piece of the solution. It doesn't do anything for mutable values passed to functions, or returned or yielded, or stored on self, or stored in any other object's attributes or in any container, or even in globals. It also doesn't prevent you from mutating them by calling a method (including __setattr__ and __setitem__ and the various __i*__ methods as well as explicit method calls). And, even if you managed to solve all of those problems, it still wouldn't be useful, because it doesn't do anything for any case where you share a member or element of the object rather than the object itself--e.g., if I have a dict mapping sockets to Connection objects, marking the dict unshareable doesn't protect the Connection objects in any way.

On 08/20/2015 05:26 PM, Andrew Barnert via Python-ideas wrote:
But you can mutate it, so it's not already marked today.
Yes... thats what I mean. ;-)
Sounds good to me.
The problem is shared mutable values. One thing I've been wondering is what is considered a mutable object in an practical sense. For example a tuple is not mutable, but it's items may be. But it seems to me it should still be considered a mutable object if it has any mutable sub items in it. So a test for mutability needs to test content. It probably needs another word or description. Completely immutable? (?) It seems like there should already be a word for an object that has no mutable sub items or parts. It also seems like there should be an easy way to test an object for that.
Returned mutable values shouldn't be a problem unless the function reuses the same object over again. But yes it doesn't solve those cases.
Nonlocal doesn't prevent that. Not being able to get to it does. For small programs that aren't nested deeply it usually not a problem. Just don't reuse names and be sure to initialize values locally.
That would be the part that this won't address, you would still need to either deep copy or lock all the shared mutable locations within that dict including the items and any attributes. Not only the items passed and returned, but for all mutable values that can seen by any functions in threads. This last part was what I was thinking could be made easier.. is it possible to reduce how much is seen, and so reduce the amount of locks. I think it was an interesting discussion and will help me think about these things in the future, but I'm not sure it would do what I was thinking at the start. Cheers, Ron

On Aug 20, 2015, at 16:57, Ron Adam <ron3200@gmail.com> wrote:
But your solution does nothing to stop values from being mutated, only to stop variables from being reassigned, so you haven't fixed anything.
Yes, that problem already exists for hashabllity (which, in Python, is strongly connected to immutability). Calling hash() on a tuple may still raise an exception of one of its elements is mutable. The reason this is rarely a problem is that it's not hard to just avoid storing mutable values in tuples that are used as dict keys. But technically, the problem is there. In your case, the same doesn't help, because it _is_ hard to avoid storing mutable values in everything in the world except specially-marked local variables.
But it seems to me it should still be considered a mutable object if it has any mutable sub items in it. So a test for mutability needs to test content. It probably needs another word or description. Completely immutable? (?) It seems like there should already be a word for an object that has no mutable sub items or parts. It also seems like there should be an easy way to test an object for that.
Only by recursing through the entire thing, as calling hash() does. In a statically-typed language, it's a different story, because if you declare, or the compiler infers, that you've got, say, a 3-tuple of nothing but ints or other 3-tuples of the same type or frozensets of the same type, then any value that it holds must be recursively immutable. (Although that can get tricky in practice, of course. Imagine writing that ADT manually, or reading it.) But that doesn't help for Python. The only obvious efficient way I can think of to solve this is in Python what I said before: having an entirely separate heap of objects, so you know that the objects in that thread can't be part of any objects in another thread. In other words, process semantics, which we already have. Maybe there are other ways to solve it, but ignoring it definitely doesn't count as solving it.
Sure it would: connections.append(connect()) with Pool() as pool: for _ in range(10): pool.apply_async(broadcast, connections, "warning") Now you've got 10 threads that all want to mutate the same object returned by connect(), even though it isn't stored in a variable.
But yes it doesn't solve those cases.
Right, it only solves an uncommon case, which also happens to be the simplest to detect and debug.
That's the whole problem: reusing names isn't the issue, it's different parts of the program having the same values under different names (or just using them without binding them to names at all). That's where races come from in real-life programs. And most complex programs are not deeply nested (they're wide, not tall), so that isn't the problem either.
If you want to take this idea further, look at what you could do with a two-level store. For example, in Oz (slightly oversimplifying and distorting things), names are bound to variables, and lists hold variables, and so on, while variables hold values. You can also implement this in something like C++, if you only store shared_ptr<T> values rather than storing T values directly (at least whenever T is mutable). I think you could build something around declaring the sharedness of the variables, rather than the names. Would that be sufficient without transitive static typing? I'm not sure, but it might be worth thinking through. But I don't think that would lead to anything useful for Python. Another possibility is to look at ways to move rather than copy values. That doesn't solve everything, but a few years of experience with C++11 seems to show that it can solve many real-world problems. (Of course this assumes the existence of collections that you can only move, not copy, things into. Or maybe just using a move-to-queue API and not directly accessing the collection underneath it?) There might be something doable for Python there.

On Thu, Aug 20, 2015 at 04:02:52PM -0400, Ron Adam wrote:
The semantics is to have a way to make names (for mutable objects) in outer scopes not be visible to function defined in inner scopes.
Why would you want that? This is a serious question -- there is already an easy way to ensure that two functions can't see the other's variables: *don't* nest them. The reason for nesting them is to ensure that the inner function *can* see the outer's variables. That's the whole point of nesting. So instead of this:
we can write this, with no new keyword, and it will work today: def foo(x): items = [1, 2, 3] return [bar(y)+x for y in items] def bar(y): """can't see items here. So can't mutate it.""" return -y The only reason to nest bar inside foo is if you want bar to have access to foo's namespace.
So foo is able to protect items from being mutated by functions defined in it's scope.
But functions defined inside foo are under foo's control. They are part of foo. foo can mutate items, say by calling items.append(1); why do you think it matters whether the call to append comes from inside a subfunction or not? Either way, it is still inside foo and part of foo's responsibility. The danger comes, not from the inside of foo, but from the outside of foo. foo has no way of knowing whether some other function, let's call it spam, has access to the *object* items (not the name!) and is mutating it. That is a real danger in threaded programming, but your proposal does nothing to protect against it. The danger comes from shared mutable state, not nested namespaces.
At the point that you're worried about a single function being so complicated or big that developers might accidentally mutate a value inside that function, worrying about nested functions is superfluous: def spam(obj): obj.mutate() def foo(): obj = something_mutable() # Don't mutate it! def inner(): obj.mutate() # masses of code # more masses of code # even more code still obj.mutate() # No protection offered against this spam(obj) # or this inner() # but this is protected against Why bother singling out such an unlikely and specific source of problems? -- Steve

On 08/20/2015 10:57 PM, Steven D'Aprano wrote:
Ok, I'm convinced it's wouldn't do what I initially was thinking. It could possibly offer some benefits to catch some programming errors, but not enough, and would not help with threads. Hmmm... I think maybe I mixed up some dynamic scope behaviour with static scope in my initial thoughts. That would be quite different, but not python. (No, don't explain further, it was a mistake on my part, as I know the difference.) Oh, and thanks to you and Andrew for the feedback, even though it didn't go anywhere. Cheers, Ron

On 8/4/2015 5:03 PM, Sven R. Kunze wrote:
Not true. The language clearly defines when each step happens. The a.__add__ method is called,
a.__iadd__, if it exists. https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types
https://docs.python.org/3/reference/simple_stmts.html#augmented-assignment-s... -- Terry Jan Reedy
participants (7)
-
Andrew Barnert
-
Chris Angelico
-
Paul Moore
-
Ron Adam
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy