[Python-ideas] fork
Andrew Barnert
abarnert at yahoo.com
Tue Aug 4 21:38:57 CEST 2015
On Aug 4, 2015, at 11:09, Sven R. Kunze <srkunze at mail.de> wrote:
>
>> On 04.08.2015 05:21, Andrew Barnert wrote:
>>> On Aug 3, 2015, at 10:11, Sven R. Kunze <srkunze at mail.de> wrote:
>>>
>>>> On 02.08.2015 02:02, Andrew Barnert wrote:
>>>> Your idea of having a single global "pool manager" object, where you could submit tasks and, depending on how they're marked, they get handled differently might have merit. But that's something you could build pretty easily on top of concurrent.futures (at least for threads vs. processes; you can add in coroutines later, because they're not quite as easy to integrate), upload to PyPI,
>>>
>>> You mean something like this?
>>>
>>> https://pypi.python.org/pypi/xfork
>>
>> Did you just write this today? Then yes, that proves my point about how easy it is to write it. Now you just have to get people using it, get some experience with it, etc. and you can come back with a proposal to put something like this in the stdlib, add syntactic support, etc. that it will be hard for anyone to disagree with. (Or to discover that it has flaws that need to be fixed, or fundamental flaws that can't be fixed, before making the proposal.)
>
> I presented it today. The team members already showed interest. They also noted they like its simplicity. The missing syntax support seemed like minor issue compared to what complexity is hidden.
>
> Others admitted they knew about the existence of concurrent.futures and such but never used it due to
> - its complexity
> - AND *drum roll* the '.result()' of the future objects
> As it seems, it doesn't feel natural.
I don't know how to put this nicely, but I think anyone who finds the complexity of concurrent.futures too daunting to even attempt to learn it should not be working on any code that uses less explicit concurrency. I have taught concurrent.futures to rank novices in a brief personal session or a single StackOverflow answer and they responded, "Wow, I didn't realize it could be this simple". Someone who can't grasp it is almost certain to be someone who introduces races all over your code and can't even understand the problem, much less debug it.
>> One quick comment: from my experience (mostly with other languages that are very different from Python, so I can't promise how well it applies here...), implicit futures without implicit laziness or even an explicit delay mechanism are not as useful as they look at first glance. Code that forks off 8 Fibonacci calls, but waits for each one's result before forking off the next one, might as well have just stayed sequential. And if you're going to use the result by forking off another job, then it's actually more convenient to use explicit futures like the ones in the stdlib.
>>
>> One slightly bigger idea: If you really want to pursue your implicit-as-possible design further, you might want to consider making the decorators replace the function with an object whose __call__ method just implicitly submits it to the pool.
>
> I added two new decorators for this. But they don't work with the @ syntax. It seems like a well-known issue of Python:
>
> _pickle.PicklingError: Can't pickle <function fib_fork at 0x7f8eaeb09730>: it's not the same object as __main__.fib_fork
>
> Would be great if somebody could fix that.
>
>> Then you can use normal function-calling syntax and pretend everything is magic. You can even add operator dunder methods to your future class that do the same thing (so "result * 2" just builds a new future out of "self.get() * 2", either submitted to the pool, probably better, tacked on as an add_done_callback). I think there's a limit to how far you can push this without some mechanism to mark when you need to actual value (in ML-derived languages and C++, static types make this easier: a cast, implicit or explicit, forces a wait; in Python, that doesn't work), but it might be worth exploring that limit. Or it might be better to just stop at the magic function calls and leave the futures alone.
>
> I actually like the idea of contagious futures and I might outline why this is not an issue with the current Python language.
>
> Have a look at the following small interactive Python session:
>
> >>> 3+4
> 7
> >>> _
> 7
> >>> a=9
> >>> _
> 7
> >>> a+=10
> >>> _
> 7
> >>> a
> 19
> >>> _
> 19
> >>>
>
>
> Question:
> When has the add operation being executed?
>
> Answer:
> Unknown from the programmer's perspective.
Not true. The language clearly defines when each step happens. The a.__add__ method is called, then the result is assigned to a, then the statement finishes. (Then, in the next statement, nothing happens--except, because this is happening in the interactive interpreter, and it's an expression statement, after the statement finishes doing nothing, the value of the expression is assigned to _ and its repr is printed out.)
This ordering relationship may be very important if the variable a is shared by multiple threads, especially if more than one thread may modify it, especially if you're using non-atomic operations like += (where another thread can read, use, and assign the variable between the __add__ call and the assignment). If a references a mutable object with an __iadd__ method, the variable doesn't even need to be shared, only the value, for this to matter. The only way to safely ignore these problems is to never share any variables or any mutable values between threads. (This is why concurrency features are easier to design in pure functional languages.) Hiding this fact when you or the people you're hiding it from don't even understand the issue is exactly how you create races.
> Only requirement:
> Exceptions are raised exactly where the operation is supposed to take place in the source code (even if the operation that raises the exception is performed later).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150804/18e06f99/attachment-0001.html>
More information about the Python-ideas
mailing list