Re: [Python-ideas] fork - other approaches
Thanks everybody for inspiring me with alternative ways of working with pools. I am very certain that any them will work as intended. However, they do not zero in 100% on my main intentions: 1) easy to understand 2) exchangeable (seq par) A) pmap It origates from map and allows easy exchangeability back and forth sequential and concurrent/parallel execution. However, I have to admit that I have difficulties to change all the 'for loops' to map (mentally as well as for real). The 'for loop' IS the most used loop construct in business applications and I do not see it going away because of something else (such as map). B) with Pool() It removes the need to close and join the pool which removes the visual clutter from the source code. That as such is great. However, exchangeability is clearly not given and the same issue concerning understandability like pmap arises. C) apply Nick's approach of providing a 'call_in_background' solution comes almost close to what would solve the issues at hand. However, it reminds me of apply (deprecated built-in function for calling other functions). So, a better name for it would be 'bg_apply'. All of these approaches basically rip the function call out of the programmer's view. It is no longer function(arg) but apply(function, arg) # or bg_apply(function, arg) # or bg_apply_many(function, args) I don't see this going well in production and in code reviews. So, an expression keyword like 'fork' would still be better at least from my perspective. It would tell me: 'it's not my responsibility anymore; delegate this to someone else and get me a handle of the future result'. Best, Sven ------------------------------------------------------------------------------------------------- FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT
On Aug 1, 2015, at 10:29, Sven R. Kunze <srkunze@mail.de> wrote:
Thanks everybody for inspiring me with alternative ways of working with pools.
I am very certain that any them will work as intended. However, they do not zero in 100% on my main intentions:
1) easy to understand 2) exchangeable (seq <-> par)
A) pmap
It origates from map and allows easy exchangeability back and forth sequential and concurrent/parallel execution.
However, I have to admit that I have difficulties to change all the 'for loops' to map (mentally as well as for real).
You probably don't have to--or want to--change all the for loops. It's very rare that you have a huge sequence of separate loops that all contribute equally to performance and are all parallelizable with the same granularity and so on. Usually, there is one loop that you want to parallelize, and that solves the problem for your entire program.
The 'for loop' IS the most used loop construct in business applications and I do not see it going away because of something else (such as map).
Of course the for statement isn't going away. Neither are comprehensions. And neither are map and other higher-order functions. They do related but slightly different things, and a language that tried to force them all into the same construct would be an unpleasant language. That's why they've coexisted for decades in Python without any of them going away. But you're the one who's trying to do that. In order to avoid having to learn about any other ways to write flow control, you want to change the language so you can disguise all flow control as the kind you already know how to write.
B) with Pool()
It removes the need to close and join the pool which removes the visual clutter from the source code. That as such is great.
It also means you can't forget to clean up the pool, you can't accidentally try to use the results before they're ready, etc. The with statement is one of the key tools in using Python effectively, and I personally wouldn't trust a developer who didn't understand it to start doing multicore optimizations on my code. Also, if you're learning from the examples at the top of the docs and haven't seen with Pool before, I suspect either you're still using Python 2.x (in which case you need to upgrade to 3.5 before you can start proposing new features for 3.6) or reading the 2.7 docs while using 3.x (in which case, don't do that).
However, exchangeability is clearly not given and the same issue concerning understandability like pmap arises.
It's still calling map, so if you don't understand even the basics of higher-order functions, I suppose you still won't understand it. But again, that's a pretty basic and key thing, and I wouldn't go assigning multicore optimization tasks to a developer who couldn't grasp the concept.
C) apply
Nick's approach of providing a 'call_in_background' solution comes almost close to what would solve the issues at hand.
However, it reminds me of apply (deprecated built-in function for calling other functions). So, a better name for it would be 'bg_apply'.
The problem with apply is that it's almost always completely unnecessary, and can be written as a one-liner when it is; its presence encouraged people from other languages where it _is_ necessary to overuse it in Python. But unfortunately, there is a bit of a mix between functions that "apply" other functions--including Pool.apply_async--and those that "call" other functions and even those they "submit" them. There's really no difference, so it would be nice if Python were consistent in the naming. And, since Pool uses the "apply" terminology, I think you may be right here. I disagree about abbreviating background to "bg", however. You're only going to be writing this a few times in your program, but you'll be reading those few places quite often, and the fact that they're backgrounding code will likely be important to understanding and debugging that code. So I'd stick with the PEP 8 recommendation and spell it out. But of course your mileage may vary. Since this is a function you're writing based on Nick's blog post, you can call it whatever makes sense in your particular app. (And, even if it makes it into the stdlib, there's nothing stopping you from writing "bg_apply = apply_in_background" or "from asyncio import apply_in_background as bg_apply" if you really want to.)
All of these approaches basically rip the function call out of the programmer's view.
It is no longer
function(arg)
but
apply(function, arg) # or bg_apply(function, arg) # or bg_apply_many(function, args)
I don't see this going well in production and in code reviews.
Using a higher-order function when there's no need for it certainly should be rejected in code review--which is why Python no longer has the "apply" function. But using one when it's appropriate--like calling map when you want to map a function over an iterable and get back and iterable of results--is a different story. If you're afraid of doing that because you're afraid it won't pass code reviews, then either you have insufficient faith in your coworkers, or you need to find a new job.
So, an expression keyword like 'fork' would still be better at least from my perspective. It would tell me: 'it's not my responsibility anymore; delegate this to someone else and get me a handle of the future result'.
You still haven't answered any of the issues I or anyone else raised with this: fork strongly implies forking new processes rather than submitting to a pool, there's no obvious or visible way to control what kind of pool you're using it how you're using it, there's nowhere to look up what kind of future-like object you get back or what its API is, it's insufficiently useful as a statement but looks clumsy and unpythonic as an expression, etc. Using Pool.map--or Executor.map, which is what I think you really want here (it provides real composable futures, it lets you switch between threads and processes in one central place, etc., and you appear to have no need for the lower-level features of the pool, like controlling batching)--avoids all of those problems. It's worth noting that there are some languages where a solution like this could be more appropriate. For example, in a pure immutable functional language, you really could just have the user start up tasks and let the implementation decide how to pool things, how to partition them among green threads/OS threads/processes manually, etc. because that would be a transparent optimization. For example, an Erlang implementation could use static analysis or runtime tracing to recognize that some processes communicate more heavily than others and partition them into OS processes in a way that minimizes the cost of that communication, and that would be pretty nifty. But a Python implementation couldn't do that, because any of those tasks might write to a shared variable that another task needs, or try to return some unpicklable object, etc. Of course the hope is that in the long run, something like PyPy's STM will be so universally usable that neither you nor the implementation will ever need to make such decisions. But until then, it has to be you, the user, who makes them.
Sven R. Kunze writes:
I am very certain that any them will work as intended. However, they do not zero in 100% on my main intentions:
1) easy to understand 2) exchangeable (seq par)
Exchangeability is a property of the computational structure, not of the language syntax. In particular, in languages that *have* a for loop, you're also going to have side effects, and exchangeability will fail if those side effects interact between problem components. Therefore you need at least two syntaxes: one to express sequential iteration, and one to express parallizable computations. Since the "for" syntax in Python has always meant sequential iteration, and the computations allowed in the suite are unrestricted, you'd just be asking for trouble.
So, an expression keyword like 'fork' would still be better at least from my perspective. It would tell me: 'it's not my responsibility anymore; delegate this to someone else and get me a handle of the future result'.
But now you run into the problem that "for" is not an expression in Python (and surely never will be). You need something that (1) takes a "set-ish"[1] of "problems" and a function to map over them, or (2) a set-ish of problem-function pairs applying the functions to the problems, and then (3) *returns* a "set-ish" of results. (That's just a somewhat more computational expression of your words that I quote.) In other words, "fork" can't be a statement, it has to be an expression, and in Python that expression is the function "map". What am I missing? Footnotes: [1] Possibly an iterable.
On 02.08.2015 02:23, Stephen J. Turnbull wrote:
Sven R. Kunze writes:
I am very certain that any them will work as intended. However, they do not zero in 100% on my main intentions:
1) easy to understand 2) exchangeable (seq par)
Exchangeability is a property of the computational structure, not of the language syntax. It is a property of both and this thread is about the latter. I am glad ThreadPool and ProcessPool have the same API. That is very helpful. In particular, in languages that *have* a for loop, you're also going to have side effects, and exchangeability will fail if those side effects interact between problem components.
Therefore you need at least two syntaxes: one to express sequential iteration, and one to express parallizable computations. Since the "for" syntax in Python has always meant sequential iteration, and the computations allowed in the suite are unrestricted, you'd just be asking for trouble. Sorry?
perhaps something could be done with the "with" statement? with someParallelExecutor() as ex: # do something # the context manager might impose some restrictions on what can be done # .. but the context manager needs to get at the code in order to execute it in parallel somehow.. current = ex.get_next_item_or_something() do_something_in_parrallel_maybe() doSomethingElseWithNastyCapsCuzThisBitWasOriginallyFromJava() #:P ex.poke_at_the_cm() ex.current_iteration_variables.x = "i got no real cause for this line but it seems like doing it this way might possibly not be completely useless" now_notice_how_i_have_described_my_computation_as_a_list_of_steps_and_if_this_variable_name_was_shorter_itd_even_prolly_be_pythonic = True using the "for" keyword does fele nice and fuzzy but "with" is much closer to what we actually want. the problem really is that the with doesnt have enough power to change the execution of the inner block at the moment.
Sven R. Kunze writes:
Exchangeability is a property of the computational structure, not of the language syntax.
It is a property of both and this thread is about the latter. I am glad ThreadPool and ProcessPool have the same API. That is very helpful.
That's because they *can* have the same API, because the computational structure is mostly the same, and where it isn't, little to no confusion is possible. For example, the fact that a process-oriented task doesn't lock variables when reading or writing them is unlikely to matter because that task can't access global objects of the parent program anyway. In order to take advantage of that aspect of threads, you need to rewrite the task. Perhaps a better way to express what I meant is "Syntax can express exchangeability already present in the computational structure. It cannot impose exchangeability not present in the computational structure."
In particular, in languages that *have* a for loop, you're also going to have side effects, and exchangeability will fail if those side effects interact between problem components.
Therefore you need at least two syntaxes: one to express sequential iteration, and one to express parallizable computations. Since the "for" syntax in Python has always meant sequential iteration, and the computations allowed in the suite are unrestricted, you'd just be asking for trouble.
Sorry?
Exactly what I said: you're trying to change a statement that has always meant sequential iteration of statements containing side effects like assignments, and have it also mean parallel execution where side effects need to be carefully controlled. That will cause trouble for people reading the code (eg, they now have to understand any function calls recursively to understand whether there might be any ambiguities), even if it doesn't necessarily cause trouble for you writing it.
On 04.08.2015 04:46, Stephen J. Turnbull wrote:
Perhaps a better way to express what I meant is "Syntax can express exchangeability already present in the computational structure. It cannot impose exchangeability not present in the computational structure." I completely agree with this. So, we still need a syntax. ;)
As the table of the thread 'Concurreny Modules' suggest, coroutines aren't that different if they can fit into that matrix alongside with processes and threads. Just internal technical differences and therefore different properties (let me stress that: that is overly desirable) but a common usage still leaves much to be desired.
Exactly what I said: you're trying to change a statement that has always meant sequential iteration of statements containing side effects like assignments, and have it also mean parallel execution where side effects need to be carefully controlled. That will cause trouble for people reading the code (eg, they now have to understand any function calls recursively to understand whether there might be any ambiguities), even if it doesn't necessarily cause trouble for you writing it.
I never said I wanted to change the 'for' loop. Your logic ('you need this, so you need that and thus you need these') came to that conclusion but it wasn't definitely not me. And I am not sure I agree with that conclusion.
participants (4)
-
Andrew Barnert
-
Joonas Liik
-
Stephen J. Turnbull
-
Sven R. Kunze