
This seems to be two separate proposals: 1) Add a new way to create and specify executor 2) Add a SerialExecutor, which does not use threads or processes So, I'll respond to each one separately. *Add a new way to create and specify executor* Jonathan Crall wrote:
The library's ThreadPoolExecutor and ProcessPoolExecutor are excellent tools, but there is currently no mechanism for configuring which type of executor you want.
The mechanism of configuring the executor type is by instantiating the type of executor you want to use. For IO-bound parallelism you use ``cf.ThreadPoolExecutor()`` or for CPU-bound parallelism you use ``cf.ProcessPoolExecutor()``. So I'm not sure that it would be practically beneficial to provide multiple ways to configure the type of executor to use. That seems to go against the philosophy of preferring "one obvious way to do it" [1]. I think there's a very reasonable argument for using a ``cf.Executor.create()`` or ``cf.create_executor()`` that works as a factory to initialize and return an executor class based on parameters that are passed to it, but to me, that seems better suited for a different library/alternative interface. I guess that I just don't see a practical benefit in having both means of specifying the type of executor for concurrent.futures in the standard library, both from a development maintenance perspective and feature bloat. If a user wants to be able to specify the executor used in this manner, it's rather trivial to implement it in a few lines of code without having to access any private members; which to me seems to indicate that there's not a whole lot of value in adding it to the standard library. That being said, if there are others that would like to use an alternative interface for concurrent.futures, it could very well be uploaded as a small package on PyPI. I just personally don't think it has a place in the existing concurrent.futures module. [1] - One could say that context managers provide an alternative means of creating and using the executors, but context managers provide a significant added value in the form of resource cleanup. To me, there doesn't seem to be much real added value in being able to both use the existing``executor = cf.ThreadPoolExecutor()`` and a new ``executor = cf.create_executor(mode="thread")`` / ``executor = cf.Executor.create(mode="thread")``. *Add a SerialExecutor, which does not use threads or processes* Andrew Barnert wrote:
e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously
In the case of C++'s std::async though, it still launches a thread to run the function within, no? This doesn't require the user to explicitly create or interact with the thread in any way, but that seems to go against what OP was looking for: Jonathan Crall wrote:
Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what
The *concrete* purpose of what that accomplishes (in the context of CPython) isn't clear to me. How exactly are you running the task in parallel without using a thread, process, or coroutine [1]? Without using one of those constructs (directly or indirectly), you're really just executing the tasks one-by-one, not with any form of parallelism, no? That seems to go against the primary practical purpose of using concurrent.futures in the first place. Am I misunderstanding something here? Perhaps it would help to have some form of real-world example where this might be useful, and how it would benefit from using something like SerialExecutor over other alternatives. Jonathan Crall wrote: that is yet. The main purpose of `cf.as_completed()` is to yield the results asynchronously as they're completed (FINISHED or CANCELLED), which is inherently *not* going to be serial. If you want to instead yield each result in the same order they're submitted, but as each one is completed [2], you could do something like this: ``` executor = cf.ThreadPoolExecutor() futs = [] for item in to_do: fut = executor.submit(do_something, item) futs.append(fut) for fut in futs: yield fut.result() ``` (The above would be presumably part of some generator function/method where you could pass a function *do_something* and an iterable of IO-bound tasks *to_do*) This would allow you to execute tasks the parallel, while ensuring the results yielded are serial/synchronous. [1] - You could also create subinterpreters to run tasks in parallel through the C-API, or through the upcoming subinterpreters module. That's been accepted (PEP 554), but since it's not officially part of the stdlib yet I didn't include it. [2] - As opposed to waiting for all of the submitted futures to complete with ``cf.wait(futures, return_when=ALL_COMPLETED)`` / ``cf.wait(futures)``. Well, that turned out quite a bit longer than expected... Hopefully part of it was useful to someone. On Sat, Feb 15, 2020 at 6:19 PM Jonathan Crall <erotemic@gmail.com> wrote:
This implementation is a proof-of-concept that I've been using for awhile <https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/u...>. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made.
First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet.
I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like
``` @classmethod def create(cls, mode, max_workers=0): """ Create an instance of a serial, thread, or process-based executor """ from concurrent import futures if mode == 'serial' or max_workers == 0: return futures.SerialExecutor() elif mode == 'thread': return futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': return futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) ```
I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib.
That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library.
On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?
Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
-- -Jon _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJ... Code of Conduct: http://python.org/psf/codeofconduct/