[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

Feb. 15, 2020

      This seems to be two separate proposals:

1) Add a new way to create and specify executor
2) Add a SerialExecutor, which does not use threads or processes

So, I'll respond to each one separately.

*Add a new way to create and specify executor*

Jonathan Crall wrote:
...
The library's ThreadPoolExecutor and ProcessPoolExecutor are excellent
tools, but there is currently no mechanism for configuring which type of
executor you want.
The mechanism of configuring the executor type is by instantiating the type
of executor you want to use. For IO-bound parallelism you use
``cf.ThreadPoolExecutor()`` or for CPU-bound parallelism you use
``cf.ProcessPoolExecutor()``. So I'm not sure that it would be practically
beneficial to provide multiple ways to configure the type of executor to
use. That seems to go against the philosophy of preferring "one obvious way
to do it" [1].

I think there's a very reasonable argument for using a
``cf.Executor.create()`` or ``cf.create_executor()`` that works as a
factory to initialize and return an executor class based on parameters that
are passed to it, but to me, that seems better suited for a different
library/alternative interface. I guess that I just don't see a practical
benefit in having both means of specifying the type of executor for
concurrent.futures in the standard library, both from a development
maintenance perspective and feature bloat. If a user wants to be able to
specify the executor used in this manner, it's rather trivial to implement
it in a few lines of code without having to access any private members;
which to me seems to indicate that there's not a whole lot of value in
adding it to the standard library.

That being said, if there are others that would like to use an alternative
interface for concurrent.futures, it could very well be uploaded as a small
package on PyPI. I just personally don't think it has a place in the
existing concurrent.futures module.

[1] - One could say that context managers provide an alternative means of
creating and using the executors, but context managers provide a
significant added value in the form of resource cleanup. To me, there
doesn't seem to be much real added value in being able to both use the
existing``executor = cf.ThreadPoolExecutor()`` and a new ``executor =
cf.create_executor(mode="thread")`` / ``executor =
cf.Executor.create(mode="thread")``.

*Add a SerialExecutor, which does not use threads or processes*

Andrew Barnert wrote:
...
e.g., in C++, you only use executors via the std::async function, and you
can just pass a launch option instead of an executor to run synchronously
In the case of C++'s std::async though, it still launches a thread to run
the function within, no? This doesn't require the user to explicitly create
or interact with the thread in any way, but that seems to go against what
OP was looking for:

Jonathan Crall wrote:
...
Often times a develop will want to run a task in parallel, but depending
on the environment they may want to disable threading or process execution.
...
The `set_result` is overloaded because in Python 3.8, the base
Future.set_result function asserts that the _state is not FINISHED when it
is called. In my proof-of-concept implementation I had to set state of the
SerialFuture._state to FINISHED in order for `as_completed` to yield it.
Again, there may be a better way to do this, but I don't claim to know what
The *concrete* purpose of what that accomplishes (in the context of
CPython) isn't clear to me. How exactly are you running the task in
parallel without using a thread, process, or coroutine [1]? Without using
one of those constructs (directly or indirectly), you're really just
executing the tasks one-by-one, not with any form of parallelism, no? That
seems to go against the primary practical purpose of using
concurrent.futures in the first place. Am I misunderstanding something
here? Perhaps it would help to have some form of real-world example where
this might be useful, and how it would benefit from using something like
SerialExecutor over other alternatives.

Jonathan Crall wrote:
that is yet.

The main purpose of `cf.as_completed()` is to yield the results
asynchronously as they're completed (FINISHED or CANCELLED), which is
inherently *not* going to be serial. If you want to instead yield each
result in the same order they're submitted, but as each one is completed
[2], you could do something like this:

```
executor = cf.ThreadPoolExecutor()
futs = []
for item in to_do:
     fut = executor.submit(do_something, item)
     futs.append(fut)
for fut in futs:
    yield fut.result()
```

(The above would be presumably part of some generator function/method where
you could pass a function *do_something* and an iterable of IO-bound tasks
*to_do*)

This would allow you to execute tasks the parallel, while ensuring the
results yielded are serial/synchronous.

[1] - You could also create subinterpreters to run tasks in parallel
through the C-API, or through the upcoming subinterpreters module. That's
been accepted (PEP 554), but since it's not officially part of the stdlib
yet I didn't include it.

[2] - As opposed to waiting for all of the submitted futures to complete
with ``cf.wait(futures, return_when=ALL_COMPLETED)`` / ``cf.wait(futures)``.

Well, that turned out quite a bit longer than expected... Hopefully part of
it was useful to someone.

On Sat, Feb 15, 2020 at 6:19 PM Jonathan Crall <erotemic@gmail.com> wrote:
...
This implementation is a proof-of-concept that I've been using for awhile
<https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/u...>.
Its certain that any version that made it into the stdlib would have to be
more carefully designed than the implementation I threw together. However,
my implementation demonstrates the concept and there are reasons for the
choices I made.
First, the choice to create a SerialFuture object that inherits from the
base Future was because I only wanted a process to run if the
SerialFuture.result method was called. The most obvious way to do that was
to overload the `result` method to execute the function when called.
Perhaps there is a better way, but in an effort to KISS I just went with
the <100 line version that seemed to work well enough.
The `set_result` is overloaded because in Python 3.8, the base
Future.set_result function asserts that the _state is not FINISHED when it
is called. In my proof-of-concept implementation I had to set state of the
SerialFuture._state to FINISHED in order for `as_completed` to yield it.
Again, there may be a better way to do this, but I don't claim to know what
that is yet.
I was thinking that a factory function might be a good idea, but if I was
designing the system I would have put that in the abstract Executor class.
Maybe something like
```
@classmethod
def create(cls, mode, max_workers=0):
    """ Create an instance of a serial, thread, or process-based executor
"""
    from concurrent import futures
    if mode == 'serial' or max_workers == 0:
        return futures.SerialExecutor()
    elif mode == 'thread':
        return futures.ThreadPoolExecutor(max_workers=max_workers)
    elif mode == 'process':
        return futures.ProcessPoolExecutor(max_workers=max_workers)
    else:
        raise KeyError(mode)
```
I do think that it would improve the standard lib to have something like
this --- again perhaps not this exact version (it does seem a bit weird to
give this method to an abstract class), but some common API that makes it
easy for the user to swap between the backend Executor implementation. Even
though the implementation is "trivial", lots of things in the standard lib
are, but they the reduce boilerplate that developers would otherwise need,
provide examples of good practices to new developers, and provide a defacto
way to do something that might otherwise be implemented differently by
different people, so it adds value to the stdlib.
That being said, while I will advocate for the inclusion of such a factory
method or wrapper class, it would only be a minor annoyance to not have it.
On the other hand I think a SerialExecutor is something that is sorely
missing from the standard library.
On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
...
...
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but
does its processing in serial. Often times a develop will want to run a
task in parallel, but depending on the environment they may want to disable
threading or process execution. To address this I use a utility called a
`SerialExecutor` which shares an API with
ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially
in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other
languages have a serial/synchronous/immediate/blocking executor just like
this. (And the ones that don’t, it’s usually because they have a different
way to specify the same functionality—e.g., in C++, you only use executors
via the std::async function, and you can just pass a launch option instead
of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great
way to get all of the logging in order to make things easier to debug, for
example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of
unnecessary overhead, but your reimplementation seems to add almost the
same unnecessary overhead. And does it make enough difference in practice
to be worth worrying about anyway? (It doesn’t for my uses, but maybe
you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The
relevant change here seems to be the one where 3.8 prevents executors from
finishing already-finished (or canceled) futures; why does your executor
need that?
Finally, why do you need a wrapper class that constructs one of the three
types at initialization and then just delegates all methods to it? Why not
just use a factory function that constructs and returns an instance of one
of the three types directly? And, given how trivial that factory function
is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices
necessary or desirable. But otherwise, I think we’d be better off adding a
SerialExecutor (that works with the existing Future type as-is) but not
adding or changing anything else.
--
-Jon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJ...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

Kyle Stanley