Based on the conversation so far, I agree with @Kyle Stanley's breakdown of the proposal. I think shelving the "Add a new way to create and specify executor" and focusing on "Add a SerialExecutor, which does not use threads or processes" is the best way forward.

For context, I'm a machine learning researcher and developer. I've made extensive use of both thread and process based parallelism (and I'm very much looking forward to subinterpreters). I use threads for tasks like downloading files, running background tasks when my GPU computations are the bottleneck, and other IO related tasks. I use processes for image processing and other CPU bound tasks.

@Andrew Barnert's analysis of the use case is spot on. Andrew states:

I’m pretty sure what he meant is that the developer _usually_ wants the task to run in parallel, but in some specific situation he wants it to _not_ run in parallel.

The concrete use case I’ve run into is this: I’ve got some parallel code that has a bug. I’m pretty sure the bug isn’t actually related to the shared data or the parallelism itself, but I want to be sure. I replace the ThreadPoolExecutor with a SyncExecutor and change nothing else about the code, and the bug still happens. Now I’ve proven that the bug isn’t related to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved into a big mess, so it’s easier to track down the problem.

This is exactly the use case that I run into, but this isn't the only use case for SerialExecutor. @Antoine Pitrou put it nicely:

Being able to swap a ThreadPoolExecutor or ProcessPoolExecutor with a serial version using the same API can have benefits in various situations. One is easier debugging (in case the problem you have to debug isn't a race condition, of course :-)). Another is writing a library a command-line tool or library where the final decision of whether to parallelize execution (e.g. through a command-line option for a CLI tool) is up to the user, not the library developer.

Antoine's second point is important in certain multiuser or limited hardware environments. On my personal machine I use all the compute available, but on a shared system I need to constrain the resources I'm using. Disabling parallelism can also be useful on hardware like the raspberry pi.

1) Debugging parallel code: this is the use case stated by @Andrew Barnert. Serial code is easier to debug, and currently the executor API requires restructuring of the code if you want to rule out parallelism as the source of a bug.

2) Some programs run better on one CPU in certain hardware / multiuser environments : depending on the hardware you may want to disable parallelism in your code. Many times I check for a `--serial` flag in the command line to disable parallelism.

This proposal isn't so much about faking parallelism as it is disabling it when you need to. If you set `max_workers` to 0 in ThreadPoolExecutor or ProcessPoolExecutor you get an error. I don't think that disabling parallelism is an uncommon use case. As previously mentioned it has uses in debugging and allowing the user to control the flow of execution. This second case is useful when your parallel code has a race condition that doesn't appear on your machine, but it does on your customer's machine. The current futures API does not work if you need to fallback on single-threaded execution, which means that if the developer wants the option to disable parallelism they have to maintain two different implementations of the same functionality. A serial executor would allow duck-typing to solve that problem.

Also, as a sidenote, I much more prefer the term "SyncExecutor" rather than "SerialExecutor". I think the former is a bit more clear at defining it's actual purpose.

FWIW I found the term "SyncExecutor" really confusing when I was reading this thread. I thought it was short for Synchonized, but I just realized its actually short for Synchronous, which makes much more sense. While SynchronousExecutor makes more sense to me, it is also more verbose and difficult to spell.

It seems there are two possible design decisions for a serial executor:
- one is to execute the task immediately on `submit()`
- another is to execute the task lazily on `result()`

This could for example be controlled by a constructor argument to SerialExecutor.

This is a great idea. I think I like the default being lazy execution, but giving the user control over that would increase the usefulness.

I also see some conversation about a public API to query and get the state of a process. That's likely because my implementation abuses a private member variable, but I think it might be possible to implement "SerialExecutor" without exposing state setter / getters. I think @Kyle Stanley's idea makes sense:

``submit()`` could potentially "fake" the process of scheduling the execution of the function, but without directly executing it; perhaps with something like this: ``executor.submit()`` => create a pending item => add pending item to dict => add callable to call queue => fut.result() => check if in pending items => get from top of call queue => run work item => pop from pending items => set result/exception => return result (skip last three if fut is not in/associated with a pending item).

I'm not 100% sure that this would work as-is, given the complexity of the futures library, but it seems right to me at face value.

On Mon, Feb 17, 2020 at 3:41 PM Antoine Pitrou <solipsis@pitrou.net> wrote:

On Mon, 17 Feb 2020 12:19:59 -0800
Guido van Rossum <guido@python.org> wrote:
> It's actually really hard to implement your own Future class that works
> well with concurrent.futures.as_completed() -- this is basically what
> complicated the OP's implementation. Maybe it would be useful to look into
> a protocol to allow alternative Future implementations to hook into that?

Ah, I understand the reasons then. Ok, it does sound useful to explore
the space of solutions. But let's decouple it from simply querying the
current Future state.

Regards

Antoine.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5UJSZP47TA3ULWNFAG33NFL4KL75QC2Y/
Code of Conduct: http://python.org/psf/codeofconduct/

-Jon (him)