> I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either.

Also, ThreadPoolExecutor and ProcessPoolExecutor both have their specific purposes in concurrent.futures: TPE for IO-bound parallelism, and PPE for CPU-bound parallelism, what niche would the proposed SerialExecutor fall under? Fake/dummy parallelism? If so, I personally don't see that as being worth the cost of adding it and then maintaining it in the standard library. But, that's not to say that it wouldn't have a place on PyPI.

> (Then again I've never had much use for ProcessExecutor period.)

I've also made use of TPE far more times than PPE, but I've definitely seen several interesting and useful real-world applications of PPE. Particularly with image processing. I can also imagine it also being quite useful for scientific computing, although I've not personally used it for that purpose.

> IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)

Expanding a bit upon the public API for the cf.Future class would likely allow something like this to be possible without accessing any private members. In particular, I believe there would have to be an public means of accessing the state of the future without having to go through the condition (currently, this can only be done with ``future._state``), and accessing a constant for each of the possible states: PENDING. RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED.

Since that would actually be quite useful for debugging purposes (I had to access ``future._state`` several times while testing the new *cancel_futures*), I'd be willing to work on implementing something like this.


On Sat, Feb 15, 2020 at 10:16 PM Guido van Rossum <guido@python.org> wrote:
Having tried my hand at a simpler version for about 15 minutes, I see the reason for the fiddly subclass of Future -- it seems over-engineered because concurrent.future is complicated.

I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either. (Then again I've never had much use for ProcessExecutor period.)

The "Serial" variants somehow remind me of the "dummy_thread.py" module we had in Python 2. It was removed in Python 3, mostly because we ran out of cases where real threads weren't an option.

IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)

On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall <erotemic@gmail.com> wrote:
This implementation is a proof-of-concept that I've been using for awhile. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made. 

First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough. 

The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet. 

I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like 


```
@classmethod
def create(cls, mode, max_workers=0):
    """ Create an instance of a serial, thread, or process-based executor """
    from concurrent import futures
    if mode == 'serial' or max_workers == 0:
        return futures.SerialExecutor()
    elif mode == 'thread':
        return futures.ThreadPoolExecutor(max_workers=max_workers)
    elif mode == 'process':
        return futures.ProcessPoolExecutor(max_workers=max_workers)
    else:
        raise KeyError(mode)
```

I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib.  

That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library.  

On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
> On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
>
> Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:

This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)

And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.

However, I think you may have overengineered this.

Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)

Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?

Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?

I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.




--
-Jon
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJFU4R2CU6JPWCQ2BYHUPH75MKUM/
Code of Conduct: http://python.org/psf/codeofconduct/


--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ICJKHZ4BPIUMOPIT2TDTBIW2EH4CPNCP/
Code of Conduct: http://python.org/psf/codeofconduct/