SerialExecutor for concurrent.futures + Convenience constructor
I'd like to propose an improvement to `concurrent.futures`. The library's ThreadPoolExecutor and ProcessPoolExecutor are excellent tools, but there is currently no mechanism for configuring which type of executor you want. Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread: ```python import concurrent.futures class SerialFuture( concurrent.futures.Future): """ Non-threading / multiprocessing version of future for drop in compatibility with concurrent.futures. """ def __init__(self, func, *args, **kw): super(SerialFuture, self).__init__() self.func = func self.args = args self.kw = kw # self._condition = FakeCondition() self._run_count = 0 # fake being finished to cause __get_result to be called self._state = concurrent.futures._base.FINISHED def _run(self): result = self.func(*self.args, **self.kw) self.set_result(result) self._run_count += 1 def set_result(self, result): """ Overrides the implementation to revert to pre python3.8 behavior """ with self._condition: self._result = result self._state = concurrent.futures._base.FINISHED for waiter in self._waiters: waiter.add_result(self) self._condition.notify_all() self._invoke_callbacks() def _Future__get_result(self): # overrides private __getresult method if not self._run_count: self._run() return self._result class SerialExecutor(object): """ Implements the concurrent.futures API around a single-threaded backend Example: >>> with SerialExecutor() as executor: >>> futures = [] >>> for i in range(100): >>> f = executor.submit(lambda x: x + 1, i) >>> futures.append(f) >>> for f in concurrent.futures.as_completed(futures): >>> assert f.result() > 0 >>> for i, f in enumerate(futures): >>> assert i + 1 == f.result() """ def __enter__(self): return self def __exit__(self, ex_type, ex_value, tb): pass def submit(self, func, *args, **kw): return SerialFuture(func, *args, **kw) def shutdown(self): pass ``` In order to make it easy to choose the type of parallel (or serial) backend with minimal code changes I use the following "Executor" wrapper class (although if this was integrated into concurrent.futures the name would need to change to something better): ```python class Executor(object): """ Wrapper around a specific executor. Abstracts Serial, Thread, and Process Executor via arguments. Args: mode (str, default='thread'): either thread, serial, or process max_workers (int, default=0): number of workers. If 0, serial is forced. """ def __init__(self, mode='thread', max_workers=0): from concurrent import futures if mode == 'serial' or max_workers == 0: backend = SerialExecutor() elif mode == 'thread': backend = futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': backend = futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) self.backend = backend def __enter__(self): return self.backend.__enter__() def __exit__(self, ex_type, ex_value, tb): return self.backend.__exit__(ex_type, ex_value, tb) def submit(self, func, *args, **kw): return self.backend.submit(func, *args, **kw) def shutdown(self): return self.backend.shutdown() ``` So in summary, I'm proposing to add a SerialExecutor and SerialFuture class as an alternative to the ThreadPool / ProcessPool executors, and I'm also advocating for some sort of "ParamatrizedExecutor", where the user can construct it in "thread", "process", or "serial" model. -- -Jon
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.) And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example. However, I think you may have overengineered this. Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.) Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that? Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib? I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
This implementation is a proof-of-concept that I've been using for awhile <https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/util_futures.py>. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made. First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough. The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet. I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like ``` @classmethod def create(cls, mode, max_workers=0): """ Create an instance of a serial, thread, or process-based executor """ from concurrent import futures if mode == 'serial' or max_workers == 0: return futures.SerialExecutor() elif mode == 'thread': return futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': return futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) ``` I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib. That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library. On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?
Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
-- -Jon
Having tried my hand at a simpler version for about 15 minutes, I see the reason for the fiddly subclass of Future -- it seems over-engineered because concurrent.future is complicated. I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either. (Then again I've never had much use for ProcessExecutor period.) The "Serial" variants somehow remind me of the "dummy_thread.py" module we had in Python 2. It was removed in Python 3, mostly because we ran out of cases where real threads weren't an option. IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.) On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall <erotemic@gmail.com> wrote:
This implementation is a proof-of-concept that I've been using for awhile <https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/util_futures.py>. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made.
First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet.
I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like
``` @classmethod def create(cls, mode, max_workers=0): """ Create an instance of a serial, thread, or process-based executor """ from concurrent import futures if mode == 'serial' or max_workers == 0: return futures.SerialExecutor() elif mode == 'thread': return futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': return futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) ```
I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib.
That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library.
On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?
Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
-- -Jon _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either.
Also, ThreadPoolExecutor and ProcessPoolExecutor both have their specific purposes in concurrent.futures: TPE for IO-bound parallelism, and PPE for CPU-bound parallelism, what niche would the proposed SerialExecutor fall under? Fake/dummy parallelism? If so, I personally don't see that as being worth the cost of adding it and then maintaining it in the standard library. But, that's not to say that it wouldn't have a place on PyPI.
(Then again I've never had much use for ProcessExecutor period.)
I've also made use of TPE far more times than PPE, but I've definitely seen several interesting and useful real-world applications of PPE. Particularly with image processing. I can also imagine it also being quite useful for scientific computing, although I've not personally used it for that purpose.
IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)
Expanding a bit upon the public API for the cf.Future class would likely allow something like this to be possible without accessing any private members. In particular, I believe there would have to be an public means of accessing the state of the future without having to go through the condition (currently, this can only be done with ``future._state``), and accessing a constant for each of the possible states: PENDING. RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED. Since that would actually be quite useful for debugging purposes (I had to access ``future._state`` several times while testing the new *cancel_futures*), I'd be willing to work on implementing something like this. On Sat, Feb 15, 2020 at 10:16 PM Guido van Rossum <guido@python.org> wrote:
Having tried my hand at a simpler version for about 15 minutes, I see the reason for the fiddly subclass of Future -- it seems over-engineered because concurrent.future is complicated.
I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either. (Then again I've never had much use for ProcessExecutor period.)
The "Serial" variants somehow remind me of the "dummy_thread.py" module we had in Python 2. It was removed in Python 3, mostly because we ran out of cases where real threads weren't an option.
IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)
On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall <erotemic@gmail.com> wrote:
This implementation is a proof-of-concept that I've been using for awhile <https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/util_futures.py>. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made.
First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet.
I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like
``` @classmethod def create(cls, mode, max_workers=0): """ Create an instance of a serial, thread, or process-based executor """ from concurrent import futures if mode == 'serial' or max_workers == 0: return futures.SerialExecutor() elif mode == 'thread': return futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': return futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) ```
I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib.
That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library.
On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?
Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
-- -Jon _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ICJKHZ... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Feb 15, 2020 at 21:00 Kyle Stanley <aeros167@gmail.com> wrote:
I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either.
Also, ThreadPoolExecutor and ProcessPoolExecutor both have their specific purposes in concurrent.futures: TPE for IO-bound parallelism, and PPE for CPU-bound parallelism, what niche would the proposed SerialExecutor fall under? Fake/dummy parallelism? If so, I personally don't see that as being worth the cost of adding it and then maintaining it in the standard library. But, that's not to say that it wouldn't have a place on PyPI.
(Then again I've never had much use for ProcessExecutor period.)
I've also made use of TPE far more times than PPE, but I've definitely seen several interesting and useful real-world applications of PPE. Particularly with image processing. I can also imagine it also being quite useful for scientific computing, although I've not personally used it for that purpose.
IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)
Expanding a bit upon the public API for the cf.Future class would likely allow something like this to be possible without accessing any private members. In particular, I believe there would have to be an public means of accessing the state of the future without having to go through the condition (currently, this can only be done with ``future._state``), and accessing a constant for each of the possible states: PENDING. RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED.
Since that would actually be quite useful for debugging purposes (I had to access ``future._state`` several times while testing the new *cancel_futures*), I'd be willing to work on implementing something like this.
Excellent!
On Sat, Feb 15, 2020 at 10:16 PM Guido van Rossum <guido@python.org> wrote:
Having tried my hand at a simpler version for about 15 minutes, I see the reason for the fiddly subclass of Future -- it seems over-engineered because concurrent.future is complicated.
I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either. (Then again I've never had much use for ProcessExecutor period.)
The "Serial" variants somehow remind me of the "dummy_thread.py" module we had in Python 2. It was removed in Python 3, mostly because we ran out of cases where real threads weren't an option.
IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)
On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall <erotemic@gmail.com> wrote:
This implementation is a proof-of-concept that I've been using for awhile <https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/util_futures.py>. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made.
First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet.
I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like
``` @classmethod def create(cls, mode, max_workers=0): """ Create an instance of a serial, thread, or process-based executor """ from concurrent import futures if mode == 'serial' or max_workers == 0: return futures.SerialExecutor() elif mode == 'thread': return futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': return futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) ```
I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib.
That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library.
On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?
Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
-- -Jon _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/ICJKHZ...
Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido (mobile)
I opened a bpo issue to expand upon the public API for cf.Future: https://bugs.python.org/issue39645. Any feedback would be greatly appreciated. (: On Sun, Feb 16, 2020 at 12:22 AM Guido van Rossum <guido@python.org> wrote:
On Sat, Feb 15, 2020 at 21:00 Kyle Stanley <aeros167@gmail.com> wrote:
I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either.
Also, ThreadPoolExecutor and ProcessPoolExecutor both have their specific purposes in concurrent.futures: TPE for IO-bound parallelism, and PPE for CPU-bound parallelism, what niche would the proposed SerialExecutor fall under? Fake/dummy parallelism? If so, I personally don't see that as being worth the cost of adding it and then maintaining it in the standard library. But, that's not to say that it wouldn't have a place on PyPI.
(Then again I've never had much use for ProcessExecutor period.)
I've also made use of TPE far more times than PPE, but I've definitely seen several interesting and useful real-world applications of PPE. Particularly with image processing. I can also imagine it also being quite useful for scientific computing, although I've not personally used it for that purpose.
IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)
Expanding a bit upon the public API for the cf.Future class would likely allow something like this to be possible without accessing any private members. In particular, I believe there would have to be an public means of accessing the state of the future without having to go through the condition (currently, this can only be done with ``future._state``), and accessing a constant for each of the possible states: PENDING. RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED.
Since that would actually be quite useful for debugging purposes (I had to access ``future._state`` several times while testing the new *cancel_futures*), I'd be willing to work on implementing something like this.
Excellent!
On Sat, Feb 15, 2020 at 10:16 PM Guido van Rossum <guido@python.org> wrote:
Having tried my hand at a simpler version for about 15 minutes, I see the reason for the fiddly subclass of Future -- it seems over-engineered because concurrent.future is complicated.
I've never felt the need for either of these myself, nor have I observed it in others I worked with. In general I feel the difference between processes and threads is so large that I can't believe a realistic application would work with either. (Then again I've never had much use for ProcessExecutor period.)
The "Serial" variants somehow remind me of the "dummy_thread.py" module we had in Python 2. It was removed in Python 3, mostly because we ran out of cases where real threads weren't an option.
IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use for it, I'm not sure how many other people would use it, so I doubt it's worth adding it to the stdlib. (The only thing the stdlib might grow could be a public API that makes implementing this feasible without overriding private methods.)
On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall <erotemic@gmail.com> wrote:
This implementation is a proof-of-concept that I've been using for awhile <https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/util_futures.py>. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made.
First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet.
I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like
``` @classmethod def create(cls, mode, max_workers=0): """ Create an instance of a serial, thread, or process-based executor """ from concurrent import futures if mode == 'serial' or max_workers == 0: return futures.SerialExecutor() elif mode == 'thread': return futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': return futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) ```
I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib.
That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library.
On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?
Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
-- -Jon _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/ICJKHZ...
Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido (mobile)
This seems to be two separate proposals: 1) Add a new way to create and specify executor 2) Add a SerialExecutor, which does not use threads or processes So, I'll respond to each one separately. *Add a new way to create and specify executor* Jonathan Crall wrote:
The library's ThreadPoolExecutor and ProcessPoolExecutor are excellent tools, but there is currently no mechanism for configuring which type of executor you want.
The mechanism of configuring the executor type is by instantiating the type of executor you want to use. For IO-bound parallelism you use ``cf.ThreadPoolExecutor()`` or for CPU-bound parallelism you use ``cf.ProcessPoolExecutor()``. So I'm not sure that it would be practically beneficial to provide multiple ways to configure the type of executor to use. That seems to go against the philosophy of preferring "one obvious way to do it" [1]. I think there's a very reasonable argument for using a ``cf.Executor.create()`` or ``cf.create_executor()`` that works as a factory to initialize and return an executor class based on parameters that are passed to it, but to me, that seems better suited for a different library/alternative interface. I guess that I just don't see a practical benefit in having both means of specifying the type of executor for concurrent.futures in the standard library, both from a development maintenance perspective and feature bloat. If a user wants to be able to specify the executor used in this manner, it's rather trivial to implement it in a few lines of code without having to access any private members; which to me seems to indicate that there's not a whole lot of value in adding it to the standard library. That being said, if there are others that would like to use an alternative interface for concurrent.futures, it could very well be uploaded as a small package on PyPI. I just personally don't think it has a place in the existing concurrent.futures module. [1] - One could say that context managers provide an alternative means of creating and using the executors, but context managers provide a significant added value in the form of resource cleanup. To me, there doesn't seem to be much real added value in being able to both use the existing``executor = cf.ThreadPoolExecutor()`` and a new ``executor = cf.create_executor(mode="thread")`` / ``executor = cf.Executor.create(mode="thread")``. *Add a SerialExecutor, which does not use threads or processes* Andrew Barnert wrote:
e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously
In the case of C++'s std::async though, it still launches a thread to run the function within, no? This doesn't require the user to explicitly create or interact with the thread in any way, but that seems to go against what OP was looking for: Jonathan Crall wrote:
Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what
The *concrete* purpose of what that accomplishes (in the context of CPython) isn't clear to me. How exactly are you running the task in parallel without using a thread, process, or coroutine [1]? Without using one of those constructs (directly or indirectly), you're really just executing the tasks one-by-one, not with any form of parallelism, no? That seems to go against the primary practical purpose of using concurrent.futures in the first place. Am I misunderstanding something here? Perhaps it would help to have some form of real-world example where this might be useful, and how it would benefit from using something like SerialExecutor over other alternatives. Jonathan Crall wrote: that is yet. The main purpose of `cf.as_completed()` is to yield the results asynchronously as they're completed (FINISHED or CANCELLED), which is inherently *not* going to be serial. If you want to instead yield each result in the same order they're submitted, but as each one is completed [2], you could do something like this: ``` executor = cf.ThreadPoolExecutor() futs = [] for item in to_do: fut = executor.submit(do_something, item) futs.append(fut) for fut in futs: yield fut.result() ``` (The above would be presumably part of some generator function/method where you could pass a function *do_something* and an iterable of IO-bound tasks *to_do*) This would allow you to execute tasks the parallel, while ensuring the results yielded are serial/synchronous. [1] - You could also create subinterpreters to run tasks in parallel through the C-API, or through the upcoming subinterpreters module. That's been accepted (PEP 554), but since it's not officially part of the stdlib yet I didn't include it. [2] - As opposed to waiting for all of the submitted futures to complete with ``cf.wait(futures, return_when=ALL_COMPLETED)`` / ``cf.wait(futures)``. Well, that turned out quite a bit longer than expected... Hopefully part of it was useful to someone. On Sat, Feb 15, 2020 at 6:19 PM Jonathan Crall <erotemic@gmail.com> wrote:
This implementation is a proof-of-concept that I've been using for awhile <https://gitlab.kitware.com/computer-vision/ndsampler/blob/master/ndsampler/util_futures.py>. Its certain that any version that made it into the stdlib would have to be more carefully designed than the implementation I threw together. However, my implementation demonstrates the concept and there are reasons for the choices I made.
First, the choice to create a SerialFuture object that inherits from the base Future was because I only wanted a process to run if the SerialFuture.result method was called. The most obvious way to do that was to overload the `result` method to execute the function when called. Perhaps there is a better way, but in an effort to KISS I just went with the <100 line version that seemed to work well enough.
The `set_result` is overloaded because in Python 3.8, the base Future.set_result function asserts that the _state is not FINISHED when it is called. In my proof-of-concept implementation I had to set state of the SerialFuture._state to FINISHED in order for `as_completed` to yield it. Again, there may be a better way to do this, but I don't claim to know what that is yet.
I was thinking that a factory function might be a good idea, but if I was designing the system I would have put that in the abstract Executor class. Maybe something like
``` @classmethod def create(cls, mode, max_workers=0): """ Create an instance of a serial, thread, or process-based executor """ from concurrent import futures if mode == 'serial' or max_workers == 0: return futures.SerialExecutor() elif mode == 'thread': return futures.ThreadPoolExecutor(max_workers=max_workers) elif mode == 'process': return futures.ProcessPoolExecutor(max_workers=max_workers) else: raise KeyError(mode) ```
I do think that it would improve the standard lib to have something like this --- again perhaps not this exact version (it does seem a bit weird to give this method to an abstract class), but some common API that makes it easy for the user to swap between the backend Executor implementation. Even though the implementation is "trivial", lots of things in the standard lib are, but they the reduce boilerplate that developers would otherwise need, provide examples of good practices to new developers, and provide a defacto way to do something that might otherwise be implemented differently by different people, so it adds value to the stdlib.
That being said, while I will advocate for the inclusion of such a factory method or wrapper class, it would only be a minor annoyance to not have it. On the other hand I think a SerialExecutor is something that is sorely missing from the standard library.
On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
And I’ve wanted this, and even built it myself at least once—it’s a great way to get all of the logging in order to make things easier to debug, for example.
However, I think you may have overengineered this.
Why can’t you use the existing Future type as-is? Yes, there’s a bit of unnecessary overhead, but your reimplementation seems to add almost the same unnecessary overhead. And does it make enough difference in practice to be worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are different.)
Also, why are you overriding set_result to restore pre-3.8 behavior? The relevant change here seems to be the one where 3.8 prevents executors from finishing already-finished (or canceled) futures; why does your executor need that?
Finally, why do you need a wrapper class that constructs one of the three types at initialization and then just delegates all methods to it? Why not just use a factory function that constructs and returns an instance of one of the three types directly? And, given how trivial that factory function is, does it even need to be in the stdlib?
I may well be missing something that makes some of these choices necessary or desirable. But otherwise, I think we’d be better off adding a SerialExecutor (that works with the existing Future type as-is) but not adding or changing anything else.
-- -Jon _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AG3AXJ... Code of Conduct: http://python.org/psf/codeofconduct/
On Feb 15, 2020, at 20:29, Kyle Stanley <aeros167@gmail.com> wrote:
Add a SerialExecutor, which does not use threads or processes
Andrew Barnert wrote:
e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously
In the case of C++'s std::async though, it still launches a thread to run the function within, no?
No; the point of launch policies is that you can (without needing an executor object[1]) tell the task to run “async” (on its own thread[2]), “deferred” (serially[3] on first demand), or “immediate” (serially right now)[4]. You can even or together multiple policies to let the implementation choose, and IIRC the default is async|deferred. At any rate, I’m not suggesting that C++ is a design worth looking at, just parenthetically noting it as an example of how when libraries don’t have a serial executor, it’s often because they already have a different way to specify the same thing.
This doesn't require the user to explicitly create or interact with the thread in any way, but that seems to go against what OP was looking for:
Jonathan Crall wrote:
Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution.
The *concrete* purpose of what that accomplishes (in the context of CPython) isn't clear to me. How exactly are you running the task in parallel without using a thread, process, or coroutine [1]?
I’m pretty sure what he meant is that the developer _usually_ wants the task to run in parallel, but in some specific situation he wants it to _not_ run in parallel. The concrete use case I’ve run into is this: I’ve got some parallel code that has a bug. I’m pretty sure the bug isn’t actually related to the shared data or the parallelism itself, but I want to be sure. I replace the ThreadPoolExecutor with a SyncExecutor and change nothing else about the code, and the bug still happens. Now I’ve proven that the bug isn’t related to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved into a big mess, so it’s easier to track down the problem. I have no idea if this is Jonathan’s use, but it is the reason I’ve built something similar myself. —- [1] Actually, the version that got into C++11 doesn’t even have executors, only launch policies. It also doesn’t have then continuation methods, composing functions like all and as_completed, … It’s basically useless. All of those other features got deferred to a tech specification that was supposed to be before C++14 but got pushed back repeatedly until it came out after C++17, and then got withdrawn, and now they’re awaiting proposals for a second TS to come. Which will probably be after the language has first-class coroutines and maybe fibers, and async/await, so they may well have to redesign the whole futures model yet again to make futures awaitable… [2] Actually “as if on its own thread”. But AFAIK, every implementation handles this by spawning a thread. I think the distinction is for future expansions, either so they can do something like Java’s ForkJoinPool, or so they can use fibers or coroutines that don’t care what thread they’re on. [3] In C++ futures lingo, “serial” actually means an executor that runs all tasks on a single background thread, with a queue that’s guaranteed to be mutex-locked rather than lock-free. But I mean “serial” in Jonathan’s sense here. [4] Checking the docs, it looks like the immediate policy didn’t make into C++11 either. But anyway, the deferred policy did, and that’s serial in Jonathan’s sense.
No; the point of launch policies is that you can (without needing an executor object[1]) tell the task to run “async” (on its own thread[2]), “deferred” (serially[3] on first demand), or “immediate” (serially right now)[4]. You can even or together multiple policies to let the implementation choose, and IIRC the default is async|deferred.
[2] Actually “as if on its own thread”. But AFAIK, every implementation handles this by spawning a thread. I think the distinction is for future expansions, either so they can do something like Java’s ForkJoinPool, or so they can use fibers or coroutines that don’t care what thread they’re on.
The concrete use case I’ve run into is this: I’ve got some parallel code
Ah, so the fact that std::async spawns a separate thread in many implementations is more of an internal detail that could be changed, rather than a guaranteed behavior. Thanks for the clarification and detailed explanation, libstdc++ is definitely not an area of expertise for me. (: that has a bug. I’m pretty sure the bug isn’t actually related to the shared data or the parallelism itself, but I want to be sure. I replace the ThreadPoolExecutor with a SyncExecutor and change nothing else about the code, and the bug still happens. Now I’ve proven that the bug isn’t related to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved into a big mess, so it’s easier to track down the problem. That sounds like it would be quite a useful utility class for general executor debugging purposes. But, I'm just not convinced that it would see wide enough usage to justify adding it to concurrent.futures. IMO, this makes it a perfect candidate for a decent PyPI package. If that package ends up being significantly popular, it might be worth re-examining it's membership in the stdlib once it becomes mature. This would reduce the risk of burdening CPython development time with an underused feature, and gives it far more room for growth/improvement [1]. Also, as a sidenote, I much more prefer the term "SyncExecutor" rather than "SerialExecutor". I think the former is a bit more clear at defining it's actual purpose. [1] - Once something gets added to the standard library, it has to adhere as much as reasonably possible to backwards compatibility, making any changes in behavior and API drastically more difficult. Also, its development time becomes limited by CPython's release cycle rather than having its own. On Sun, Feb 16, 2020 at 12:43 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 15, 2020, at 20:29, Kyle Stanley <aeros167@gmail.com> wrote:
*Add a SerialExecutor, which does not use threads or processes*
Andrew Barnert wrote:
e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously
In the case of C++'s std::async though, it still launches a thread to run the function within, no?
No; the point of launch policies is that you can (without needing an executor object[1]) tell the task to run “async” (on its own thread[2]), “deferred” (serially[3] on first demand), or “immediate” (serially right now)[4]. You can even or together multiple policies to let the implementation choose, and IIRC the default is async|deferred.
At any rate, I’m not suggesting that C++ is a design worth looking at, just parenthetically noting it as an example of how when libraries don’t have a serial executor, it’s often because they already have a different way to specify the same thing.
This doesn't require the user to explicitly create or interact with the thread in any way, but that seems to go against what OP was looking for:
Jonathan Crall wrote:
Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution.
The *concrete* purpose of what that accomplishes (in the context of CPython) isn't clear to me. How exactly are you running the task in parallel without using a thread, process, or coroutine [1]?
I’m pretty sure what he meant is that the developer _usually_ wants the task to run in parallel, but in some specific situation he wants it to _not_ run in parallel.
The concrete use case I’ve run into is this: I’ve got some parallel code that has a bug. I’m pretty sure the bug isn’t actually related to the shared data or the parallelism itself, but I want to be sure. I replace the ThreadPoolExecutor with a SyncExecutor and change nothing else about the code, and the bug still happens. Now I’ve proven that the bug isn’t related to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved into a big mess, so it’s easier to track down the problem.
I have no idea if this is Jonathan’s use, but it is the reason I’ve built something similar myself.
—-
[1] Actually, the version that got into C++11 doesn’t even have executors, only launch policies. It also doesn’t have then continuation methods, composing functions like all and as_completed, … It’s basically useless. All of those other features got deferred to a tech specification that was supposed to be before C++14 but got pushed back repeatedly until it came out after C++17, and then got withdrawn, and now they’re awaiting proposals for a second TS to come. Which will probably be after the language has first-class coroutines and maybe fibers, and async/await, so they may well have to redesign the whole futures model yet again to make futures awaitable…
[2] Actually “as if on its own thread”. But AFAIK, every implementation handles this by spawning a thread. I think the distinction is for future expansions, either so they can do something like Java’s ForkJoinPool, or so they can use fibers or coroutines that don’t care what thread they’re on.
[3] In C++ futures lingo, “serial” actually means an executor that runs all tasks on a single background thread, with a queue that’s guaranteed to be mutex-locked rather than lock-free. But I mean “serial” in Jonathan’s sense here.
[4] Checking the docs, it looks like the immediate policy didn’t make into C++11 either. But anyway, the deferred policy did, and that’s serial in Jonathan’s sense.
On Sat, 15 Feb 2020 14:16:39 -0800 Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
FWIW, I agree with Andrew here. Being able to swap a ThreadPoolExecutor or ProcessPoolExecutor with a serial version using the same API can have benefits in various situations. One is easier debugging (in case the problem you have to debug isn't a race condition, of course :-)). Another is writing a library a command-line tool or library where the final decision of whether to parallelize execution (e.g. through a command-line option for a CLI tool) is up to the user, not the library developer. It seems there are two possible design decisions for a serial executor: - one is to execute the task immediately on `submit()` - another is to execute the task lazily on `result()` This could for example be controlled by a constructor argument to SerialExecutor. Regards Antoine.
FWIW, I agree with Andrew here. Being able to swap a ThreadPoolExecutor or ProcessPoolExecutor with a serial version using the same API can have benefits in various situations. One is easier debugging (in case the problem you have to debug isn't a race condition, of course :-)). Another is writing a library a command-line tool or library where the final decision of whether to parallelize execution (e.g. through a command-line option for a CLI tool) is up to the user, not the library developer.
After Andrew explained his own use case for it with isolating bugs to ensure that the issue wasn't occurring as a result of parallelism, threads, processes, etc; I certainly can see how it would be useful. I could also see a use case in a CLI tool for a conveniently similar parallel and non-parallel version, although I'd likely prefer just having an entirely separate implementation. Particularly if the parallel version includes diving a large, computationally intensive task into many sub-tasks (more common for PPE), that seems like it could result in significant additional unneeded overhead for the non-parallel version. I think at this point, it's potential usefulness is clear though. But, IMO, the main question is now the following: would it be better *initially* placed in the standard library or on PyPI (which could eventually transition into stdlib if it sees widespread usage)?
It seems there are two possible design decisions for a serial executor: - one is to execute the task immediately on `submit()` - another is to execute the task lazily on `result()`
To me, it seems like the latter would be more useful for debugging purposes, since that would be more similar to how the submitted task/function would actually be executed. ``submit()`` could potentially "fake" the process of scheduling the execution of the function, but without directly executing it; perhaps with something like this: ``executor.submit()`` => create a pending item => add pending item to dict => add callable to call queue => fut.result() => check if in pending items => get from top of call queue => run work item => pop from pending items => set result/exception => return result (skip last three if fut is not in/associated with a pending item). IMO, that would be similar enough to the general workflow followed in the executors without any of the parallelization. On Sun, Feb 16, 2020 at 6:29 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Feb 15, 2020, at 13:36, Jonathan Crall <erotemic@gmail.com> wrote:
Also, there is no duck-typed class that behaves like an executor, but does its processing in serial. Often times a develop will want to run a task in parallel, but depending on the environment they may want to disable
On Sat, 15 Feb 2020 14:16:39 -0800 Andrew Barnert via Python-ideas <python-ideas@python.org> wrote: threading or process execution. To address this I use a utility called a `SerialExecutor` which shares an API with ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in the same python thread:
This makes sense. I think most futures-and-executors frameworks in other
languages have a serial/synchronous/immediate/blocking executor just like this. (And the ones that don’t, it’s usually because they have a different way to specify the same functionality—e.g., in C++, you only use executors via the std::async function, and you can just pass a launch option instead of an executor to run synchronously.)
FWIW, I agree with Andrew here. Being able to swap a ThreadPoolExecutor or ProcessPoolExecutor with a serial version using the same API can have benefits in various situations. One is easier debugging (in case the problem you have to debug isn't a race condition, of course :-)). Another is writing a library a command-line tool or library where the final decision of whether to parallelize execution (e.g. through a command-line option for a CLI tool) is up to the user, not the library developer.
It seems there are two possible design decisions for a serial executor: - one is to execute the task immediately on `submit()` - another is to execute the task lazily on `result()`
This could for example be controlled by a constructor argument to SerialExecutor.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PCDN4J... Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, 16 Feb 2020 09:29:36 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
After Andrew explained his own use case for it with isolating bugs to ensure that the issue wasn't occurring as a result of parallelism, threads, processes, etc; I certainly can see how it would be useful. I could also see a use case in a CLI tool for a conveniently similar parallel and non-parallel version, although I'd likely prefer just having an entirely separate implementation. Particularly if the parallel version includes diving a large, computationally intensive task into many sub-tasks (more common for PPE), that seems like it could result in significant additional unneeded overhead for the non-parallel version.
I think at this point, it's potential usefulness is clear though. But, IMO, the main question is now the following: would it be better *initially* placed in the standard library or on PyPI (which could eventually transition into stdlib if it sees widespread usage)?
I don't think we need to be dogmatic here. If someone wants to provide it on PyPI, then be it. But if they'd rather contribute it to the stdlib, we should examine the relevant PR at face value. Asking it to be exercised first on PyPI is worthwhile if the domain space is complex or there are multiple possible APIs. It's not really the case here: the API is basically constrained (it must be an Executor) and the main unknown seems to be whether execution is lazily or immediate (which may be as well governed by a constructor parameter). And the implementation shouldn't be very hairy either :-) Regards Antoine.
I'm happy to defer to Antoine, who is the subject expert here (and Brian Quinlan, the original author). On Sun, Feb 16, 2020 at 6:48 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Feb 2020 09:29:36 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
After Andrew explained his own use case for it with isolating bugs to ensure that the issue wasn't occurring as a result of parallelism,
threads,
processes, etc; I certainly can see how it would be useful. I could also see a use case in a CLI tool for a conveniently similar parallel and non-parallel version, although I'd likely prefer just having an entirely separate implementation. Particularly if the parallel version includes diving a large, computationally intensive task into many sub-tasks (more common for PPE), that seems like it could result in significant additional unneeded overhead for the non-parallel version.
I think at this point, it's potential usefulness is clear though. But, IMO, the main question is now the following: would it be better *initially* placed in the standard library or on PyPI (which could eventually transition into stdlib if it sees widespread usage)?
I don't think we need to be dogmatic here. If someone wants to provide it on PyPI, then be it. But if they'd rather contribute it to the stdlib, we should examine the relevant PR at face value.
Asking it to be exercised first on PyPI is worthwhile if the domain space is complex or there are multiple possible APIs. It's not really the case here: the API is basically constrained (it must be an Executor) and the main unknown seems to be whether execution is lazily or immediate (which may be as well governed by a constructor parameter). And the implementation shouldn't be very hairy either :-)
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U5AOBM... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
I don't think we need to be dogmatic here. If someone wants to provide it on PyPI, then be it. But if they'd rather contribute it to the stdlib, we should examine the relevant PR at face value.
Asking it to be exercised first on PyPI is worthwhile if the domain space is complex or there are multiple possible APIs. It's not really the case here: the API is basically constrained (it must be an Executor) and the main unknown seems to be whether execution is lazily or immediate (which may be as well governed by a constructor parameter). And the implementation shouldn't be very hairy either :-)
Alright, fair enough. I suppose that I hadn't adequately considered how constrained the API and straightforward the implementation would likely be. If you think it would very likely receive widespread enough usage to justify adding and maintaining it to the stdlib, I fully trust your judgement on that. (: As a side note, are we still interested in expanding the public API for the Future class? Particularly for a public means of accessing the state. The primary motivation for it was this topic, but I could easily the same issues coming up with custom Future and Executor classes; not to mention the general debugging usefulness for being able to log the current state of the future (without relying on private members). On Sun, Feb 16, 2020 at 9:49 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Feb 2020 09:29:36 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
After Andrew explained his own use case for it with isolating bugs to ensure that the issue wasn't occurring as a result of parallelism,
threads,
processes, etc; I certainly can see how it would be useful. I could also see a use case in a CLI tool for a conveniently similar parallel and non-parallel version, although I'd likely prefer just having an entirely separate implementation. Particularly if the parallel version includes diving a large, computationally intensive task into many sub-tasks (more common for PPE), that seems like it could result in significant additional unneeded overhead for the non-parallel version.
I think at this point, it's potential usefulness is clear though. But, IMO, the main question is now the following: would it be better *initially* placed in the standard library or on PyPI (which could eventually transition into stdlib if it sees widespread usage)?
I don't think we need to be dogmatic here. If someone wants to provide it on PyPI, then be it. But if they'd rather contribute it to the stdlib, we should examine the relevant PR at face value.
Asking it to be exercised first on PyPI is worthwhile if the domain space is complex or there are multiple possible APIs. It's not really the case here: the API is basically constrained (it must be an Executor) and the main unknown seems to be whether execution is lazily or immediate (which may be as well governed by a constructor parameter). And the implementation shouldn't be very hairy either :-)
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U5AOBM... Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, 16 Feb 2020 17:41:36 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
As a side note, are we still interested in expanding the public API for the Future class? Particularly for a public means of accessing the state. The primary motivation for it was this topic, but I could easily the same issues coming up with custom Future and Executor classes; not to mention the general debugging usefulness for being able to log the current state of the future (without relying on private members).
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory). Regards Antoine.
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory).
Yep, a `Future.state()` method is exactly what I had in mind! I hadn't considered that `Queue.qsize()` was analogous, but that's a perfect example. Based on the proposal in the OP, I had considered that it might also be needed to be able to manually set the state of the future through something like a `Future.set_state()`, which would have a parameter for accessing it safely through the condition's RLock, and another without it (in case they want to specify their own, such as in the OP's example code). Lastly, it seemed also useful to be able to publicly use the future state constants. This isn't necessary for extending them, but IMO it would look better from an API design perspective to use `future.set_state(cf.RUNNING)` instead of `future.set_state(cf._base.RUNNING)` or `future.set_state("running") [1]. Combining the above, this would look something like `future.set_state(cf.FINISHED)`, instead of the current private means of modifying them with `future._state = cf._base.FINISHED` or `future._state = "finished"`. Personally, I'm most strongly in favor of adding Future.state(), as it would be personally useful for me (for reasons previously mentioned); but I think that the other two would be useful for properly extending the Future class without having to access private members. This was more formally proposed in https://bugs.python.org/issue39645. [1] - Setting running was just an example, although normally that would be just done in the executor through `Future.set_running_or_notify_cancel()`. On Sun, Feb 16, 2020 at 6:00 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Feb 2020 17:41:36 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
As a side note, are we still interested in expanding the public API for
the
Future class? Particularly for a public means of accessing the state. The primary motivation for it was this topic, but I could easily the same issues coming up with custom Future and Executor classes; not to mention the general debugging usefulness for being able to log the current state of the future (without relying on private members).
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory).
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7ZQE3I... Code of Conduct: http://python.org/psf/codeofconduct/
Hm, but doesn't the OP's example require *synchronously* reading and writing the state? On Sun, Feb 16, 2020 at 4:47 PM Kyle Stanley <aeros167@gmail.com> wrote:
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory).
Yep, a `Future.state()` method is exactly what I had in mind! I hadn't considered that `Queue.qsize()` was analogous, but that's a perfect example.
Based on the proposal in the OP, I had considered that it might also be needed to be able to manually set the state of the future through something like a `Future.set_state()`, which would have a parameter for accessing it safely through the condition's RLock, and another without it (in case they want to specify their own, such as in the OP's example code).
Lastly, it seemed also useful to be able to publicly use the future state constants. This isn't necessary for extending them, but IMO it would look better from an API design perspective to use `future.set_state(cf.RUNNING)` instead of `future.set_state(cf._base.RUNNING)` or `future.set_state("running") [1].
Combining the above, this would look something like `future.set_state(cf.FINISHED)`, instead of the current private means of modifying them with `future._state = cf._base.FINISHED` or `future._state = "finished"`.
Personally, I'm most strongly in favor of adding Future.state(), as it would be personally useful for me (for reasons previously mentioned); but I think that the other two would be useful for properly extending the Future class without having to access private members. This was more formally proposed in https://bugs.python.org/issue39645.
[1] - Setting running was just an example, although normally that would be just done in the executor through `Future.set_running_or_notify_cancel()`.
On Sun, Feb 16, 2020 at 6:00 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Feb 2020 17:41:36 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
As a side note, are we still interested in expanding the public API for
the
Future class? Particularly for a public means of accessing the state. The primary motivation for it was this topic, but I could easily the same issues coming up with custom Future and Executor classes; not to mention the general debugging usefulness for being able to log the current state of the future (without relying on private members).
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory).
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7ZQE3I... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BT5AKV... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Hm, but doesn't the OP's example require *synchronously* reading and writing the state?
Correct. But in the OP's example, they wanted to use their own "FakeCondition" for reading and writing the state, rather than the executor's internal condition (which is bypassed when you directly access or modify the state through future._state instead of the public methods). That's why I proposed to add something like future.state(). In the case of the OP's example, they would presumably access future.state() through "FakeCondition". Or am I misunderstanding something? On Mon, Feb 17, 2020 at 12:03 AM Guido van Rossum <guido@python.org> wrote:
Hm, but doesn't the OP's example require *synchronously* reading and writing the state?
On Sun, Feb 16, 2020 at 4:47 PM Kyle Stanley <aeros167@gmail.com> wrote:
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory).
Yep, a `Future.state()` method is exactly what I had in mind! I hadn't considered that `Queue.qsize()` was analogous, but that's a perfect example.
Based on the proposal in the OP, I had considered that it might also be needed to be able to manually set the state of the future through something like a `Future.set_state()`, which would have a parameter for accessing it safely through the condition's RLock, and another without it (in case they want to specify their own, such as in the OP's example code).
Lastly, it seemed also useful to be able to publicly use the future state constants. This isn't necessary for extending them, but IMO it would look better from an API design perspective to use `future.set_state(cf.RUNNING)` instead of `future.set_state(cf._base.RUNNING)` or `future.set_state("running") [1].
Combining the above, this would look something like `future.set_state(cf.FINISHED)`, instead of the current private means of modifying them with `future._state = cf._base.FINISHED` or `future._state = "finished"`.
Personally, I'm most strongly in favor of adding Future.state(), as it would be personally useful for me (for reasons previously mentioned); but I think that the other two would be useful for properly extending the Future class without having to access private members. This was more formally proposed in https://bugs.python.org/issue39645.
[1] - Setting running was just an example, although normally that would be just done in the executor through `Future.set_running_or_notify_cancel ()`.
On Sun, Feb 16, 2020 at 6:00 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Feb 2020 17:41:36 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
As a side note, are we still interested in expanding the public API
for the
Future class? Particularly for a public means of accessing the state. The primary motivation for it was this topic, but I could easily the same issues coming up with custom Future and Executor classes; not to mention the general debugging usefulness for being able to log the current state of the future (without relying on private members).
That sounds useful to me indeed. I assume you mean something like a state() method? We already have Queue.qsize() which works a bit like this (unlocked and advisory).
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7ZQE3I... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BT5AKV... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Sun, 16 Feb 2020 19:46:13 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
Based on the proposal in the OP, I had considered that it might also be needed to be able to manually set the state of the future through something like a `Future.set_state()`, which would have a parameter for accessing it safely through the condition's RLock, and another without it (in case they want to specify their own, such as in the OP's example code).
I'm much more lukewarm on set_state(). How hard is it to reimplement one's own Future if someone wants a different implementation? By allowing people to change the future's internal state, we're also giving them a (small) gun to shoot themselves with.
Lastly, it seemed also useful to be able to publicly use the future state constants. This isn't necessary for extending them, but IMO it would look better from an API design perspective to use `future.set_state(cf.RUNNING)` instead of `future.set_state(cf._base.RUNNING)` or `future.set_state("running") [1].
No strong opinion on this, but it sounds ok. That means `future.state()` would return an enum value, not a bare string? Regards Antoine.
It's actually really hard to implement your own Future class that works well with concurrent.futures.as_completed() -- this is basically what complicated the OP's implementation. Maybe it would be useful to look into a protocol to allow alternative Future implementations to hook into that? On Mon, Feb 17, 2020 at 2:07 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Feb 2020 19:46:13 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
Based on the proposal in the OP, I had considered that it might also be needed to be able to manually set the state of the future through
like a `Future.set_state()`, which would have a parameter for accessing it safely through the condition's RLock, and another without it (in case
something they
want to specify their own, such as in the OP's example code).
I'm much more lukewarm on set_state(). How hard is it to reimplement one's own Future if someone wants a different implementation? By allowing people to change the future's internal state, we're also giving them a (small) gun to shoot themselves with.
Lastly, it seemed also useful to be able to publicly use the future state constants. This isn't necessary for extending them, but IMO it would look better from an API design perspective to use `future.set_state(cf.RUNNING)` instead of `future.set_state(cf._base.RUNNING)` or `future.set_state("running") [1].
No strong opinion on this, but it sounds ok. That means `future.state()` would return an enum value, not a bare string?
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/APZRZP... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Mon, 17 Feb 2020 12:19:59 -0800 Guido van Rossum <guido@python.org> wrote:
It's actually really hard to implement your own Future class that works well with concurrent.futures.as_completed() -- this is basically what complicated the OP's implementation. Maybe it would be useful to look into a protocol to allow alternative Future implementations to hook into that?
Ah, I understand the reasons then. Ok, it does sound useful to explore the space of solutions. But let's decouple it from simply querying the current Future state. Regards Antoine.
Based on the conversation so far, I agree with @Kyle Stanley's breakdown of the proposal. I think shelving the "*Add a new way to create and specify executor*" and focusing on "*Add a SerialExecutor, which does not use threads or processes*" is the best way forward. For context, I'm a machine learning researcher and developer. I've made extensive use of both thread and process based parallelism (and I'm very much looking forward to subinterpreters). I use threads for tasks like downloading files, running background tasks when my GPU computations are the bottleneck, and other IO related tasks. I use processes for image processing and other CPU bound tasks. @Andrew Barnert <abarnert@yahoo.com>'s analysis of the use case is spot on. Andrew states:
I’m pretty sure what he meant is that the developer _usually_ wants the task to run in parallel, but in some specific situation he wants it to _not_ run in parallel.
The concrete use case I’ve run into is this: I’ve got some parallel code
that has a bug. I’m pretty sure the bug isn’t actually related to the shared data or the parallelism itself, but I want to be sure. I replace the ThreadPoolExecutor with a SyncExecutor and change nothing else about the code, and the bug still happens. Now I’ve proven that the bug isn’t related to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved into a big mess, so it’s easier to track down the problem.
This is exactly the use case that I run into, but this isn't the only use case for SerialExecutor. @Antoine Pitrou put it nicely: Being able to swap a ThreadPoolExecutor or ProcessPoolExecutor with a
serial version using the same API can have benefits in various situations. One is easier debugging (in case the problem you have to debug isn't a race condition, of course :-)). Another is writing a library a command-line tool or library where the final decision of whether to parallelize execution (e.g. through a command-line option for a CLI tool) is up to the user, not the library developer.
Antoine's second point is important in certain multiuser or limited hardware environments. On my personal machine I use all the compute available, but on a shared system I need to constrain the resources I'm using. Disabling parallelism can also be useful on hardware like the raspberry pi. 1) Debugging parallel code: this is the use case stated by @Andrew Barnert <abarnert@yahoo.com>. Serial code is easier to debug, and currently the executor API requires restructuring of the code if you want to rule out parallelism as the source of a bug. 2) Some programs run better on one CPU in certain hardware / multiuser environments : depending on the hardware you may want to disable parallelism in your code. Many times I check for a `--serial` flag in the command line to disable parallelism. This proposal isn't so much about faking parallelism as it is disabling it when you need to. If you set `max_workers` to 0 in ThreadPoolExecutor or ProcessPoolExecutor you get an error. I don't think that disabling parallelism is an uncommon use case. As previously mentioned it has uses in debugging and allowing the user to control the flow of execution. This second case is useful when your parallel code has a race condition that doesn't appear on your machine, but it does on your customer's machine. The current futures API does not work if you need to fallback on single-threaded execution, which means that if the developer wants the option to disable parallelism they have to maintain two different implementations of the same functionality. A serial executor would allow duck-typing to solve that problem. Also, as a sidenote, I much more prefer the term "SyncExecutor" rather than
"SerialExecutor". I think the former is a bit more clear at defining it's actual purpose.
FWIW I found the term "SyncExecutor" really confusing when I was reading this thread. I thought it was short for Synchonized, but I just realized its actually short for Synchronous, which makes much more sense. While SynchronousExecutor makes more sense to me, it is also more verbose and difficult to spell. It seems there are two possible design decisions for a serial executor:
- one is to execute the task immediately on `submit()` - another is to execute the task lazily on `result()`
This could for example be controlled by a constructor argument to
SerialExecutor.
This is a great idea. I think I like the default being lazy execution, but giving the user control over that would increase the usefulness. I also see some conversation about a public API to query and get the state of a process. That's likely because my implementation abuses a private member variable, but I think it might be possible to implement "SerialExecutor" without exposing state setter / getters. I think @Kyle Stanley's idea makes sense: ``submit()`` could potentially "fake" the process of scheduling the
execution of the function, but without directly executing it; perhaps with something like this: ``executor.submit()`` => create a pending item => add pending item to dict => add callable to call queue => fut.result() => check if in pending items => get from top of call queue => run work item => pop from pending items => set result/exception => return result (skip last three if fut is not in/associated with a pending item).
I'm not 100% sure that this would work as-is, given the complexity of the futures library, but it seems right to me at face value. On Mon, Feb 17, 2020 at 3:41 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 17 Feb 2020 12:19:59 -0800 Guido van Rossum <guido@python.org> wrote:
It's actually really hard to implement your own Future class that works well with concurrent.futures.as_completed() -- this is basically what complicated the OP's implementation. Maybe it would be useful to look into a protocol to allow alternative Future implementations to hook into that?
Ah, I understand the reasons then. Ok, it does sound useful to explore the space of solutions. But let's decouple it from simply querying the current Future state.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5UJSZP... Code of Conduct: http://python.org/psf/codeofconduct/
-- -Jon (him)
On Feb 17, 2020, at 15:41, Jonathan Crall <erotemic@gmail.com> wrote:
FWIW I found the term "SyncExecutor" really confusing when I was reading this thread. I thought it was short for Synchonized, but I just realized its actually short for Synchronous, which makes much more sense. While SynchronousExecutor makes more sense to me, it is also more verbose and difficult to spell.
I think that’s my fault—I switched from “serial” to “sync” in the middle of a message without even realizing It, probably borrowed from an ObjC library I used recently. Anyway, I think the spelled-out “Synchronous” may be a better name, to avoid the (very likely) case of people mistakenly reading “Sync” as short for “Synchronized”. It’s no longer than “ProcessPool”, and, although it is easy to typo, tab-completion or copy-paste helps, and how many times do you need to type it anyway? And there will always be more readers than writers, and it’s more likely the writers will be familiar with the futures module contents than the readers. And IIRC, this is the name Scala uses. Maybe “Serial” is ok too, but to me that implies serialized on a queue, probably using a single background thread. That’s the naming used in the third-party C++ and ObjC libs I’ve used most recently, and it may be more common than that—but it may not, in which case my reading may be idiosyncratic and not worth worrying about.
Anyway, I think the spelled-out “Synchronous” may be a better name, to avoid the (very likely) case of people mistakenly reading “Sync” as short for “Synchronized”. It’s no longer than “ProcessPool”, and, although it is easy to typo, tab-completion or copy-paste helps, and how many times do you need to type it anyway? And there will always be more readers than writers, and it’s more likely the writers will be familiar with the futures module contents than the readers. And IIRC, this is the name Scala uses.
Maybe “Serial” is ok too, but to me that implies serialized on a queue, probably using a single background thread. That’s the naming used in the third-party C++ and ObjC libs I’ve used most recently, and it may be more common than that—but it may not, in which case my reading may be idiosyncratic and not worth worrying about.
FWIW, I'm also in favor of SynchronousExecutor. I find that the term "Serial" has a bit too many definitions depending on the context; whereas "Synchronous" is very clear as to the behavior and purpose of the executor. I'd rather the class name to be excessively verbose and more immediately obvious as to what it does; rather than shorter to type and a bit ambiguous. On Mon, Feb 17, 2020 at 9:05 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Feb 17, 2020, at 15:41, Jonathan Crall <erotemic@gmail.com> wrote:
FWIW I found the term "SyncExecutor" really confusing when I was reading
this thread. I thought it was short for Synchonized, but I just realized its actually short for Synchronous, which makes much more sense. While SynchronousExecutor makes more sense to me, it is also more verbose and difficult to spell.
I think that’s my fault—I switched from “serial” to “sync” in the middle of a message without even realizing It, probably borrowed from an ObjC library I used recently.
Anyway, I think the spelled-out “Synchronous” may be a better name, to avoid the (very likely) case of people mistakenly reading “Sync” as short for “Synchronized”. It’s no longer than “ProcessPool”, and, although it is easy to typo, tab-completion or copy-paste helps, and how many times do you need to type it anyway? And there will always be more readers than writers, and it’s more likely the writers will be familiar with the futures module contents than the readers. And IIRC, this is the name Scala uses.
Maybe “Serial” is ok too, but to me that implies serialized on a queue, probably using a single background thread. That’s the naming used in the third-party C++ and ObjC libs I’ve used most recently, and it may be more common than that—but it may not, in which case my reading may be idiosyncratic and not worth worrying about.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5KVFCF... Code of Conduct: http://python.org/psf/codeofconduct/
A protocol that other Future implementations would be great. The Dask distributed library has an API compatible with concurrent.futures, but would never be appropriate for inclusion in the standard library. It'd be perfect if Dask's Future objects would work well with concurrent.futures.as_completed. https://github.com/dask/distributed/issues/3695 has a view more details.
I'm much more lukewarm on set_state(). How hard is it to reimplement one's own Future if someone wants a different implementation? By allowing people to change the future's internal state, we're also giving them a (small) gun to shoot themselves with.
Yeah, I don't feel quite as strongly about future.set_state(). My primary motivation was to work on a complete means of extending the Future class through a public API, starting with the future's state. But it might be too (potentially) detrimental for the average user to be worth the more niche case of being able to extend Future without needing to use the private members. Upon further consideration, I think it would be better to stick with future.state() for now, since it has more of a general-purpose use case. If it's specifically documented in similar manner to queue.qsize(); stating something along the lines of "Return the approximate state of the future. Note that this state is only advisory, and is not guaranteed."
No strong opinion on this, but it sounds ok. That means `future.state()` would return an enum value, not a bare string?
Yeah, presumably with each using auto() for the value. For simplified review purposes though, I'll likely do these in separate PRs, but attached to the same bpo issue: https://bugs.python.org/issue39645. I'll also update the issue to reduce the scope a bit (mainly removing future.set_state()). On Mon, Feb 17, 2020 at 5:08 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 16 Feb 2020 19:46:13 -0500 Kyle Stanley <aeros167@gmail.com> wrote:
Based on the proposal in the OP, I had considered that it might also be needed to be able to manually set the state of the future through
like a `Future.set_state()`, which would have a parameter for accessing it safely through the condition's RLock, and another without it (in case
something they
want to specify their own, such as in the OP's example code).
I'm much more lukewarm on set_state(). How hard is it to reimplement one's own Future if someone wants a different implementation? By allowing people to change the future's internal state, we're also giving them a (small) gun to shoot themselves with.
Lastly, it seemed also useful to be able to publicly use the future state constants. This isn't necessary for extending them, but IMO it would look better from an API design perspective to use `future.set_state(cf.RUNNING)` instead of `future.set_state(cf._base.RUNNING)` or `future.set_state("running") [1].
No strong opinion on this, but it sounds ok. That means `future.state()` would return an enum value, not a bare string?
Regards
Antoine.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/APZRZP... Code of Conduct: http://python.org/psf/codeofconduct/
participants (6)
-
Andrew Barnert
-
Antoine Pitrou
-
Guido van Rossum
-
Jonathan Crall
-
Kyle Stanley
-
Tom Augspurger