Add an introspection API to Executor
Sometimes I want to take a live executor, like a `ThreadPoolExecutor`, and check up on it. I want to know how many threads there are, how many are handling tasks and which tasks, how many are free, and which tasks are in the queue. I asked on Stack Overflow: http://stackoverflow.com/questions/25474204/checking-up-on-a-concurrent-futu... There's an answer there, but it uses private variables and it's not part of the API. I suggest it become a part of the API. There should be an API for checking on what the executor is currently doing and answering all the questions I raised above. Thanks, Ram.
Adding active/idle/total worker counts for both ThreadPoolExecutor and ProcessPoolExecutor is pretty straightforward; I just threw a patch together for both in 30 minutes or so. However, I don't think its possible to inspect the contents of a ProcessPoolExecutor's queue without actually consuming items from it. While it *is* possible with ThreadPoolExecutor, I don't think we should expose it - the queue.Queue() implementation ThreadPoolExecutor relies on doesn't have a public API for inspecting its contents, so ThreadPoolExecutor probably shouldn't expose one, either. Identifying which task each worker is processing is possible, but would perhaps require more work than its worth, at least for ProcessPoolExecutor. I do think adding worker count APIs is reasonable, and in-line with a TODO item in the ThreadPoolExecutor source: # TODO(bquinlan): Should avoid creating new threads if there are more # idle threads than items in the work queue. So, at the very least there have been plans to internally keep track active/idle thread counts. If others agree it's a good idea, I'll open an issue on the tracker for this and include my patch (which also addresses that TODO item). On Sun, Aug 24, 2014 at 5:41 PM, Ram Rachum <ram.rachum@gmail.com> wrote:
Sometimes I want to take a live executor, like a `ThreadPoolExecutor`, and check up on it. I want to know how many threads there are, how many are handling tasks and which tasks, how many are free, and which tasks are in the queue.
I asked on Stack Overflow: http://stackoverflow.com/questions/25474204/checking-up-on-a-concurrent-futu...
There's an answer there, but it uses private variables and it's not part of the API.
I suggest it become a part of the API. There should be an API for checking on what the executor is currently doing and answering all the questions I raised above.
Thanks, Ram.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Doesn't queue.Queue also have methods qsize(), empty() and full()? We could easily wrap those. There's always the caveat that the numbers may be out of date as soon as you print them. On Mon, Aug 25, 2014 at 6:44 AM, Dan O'Reilly <oreilldf@gmail.com> wrote:
Adding active/idle/total worker counts for both ThreadPoolExecutor and ProcessPoolExecutor is pretty straightforward; I just threw a patch together for both in 30 minutes or so. However, I don't think its possible to inspect the contents of a ProcessPoolExecutor's queue without actually consuming items from it. While it *is* possible with ThreadPoolExecutor, I don't think we should expose it - the queue.Queue() implementation ThreadPoolExecutor relies on doesn't have a public API for inspecting its contents, so ThreadPoolExecutor probably shouldn't expose one, either. Identifying which task each worker is processing is possible, but would perhaps require more work than its worth, at least for ProcessPoolExecutor.
I do think adding worker count APIs is reasonable, and in-line with a TODO item in the ThreadPoolExecutor source:
# TODO(bquinlan): Should avoid creating new threads if there are more # idle threads than items in the work queue.
So, at the very least there have been plans to internally keep track active/idle thread counts. If others agree it's a good idea, I'll open an issue on the tracker for this and include my patch (which also addresses that TODO item).
On Sun, Aug 24, 2014 at 5:41 PM, Ram Rachum <ram.rachum@gmail.com> wrote:
Sometimes I want to take a live executor, like a `ThreadPoolExecutor`, and check up on it. I want to know how many threads there are, how many are handling tasks and which tasks, how many are free, and which tasks are in the queue.
I asked on Stack Overflow: http://stackoverflow.com/questions/25474204/checking-up-on-a-concurrent-futu...
There's an answer there, but it uses private variables and it's not part of the API.
I suggest it become a part of the API. There should be an API for checking on what the executor is currently doing and answering all the questions I raised above.
Thanks, Ram.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
Le 25/08/2014 09:44, Dan O'Reilly a écrit :
So, at the very least there have been plans to internally keep track active/idle thread counts. If others agree it's a good idea, I'll open an issue on the tracker for this and include my patch (which also addresses that TODO item).
I agree that basic executor parameters could be reflected, and I also agree that some other pieces of runtime state cannot be reliably computed and therefore shouldn't be exposed. Don't hesitate to open an issue with your patch. Regards Antoine.
"some other pieces of runtime state cannot be reliably computed" Can you please specify which ones you mean, and why not reliable? On Mon, Aug 25, 2014 at 8:34 PM, Antoine Pitrou <antoine@python.org> wrote:
Le 25/08/2014 09:44, Dan O'Reilly a écrit :
So, at the very least there have been plans to internally keep track active/idle thread counts. If others agree it's a good idea, I'll open an issue on the tracker for this and include my patch (which also addresses that TODO item).
I agree that basic executor parameters could be reflected, and I also agree that some other pieces of runtime state cannot be reliably computed and therefore shouldn't be exposed.
Don't hesitate to open an issue with your patch.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/python-ideas/pl3r5SsbLLU/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Le 25/08/2014 13:37, Ram Rachum a écrit :
"some other pieces of runtime state cannot be reliably computed"
Can you please specify which ones you mean, and why not reliable?
I cannot say for sure without taking a more detailed look at concurrent.futures :-) However, any runtime information such as "the tasks current being processes" (as opposed to, say, waiting) may not be available to the calling thread or process, or may be unreliable once it returns to the function's caller (since the actual state may have changed in-between). In the former case (information not available to the main process), we can't expose the information at all; in the latter case, we may still choose to expose it with the usual caveats in the documentation (exactly like Queue.qsize()). Regards Antoine.
Maybe I'm missing something, but I don't think that's something that should block implementation. Information not available? Change the executor code to make that information available. Information could have been changed? So what? That is to be expected. When I read a file in Python, by the time the line finished someone could have written something to that file so the result of the read may not be current. Even if I read just a simple variable, by the next line it might have been changed by another thread. I really don't see why any of that deserves special consideration. On Mon, Aug 25, 2014 at 8:57 PM, Antoine Pitrou <antoine@python.org> wrote:
Le 25/08/2014 13:37, Ram Rachum a écrit :
"some other pieces of runtime state cannot be reliably computed"
Can you please specify which ones you mean, and why not reliable?
I cannot say for sure without taking a more detailed look at concurrent.futures :-) However, any runtime information such as "the tasks current being processes" (as opposed to, say, waiting) may not be available to the calling thread or process, or may be unreliable once it returns to the function's caller (since the actual state may have changed in-between).
In the former case (information not available to the main process), we can't expose the information at all; in the latter case, we may still choose to expose it with the usual caveats in the documentation (exactly like Queue.qsize()).
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/python-ideas/pl3r5SsbLLU/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Le 25/08/2014 14:16, Ram Rachum a écrit :
Maybe I'm missing something, but I don't think that's something that should block implementation.
Information not available? Change the executor code to make that information available.
Not if that would make the implementation much more complicated, or significantly slower. Regards Antoine.
It might be worth it to make the implementation somewhat more complicated if it serves a good purpose, for example giving the user of the program insights into how well the executor is performing. Without such insight you may be attempting to tune parameters (like the pool size) without being able to evaluate their effect. On Mon, Aug 25, 2014 at 11:43 AM, Antoine Pitrou <antoine@python.org> wrote:
Le 25/08/2014 14:16, Ram Rachum a écrit :
Maybe I'm missing something, but I don't think that's something that
should block implementation.
Information not available? Change the executor code to make that information available.
Not if that would make the implementation much more complicated, or significantly slower.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
I'll take a look at this again tonight and see if more detailed information (e.g. which tasks are actually being processed) can be determined without too much added complexity and/or performance penalties. If I can come up with something reasonable for both ProcessPool/ThreadPool, I'll add it to the changes I've already made. Either way, I'll create an issue to track this. On Mon, Aug 25, 2014 at 2:54 PM, Guido van Rossum <guido@python.org> wrote:
It might be worth it to make the implementation somewhat more complicated if it serves a good purpose, for example giving the user of the program insights into how well the executor is performing. Without such insight you may be attempting to tune parameters (like the pool size) without being able to evaluate their effect.
On Mon, Aug 25, 2014 at 11:43 AM, Antoine Pitrou <antoine@python.org> wrote:
Le 25/08/2014 14:16, Ram Rachum a écrit :
Maybe I'm missing something, but I don't think that's something that
should block implementation.
Information not available? Change the executor code to make that information available.
Not if that would make the implementation much more complicated, or significantly slower.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I don't think there's any issue with letting people introspect the executor. The problem is that the main thing you get is a queue, and there's a limit to how introspectable a queue can be. In particular, if you want to iterate the waiting tasks, you have to iterate the queue, and there's no safe way to do that. Since CPython's queue.Queue happens to be just a deque and a mutex, you could make it iterable at the cost of blocking all producers and consumers (which might be fine for many uses, like debugging or exploratory programming), or provide a snapshot API to return a copy of the deque. But do you want to make that a requirement on all subclasses of Queue, and all other implementations' queue modules? Does ProirityQueue have to nondestructively iterate a heap in order? Does Jython have to use a mutex and a deque instead of a more efficient (and possibly lock-free) queue from the Java stdlib? What does multiprocessing.Queue do on each implementation? I don't think the costs are worth the benefit. And I assume that's why the existing queue API doesn't provide an iteration or snapshot mechanism. But there's an option that might be worth doing: Provide a queue.IntrospectableQueue type that _is_ defined to have such a mechanism, but to otherwise work like a Queue (except maybe less efficiently). Then provide an optional parameter for the Executor that lets you specify an alternate queue constructor in place of the default. So, when exploring or debugging, you could pass queue.IntrospectableQueue (or multiprocessing.IntrospectableQueue for ProcessPoolExecutor). Whether the interface is "lock_and_return_iterator" or "snapshot", this would be trivial to implement in CPython, and other Pythons could just copy the CPython version instead of extending their native queue types. Sent from a random iPhone On Aug 25, 2014, at 11:54, Guido van Rossum <guido@python.org> wrote:
It might be worth it to make the implementation somewhat more complicated if it serves a good purpose, for example giving the user of the program insights into how well the executor is performing. Without such insight you may be attempting to tune parameters (like the pool size) without being able to evaluate their effect.
On Mon, Aug 25, 2014 at 11:43 AM, Antoine Pitrou <antoine@python.org> wrote:
Le 25/08/2014 14:16, Ram Rachum a écrit :
Maybe I'm missing something, but I don't think that's something that should block implementation.
Information not available? Change the executor code to make that information available.
Not if that would make the implementation much more complicated, or significantly slower.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Sounds good to me. Having to specify `IntrospectableQueue` to the executor is a bit of a chore, but not too bad to get this functionality. I also bet that the performance difference wouldn't be an issue for most uses. On Mon, Aug 25, 2014 at 11:05 PM, 'Andrew Barnert' via python-ideas < python-ideas@googlegroups.com> wrote:
I don't think there's any issue with letting people introspect the executor. The problem is that the main thing you get is a queue, and there's a limit to how introspectable a queue can be.
In particular, if you want to iterate the waiting tasks, you have to iterate the queue, and there's no safe way to do that.
Since CPython's queue.Queue happens to be just a deque and a mutex, you could make it iterable at the cost of blocking all producers and consumers (which might be fine for many uses, like debugging or exploratory programming), or provide a snapshot API to return a copy of the deque.
But do you want to make that a requirement on all subclasses of Queue, and all other implementations' queue modules? Does ProirityQueue have to nondestructively iterate a heap in order? Does Jython have to use a mutex and a deque instead of a more efficient (and possibly lock-free) queue from the Java stdlib? What does multiprocessing.Queue do on each implementation?
I don't think the costs are worth the benefit. And I assume that's why the existing queue API doesn't provide an iteration or snapshot mechanism.
But there's an option that might be worth doing:
Provide a queue.IntrospectableQueue type that _is_ defined to have such a mechanism, but to otherwise work like a Queue (except maybe less efficiently). Then provide an optional parameter for the Executor that lets you specify an alternate queue constructor in place of the default. So, when exploring or debugging, you could pass queue.IntrospectableQueue (or multiprocessing.IntrospectableQueue for ProcessPoolExecutor).
Whether the interface is "lock_and_return_iterator" or "snapshot", this would be trivial to implement in CPython, and other Pythons could just copy the CPython version instead of extending their native queue types.
Sent from a random iPhone
On Aug 25, 2014, at 11:54, Guido van Rossum <guido@python.org> wrote:
It might be worth it to make the implementation somewhat more complicated if it serves a good purpose, for example giving the user of the program insights into how well the executor is performing. Without such insight you may be attempting to tune parameters (like the pool size) without being able to evaluate their effect.
On Mon, Aug 25, 2014 at 11:43 AM, Antoine Pitrou <antoine@python.org> wrote:
Le 25/08/2014 14:16, Ram Rachum a écrit :
Maybe I'm missing something, but I don't think that's something that
should block implementation.
Information not available? Change the executor code to make that information available.
Not if that would make the implementation much more complicated, or significantly slower.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ --
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/pl3r5SsbLLU/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though. Adding support for identifying which tasks are active introduces some extra overhead, which I think can reasonably be made optional. If we're going to use a different Queue class to enable introspection, we might as well disable the other stuff that we're doing to make introspection work. It also makes it easier to raise an exception if an API is called that won't work without IntrospectableQueue being used.
Does Jython have to use a mutex and a deque instead of a more efficient (and possibly lock-free) queue from the Java stdlib?
For what it's worth, Jython just uses CPython's queue.Queue implementation, as far as I can tell.
What does multiprocessing.Queue do on each implementation?
In addition to a multiprocessing.Queue, the ProcessPoolExecutor maintains a dict of all submitted work items, so that can be used instead of trying to inspect the queue itself.
On 08/25/2014 07:51 PM, Dan O'Reilly wrote:
The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though.
Passing the class is the better choice -- it means that future needs can be more easily met by designing the queue variant needed and passing it in -- having a keyword to select only one option is unnecessarily limiting. -- ~Ethan~
In that case, what's the best way to disallow use of APIs that require an introspectable queue implementation? Using isinstance(self._work_queue, IntrospectableQueue) would work, but seems to be nearly as limiting as using an introspectable keyword. Perhaps IntrospectableQueue could support __iter__ as a way of iterating over a snapshot of enqueued items - The Executor could try iterating over the queue when it needs to inspect its contents, raising an appropriate exception (something like "Provided queue class must be introspectable") if that fails. If people prefer __iter__ isn't used for that purpose, we could just do the same thing with whatever public method ends up being used to get the snapshot instead. On Mon, Aug 25, 2014 at 11:02 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 08/25/2014 07:51 PM, Dan O'Reilly wrote:
The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though.
Passing the class is the better choice -- it means that future needs can be more easily met by designing the queue variant needed and passing it in -- having a keyword to select only one option is unnecessarily limiting.
-- ~Ethan~
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Le 25/08/2014 23:02, Ethan Furman a écrit :
On 08/25/2014 07:51 PM, Dan O'Reilly wrote:
The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though.
Passing the class is the better choice -- it means that future needs can be more easily met by designing the queue variant needed and passing it in -- having a keyword to select only one option is unnecessarily limiting.
What if an implementation wants to use something other than a queue? It seems you're breaking the abstraction here. Regards Antoine.
On Monday, August 25, 2014 8:44 PM, Antoine Pitrou <antoine@python.org> wrote:
Le 25/08/2014 23:02, Ethan Furman a écrit : On 08/25/2014 07:51 PM, Dan O'Reilly wrote:
The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though.
Passing the class is the better choice -- it means that future needs can be more easily met by designing the queue variant needed and passing it in -- having a keyword to select only one option is unnecessarily limiting.
What if an implementation wants to use something other than a queue? It seems you're breaking the abstraction here.
A collection of threads and a shared queue is almost the definition of a thread pool. What else would you use? Also, this could make it a lot easier to create variations on ThreadPoolExecutor without subclassing or forking it. For example, if you want your tasks to run in priority order, just give it a priority queue keyed on task.priority. If you want a scheduled executor, just give it a priority queue whose get method blocks until the first task's task.timestamp or a new task is added ahead of the first. And so on. I'm not sure if that's a good idea or not, but it's an interesting possibility at least…
Hm. Maybe we should not complicate the API after all. This seems a lot of theorizing without enough of a use case. -- --Guido van Rossum (python.org/~guido)
On 08/25/2014 09:08 PM, Guido van Rossum wrote:
Hm. Maybe we should not complicate the API after all. This seems a lot of theorizing without enough of a use case.
I went in search of docs to see what the API actually was, and while I know the source code is a great place to go look for education and finer points, should we have to go looking there just to see what the __init__ parameters are? I'm going to go out on a limb and say that ThreadPoolExecutor takes a max_workers param, but I only have that because it's in the example. On the up side, having a link to the source is really cool. Having clicked on that I now know that max_workers is the only param taken. ;) -- ~Ethan~
Le 26/08/2014 00:22, Ethan Furman a écrit :
I went in search of docs to see what the API actually was, and while I know the source code is a great place to go look for education and finer points, should we have to go looking there just to see what the __init__ parameters are?
So, you didn't find the docs? https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor """ class concurrent.futures.ThreadPoolExecutor(max_workers) An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously. """ https://docs.python.org/3/library/concurrent.futures.html#processpoolexecuto... """ class concurrent.futures.ProcessPoolExecutor(max_workers=None) An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If max_workers is None or not given, it will default to the number of processors on the machine. """ Regards Antoine.
On 08/25/2014 09:37 PM, Antoine Pitrou wrote:
Le 26/08/2014 00:22, Ethan Furman a écrit :
I went in search of docs to see what the API actually was, and while I know the source code is a great place to go look for education and finer points, should we have to go looking there just to see what the __init__ parameters are?
So, you didn't find the docs?
https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor
""" class concurrent.futures.ThreadPoolExecutor(max_workers)
An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously. """
https://docs.python.org/3/library/concurrent.futures.html#processpoolexecuto...
""" class concurrent.futures.ProcessPoolExecutor(max_workers=None)
An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If max_workers is None or not given, it will default to the number of processors on the machine. """
I did find the docs, and even with your plain text guide I almost didn't see them when I looked just now. Too much fancy going on there, and all the green examples -- yeah, it's hard to read. For comparison, here's what help(ThreadPoolExecutor) shows: class ThreadPoolExecutor(concurrent.futures._base.Executor) | Method resolution order: | ThreadPoolExecutor | concurrent.futures._base.Executor | builtins.object | | Methods defined here: | | __init__(self, max_workers) | Initializes a new ThreadPoolExecutor instance. | | Args: | max_workers: The maximum number of threads that can be used to | execute the given calls. | | shutdown(self, wait=True) | Clean-up the resources associated with the Executor. | | It is safe to call this method several times. Otherwise, no other | methods can be called after this one. | | Args: | wait: If True then shutdown will not return until all running | futures have finished executing and the resources used by the | executor have been reclaimed. | | submit(self, fn, *args, **kwargs) | Submits a callable to be executed with the given arguments. | | Schedules the callable to be executed as fn(*args, **kwargs) and returns | a Future instance representing the execution of the callable. | | Returns: | A Future representing the given call. Much easier to understand. Looking at the docs again, I think the biggest hurdle to finding that line and recognizing it for what it is is the fact that it comes /after/ all the examples. That's backwards. Why would you need examples for something you haven't read yet? -- ~Ethan~
Looking at the docs again, I think the biggest hurdle to finding that
On 26 Aug 2014 16:12, "Ethan Furman" <ethan@stoneleaf.us> wrote: line and recognizing it for what it is is the fact that it comes /after/ all the examples. That's backwards. Why would you need examples for something you haven't read yet? Many of our module docs serve a dual purpose as a tutorial *and* as an API reference. That's actually a problem, and often a sign of a separate "HOWTO" guide trying to get out. Actually doing the work to split them is rather tedious though, so it tends not to happen very often. Cheers, Nick.
-- ~Ethan~
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 08/25/2014 09:08 PM, Guido van Rossum wrote:
Hm. Maybe we should not complicate the API after all. This seems a lot of theorizing without enough of a use case.
Introspection (aka debugging) is an important use case. Having looked at the code, and with Antoine's comments in mind, I'd be happy with whatever Dan can get in there without changing the queuing implementation -- if anyone needs that much flexibility, they can take the Python code and massage it to their own desires. -- ~Ethan~
As promised, I've opened issue22281 (http://bugs.python.org/issue22281), and attached a patch that makes an attempt at implementing this. Let's continue any further discussion on this topic there. On Tue, Aug 26, 2014 at 12:27 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 08/25/2014 09:08 PM, Guido van Rossum wrote:
Hm. Maybe we should not complicate the API after all. This seems a lot of theorizing without enough of a use case.
Introspection (aka debugging) is an important use case. Having looked at the code, and with Antoine's comments in mind, I'd be happy with whatever Dan can get in there without changing the queuing implementation -- if anyone needs that much flexibility, they can take the Python code and massage it to their own desires.
-- ~Ethan~
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Le 25/08/2014 23:56, Andrew Barnert a écrit :
What if an implementation wants to use something other than a queue? It seems you're breaking the abstraction here.
A collection of threads and a shared queue is almost the definition of a thread pool. What else would you use?
Definitions don't necessarily have any relationship with the way a feature is implemented. Perhaps some version of concurrent.futures would like to use some advanced dispatch mechanism provided by the OS (or shared memory, or whatever). (I'll note that such "flexibility" has been chosen for the API of threading.Condition and it is making it difficult to write an optimized implementation that would you use OS-native facilities, such as pthread condition variables) We have come from a simple proposal to introspect some runtime properties of an executor to the idea of swapping out a building block with another. That doesn't sound reasonable. Regards Antoine.
On Monday, August 25, 2014 7:52 PM, Dan O'Reilly <oreilldf@gmail.com> wrote:
The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though. Adding support for identifying which tasks are active introduces some extra overhead, which I think can reasonably be made optional. If we're going to use a different Queue class to enable introspection, we might as well disable the other stuff that we're doing to make introspection work. It also makes it easier to raise an exception if an API is called that won't work without IntrospectableQueue being used.
Even though this was my suggestion, let me play devil's advocate for a second… The main reason to use this is for debugging or exploratory programming. In the debugger, of course, it's not necessary, because you can just break and suspend all the threads while you do what you want. Would it be reasonable to do the same thing outside the debugger, by providing a threading.Thread.suspend API (and of course the pool and executor APIs have a suspend method that suspends all their threads) so you can safely access the queue's internals? Obviously suspending threads in general is a bad thing to do unless you're a big fan of deadlocks, but for debugging and exploration it seems reasonable; if a program occasionally deadlocks or crashes while you're screwing with its threads to see what happens, well, you were screwing with its threads to see what happens… That might be a horrible attractive nuisance, but if you required an extra flag to be passed in at construction time to make these methods available, and documented that it was unsafe and potentially inefficient, it might be acceptable. On the other hand, it's hard to think of a case where this is a good answer but "just run it in the debugger" isn't a better answer…
Does Jython have to use a mutex and a deque instead of a more efficient (and possibly lock-free) queue from the Java stdlib?
For what it's worth, Jython just uses CPython's queue.Queue implementation, as far as I can tell.
Now that I think about it, that makes sense; if I really need a lock-free thread pool and queue in Jython, I'm probably going to use the native Java executors, not the Python ones, right?
What does multiprocessing.Queue do on each implementation?
In addition to a multiprocessing.Queue, the ProcessPoolExecutor maintains a dict of all submitted work items, so that can be used instead of trying to inspect the queue itself.
Interesting. This implies that supplying an inspectable queue class may not be the best answer here; instead, we could have an option for an inspectable work dict, which would just expose the existing one for ProcessPoolExecutor, while it would make ThreadPoolExecutor maintain an equivalent dict as a thread-local in the launching thread. (I'm assuming you only need to inspect the jobs from the launching process/thread here… I'm not sure if that's sufficient for the OP's intended use or not.)
participants (8)
-
Andrew Barnert -
Antoine Pitrou -
Dan O'Reilly -
Ethan Furman -
Guido van Rossum -
Nick Coghlan -
Ram Rachum -
Ram Rachum