The multi-processing discussion reminded me that I have a few problems I run into every time I try to use Queue objects. My first problem is finding it: Py> from threading import Queue # Nope Traceback (most recent call last): File "<stdin>", line 1, in ? ImportError: cannot import name Queue Py> from Queue import Queue # Ah, there it is What do people think of the idea of adding an alias to Queue into the threading module so that: a) the first line above works; and b) Queue can be documented with all of the other threading primitives, rather than being off somewhere else in its own top-level section. My second problem is with the current signatures of the put() and get() methods. Specifically, the following code blocks forever instead of raising an Empty exception after 500 milliseconds as one might expect: from Queue import Queue x = Queue() x.get(0.5) I assume the current signature is there for backward compatibility with the original version that didn't support timeouts (considering the difficulty of telling the difference between "x.get(1)" and "True = 1; x.get(True)" from inside the get() method) However, the need to write "x.get(True, 0.5)" seems seriously redundant, given that a single paramater can actually handle all the options (as is currently the case with Condition.wait()). The "put_nowait" and "get_nowait" functions are fine, because they serve a useful documentation purpose at the calling point (particularly given the current clumsy timeout signature). What do people think of the idea of adding "put_wait" and "get_wait" methods with the signatures: put_wait(item,[timeout=None) get_wait([timeout=None]) Optionally, the existing "put" and "get" methods could be deprecated, with the goal of eventually changing their signature to match the put_wait and get_wait methods above. If people are amenable to these ideas, I should be able to work up a patch for them this week. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
On 10/11/05, Nick Coghlan <ncoghlan@iinet.net.au> wrote:
The multi-processing discussion reminded me that I have a few problems I run into every time I try to use Queue objects.
My first problem is finding it:
Py> from threading import Queue # Nope Traceback (most recent call last): File "<stdin>", line 1, in ? ImportError: cannot import name Queue Py> from Queue import Queue # Ah, there it is
I don't think that's a reason to move it.
from sys import Queue ImportError: cannon import name Queue from os import Queue ImportError: cannot import name Queue # Well where the heck is it?!
What do people think of the idea of adding an alias to Queue into the threading module so that: a) the first line above works; and
I see no need. Code that *doesn't* need Queue but does use threading shouldn't have to pay for loading Queue.py.
b) Queue can be documented with all of the other threading primitives, rather than being off somewhere else in its own top-level section.
Do top-level sections have to limit themselves to a single module? Even if they do, I think it's fine to plant a prominent link to the Queue module. You can't really expect people to learn how to use threads wisely from reading the library reference anyway.
My second problem is with the current signatures of the put() and get() methods. Specifically, the following code blocks forever instead of raising an Empty exception after 500 milliseconds as one might expect: from Queue import Queue x = Queue() x.get(0.5)
I'm not sure if I have much sympathy with a bug due to refusing to read the docs... :)
I assume the current signature is there for backward compatibility with the original version that didn't support timeouts (considering the difficulty of telling the difference between "x.get(1)" and "True = 1; x.get(True)" from inside the get() method)
Huh? What a bizarre idea. Why would you do that? I gues I don't understand where you're coming from.
However, the need to write "x.get(True, 0.5)" seems seriously redundant, given that a single paramater can actually handle all the options (as is currently the case with Condition.wait()).
So write x.get(timeout=0.5). That's clear and unambiguous.
The "put_nowait" and "get_nowait" functions are fine, because they serve a useful documentation purpose at the calling point (particularly given the current clumsy timeout signature).
What do people think of the idea of adding "put_wait" and "get_wait" methods with the signatures: put_wait(item,[timeout=None) get_wait([timeout=None])
-1. I'd rather not tweak the current Queue module at all until Python 3000. Then we could force people to use keyword args.
Optionally, the existing "put" and "get" methods could be deprecated, with the goal of eventually changing their signature to match the put_wait and get_wait methods above.
Apart from trying to guess the API without reading the docs (:-), what are the use cases for using put/get with a timeout? I have a feeling it's not that common. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum <guido@python.org> wrote:
Optionally, the existing "put" and "get" methods could be deprecated, with the goal of eventually changing their signature to match the put_wait and get_wait methods above.
Apart from trying to guess the API without reading the docs (:-), what are the use cases for using put/get with a timeout? I have a feeling it's not that common.
With timeout=0, a shared connection/resource pool (perhaps DB, etc., I use one in the tuple space implementation I have for connections to the tuple space). Note that technically speaking, Queue.Queue from Pythons prior to 2.4 is broken: get_nowait() may not get an object even if the Queue is full, this is caused by "elif not self.esema.acquire(0):" being called for non-blocking requests. Tim did more than simplify the structure by rewriting it, he fixed this bug. With block=True, timeout=None, worker threads pulling from a work-to-do queue, and even a thread which handles the output of those threads via a result queue. - Josiah
[Guido]
Apart from trying to guess the API without reading the docs (:-), what are the use cases for using put/get with a timeout? I have a feeling it's not that common.
[Josiah Carlson]
With timeout=0, a shared connection/resource pool (perhaps DB, etc., I use one in the tuple space implementation I have for connections to the tuple space).
Passing timeout=0 is goofy: use {get,put}_nowait() instead. There's no difference in semantics.
Note that technically speaking, Queue.Queue from Pythons prior to 2.4 is broken: get_nowait() may not get an object even if the Queue is full, this is caused by "elif not self.esema.acquire(0):" being called for non-blocking requests. Tim did more than simplify the structure by rewriting it, he fixed this bug.
I don't agree it was a bug, but I did get fatally weary of arguing with people who insisted it was ;-) It's certainly easier to explain (and the code is easier to read) now.
With block=True, timeout=None, worker threads pulling from a work-to-do queue, and even a thread which handles the output of those threads via a result queue.
Guido understands use cases for blocking and non-blocking put/get, and Queue always supported those possibilities. The timeout argument got added later, and it's not really clear _why_ it was added. timeout=0 isn't a sane use case (because the same effect can be gotten with non-blocking put/get).
On 10/11/05, Tim Peters <tim.peters@gmail.com> wrote:
Guido understands use cases for blocking and non-blocking put/get, and Queue always supported those possibilities. The timeout argument got added later, and it's not really clear _why_ it was added. timeout=0 isn't a sane use case (because the same effect can be gotten with non-blocking put/get).
In the socket world, a similar bifurcation of the API has happened (also under my supervision, even though the idea and prototype code were contributed by others). The API there is very different because the blocking or timeout is an attribute of the socket, not passed in to every call. But one lesson we can learn from sockets (or perhaps the reason why people kept asking for timeout=0 to be "fixed" :) is that timeout=0 is just a different way to spell blocking=False. The socket module makes sure that the socket ends up in exactly the same state no matter which API is used; and in fact the setblocking() API is redundant. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
Apart from trying to guess the API without reading the docs (:-), what are the use cases for using put/get with a timeout? I have a feeling it's not that common.
[Josiah Carlson]
With timeout=0, a shared connection/resource pool (perhaps DB, etc., I use one in the tuple space implementation I have for connections to the tuple space).
[Tim Peters]
Passing timeout=0 is goofy: use {get,put}_nowait() instead. There's no difference in semantics.
I understand this, as do many others who use it. However, having both manually and automatically tuned timeouts myself in certain applications, the timeout=0 case is useful. Uncommon? Likely, I've not yet seen any examples of anyone using this particular timeout method at koders.com .
Note that technically speaking, Queue.Queue from Pythons prior to 2.4 is broken: get_nowait() may not get an object even if the Queue is full, this is caused by "elif not self.esema.acquire(0):" being called for non-blocking requests. Tim did more than simplify the structure by rewriting it, he fixed this bug.
I don't agree it was a bug, but I did get fatally weary of arguing with people who insisted it was ;-) It's certainly easier to explain (and the code is easier to read) now.
When getting an object from a non-empty queue fails because some other thread already had the lock, and it is a fair assumption that the other thread will release the lock within the next context switch... Because I still develop on Python 2.3 (I need to support a commercial codebase made with 2.3), I was working around it by using the timeout parameter: try: connection = connection_queue.get(timeout=.000001) except Queue.Empty: connection = make_new_connection() With only get_nowait() calls, by the time I hit 3-4 threads, it was failing to pick up connections even when there were hundreds in the queue, and I quickly ran into the file handle limit for my platform, not to mention that the server I was connecting to used asynchronous sockets and select, which died at the 513th incoming socket. I have since copied the implementation of 2.4's queue into certain portions of code which make use of get_nowait() and its variants (handline the deque reference as necessary). Any time one needs to work around a "not buggy feature" with some claimed "unnecessary feature", it tends to smell less than pristine to my nose.
With block=True, timeout=None, worker threads pulling from a work-to-do queue, and even a thread which handles the output of those threads via a result queue.
Guido understands use cases for blocking and non-blocking put/get, and Queue always supported those possibilities. The timeout argument got added later, and it's not really clear _why_ it was added. timeout=0 isn't a sane use case (because the same effect can be gotten with non-blocking put/get).
def t(): try: #thread state setup... while not QUIT: try: work = q.get(timeout=5) except Queue.Empty: continue #handle work finally: #thread state cleanup... Could the above be daemonized? Certainly, but then the thread state wouldn't be cleaned up. If you can provide me with a way of doing the above with equivalent behavior, using only get_nowait() and get(), then put it in the documentation. If not, then I'd say that the timeout argument is a necessarily useful feature. [Guido]
But one lesson we can learn from sockets (or perhaps the reason why people kept asking for timeout=0 to be "fixed" :) is that timeout=0 is just a different way to spell blocking=False. The socket module makes sure that the socket ends up in exactly the same state no matter which API is used; and in fact the setblocking() API is redundant.
This would suggest to me that at least for sockets, setblocking() could be deprecated, as could the block parameter in Queue. I wouldn't vote for either deprecation, but it would seem to make more sense than to remove the timeout arguments from both. - Josiah
Guido van Rossum wrote:
I see no need. Code that *doesn't* need Queue but does use threading shouldn't have to pay for loading Queue.py.
However, it does seem awkward to have a whole module providing just one small class that logically is so closely related to other threading facilities. What we want in this kind of situation is some sort of autoloading mechanism, so you can import something from a module and have it trigger the loading of another module behind the scenes to provide it. Another place I'd like this is in my PyGUI library, where I want all the commonly-used class names to appear in the top-level package, but ideally not import the code to implement them until they're actually used. There are various ways of hacking up such functionality today, but it would be nice if there were some kind of language or library support for it. Maybe something like a descriptor mechanism for lookups in module namespaces. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
On Wed, Oct 12, 2005, Greg Ewing wrote:
Guido van Rossum wrote:
I see no need. Code that *doesn't* need Queue but does use threading shouldn't have to pay for loading Queue.py.
I'd argue that such code is rare enough (given the current emphasis on Queue) that the performance issue doesn't matter.
However, it does seem awkward to have a whole module providing just one small class that logically is so closely related to other threading facilities.
The problem is that historically Queue did not use ``threading``; it was built directly on top of ``thread``, and people were told to use Queue regardless of whether they were using ``thread`` or ``threading``. Obviously, there is no use case for putting Queue into ``thread``, so off it went into its own module. At this point, my opinion is that we should leave reorganizing the thread stuff until Python 3.0. (Python 3.0 should "deprecate" ``thread`` by renaming it to ``_thread``). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair
On 10/12/05, Aahz <aahz@pythoncraft.com> wrote:
(Python 3.0 should "deprecate" ``thread`` by renaming it to ``_thread``).
+1. (We could even start doing this before 3.0.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Apart from trying to guess the API without reading the docs (:-), what are the use cases for using put/get with a timeout? I have a feeling it's not that common.
Actually, I think wanting to use a timeout is an artifact of a history of dealing with too many C libraries which don't provide a proper event-based or select-style interface (which means the calls have to time out periodically in order to respond gracefully to program shutdown requests). However, because Queues are multi-producer, that isn't a problem - I just have to remember to push the shutdown request in through the Queue. Basically, I'd fallen into the "trying-to-write-C-in-Python" trap and I simply didn't notice until I read the responses in this thread :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
I'd just like to point out that Queue is not quite as useful as people seem to think in this thread. The main problem is that I can't integrate Queue into a select/poll based main loop. The other day I wanted extended a python main loop, which uses poll(), to be thread safe, so I could queue idle functions from separate threads. Obviously Queue doesn't work (no file descriptor to poll), so I just ended up creating a pipe, to which I send a single byte when I want to "wake up" the main loop to make it realize changes in its configuration, such as a new callback added. I guess this is partly an unix problem. There's no system call to say like "wake me up when one of these descriptors has data OR when this condition variable is set". Windows has WaitForMultipleObjects, which I suspect is quite a bit more powerful. Regards. -- Gustavo J. A. M. Carneiro <gjc@inescporto.pt> <gustavo@users.sourceforge.net> The universe is always one step beyond logic.
On 10/13/05, Gustavo J. A. M. Carneiro <gjc@inescporto.pt> wrote:
I'd just like to point out that Queue is not quite as useful as people seem to think in this thread. The main problem is that I can't integrate Queue into a select/poll based main loop.
Well, you're mixing two incompatible paradigms there, so that's to be expected, right? Either you're using async I/O or you're using threads. Mixing the two causes confusion and bugs no matter what you try. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (8)
-
Aahz -
Greg Ewing -
Guido van Rossum -
Gustavo J. A. M. Carneiro -
Josiah Carlson -
Nick Coghlan -
Nick Coghlan -
Tim Peters