[Python-Dev] collections module

Sat Jan 10 00:06:14 EST 2004

[Tim]
>> Maybe; it probably doesn't matter; serious threaded apps pass
>> maxsize to Queue, generally equal to the # of worker threads (or a
>> small constant multiple thereof), and the bounded queue does
>> double-duty then as a flow-control throttle too; list.pop(0) isn't
>> really slow when len(list) is that small.

[Josiah Carlson]
> It really all depends on how fast the producers and consumers are.

Mediating differences in speed is what "flow-control throttle" means.  A
"serious threaded app" can't allow speed differences to lead to unbounded
resource consumption, and a bounded queue is a wonderful solution to that.

> In the cases where I've used Queue.Queue, I didn't have small Queues.

That doesn't demonstrate you had a sane design <wink>.

>>> and a single conditional,

>> Sorry, I don't know what that might mean.

> oops, I misspelled it, threading.Condition()

Ah, got it.

> ...
> The below is quite fast for most reasonable numbers of threads - it
> does wake all threads up that were waiting, but I find it considerably
> clearer, and can be done with a single synchronization primitive.

I think you mean a single synchronization instance variable.  As I noted
last time, Queue was written before Condition variables-- or even the
threading module --existed.  (Note that it still imports the ancient
low-level 'thread' module instead.)  There's no question about condvars
being clearer than raw locks for anything fancier than a plain critical
section, and I'd be in favor of rewriting Queue.py to use condvars, provided
nothing important is lost.

> from threading import Condition
> #assuming fifo exists and is fast
>
> class CQueue:
>     def __init__(self, maxsize):
>         if maxsize > 0:
>             self._m = maxsize
>         else:
>             #big enough, but Max would be better
>             self._m = 2**64

The way the code uses it, -1 would work fine -- even on 128-bit machines
<wink>.

>         self._q = fifo()
>         self._l = Condition()
>     def qsize(self):
>         self._l.acquire()
>         l = len(self._q)
>         self._l.release()
>         return l
>     def empty(self):
>         return self.qsize() == 0
>     def full(self):
>         return self.qsize() == self._m
>     def put(self, item, block=1):
>         try:
>             self._l.acquire()

That line should go before the "try:" (if the acquire() attempt raises an
exception for some reason, the "finally:" block will try to release an
unacquired condvar, raising a new exception, and so masking the real
problem).

>             while block and len(self._q) == self._m:
>                 self._l.wait()
>                 self._l.release()
>                 self._l.acquire()

Note that the last two lines there aren't necessary.  Or, if they are, this
is much more obscure than Queue.py <wink>.  wait() releases the associated
lock as part of waiting, and reacquires it as part of waking up.

>             if len(self._q) < self._m:
>                 self._q.put(item)
>                 self._l.notifyAll()

It's not clear why this isn't plain notify().  Note that if self._q.put()
raises an exception (for example, MemoryError if the box is plain of out
memory), no notify is done, and so other threads blocked in put() and/or
get() that *could* make progress aren't woken up.  Queue.py endures
excruciating pain to try to prevent endcase glitches like that.

>             else:
>                 raise Full
>         finally:
>             self._l.release()

> ... [similar stuff for get()] ...

Part of the simplification here is due to the use of a condvar.  A great
deal more is due to that it's simply ignoring some of the bells and whistles
Queue.py has sprouted:  a defined protocol for subclassing so that "the real
queue" can have any concrete representation and semantics (e.g., it's easy
to make a stack-like Queue subclass now, or a priority-queue-like Queue
subclass) without the subclasses needing to know anything about the thread
convolutions); optional timeouts on get and put; and anal attention paid to
the consequences of "unexpected" exceptions.

A possible advantage of the condvar logic is that it doesn't raise Full
unless the queue is actually full, or Empty unless the queue is actually
empty.  The current code can raise those now for another reason, and users
have a hard time understanding this:  self.mutex just doesn't happen to be
instantly available.  OTOH, that's also a feature, taking "don't block" very
seriously, not even waiting for a queue operation that happens to take a
long time.  For example, one thread gets into self._q.put(item) and loses
its timeslice.  Some other thread tries to do put(item, block=False).  Under
Queue.py today, the latter thread raises Full at once, because it can't
instantly acquire the mutex.  Under the condvar scheme, it does block, and
*may* block for a long time (the former thread has to get enough timeslices
to finish its _q.put(item), get through its self._l.release(), and the
latter thread may not acquire the condvar even then (any number of other
threads may get it first)).  The tradeoffs here are murky, but the condvar
scheme would be a change in actual behavior in such cases, so they can't be
ignored.

> ...
> I agree completely, most locking mechanisms are slow as hell.  There
> are some tricks that can be done with critical sections (or its
> equivalent on multiple platforms); an article was written about
> multithreaded locking performance on IBM at least a year ago.  I
> wonder if any of those performance-tuning semantics can be used with
> the underlying locks in Python.

[and later]
> One thing that bothered me about using the three mutexes as given
> in Queue.Queue is that it relies on the fact that a thread can
> unlock an underlying thread.lock that it didn't acquire.  To me,
> this is strange behavior.

All those are related.  If you port Python to a new thread implementation,
there's only one synchronization object you have to implement in C:
thread.lock.  While it may seem strange to you, it's a "universal"
primitive, meaning that all other synch primitives can be built on top of it
(and, in Python, they are).  When Guido was designing Python, "the Python
lock" was the same synch primitive used in the OS he was working on at the
time, and it didn't seem strange to me either because it was also the only
synch primitive we built into hardware at Kendall Square Research at the
time (on that machine, Python's lock.acquire() and lock.release() were each
one machine instruction!).  So I didn't gripe either at the time, especially
seeing as KSR machines were clearly destined to take over the world <wink>.

Note that the various synch objects supplied by threading.py (like RLocks
and Conditions) are supplied via factory functions, not directly by Python
classes.  This was deliberate, so that someone motivated enough *could*
replace them by faster platform-specific implementations, and not have to
worry about breaking user-defined subclasses.  For example, RLocks are
supplied natively by Windows, so a much faster RLock gimmick *could* be
implemented specific to Windows.

> Oh, by adding a second Condition(), you don't need to notify everyone
> and can replace the 'while' uses with 'if'.  Even faster.

You can't replace 'while' with 'if' then (think about it, and note that the
docs explicitly warn that .notify() *may* wake more than one thread), and
then you're well on the way to reinventing esema and fsema, with the current
self.mutex being the lock shared by your two condvars.  They'd be clearer as
condvars, though, and it's always a good idea to name a "condition variable"
after the *condition* it's modelling (like not_full or not_empty).