[Python-ideas] Thread-safe generators

Nick Coghlan ncoghlan at gmail.com
Mon Apr 17 01:08:22 EDT 2017


On 17 April 2017 at 08:00, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15 April 2017 at 10:45, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> So I'd be opposed to trying to make generator objects natively thread
>> aware - as Stephen notes, the GIL is an implementation detail of
>> CPython, so it isn't OK to rely on it when defining changes to
>> language level semantics (in this case, whether or not it's OK to have
>> multiple threads all calling the same generator without some form of
>> external locking).
>>
>> However, it may make sense to explore possible options for offering a
>> queue.AutoQueue type, where the queue always has a defined maximum
>> size (defaulting to 1), disallows explicit calls to put(), and
>> automatically populates itself based on an iterator supplied to the
>> constructors. Once the input iterator raises StopIteration, then the
>> queue will start reporting itself as being empty.
>
> +1 A generator that can have values pulled from it on different
> threads sounds like a queue to me, so the AutoQueue class that wraps a
> generator seems like a natural abstraction to work with. It also means
> that the cost for thread safety is only paid by those applications
> that need it.

If someone did build something like this, it would be interesting to
benchmark it against a more traditional producer thread model, where
one thread is responsible for adding work items to the queue, while
others are responsible for draining them.

The trick is that an auto-queue would borrow execution time from the
consumer threads when new values are needed, so you'd theoretically
get fewer context switches between threads, but at the cost of
changing the nature of the workload in a given thread, and hence
messing with the working set of objects it has active.

It may also pair well with the concurrent.futures.Executor model,
which is already good for "go handle this predefined list of tasks",
but currently less useful as a replacement for a message queue with a
pool of workers.

Setting the latter up yourself is currently still a bit tedious, since:

1. we don't have a standard threading Pool abstraction in the standard
library, just the one tucked away as part of multiprocessing
2. while queue.Queue has native support for worker pools, we don't
provide a pre-assembled version that makes it easy to say "here is the
producer, here are the consumers, wire them together for me"

There are good reasons for that (mainly that it's hard to come up with
an abstraction that's useful in its own right without becoming so
complex that you're on the verge of reinventing a task manager like
celery or a distributed computation manager like dask), but at the
same time, the notion of "input queue, worker pool, output queue" is
one that comes up a *lot* across different concurrency models, so
there's potential value in providing a low-barrier-to-entry
introduction to that idiom as part of the standard library.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list