[Python-ideas] Proposal: A simple protocol for generator tasks

Mon Oct 15 12:48:20 CEST 2012

On Sun, Oct 14, 2012 at 11:36 PM, Piet Delport <pjdelport at gmail.com> wrote:
> [This is a lengthy mail; I apologize in advance!]

This is what I get for deciding to check up on these threads at 6AM
after a late night.

> Hi,
>
> I've been following this discussion with great interest, and would like
> to put forward a suggestion that might simplify some of the questions
> that are up in the air.
>
> There are several key point being considered: what exactly constitutes a
> "coroutine" or "tasklet", what the precise semantics of "yield" and
> "yield from" should be, how the stdlib can support different event loops
> and reactors, and how exactly Futures, Deferreds, and other APIs fit
> into the whole picture.
>
> This mail is mostly about the first point: I think everyone agrees
> roughly what a coroutine-style generator is, but there's enough
> variation in how they are used, both historically and presently, that
> the concept isn't as precise as it should be. This makes them hard to
> think and reason about (failing the "BDFL gets headaches" test), and
> makes it harder to define the behavior of all the parts that they
> interact with, too.
>
> This is a sketch of an attempt to define what constitutes a
> generator-based task or coroutine more rigorously: I think that the
> essential behavior can be captured in a small protocol, building on the
> generator and iterator protocols. If anyone else thinks this is a good
> idea, maybe something like this could work its way into a PEP?
>
> (For the sake of this mail, I will use the term "generator task" or
> "task" as a straw man term, but feel free to substitute "coroutine", or
> whatever the preferred name ends up being.)

I like that "task" is more general and avoids complaints from some that
these are not "real" coroutines.

> Definition
> ==========
>
> Very informally: A "generator task" is what you get if you take a normal
> Python function and replace its blocking calls with "yield from" calls
> to equivalent subtasks.

"yield" and "yield from", although I'm really disliking the second
being included
at all. More on this later.

> More formally, a "generator task" is a generator that implements an
> incremental, multi-step computation, and is intended to be externally
> driven to completion by a runner, or "scheduler", until it delivers a
> final result.
>
> This driving process happens as follows:
>
> 1. A generator task is iterated by its scheduler to yield a series of
>    intermediate "step" values.
>
> 2. Each value yielded as a "step" represents a scheduling instruction,
>    or primitive, to be interpreted by the task's scheduler.
>
>    This scheduling instruction can be None ("just resume this task
>    later"), or a variety of other primitives, such as Futures ("resume
>    this task with the result of this Future"); see below for more.
>
> 3. The scheduler is responsible for interpreting each "step" instruction
>    as appropriate, and sending the instruction's result, if any, back to
>    the task using send() or throw().
>
>    A scheduler may run a single task to completion, or may multiplex
>    execution between many tasks: generator tasks should assume that
>    other tasks may have executed while the task was yielding.
>
> 4. The generator task completes by successfully returning (raising
>    StopIteration), or by raising an exception. The task's caller
>    receives this result.
>
> (For the sake of discussion, I use "the scheduler" to refer to whoever
> calls the generator task's next/send/throw methods, and "the task's
> caller" to refer to whoever receives the task's final result, but this
> is not important to the protocol: a task should not care who drives it
> or consumes its result, just like an iterator should not.)
>
>
> Scheduling instructions / primitives
> ====================================
>
> (This could probably use a better name.)
>
> The protocol is intentionally agnostic about the implementation of
> schedulers, event loops, or reactors: as long as they implement the same
> set of scheduling primitives, code should work across them.
>
> There multiple ways to accomplish this, but one possibility is to have a
> set common, generic instructions in a standard library module such as
> "tasklib" (which could also contain things like default scheduler
> implementations, helper functions, and so on).
>
> A partial list of possible primitives (the names are all made up, not
> serious suggestions):
>
> 1. None: The most basic "do nothing" instruction. This just instructs
>    the scheduler to resume the yielding task later.
>
> 2. Futures: Instruct the scheduler to resume with the future's result.
>
>    Similar types in third-party libraries, such Deferreds, could
>    potentially be implemented either natively by a scheduler that
>    supports it, or using a wait_for_deferred(d) helper task, or using
>    the idea of a "adapter" scheduler (see below).
>
> 3. Control primitives: spawn, sleep, etc.
>
>    - Spawn a new (independent) task: yield tasklib.spawn(task())
>    - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
>    - Delay execution: yield tasklib.sleep(seconds)
>    - etc.
>
>    These could be simple marker objects, leaving it up to the underlying
>    scheduler to actually recognize and implement them; some could also
>    be implemented in terms of simpler operations (e.g.  sleep(), in
>    terms of lower-level suspend and resume operations).

What is the difference between the tossed around "yield from task()"
and this "yield tasklib.spawn(task())"

And, why isn't it simply spelled "yield task()"? You have all these different
types that can be yielded to the scheduler from tasks to the scheduler. Why
isn't a task one of those possible types? If the scheduler gets an iterator, it
should schedule it automatically.

> 4. I/O operations
>
>    This could be anything from low-level "yield fd_readable(sock)" style
>    requests, or any of the higher-level APIs being discussed elsewhere.
>
>    Whatever the exact API ends up being, the scheduler should implement
>    these primitives by waiting for the I/O (or condition), and resuming
>    the task with the result, if any.
>
> 5. Cooperative concurrency primitives, for working with locks, condition
>    variables, and so on. (If useful?)

I am sure these will come about, but I think that is considered a
library that sits
on top of whatever API comes out, not part of it.

> 6. Custom, scheduler-specific instructions: Since a generator task can
>    potentially yield anything as a scheduler instruction, it's not
>    inconceivable for specialized schedulers to support specialized
>    instructions. (Code that relies on such special instructions won't
>    work on other schedulers, but that would be the point.)
>
> A question open to debate is what a scheduler should do when faced with
> an unrecognized scheduling instruction.
>
> Raising TypeError or NotImplementedError back into the task is probably
> a reasonable action, and would allow code like:
>
>     def task():
>         try:
>             yield fancy_magic_instruction()
>         except NotImplementedError:
>             yield from boring_fallback()
>         ...

Interesting. Can anyone think of an example of this?

>
> Generator tasks as schedulers, and vice versa
> =============================================
>
> Note that there is a symmetry to the protocol when a generator task
> calls another using "yield from":
>
>     def task()
>         spam = yield from subtask()
>
> Here, task() is both a generator task, and the effective scheduler for
> subtask(): it "implements" subtask()'s scheduling instructions by
> delegating them to its own scheduler.

As raised above, why not simply "yield subtask()"?

> This is a plain observation on its own, however, it raises one or two
> interesting possibilities for more interesting schedulers implemented as
> generator tasks themselves, including:
>
> - Specialized sub-schedulers that run as a normal task within their
>   parent scheduler, but implement for example weighted or priority
>   queuing of their subtasks, or similar features.

I think that is too messy, you could have so many different scheduler
semantics. Maybe this sort of thing is what your schedule-specific
instructions should be for.

Or, attributes on tasks that schedulers can be known to look for.

> - "Adapter" schedulers that intercept special scheduler instructions
>   (say, Deferreds or other library-specific objects), and implement them
>   using more generic instructions to the underlying scheduler.

I think we can make yielding tasks a direct operation, and still implment
sub-schedulers. They should be more opaque, I think.

> --
> Piet Delport
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>

-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy