[Python-ideas] Proposal: A simple protocol for generator tasks
Ronny.Pfannschmidt at gmx.de
Mon Oct 15 13:39:16 CEST 2012
i like that finally someone is pointing out
how to deal with the *concurrent* part
i have some further notes
* greenlet interaction wanted
since interacting with greenlets is slightly different
* they don’t get the function arguments at greenlet creation time,
but on the first `switch`
generator outer use:
gn = f(*arg, **kwarg)
greenlet outer use:
gr = greenlet.greenlet(f)
* instead of send/next, they always use switch
* `yield` is a function call
-> there is need for a lib to manage the local part
of greenlet operations in any case
(so we should just ensure that the scheduler can
handle their way if `yield`,
but not actually have support/compat code in
the stdlib for their yielding)
* considering regular classes for interaction
since for some protocol implementations
different means might make sense
(this could also be used for the scheduler part of
result -> a protocol for cooperative concurrency
* considering the upcoming pypy transaction module/stm
since using that right could mean "free" parallelism in future
* alternatives for queues/channels are needed
* pools/rate-limiters and other exercises are needed as well
* some kind of default tools for servers are needed
* the stdlib could have a very simple default scheduler
that’s just doing something basic like run all work it can do,
and if it cant block on a io reactor
we just need something that can run() after all has been created
having an api like sheduler.add(gen) would be a plus
(since it would be just like pypy's transaction module)
an example i have in mind is something like
If things go as I planned on my side,
starting in jan/feb 2013 i'll try a prototype implementation
for further comments/actual experimentation.
On 10/15/2012 05:36 AM, Piet Delport wrote:
> [This is a lengthy mail; I apologize in advance!]
> I've been following this discussion with great interest, and would like
> to put forward a suggestion that might simplify some of the questions
> that are up in the air.
> There are several key point being considered: what exactly constitutes a
> "coroutine" or "tasklet", what the precise semantics of "yield" and
> "yield from" should be, how the stdlib can support different event loops
> and reactors, and how exactly Futures, Deferreds, and other APIs fit
> into the whole picture.
> This mail is mostly about the first point: I think everyone agrees
> roughly what a coroutine-style generator is, but there's enough
> variation in how they are used, both historically and presently, that
> the concept isn't as precise as it should be. This makes them hard to
> think and reason about (failing the "BDFL gets headaches" test), and
> makes it harder to define the behavior of all the parts that they
> interact with, too.
> This is a sketch of an attempt to define what constitutes a
> generator-based task or coroutine more rigorously: I think that the
> essential behavior can be captured in a small protocol, building on the
> generator and iterator protocols. If anyone else thinks this is a good
> idea, maybe something like this could work its way into a PEP?
> (For the sake of this mail, I will use the term "generator task" or
> "task" as a straw man term, but feel free to substitute "coroutine", or
> whatever the preferred name ends up being.)
> Very informally: A "generator task" is what you get if you take a normal
> Python function and replace its blocking calls with "yield from" calls
> to equivalent subtasks.
> More formally, a "generator task" is a generator that implements an
> incremental, multi-step computation, and is intended to be externally
> driven to completion by a runner, or "scheduler", until it delivers a
> final result.
> This driving process happens as follows:
> 1. A generator task is iterated by its scheduler to yield a series of
> intermediate "step" values.
> 2. Each value yielded as a "step" represents a scheduling instruction,
> or primitive, to be interpreted by the task's scheduler.
> This scheduling instruction can be None ("just resume this task
> later"), or a variety of other primitives, such as Futures ("resume
> this task with the result of this Future"); see below for more.
> 3. The scheduler is responsible for interpreting each "step" instruction
> as appropriate, and sending the instruction's result, if any, back to
> the task using send() or throw().
> A scheduler may run a single task to completion, or may multiplex
> execution between many tasks: generator tasks should assume that
> other tasks may have executed while the task was yielding.
> 4. The generator task completes by successfully returning (raising
> StopIteration), or by raising an exception. The task's caller
> receives this result.
> (For the sake of discussion, I use "the scheduler" to refer to whoever
> calls the generator task's next/send/throw methods, and "the task's
> caller" to refer to whoever receives the task's final result, but this
> is not important to the protocol: a task should not care who drives it
> or consumes its result, just like an iterator should not.)
> Scheduling instructions / primitives
> (This could probably use a better name.)
> The protocol is intentionally agnostic about the implementation of
> schedulers, event loops, or reactors: as long as they implement the same
> set of scheduling primitives, code should work across them.
> There multiple ways to accomplish this, but one possibility is to have a
> set common, generic instructions in a standard library module such as
> "tasklib" (which could also contain things like default scheduler
> implementations, helper functions, and so on).
> A partial list of possible primitives (the names are all made up, not
> serious suggestions):
> 1. None: The most basic "do nothing" instruction. This just instructs
> the scheduler to resume the yielding task later.
> 2. Futures: Instruct the scheduler to resume with the future's result.
> Similar types in third-party libraries, such Deferreds, could
> potentially be implemented either natively by a scheduler that
> supports it, or using a wait_for_deferred(d) helper task, or using
> the idea of a "adapter" scheduler (see below).
> 3. Control primitives: spawn, sleep, etc.
> - Spawn a new (independent) task: yield tasklib.spawn(task())
> - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
> - Delay execution: yield tasklib.sleep(seconds)
> - etc.
> These could be simple marker objects, leaving it up to the underlying
> scheduler to actually recognize and implement them; some could also
> be implemented in terms of simpler operations (e.g. sleep(), in
> terms of lower-level suspend and resume operations).
> 4. I/O operations
> This could be anything from low-level "yield fd_readable(sock)" style
> requests, or any of the higher-level APIs being discussed elsewhere.
> Whatever the exact API ends up being, the scheduler should implement
> these primitives by waiting for the I/O (or condition), and resuming
> the task with the result, if any.
> 5. Cooperative concurrency primitives, for working with locks, condition
> variables, and so on. (If useful?)
> 6. Custom, scheduler-specific instructions: Since a generator task can
> potentially yield anything as a scheduler instruction, it's not
> inconceivable for specialized schedulers to support specialized
> instructions. (Code that relies on such special instructions won't
> work on other schedulers, but that would be the point.)
> A question open to debate is what a scheduler should do when faced with
> an unrecognized scheduling instruction.
> Raising TypeError or NotImplementedError back into the task is probably
> a reasonable action, and would allow code like:
> def task():
> yield fancy_magic_instruction()
> except NotImplementedError:
> yield from boring_fallback()
> Generator tasks as schedulers, and vice versa
> Note that there is a symmetry to the protocol when a generator task
> calls another using "yield from":
> def task()
> spam = yield from subtask()
> Here, task() is both a generator task, and the effective scheduler for
> subtask(): it "implements" subtask()'s scheduling instructions by
> delegating them to its own scheduler.
> This is a plain observation on its own, however, it raises one or two
> interesting possibilities for more interesting schedulers implemented as
> generator tasks themselves, including:
> - Specialized sub-schedulers that run as a normal task within their
> parent scheduler, but implement for example weighted or priority
> queuing of their subtasks, or similar features.
> - "Adapter" schedulers that intercept special scheduler instructions
> (say, Deferreds or other library-specific objects), and implement them
> using more generic instructions to the underlying scheduler.
> Piet Delport
> Python-ideas mailing list
> Python-ideas at python.org
More information about the Python-ideas