[Python-ideas] Proposal: A simple protocol for generator tasks

Ronny Pfannschmidt Ronny.Pfannschmidt at gmx.de
Mon Oct 15 13:39:16 CEST 2012


Hi Piet,

i like that finally someone is pointing out
how to deal with the *concurrent* part

i have some further notes

* greenlet interaction wanted
   since interacting with greenlets is slightly different
   from generators

   * they don’t get the function arguments at greenlet creation time,
     but on the first `switch`

     generator outer use:
       gn = f(*arg, **kwarg)
       gn.next()

     greenlet outer use:
       gr = greenlet.greenlet(f)
       gr.switch(*args, **kw)

   * instead of send/next, they always use switch
   * `yield` is a function call
      -> there is need for a lib to manage the local part
         of greenlet operations in any case

         (so we should just ensure that the scheduler can
          handle their way if `yield`,
          but not actually have support/compat code in
          the stdlib for their yielding)

* considering regular classes for interaction
   since for some protocol implementations
   different means might make sense
   (this could also be used for the scheduler part of
    greenlet interaction)

   result -> a protocol for cooperative concurrency

* considering the upcoming pypy transaction module/stm
   since using that right could mean "free" parallelism in future
* alternatives for queues/channels are needed
* pools/rate-limiters and other exercises are needed as well
* some kind of default tools for servers are needed

* the stdlib could have a very simple default scheduler
   that’s just doing something basic like run all work it can do,
   and if it cant block on a io reactor

   we just need something that can run() after all has been created

   having an api like sheduler.add(gen) would be a plus
   (since it would be just like pypy's transaction module)

   an example i have in mind is something like

     sheduler.add(...)
     sheduler.add(...)
     sheduler.run()




If things go as I planned on my side,
starting in jan/feb 2013 i'll try a prototype implementation
for further comments/actual experimentation.

-- Ronny

On 10/15/2012 05:36 AM, Piet Delport wrote:
> [This is a lengthy mail; I apologize in advance!]
>
> Hi,
>
> I've been following this discussion with great interest, and would like
> to put forward a suggestion that might simplify some of the questions
> that are up in the air.
>
> There are several key point being considered: what exactly constitutes a
> "coroutine" or "tasklet", what the precise semantics of "yield" and
> "yield from" should be, how the stdlib can support different event loops
> and reactors, and how exactly Futures, Deferreds, and other APIs fit
> into the whole picture.
>
> This mail is mostly about the first point: I think everyone agrees
> roughly what a coroutine-style generator is, but there's enough
> variation in how they are used, both historically and presently, that
> the concept isn't as precise as it should be. This makes them hard to
> think and reason about (failing the "BDFL gets headaches" test), and
> makes it harder to define the behavior of all the parts that they
> interact with, too.
>
> This is a sketch of an attempt to define what constitutes a
> generator-based task or coroutine more rigorously: I think that the
> essential behavior can be captured in a small protocol, building on the
> generator and iterator protocols. If anyone else thinks this is a good
> idea, maybe something like this could work its way into a PEP?
>
> (For the sake of this mail, I will use the term "generator task" or
> "task" as a straw man term, but feel free to substitute "coroutine", or
> whatever the preferred name ends up being.)
>
>
> Definition
> ==========
>
> Very informally: A "generator task" is what you get if you take a normal
> Python function and replace its blocking calls with "yield from" calls
> to equivalent subtasks.
>
> More formally, a "generator task" is a generator that implements an
> incremental, multi-step computation, and is intended to be externally
> driven to completion by a runner, or "scheduler", until it delivers a
> final result.
>
> This driving process happens as follows:
>
> 1. A generator task is iterated by its scheduler to yield a series of
>     intermediate "step" values.
>
> 2. Each value yielded as a "step" represents a scheduling instruction,
>     or primitive, to be interpreted by the task's scheduler.
>
>     This scheduling instruction can be None ("just resume this task
>     later"), or a variety of other primitives, such as Futures ("resume
>     this task with the result of this Future"); see below for more.
>
> 3. The scheduler is responsible for interpreting each "step" instruction
>     as appropriate, and sending the instruction's result, if any, back to
>     the task using send() or throw().
>
>     A scheduler may run a single task to completion, or may multiplex
>     execution between many tasks: generator tasks should assume that
>     other tasks may have executed while the task was yielding.
>
> 4. The generator task completes by successfully returning (raising
>     StopIteration), or by raising an exception. The task's caller
>     receives this result.
>
> (For the sake of discussion, I use "the scheduler" to refer to whoever
> calls the generator task's next/send/throw methods, and "the task's
> caller" to refer to whoever receives the task's final result, but this
> is not important to the protocol: a task should not care who drives it
> or consumes its result, just like an iterator should not.)
>
>
> Scheduling instructions / primitives
> ====================================
>
> (This could probably use a better name.)
>
> The protocol is intentionally agnostic about the implementation of
> schedulers, event loops, or reactors: as long as they implement the same
> set of scheduling primitives, code should work across them.
>
> There multiple ways to accomplish this, but one possibility is to have a
> set common, generic instructions in a standard library module such as
> "tasklib" (which could also contain things like default scheduler
> implementations, helper functions, and so on).
>
> A partial list of possible primitives (the names are all made up, not
> serious suggestions):
>
> 1. None: The most basic "do nothing" instruction. This just instructs
>     the scheduler to resume the yielding task later.
>
> 2. Futures: Instruct the scheduler to resume with the future's result.
>
>     Similar types in third-party libraries, such Deferreds, could
>     potentially be implemented either natively by a scheduler that
>     supports it, or using a wait_for_deferred(d) helper task, or using
>     the idea of a "adapter" scheduler (see below).
>
> 3. Control primitives: spawn, sleep, etc.
>
>     - Spawn a new (independent) task: yield tasklib.spawn(task())
>     - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
>     - Delay execution: yield tasklib.sleep(seconds)
>     - etc.
>
>     These could be simple marker objects, leaving it up to the underlying
>     scheduler to actually recognize and implement them; some could also
>     be implemented in terms of simpler operations (e.g.  sleep(), in
>     terms of lower-level suspend and resume operations).
>
> 4. I/O operations
>
>     This could be anything from low-level "yield fd_readable(sock)" style
>     requests, or any of the higher-level APIs being discussed elsewhere.
>
>     Whatever the exact API ends up being, the scheduler should implement
>     these primitives by waiting for the I/O (or condition), and resuming
>     the task with the result, if any.
>
> 5. Cooperative concurrency primitives, for working with locks, condition
>     variables, and so on. (If useful?)
>
> 6. Custom, scheduler-specific instructions: Since a generator task can
>     potentially yield anything as a scheduler instruction, it's not
>     inconceivable for specialized schedulers to support specialized
>     instructions. (Code that relies on such special instructions won't
>     work on other schedulers, but that would be the point.)
>
> A question open to debate is what a scheduler should do when faced with
> an unrecognized scheduling instruction.
>
> Raising TypeError or NotImplementedError back into the task is probably
> a reasonable action, and would allow code like:
>
>      def task():
>          try:
>              yield fancy_magic_instruction()
>          except NotImplementedError:
>              yield from boring_fallback()
>          ...
>
>
> Generator tasks as schedulers, and vice versa
> =============================================
>
> Note that there is a symmetry to the protocol when a generator task
> calls another using "yield from":
>
>      def task()
>          spam = yield from subtask()
>
> Here, task() is both a generator task, and the effective scheduler for
> subtask(): it "implements" subtask()'s scheduling instructions by
> delegating them to its own scheduler.
>
> This is a plain observation on its own, however, it raises one or two
> interesting possibilities for more interesting schedulers implemented as
> generator tasks themselves, including:
>
> - Specialized sub-schedulers that run as a normal task within their
>    parent scheduler, but implement for example weighted or priority
>    queuing of their subtasks, or similar features.
>
> - "Adapter" schedulers that intercept special scheduler instructions
>    (say, Deferreds or other library-specific objects), and implement them
>    using more generic instructions to the underlying scheduler.
>
>
> --
> Piet Delport
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas




More information about the Python-ideas mailing list