[Python-ideas] The async API of the future: yield-from

Fri Oct 19 22:10:00 CEST 2012

On 10/19/12, Calvin Spealman <ironfroggy at gmail.com> wrote:
> On Thu, Oct 18, 2012 at 11:46 PM, Jim Jewett <jimjjewett at gmail.com> wrote:

>> [I think the yield solutions are (too magic)/(prematurely lock too
>> much policy) to be "The" API, but work fine as "an example API"]

> I think it is important that this is more than convention. ... Our
> focus should be not on providing simple things like "async file read" but
> crafting an environment where people can continue to write wonderfully
> expressive and useful libraries that others can combine to their own needs.

And I think that adding  (requirements for generator usage) / (implied
meaning of yield) prevents that.

>> On 10/12/12, Guido van Rossum <guido at python.org> wrote:

>>> But the only use for send() on a generator is when using it as a
>>> coroutine for a concurrent tasks system -- send() really makes no
>>> sense for generators used as iterators.

>> But the data doesn't have to be scheduling information; it can be new
>> data, a seed for an algorithm, a command to switch or reset the state
>> ... locking it to the scheduler is part of what worries me.

> When a coroutine yields, it yields *to the scheduler* so for whom else
> should these values be?

Who says that there has to be a scheduler?  Or at least a single scheduler?

To me, the "obvious" solution is that each co-routine is "scheduled"
only by its own caller, and runs on its own micro-thread.  The caller
thread may or may not wait for a result to be yielded, but would not
normally wait for the entire generator to be exhausted forever (the
"return").

The next call to the co-routine may well be from an entirely different
caller, particularly if the co-routine is a generic source or sink.

There may well be several other co-routines (as opposed to a single
scheduler) that enforce policy, and may send messages about things
like "switch to that source of randomness", "start using this other
database instance as a backup", "stop listening on that port".  They
would certainly want to use throw, and perhaps send as well.

In practice, creating a full thread for each such co-routine probably
won't work well under current threading systems, because an OS thread
(let alone an OS process) is too heavy-weight.  And without OS
support, python has to do some internal scheduling.  But I'm not
convinced that the current situation will last forever, so I don't
want to muddy up the *abstraction* just to coddle temporary
limitations.

>> But with the yield channel reserved for scheduling overhead, the
>> "generator" can't really generate anything, except through side
>> effects...

> Don't forget that yield-from is an expression, not a statement. The
> value eventually returned from the generator is the result of the
> yield-from, so the generator still produces a final value.

Assuming it terminates, then yes.  But that isn't (conceptually) a
generator; it is an ordinary function call.

> The fact that these are generators is for their ability to suspend, not to
> iterate.

So "generator" is not really the right term.  Abusing that for one
module is acceptable, but I'm reluctant to bake that change into an
officially sanctioned API, let alone one important enough that it
might eventually be seen as the primary definition.

>> * "mostly", because if my task is willing to wait for the subtask to
>> complete, then why not just use a blocking call in the first place?
>> Is it just because switching to another task is lighter weight than
>> letting a thread block?

> By blocking call do you mean "x = foo()" or "x = yield from foo()"?
> Blocking call usually means the former, so if you mean that, then you
> neglect to think of all the other tasks running which are not willing to wait.

Exactly.  From my own code's perspective, is there any difference
between those two?  (Well, besides the fact that the second is
wordier, and puts more constraints on what I can use for foo.)

So why not just use the first spelling, let the (possibly OS-level)
scheduler notice that I'm blocked (if I happen to be), and let it
suspend my thread waiting on foo?

Is it just that *current* ways to suspend a thread of execution are
expensive, and we hope to do it more cheaply?  If so, that is a
perfectly sensible justification for conventions within a single
stdlib module.  But since the trade-offs may change with time, the
current costs shouldn't drive decisions about the async API, let alone
changes to the meaning of "yield" or "generator".

>> [Questions about generators that do not follow the new constraints]

> I think if the scheduler doesn't know what to do with something, it should
> be an error. That makes it easier to change things in the future.

Those were all things that could reasonably happen simply by reusing
correct existing code.

For a specific implementation, even a stdlib module, it is OK to treat
them as errors;  a specific module can always be viewed as incomplete.

But for "the asynchronous API of the future", undefined behavior just
guarantees warts.  We may eventually decide that the warts are in the
existing legacy code, but there would still be warts.

-jJ