On 10/19/12, Calvin Spealman firstname.lastname@example.org wrote:
On Thu, Oct 18, 2012 at 11:46 PM, Jim Jewett email@example.com wrote:
[I think the yield solutions are (too magic)/(prematurely lock too much policy) to be "The" API, but work fine as "an example API"]
I think it is important that this is more than convention. ... Our focus should be not on providing simple things like "async file read" but crafting an environment where people can continue to write wonderfully expressive and useful libraries that others can combine to their own needs.
And I think that adding (requirements for generator usage) / (implied meaning of yield) prevents that.
On 10/12/12, Guido van Rossum firstname.lastname@example.org wrote:
But the only use for send() on a generator is when using it as a coroutine for a concurrent tasks system -- send() really makes no sense for generators used as iterators.
But the data doesn't have to be scheduling information; it can be new data, a seed for an algorithm, a command to switch or reset the state ... locking it to the scheduler is part of what worries me.
When a coroutine yields, it yields *to the scheduler* so for whom else should these values be?
Who says that there has to be a scheduler? Or at least a single scheduler?
To me, the "obvious" solution is that each co-routine is "scheduled" only by its own caller, and runs on its own micro-thread. The caller thread may or may not wait for a result to be yielded, but would not normally wait for the entire generator to be exhausted forever (the "return").
The next call to the co-routine may well be from an entirely different caller, particularly if the co-routine is a generic source or sink.
There may well be several other co-routines (as opposed to a single scheduler) that enforce policy, and may send messages about things like "switch to that source of randomness", "start using this other database instance as a backup", "stop listening on that port". They would certainly want to use throw, and perhaps send as well.
In practice, creating a full thread for each such co-routine probably won't work well under current threading systems, because an OS thread (let alone an OS process) is too heavy-weight. And without OS support, python has to do some internal scheduling. But I'm not convinced that the current situation will last forever, so I don't want to muddy up the *abstraction* just to coddle temporary limitations.
But with the yield channel reserved for scheduling overhead, the "generator" can't really generate anything, except through side effects...
Don't forget that yield-from is an expression, not a statement. The value eventually returned from the generator is the result of the yield-from, so the generator still produces a final value.
Assuming it terminates, then yes. But that isn't (conceptually) a generator; it is an ordinary function call.
The fact that these are generators is for their ability to suspend, not to iterate.
So "generator" is not really the right term. Abusing that for one module is acceptable, but I'm reluctant to bake that change into an officially sanctioned API, let alone one important enough that it might eventually be seen as the primary definition.
- "mostly", because if my task is willing to wait for the subtask to
complete, then why not just use a blocking call in the first place? Is it just because switching to another task is lighter weight than letting a thread block?
By blocking call do you mean "x = foo()" or "x = yield from foo()"? Blocking call usually means the former, so if you mean that, then you neglect to think of all the other tasks running which are not willing to wait.
Exactly. From my own code's perspective, is there any difference between those two? (Well, besides the fact that the second is wordier, and puts more constraints on what I can use for foo.)
So why not just use the first spelling, let the (possibly OS-level) scheduler notice that I'm blocked (if I happen to be), and let it suspend my thread waiting on foo?
Is it just that *current* ways to suspend a thread of execution are expensive, and we hope to do it more cheaply? If so, that is a perfectly sensible justification for conventions within a single stdlib module. But since the trade-offs may change with time, the current costs shouldn't drive decisions about the async API, let alone changes to the meaning of "yield" or "generator".
[Questions about generators that do not follow the new constraints]
I think if the scheduler doesn't know what to do with something, it should be an error. That makes it easier to change things in the future.
Those were all things that could reasonably happen simply by reusing correct existing code.
For a specific implementation, even a stdlib module, it is OK to treat them as errors; a specific module can always be viewed as incomplete.
But for "the asynchronous API of the future", undefined behavior just guarantees warts. We may eventually decide that the warts are in the existing legacy code, but there would still be warts.