On Mon, Oct 15, 2012 at 12:48 PM, Calvin Spealman
What is the difference between the tossed around "yield from task()" and this "yield tasklib.spawn(task())"
"yield from task()" is simply the coroutine / task version of a function call: it runs the task to completion, and returns its final result. "yield tasklib.spawn(task())" (or however it ends up being spelled) would be a scheduler primitive to start a task *without* waiting for its result: in other words, it's a request that the scheduler start a new, independent thread of control.
And, why isn't it simply spelled "yield task()"? You have all these different types that can be yielded to the scheduler from tasks to the scheduler. Why isn't a task one of those possible types? If the scheduler gets an iterator, it should schedule it automatically.
This is a good question: I stopped short of discussing it in the original message only to keep it short, and in the hope that the answer is implied. The short answer is that "yield task()" is the old, hacky, cumbersome, "legacy"[1] way of calling subtasks, and that "yield from" should entirely replace the need to have to support it. Before "yield from", "yield task()" was the only to call subtasks, but this approach has some major disadvantages: 1. In order for it to work, schedulers must manually implement task trampolining, which is ugly at best, and prone to bugs if not all edge cases are handled correctly. (IOW, it effectively places the burden of implementing PEP 380 onto each scheduler.) 2. It obfuscates exception tracebacks by default, requiring schedulers that want readable stack traces to take additional pains to clean up their own non-task frames, while propagating exceptions. 3. It requires schedulers to reliably distinguish between tasks and other primitives in the first place. Simply treating all iterators as tasks is not sufficient: to run a task, you need send() and throw(), at least. (Type-checking for GeneratorType would be marginally better, but would unnecessarily preclude for example implementing tasks as classes or C extension types, which is otherwise entirely possible with this protocol.) "yield from" simplifies and solves all these problems in elegant swoop: 1. No more manual trampolining: a scheduler can treat any task as a single unit, and only needs to worry about the single, combined stream of instructions coming from it. 2. Tracebacks (and return values) take care of themselves, as they should. 3. By separating the concerns of direct scheduler communication ("yield") and subtask delegation ("yield from"), schedulers can limit themselves to just knowing about scheduler primitives when dealing yielded values, which should be more easily and tightly defined than the full spectrum of tasks in general. (The set of officially-defined scheduler instructions could end up being as small as None and Future, say.) In summary, it's entirely possible for schedulers to continue supporting the old "yield task()" way of calling subtasks (and this has no problem fitting into the proposed protocol[2]), but there should be no reason to do so, and several good reasons not to: hopefully, it will become a pre-3.3 historical footnote. [1] For the purposes of this email, interpret "legacy" to mean "older than 17 days". :) [2] Interpreted as a scheduler instruction, a task value would simply mean "resume the current task with the result of completing the yielded subtask" (modulo the practical question of reliably type-checking tasks, as mentioned).
Raising TypeError or NotImplementedError back into the task is probably a reasonable action, and would allow code like:
def task(): try: yield fancy_magic_instruction() except NotImplementedError: yield from boring_fallback() ...
Interesting. Can anyone think of an example of this?
I just want to note for the record that I'm not *encouraging* this kind of thing: I'm just just observing that it would be allowed by the protocol. (However, one imaginable use case would be for tasks to send scheduler-specific hints, that can safely be ignored when those tasks are running on other scheduler implementations.)
This is a plain observation on its own, however, it raises one or two interesting possibilities for more interesting schedulers implemented as generator tasks themselves, including:
- Specialized sub-schedulers that run as a normal task within their parent scheduler, but implement for example weighted or priority queuing of their subtasks, or similar features.
I think that is too messy, you could have so many different scheduler semantics. Maybe this sort of thing is what your schedule-specific instructions should be for.
It shouldn't get messy: the core semantics of any scheduler should always stay within the proposed protocol. The above is not the best example of a custom scheduler, though. Perhaps a better example would be a generic helper function like the following, that implements throttling throttling of I/O requests made through it: def task(): result = yield from io_throttled(subtask(), rate=foo) io_throttled() would end up sitting between task() and subtask() in the hierarchy, like so: ... -> task() -> io_throttled() -> subtask() -> ... To recap, each task is implicitly driven by the scheduler above it, and implicitly drives the task(s) below it: The outer scheduler drives task(), which drives io_throttled(), which drives subtask(), and so on. In this picture: "yield from" is the "most default" scheduler: it simply delegates all yielded instructions to the outer scheduler. However, instead of relying on "yield from", io_throttled() can dip down into the task protocol itself, and drive subtask() directly. This would allow it to inspect and manipulate the underlying instructions instructions and responses flowing back and forth, and, assuming that there's a recognizable standard representation for I/O primitives, it could keep track of the rate of I/O, and insert delay instructions as necessary (or something similar). The key observations I want to make: * io_throttled() is not special: it is just a normal task, as far as the tasks above and below it are concerned, and assumes only a recognizable representation of the fundamental I/O and delay instructions used. * To the extent that said underlying primitives are scheduler-agnostic, io_throttled() can be used or inserted anywhere, without caring how the underlying scheduler or event loop handles I/O, or how its global API looks. It just acts locally, in terms of the task protocol. An example where this kind of thing might actually be useful is an application or library that wishes to throttle, say, certain HTTP requests: it could simply internally wrap the tasks that make those requests in io_throttled(), without any special support from the underlying scheduler. This is of course not the only way to solve this particular problem, but it's an example of how thinking about generator tasks and their schedulers as two sides of the same underlying protocol could be a powerful abstraction, enabling a compositional approach to combining implementations of the protocol that might not be obvious or possible otherwise. -- Piet Delport