Re: [Python-Dev] ActiveState & fork & Perl

Moving back in time ... [GordonM]
Perhaps Christian's stackless Python would enable green threads...
[Guido]
I didn't understand this. If I/O calls are left alone, and a green thread hit one, the whole program just sits there waiting for the call to complete, right? But if the same thing happens using "real threads" today, the same thing happens today anyway <wink>. That is, if a thread doesn't release the global lock before a blocking call today, the whole program just sits there etc. Or do you have some other kind of problem in mind here? unconvincedly y'rs - tim

Still trying to make the brain shift from out-of-town to back-to-work... Tim> [GordonM] >> Perhaps Christian's stackless Python would enable green threads... What's a green thread? Skip

What's a green thread?
a user-level thread (essentially what you can implement yourself by swapping stacks, etc). it's enough to write smoothly running threaded programs, but not enough to support true concurrency on multiple processors. also see: http://www.sun.com/solaris/java/wp-java/4.html </F>

Skip Montanaro wrote:
Nano-Threads. Threadless threads, solely Python driven, no system threads needed but possible. Think of the "big" system threads where each can run any number of tiny Python threads. Powered by snake oil - ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

[Tim]
OK, I'll explain. Suppose there's a wrapper for a read() call whose essential code looks like this: Py_BEGIN_ALLOW_THREADS n = read(fd, buffer, size); Py_END_ALLOW_THREADS When the read() call is made, other threads can run. However in green threads (e.g. using Christian's stackless Python, where a thread switcher is easily added) the whole program would block at this point. The way to fix this is to have a way to tell the scheduler "come back to this thread when there's input ready on this fd". The scheduler has to combine such calls from all threads into a single giant select. It gets more complicated when you have blocking I/O wrapped in library functions, e.g. gethostbyname() or fread(). Then, you need to have a way to implement sleep() by talking to the thread schedule (remember, this is the thread scheduler we have to write ourselves). Oh, and of course the thread scheduler must also have a select() lookalike API so I can still implement the select module. Does this help? Or am I misunderstanding your complaint? Or is a <wink> missing? --Guido van Rossum (home page: http://www.python.org/~guido/)

[Tim, claims not to understand Guido's
] [Guido replies, sketching an elaborate scheme for making threads that are fake nevertheless act like real threads in the particular case of potentially blocking I/O calls]
No missing wink; I think it hinges on a confusion about the meaning of your original word "useful". Threads can be very useful purely as a means for algorithm structuring, due to independent control flows. Indeed, I use threads in Python most often these days without any hope or even *use* for potential parallelism (overlapped I/O or otherwise). It's the only non-brain-busting way to write code now that requires advanced control of the iterator, generator, coroutine, or even independent-agents-in-a-pipeline flavors. Fake threads would allow code like that to run portably, and also likely faster than with the overheads of OS-level threads. For pedagogical and debugging purposes too, fake threads could be very much friendlier than the real thing. Heck, we could even run them on a friendly old Macintosh <wink>. If all fake threads block when any hits an I/O call, waiting for the latter to return, we're no worse off than in a single-threaded program. Being "fake threads", it *is* a single-threaded program, so it's not even a surprise <wink>. Maybe in your Py_BEGIN_ALLOW_THREADS n = read(fd, buffer, size); Py_END_ALLOW_THREADS you're assuming that some other Python thread needs to run in order for the read implementation to find something to read? Then that's a dead program for sure, as it would be for a single-threaded run today too. I can live with that! I don't expect fake threads to act like real threads in all cases. My assumption was that the BEGIN/END macros would do nothing under fake threads -- since there isn't a real thread backing it up, a fake thread can't yield in the middle of random C code (Python has no way to capture/restore the C state). I didn't picture fake threads working except as a Python-level feature, with context switches limited to bytecode boundaries (which a stackless ceval can handle with ease; the macro context switch above is "in the middle of" some bytecode's interpretation, and while "green threads" may be interested in simulating the that, Tim's "fake threads" aren't). different-threads-for-different-heads-ly y'rs - tim

[Tim responds, explaining that without this threads are quite useful.] I guess it's all in the perspective. 99.99% of all thread apps I've ever written use threads primarily to overlap I/O -- if there wasn't I/O to overlap I wouldn't use a thread. I think I share this perspective with most of the thread community (after all, threads originate in the OS world where they were invented as a replacement for I/O completion routines). (And no, I don't use threads to get the use of multiple CPUs, since I almost never have had more than one of those. And no, I wasn't expecting the read() to be fed from another thread.) As far as I can tell, all the examples you give are easily done using coroutines. Can we call whatever you're asking for coroutines instead of fake threads? I think that when you mention threads, green or otherwise colored, most people who are at all familiar with the concept will assume they provide I/O overlapping, except perhaps when they grew up in the parallel machine world. Certainly all examples I give in my never-completed thread tutorial (still available at http://www.python.org/doc/essays/threads.html) use I/O as the primary motivator -- this kind of example appeals to simples souls (e.g. downloading more than one file in parallel, which they probably have already seen in action in their web browser), as opposed to generators or pipelines or coroutines (for which you need to have some programming theory background to appreciate the powerful abstraction possibillities they give). Another good use of threads (suggested by Sam) is for GUI programming. An old GUI system, News by David Rosenthal at Sun, used threads programmed in PostScript -- very elegant (and it failed for other reasons -- if only he had used Python instead :-). On the other hand, having written lots of GUI code using Tkinter, the event-driven version doesn't feel so bad to me. Threads would be nice when doing things like rubberbanding, but I generally agree with Ousterhout's premise that event-based GUI programming is more reliable than thread-based. Every time your Netscape freezes you can bet there's a threading bug somewhere in the code. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I don't think this would match it. These threads can be implemented by coroutines which always run apart, and have some scheduling running. When there is polled I/O available, they can of course give a threaded feeling. If an application polls the kbhit function instead of reading, the other "threads" can run nicely. Can be quite useful for very small computers like CE. Many years before, I had my own threads under Turbo Pascal (I had no idea that these are called so). Ok, this was DOS, but it was enough of threading to have a "process" which smoothly updated a graphics screen, while another (single! :) "process" wrote data to the disk, a third one handled keyboard input, and a fourth drove a multichannel A/D sampling device. ? Oops, I just realized that these were *true* threads. The disk process would not run smooth, I agree. All the rest would be fine with green threads. ...
Right. But with a traceback instead of a machine hang, this could be more attractive to do. Green threads/coroutines are incredibly fast (one c call per switch). And since they have local state, you can save most of the attribute lookups which are needed with event based programming. (But this is all theory until we tried it). ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

I've been out of town, too (not with Skip), but I'll jump back in here... [Guido]
I suppose, in the best of all possible worlds, this is true. But I'm fairly sure there are a number of well-used green thread implementations which go only part way - eg, if this is a "selectable" fd, do a select with a timeout of 0 on this one fd and choose to read/write or swap accordingly. That's a fair amount of bang for the buck, I think... [Tim]
Threads can be very useful purely as a means for algorithm structuring, due to independent control flows.
Spoken like a true schizo, Tim me boyos! Actually, you and Guido are saying almost the same thing - threads are useful when more than one thing is "driving" your processing. It's just that in the real world, that's almost always I/O, not some sick, tortured internal dialogue... I think the real question is: how useful would this be on a Mac? On Win31? (I'll answer that - useful, though I've finally got my last Win31 client to promise to upgrade, RSN <hack, cough>). - Gordon

[Tim]
Threads can be very useful purely as a means for algorithm structuring, due to independent control flows.
FWIW, I've been following the coroutine/continuation/generator bit with 'academic' interest -- the CS part of my brain likes to read about them. Prompted by Tim's latest mention of Demo/threads/Generator.py, I looked at it (again?) and *immediately* grokked it and realized how it'd fit into a tool I'm writing. Nothing to do with concurrency, I/O, etc -- just compartmentalization of stateful iterative processes (details too baroque to go over). More relevantly, that tool would be useful on thread-less Python's (well, when it reaches usefulness on threaded Pythons =). Consider me pro-generator, and still agnostic on the co* things. --david

[David Ascher]
"stateful iterative process" is a helpful characterization of where these guys can be useful! State captured in variables is the obvious one, but simply "where you are" in a mass of nested loops and conditionals is also "state" -- and a kind of state especially clumsy to encode as data state instead (ever rewrite a hairy recursive routine to use iteration with an explicit stack? it's a transformation that can be mechanized, but the result is usually ugly & often hard to understand). Once it sinks in that it's *possible* to implement a stateful iterative process in this other way, I think you'll find examples popping up all over the place.
More relevantly, that tool would be useful on thread-less Python's (well, when it reaches usefulness on threaded Pythons =).
As Guido pointed out, the API provided by Generator.py is less restrictive than any that can be built with the "one frame" flavor of generator ("resumable function"). Were you able to make enough sense of the long discussion that ensued to guess whether the particular use you had in mind required Generator.py's full power? If you couldn't tell, post the baroque details & I'll tell you <wink>. not-putting-too-fine-a-point-on-possible-vs-natural-ly y'rs - tim

On Sun, 11 Jul 1999, Tim Peters wrote:
I'm pretty sure the use I mentioned would fit in even the simplest version of a generator. As to how much sense I made of the discussion, let's just say I'm glad there's no quiz at the end. I did shudder at the mention of unmentionables (male public displays of affection -- yeaach!), yodel at the mention of Lord Greystoke swinging among stack branches and chuckled at the vision of him being thrown back in a traceback (ouch! ouch! ouch!, "most painful last"...). --david

[Tim]
This is another key description of continuations (maybe not quite worth a hug :). The continuation captures exactly all state that is represented by "position in the program" and no state that is represented by variables. But there are many hairy details. In antiquated assembly, there might not be a call stack, and a continuation could be represented by a single value: the program counter. But now we have a call stack, a value stack, a block stack (in Python) and who knows what else. I'm trying to understand whether we can get away with saving just a pointer to a frame, whether we need to copy the frame, or whether we need to copy the entire frame stack. (In regular Python, the frame stack also contains local variables. These are explicitly exempted from being saved by a continuation. I don't know how Christian does this, but I presume he uses the dictionary which can be shared between frames.) Let's see... Say we have this function: def f(x): try: return 1 + (-x) finally: print "boo" The bytecode (simplified) looks like: SETUP_FINALLY (L1) LOAD_CONST (1) LOAD_FAST (x) UNARY_NEGATIVE BINARY_ADD RETURN_VALUE L1: LOAD_CONST ("boo") PRINT_ITEM PRINT_NEWLINE END_FINALLY Now suppose that the unary minus operator saves its continuation (e.g. because x was a type with a __neg__ method). At this point there is an entry on the block stack pointing to L1 as the try-finally block, and the value stack has the value 1 pushed on it. Clearly if that saved continuation is ever invoked (called? used? activated? What do you call what you do to a continuation?) it should substitute whatever value was passed into the continuation for the result of the unary minus, and the program should continue by pushing it on top of the value stack, adding it to 1, and returning the result, executing the block of code at L1 on the way out. So clearly when the continuation is used, 1 should be on the value stack and L1 should be on trh block stack. Assuming that the unary minus function initially returns just fine, the value stack and the block stack of the frame will be popped. So I conclude that saving a continuation must save at least the value and block stack of the frame being saved. Is it safe not to save the frame and block stacks of frames further down on the call stack? I don't think so -- these are all destroyed when frames are popped off the call stack (even if the frame is kept alive, its value and block stack are always empty when the function has returned). So I hope that Christian has code that saves the frame and block stacks! (It would be fun to try and optimize this by doing it lazily, so that frames which haven't returned yet aren't copied yet.) How does Scheme do this? I don't know if it has something like the block stack, but surely it has a value stack! Still mystified, --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido wonders about continuations -- must be a bad night for sleep <wink>] Paul Wilson's book-in-progress has a (large) page of HTML that you can digest quickly and that will clear up many mysteries: ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_142.html Scheme may be the most-often implemented language on Earth (ask 100 Schemers what they use Scheme for, persist until you get the truth, and 81 will eventually tell you that mostly they putz around writing their own Scheme interpreter <0.51 wink>); so there are a *lot* of approaches out there. Wilson describes a simple approach for a compiler. A key to understanding it is that continuations aren't "special" in Scheme: they're the norm. Even plain old calls are set up by saving the caller's continuation, then handing control to the callee. In Wilson's approach, "the eval stack" is a globally shared stack, but at any given moment contains only the eval temps relevant to the function currently executing. In preparation for a call, the caller saves away its state in "a continuation", a record which includes: the current program counter a pointer to the continuation record it inherited a pointer to the structure supporting name resolution (locals & beyond) the current eval stack, which gets drained (emptied) at this point There isn't anything akin to Python's block stack (everything reduces to closures, lambdas and continuations). Note: the continuation is immutable; once constructed, it's never changed. Then the callees' arguments are pushed on the eval stack, a pointer to the continuation as saved above is stored in "the continuation register", and control is transferred to the callee. Then a function return is exactly the same operation as "invoking a continuation": whatever is in the continuation register at the time of the return/invoke is dereferenced, and the PC, continuation register, env pointer and eval stack values are copied out of the continuation record. The return value is passed back in another "virtual register", and pushed onto the eval stack first thing after the guts of the continuation are restored. So this copies the eval stack all the time, at every call and every return/invoke. Kind of. This is partly why "tail calls" are such a big deal in Scheme: a tail call need not (*must* not, in std Scheme) create a new continuation. The target of a tail call simply inherits the continuation pointer inherited by its caller. Of course many Scheme implementations optimize beyond this.
In the absence of tail calls, the approach above saves the stack on every call and restores it on every return, so there's no "extra" copying needed when capturing, or invoking, a continuation (cold comfort, I agree <wink>). About Christian's code, we'd better let it speak for itself -- I'm not clear on the details of what he's doing today. Generalities:
Yes, but nothing gets copied until a continuation gets captured, and at the start of that I believe only one frame gets cloned.
(It would be fun to try and optimize this by doing it lazily, so that frames which haven't returned yet aren't copied yet.)
He's aware of that <wink>.
How does Scheme do this? I don't know if it has something like the block stack, but surely it has a value stack!
Stacks and registers and such aren't part of the language spec, but, you bet -- however it may be spelled in a given implementation, "a value stack" is there. BTW, many optimizing Schemes define a weaker form of continuation too (call/ec, for "escaping continuation"). Skipping the mumbo jumbo definition <0.9 wink>, you can only invoke one of those if its target is on the path back from the invoker to the root of the call tree (climb up tree like Cheetah, not leap across branches like Tarzan). This amounts to a setjmp/longjmp in C -- and may be implemented that way! i-say-do-it-right-or-not-at-all<wink>-ly y'rs - tim

Tim Peters wrote: ...
Right, maybe this would do enough. We will throw away what's not needed, when we know what we actually need...
i-say-do-it-right-or-not-at-all<wink>-ly y'rs - tim
...and at the moment I think it was right to take it all. just-fixing-continuations-spun-off-in-an-__init__-which- -is-quite-hard-since-still-recursive,-and-I-will-ship-it-ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

Guido van Rossum wrote: ...
You need to preserve the stack and the block stack of a frame, if and only if it can be reached twice. I make this dependent from its refcount. Every frame monitors itself before and after every call_function, if a handler field in the frame "f_callguard" has been set. If so, the callguard is called. Its task is to see wether we must preserve the current state of the frame and to carry this out. The idea is to create a shadow frame "on demand". When I touch a frame with a refcount > 1, I duplicate it at its f_back pointer. By that is is turned into a "continuation frame" which is nothing more than the stack copy, IP, and the block stack. By that, the frame stays in place where it was, all pointers are still fine. The "real" one is now in the back, and the continuation frame's purpose when called is only to restore the state of the "real one" and run it (after doing a new save if necessary). I call this technique "push back frames".
I keep the block stack and a stack copy. All the locals are only existing once. The frame is also only one frame. Actually always a new one (due to push back), but virtually it is "the frame", with multiple continuation frames pointing at it. ...
Clearly if that saved continuation is ever invoked (called? used? activated? What do you call what you do to a continuation?)
I think of throwing. Mine are thrown. The executive of standard frames is "eval_code2_loop(f, passed_retval)", where the executive of a continuation frame is "throw_continuation(f, passed_retval)". ...
:-) I have exactly that, and I do it lazily already. Unless somebody saves a continuation, nothing special happens. But if he does, the push back process follows his path like a zip (? Reißverschluß) and ensures that the path can be walked again. Tarzan has now the end of this liane in his hand. He might use it to swing over, or he might drop it, and it ribbles away and vanishes as if it never existed. Give me some final testing, and you will be able to try it out in a few days. ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

Backtracking a bit: [Guido]
This is another key description of continuations (maybe not quite worth a hug :).
I suppose a kiss is out of the question, then.
The continuation captures exactly all state that is represented by "position in the program" and no state that is represented by variables.
Right!
As you convinced yourself in following paragraphs, for 1st-class continuations "the entire frame stack" *may* be necessary.
... How does Scheme do this?
I looked up R. Kent Dybvig's doctoral dissertation, at ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/3imp.ps.gz He gives detailed explanations of 3 Scheme implementations there (from whence "3imp", I guess). The first is all heap-based, and looks much like the simple Wilson implementation I summarized yesterday. Dybvig profiled it and discovered it spent half its time in, together, function call overhead and name resolution. So he took a different approach: Scheme is, at heart, just another lexically scoped language, like Algol or Pascal. So how about implementing it with a perfectly conventional shared, contiguous stack? Because that doesn't work: the advanced features (lexical closures with indefinite extent, and user-captured continuations) aren't stack-like. Tough, forget those at the start, and do whatever it takes later to *make* 'em work. So he did. When his stack implementation hit a user's call/cc, it made a physical copy of the entire stack. And everything ran much faster! He points out that "real programs" come in two flavors: 1) Very few, or no, call/cc thingies. Then most calls are no worse than Algol/Pascal/C functions, and the stack implementation runs them at Algol/Pascal/C speed (if we knew of anything faster than a plain stack, the latter would use it). 2) Lots of call/cc thingies. Then "the stack" is likely to be shallow (the program is spending most of its time co-transferring, not recursing deeply), and because the stack is contiguous he can exploit the platform's fastest block-copy operation (no need to chase pointer links, etc). So, in some respects, Dybvig's stack implementation of Scheme was more Pythonic than Python's current implementation <wink -- Dybvig effectively used a Python list at this level, while Python effectively uses a Lispish cons'ed frame list>. His third implementation was for some propeller-head theoretical "string machine", so I won't even mention it. worrying-about-the-worst-case-can-hurt-the-normal-cases-ly y'rs - tim

Just so Guido doesn't feel like the quesion is being ignored <wink>:
... How does Scheme do this? [continuations]
One more reference here. Previously sketched Wilson's simple heap implementation and Dybvig's simple stack one. They're easy to understand, but are (heap) slow all the time, or (stack) fast most of the time but horribly slow in some cases. For the other extreme end of things, check out: Representing Control in the Presence of First-Class Continuations Robert Hieb, R. Kent Dybvig, and Carl Bruggeman PLDI, June 1990 http://www.cs.indiana.edu/~dyb/papers/stack.ps In part: In this paper we show how stacks can be used to implement activation records in a way that is compatible with continuation operations, multiple control threads, and deep recursion. Our approach allows a small upper bound to be placed on the cost of continuation operations and stack overflow and underflow recovery. ... ordinary procedure calls and returns are not adversely affected. ... One important feature of our method is that the stack is not copied when a continuation is captured. Consequently, capturing a continuation is very efficient, and objects that are known to have dynamic extent can be stack allocated and modified since they remain in the locations in which they were originally allocated. By copying only a small portion of the stack when a continuation is reinstated, reinstatement costs are bounded by a small constant. The basic gimmick is a segmented stack, where large segments are heap-allocated and each contains multiple contiguous frames (across their code base, only 1% of frames exceeded 30 machine words). But this is a complicated approach, best suited for industrial-strength native-code compilers (speed at any cost -- the authors go thru hell to save an add here, a pointer store there, etc). At least at the time the paper was written, it was the approach implemented by Dybvig's Chez Scheme (a commercial native-code Scheme compiler noted for high speed). Given that Python allocates frames from the heap, I doubt there's a much faster approach than the one Christian has crafted out of his own sweat and blood! It's worth a paper of its own. or-at-least-two-hugs<wink>-ly y'rs - tim

[Guido]
Different perspective indeed! Where I've been, you never used something as delicate as a thread to overlap I/O, you instead used the kernel-supported asynch Fortran I/O extensions <0.7 wink>. Those days are long gone, and I've adjusted to that. Time for you to leave the past too <wink>: by sheer numbers, most of the "thread community" *today* is to be found typing at a Windows box, where cheap & reliable threads are a core part of the programming culture. They have better ways to overlap I/O there too. Throwing explicit threads at this is like writing a recursive Fibonacci number generator in Scheme, but building the recursion yourself by hand out of explicit continuations <wink>.
I have multiple agendas, of course. What I personally want for my own work is no more than Icon's generators, formally "semi coroutines", and easily implemented in the interpreter (although not the language) as it exists today. Coroutines, fake threads and continuations are much stronger than generators, and I expect you can fake any of the first three given either of the others. Generators fall out of any of them too (*you* implemented generators once using Python threads, and I implemented general coroutines -- "fake threads" are good enough for either of those). So, yes, for that agenda any means of suspending/resuming control flow can be made to work. I seized on fake threads because Python already has a notion of threads. A second agenda is that Python could be a lovely language for *learning* thread programming; the threading module helps, but fake threads could likely help more by e.g. detecting deadlocks (and pointing them out) instead of leaving a thread newbie staring at a hung system without a clue. A third agenda is related to Mark & Greg's, making Python's threads "real threads" under Windows. The fake thread agenda doesn't tie into that, except to confuse things even more if you take either agenda seriously <0.5 frown>.
They didn't suggest I/O to me at all, but I grew up in the disqualified world <wink>; doubt they would to a Windows programmer either (e.g., my employer ships heavily threaded Windows apps of various kinds, and overlapped I/O isn't a factor in any of them; it's mostly a matter of algorithm factoring to keep the real-time incestuous subsystems from growing impossibly complex, and in some of the very expensive apps also a need to exploit multiple processors). BTW, I called them "fake" threads to get away from whatever historical baggage comes attached to "green".
The preceding "99.99% of all thread apps I've ever written use threads primarily to overlap I/O" may explain this <wink>. BTW, there is only one example there, which rather dilutes the strength of the rhetorical "all" ...
I don't at all object to using I/O as a motivator, but the latter point is off base. There is *nothing* in Comp Sci harder to master than thread programming! It's the pinnacle of perplexity, the depth of despair, the king of confusion (stop before I exaggerate <wink>). Generators in particular get re-invented often as a much simpler approach to suspending a subroutine's control flow; indeed, Icon's primary audience is still among the humanities, and even dumb linguists <wink> don't seem to have notable problems picking it up. Threads have all the complexities of the other guys, plus races, deadlocks, starvation, load imbalance, non-determinism and non-reproducibility. Threads simply aren't simple-soul material, no matter how pedestrian a motivating *example* may be. I suspect that's why your tutorial remains unfinished: you had no trouble describing the problem to be solved, but got bogged down in mushrooming complications describing how to use threads to solve it. Even so, the simple example at the end is already flawed ("print" isn't atomic in Python, so the print len(text), url may print the len(text) from one thread followed by the url from another). It's not hard to find simple-soul examples for generators either (coroutines & continuations *are* hard to motivate!), especially since Python's for/__getitem__ protocol is already a weak form of generator, and xrange *is* a full-blown generator; e.g., a common question on c.l.py is how to iterate over a sequence backwards: for x in backwards(sequence): print x def backwards(s): for i in xrange(len(s)-1, -1, -1): suspend s[i] Nobody needs a comp sci background to understand what that *does*, or why it's handy. Try iterating over a tree structure instead & then the *power* becomes apparent; this isn't comp-sci-ish either, unless we adopt a "if they've heard of trees, they're impractical dreamers" stance <wink>. BTW, iterating over a tree is what os.path.walk does, and a frequent source of newbie confusion (they understand directory trees, they don't grasp the callback-based interface; generating (dirname, names) pairs instead would match their mental model at once). *This* is the stuff for simple souls!
I don't use Netscape, but I can assure you the same is true of Internet Explorer -- except there the threading bug is now somewhere in the OS <0.5 wink>. Anyway, 1) There are lots of goods uses for threads, and especially in the Windows and (maybe) multiprocessor NumPy worlds. Those would really be happier with "free-threaded threads", though. 2) Apart from pedagogical purposes, there probably isn't a use for my "fake threads" that couldn't be done easier & better via a more direct (coroutine, continuation) approach; and if I had fake threads, the first thing I'd do for me is rewrite the generator and coroutine packages to use them. So, yes: you win <wink>. 3) Python's current threads are good for overlapping I/O. Sometimes. And better addressed by Sam's non-threaded "select" approach when you're dead serious about overlapping lots of I/O. They're also beaten into service under Windows, but not without cries of anguish from Greg and Mark. I don't know, Guido -- if all you wanted threads for was to speed up a little I/O in as convoluted a way as possible, you may have been witness to the invention of the wheel but missed that ox carts weren't the last application <wink>. nevertheless-ox-carts-may-be-the-best-ly y'rs - tim

[Tim]
No quibble so far...
They have better ways to overlap I/O there too.
Really? What are they? The non-threaded primitives for overlapping I/O look like Medusa to me: very high performance, but a pain to use -- because of the event-driven programming model! (Or worse, callback functions.) But maybe there are programming techniques that I'm not even aware of? (Maybe I should define what I mean by overlapping I/O -- basically every situation where disk or network I/O or GUI event handling goes on in parallel with computation or with each other. For example, in my book copying a set of files while at the same time displaying some silly animation of sheets of paper flying through the air *and* watching a Cancel button counts as overlapping I/O, and if I had to code this it would probably be a lot simpler to do using threads.
Aren't you contradicting yourself? You say that threads are ubiquitous and easy on Windows (and I agree), yet you claim that threads are overkill for doing two kinds of I/O or one kind of I/O and some computation in parallel...? I'm also thinking of Java threads. Yes, the GC thread is one of those computational threads you are talking about, but I think the examples I've seen otherwise are mostly about having one GUI component (e.g. an applet) independent from other GUI components (e.g. the browser). To me that's overlapping I/O, since I count GUI events as I/O.
Coroutines, fake threads and continuations? Can you really fake continuations given generators?
Hm. Maybe I'm missing something. Why didn't you simply say "you can fake each of the others given any of these"?
Yes.
What makes them unreal except for the interpreter lock? Python threads are always OS threads, and that makes them real enough for most purposes... (I'm not sure if there are situations on uniprocessors where the interpreter lock screws things up that aren't the fault of the extension writer -- typically, problems arise when an extension does some blocking I/O but doesn't place Py_{BEGIN,END}_ALLOW_THREADS macros around the call.)
Hm, you admit that they sometimes want to use multiple CPUs, which was explcitly excluded from our discussion (since fake threads don't help there), and I bet that they are also watching some kind of I/O (e.g. whether the user says some more stuff).
BTW, I called them "fake" threads to get away from whatever historical baggage comes attached to "green".
Agreed -- I don't understand where green comes from at all. Does it predate Java?
OK, ok. I was planning on more along the same lines. I may have borrowed this idea from a Java book I read.
I dunno, but we're probably both pretty poor predictors for what beginning programmers find hard. Randy Pausch (of www.alice.org) visited us this week; he points out that we experienced programmers are very bad at gauging what newbies find hard, because we've been trained "too much". He makes the point very eloquently. He also points out that in Alice, users have no problem at all with parallel activities (e.g. the bunny's head rotating while it is also hopping around, etc.).
Strange. Maybe dumb linguists are better at simply copying examples without thinking too much about them; personally I had a hard time understanding what Icon was doing when I read about it, probably because I tried to understand how it was done. For threads, I have a simple mental model. For coroutines, my head explodes each time.
No, I simply realized that I had to finish the threading module and release the thread-safe version of urllib.py before I could release the tutorial; and then I was distracted and never got back to it.
Fine -- that's a great excuse to introduce locks in the next section. (Most threading tutorials I've seen start by showing flawed examples to create an appreciation for the need of locks.)
But backwards() also returns, when it's done. What happens with the return value?
Probably right, although I think that os.path.walk just has a bad API (since it gives you a whole directory at a time instead of giving you each file).
This is independent of Python, and is (I think) fairly common knowledge -- if you have 10 threads this works fine, but with 100s of them the threads themselves become expensive resources. But then you end up with contorted code which is why high-performance systems require experts to write them.
They're also beaten into service under Windows, but not without cries of anguish from Greg and Mark.
Not sure what you mean here.
What were those applications of threads again you were talking about that could be serviced by fake threads that weren't coroutines/generators? --Guido van Rossum (home page: http://www.python.org/~guido/)

Hmmm. I jumped back into this one, but never saw my post show up... Threads (real or fake) are useful when more than one thing is "driving" your processing. It's just that in the real world (a place Tim visited, once, but didn't like - or was it vice versa?) those "drivers" are normally I/O. Guido complained that to do it right would require gathering up all the fds and doing a select. I don't think that's true (at least, for a decent fake thread). You just have to select on the one (to see if the I/O will work) and swap or do it accordingly. Also makes it a bit easier for portability (I thought I heard that Mac's select is limited to sockets). I see 2 questions. First, is there enough of an audience (Mac, mostly, I think) without native threads to make them worthwhile? Second, do we want to introduce yet more possibilities for brain-explosions by enabling coroutines / continuations / generators or some such? There is practical value there (as Sam has pointed out, and I now concur, watching my C state machine grow out of control with each new client request). I think the answer to both is probably "yes", and though they have a lot in common technically, they have totally different rationales. - Gordon

[Gordon McMillan]
Hmmm. I jumped back into this one, but never saw my post show up...
Me neither! An exclamation point because I see there's a recent post of yours in the Python-Dev archives, but I didn't get it in the mail either.
Yes, but that's the consensus view of "real", and so suffers from "ten billion flies can't be wrong" syndrome <wink>. If you pitch a parallel system to the NSA, they give you a long list of problems and ask you to sketch the best way to solve them on your platform; as I recall, none had anything to do with I/O even under Guido's definition; instead tons of computation with difficult twists, and enough tight coupling to make threads the natural approach in most cases. If I said any more they'd terminate me with extreme prejudice, and the world doesn't get any realer than that <wink>.
Can you flesh out the "swap" part more? That is, we're in the middle of some C code, so the C stack is involved in the state that's being swapped, and under fake threads we don't have a real thread to magically capture that.
a) Generators aren't enough for Sam's designs. b) Fake threads are roughly comparable to coroutines and continuations wrt power (depending on implementation details, continuations may be strictly most powerful, and coroutines least). c) Christian's stackless Python can, I believe, already do full coroutines, and is close to doing full continuations. So soon we can kick the tires instead of each other <wink>. or-what-the-heck-we-can-akk-kick-chris-ly y'rs - tim

[I jump back into a needlessly contentious thread]: [Gordon McMillan - me]
[Tim]
I can assure you that gov't work isn't "real", even when the problem domain appears to be, which in this case is assuredly not true <wink>. But the point really is that (1) Guido's definition of "I/O" is very broad and (2) given that definition, it probably does account for 99% of the cases. Which is immaterial, if the fix for one fixes the others.
Sure - it's spelled "T I S M E R". IFRC, this whole thread started with Guido dumping cold water on the comment that perhaps Chris's work could yield green (er, "fake") threads.
OK, but they're still (minorly) mind expanding for someone from the orthodox C / Python world...
So then we're down to Tim faking the others from whatever Chris comes up with? Sounds dandy to me! (Yah, bitch and moan Tim; you'd do it anyway...). (And yes, we're on the "dev" list; this is all experimental; so Guido can just live with being a bit uncomfortable with it <wink>). The rambling arguments have had to do with "reasons" for doing this stuff. I was just trying to point out that there are a couple valid but very different reasons: 1) Macs. 2) Sam. almost-a-palindrome-ly y'rs - Gordon

"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> Me neither! An exclamation point because I see there's a TP> recent post of yours in the Python-Dev archives, but I didn't TP> get it in the mail either. A bad performance problem in Mailman was causing cpu starvation and (I'm surmising) lost messages. I believe I've fixed this in the version currently running on python.org. If you think messages are showing up in the archives but you are still not seeing them delivered to you, please let me know via webmaster@python.org! -Barry

[Guido and Tim, Guido and Tim] Ouch! This is getting contentious. Let's unwind the "you said, I said, you said" business a bit. Among the three {coroutines, fake threads, continuations}, I expect any could be serviceably simulated via either of the others. There: just saved a full page of sentence diagramming <wink>. All offer a strict superset of generator semantics. It follows that, *given* either coroutines or continuations, I indeed see no semantic hole that would be plugged by fake threads. But Python doesn't have any of the three now, and there are two respects in which fake threads may have an advantage over the other two: 1) Pedagogical, a friendlier sandbox for learning "real threads". 2) Python already has *a* notion of threads. So fake threads could be seen as building on that (variation of an existing concept, as opposed to something unprecedented). I'm the only one who seems to see merit in #2, so I won't mention it again: fake threads may be an aid to education, but other than that they're useless crap, and probably cause stains if not outright disk failure <wink>. About newbies, I've seen far too many try to learn threads to entertain the notion that they're easier than I think. Threads != parallel programming, though! Approaches like Gelertner's Linda, or Klappholz's "refined languages", *are* easy for newbies to pick up, because they provide clear abstractions that prevent the worst parallelism bugs by offering primitives that *can't* e.g. deadlock. threading.py is a step in the right direction (over the "thread" module) too. And while I don't know what Alice presents as a parallelism API, I'd bet 37 cents unseen that the Alice user doesn't build "parallel activities" out of thread.start_new_thread and raw mutii <wink>. About the rest, I think you have a more expansive notion of I/O than I do, although I can squint and see what you mean; e.g., I *could* view almost all of what Dragon's products do as I/O, although it's a real stretch for the thread that just polls the other threads making sure they're still alive <wink -- part of our text-to-speech system is licensed, and we don't have the source, and it dies in mysterious ways; so we run it in a thread and restart it whenever it vanishes -- can't afford to have the whole app die!>. Back to quoting:
They're a general approach (like continuations) but, yes, given an asynch I/O interface most times I'd much rather use the latter (like I'd rather use recursion directly when it's available). BTW, I didn't say threads were "easy" under Windows: cheap, reliable & ubiquitous, yes. They're easier than under many other OSes thanks to a rich collection of system-supplied thread gimmicks that actually work, but no way are they "easy". Like you did wrt hiding "thread" under "threading", even under Windows real projects have to create usable app-specific thread-based abstractions (c.f. your on-target remark about Netscape & thread bugs).
Whereas I don't. So let's just agree to argue about this one with ever-increasing intensity <wink>.
We should move this part to the Thread-SIG; Mark & Greg are doubtless chomping at the bit to rehash the headaches the global lock causes under Windows <wink>; I'm not so keen either to brush off the potential benefits of multiprocessor parallelism, particularly not with the price of CPUs falling into spare-change range.
Hmm! What kinds of problems happen then? Just a lack of hoped-for overlap, or actual deadlock (the I/O thread needing another thread to proceed for itself to make progress)? If the latter, the extension writer's view of who's at fault may differ from ours <wink -- but I agree with you here>.
I've been ranting about both fake threads and real threads, and don't recall excluding anything; I do think I *should* have, though <smile>.
and I bet that they are also watching some kind of I/O (e.g. whether the user says some more stuff).
Sure, and whether the phone rings, and whether text-to-speech is in progress, and tracking the mouse position, and all sorts of other arguably I/O-like stuff too. Some of the subsytems are thread-unaware legacy or 3rd-party code, and need to run in threads dedicated to them because they believe they own the entire machine (working via callbacks). The coupling is too tight to afford IPC mechanisms, though (i.e., running these in a separate process is not an option). Mostly it's algorithm-factoring, though: text-to-speech and speech-to-text both require mondo complex processing, and the "I/O part" of each is a small link at an end of a massive chain. Example: you say something, and you expect to see "the result" the instant you stop speaking. But the CPU cycles required to recognize 10 seconds of speech consumes, alas, about 10 seconds. So we *have* to overlap the speech collection with the signal processing, the acoustic feature extraction, the acoustic scoring, the comparison with canned acoustics for many tens of thousands of words, the language modeling ("sounded most like 'Guido', but considering the context they probably said 'ghee dough'"), and so on. You simply can't write all that as a monolothic algorithm and have a hope of it working; it's most naturally a pipeline, severely complicated in that what pops out of the end of the first stage can have a profound effect on what "should have come out" at the start of the last stage. Anyway, thread-based pseudo-concurreny is a real help in structuring all that. It's *necessary* to overlap speech collection (input) with computation and result-so-far display (output), but it doesn't stop there.
Don't know, but I never heard of it before Java or outside of Solaris. [about generators & dumb linguists]
Yes, I expect the trick for "dumb linguists" is that they don't try to understand. They just use it, and it works or it doesn't. BTW, coroutines are harder to understand because of (paradoxically!) the symmetry; generators are slaves, so you don't have to bifurcate your brain to follow what they're doing <wink>.
Even better, they start with an endless sequence of flawed examples that makes the reader wonder if there's *any* way to get this stuff to work <wink>.
But backwards() also returns, when it's done. What happens with the return value?
I don't think a newbie would think to ask that: it would "just work" <wink>. Seriously, in Icon people quickly pick up that generators have a "natural lifetime", and when they return their life is over. It hangs together nicely enough that people don't have to think about it. Anyway, "return" and "suspend" both return a value; the only difference is that "return" kills the generator (it can't be resumed again after a return). The pseudo-Python above assumed that a generator signals the end of its life by returning None. Icon uses a different mechanism.
Well, in Ping's absence I've generally fielded the c.l.py questions about tokenize.py too, and there's a pattern: non-GUI people simply seem to find callbacks confusing! os.path.walk has some other UI glitches (like "arg" is the 3rd argument to walk but the 1st arg to the callback, & people don't know what its purpose is anyway), but I think the callback is the core of it (& "arg" is an artifact of the callback interface). I can't help but opine that part of what people find so confusing about call/cc in Scheme is that it calls a function taking a callback argument too. Generators aren't strong enough to replace call/cc, but they're exactly what's needed to make tokenize's interface match the obvious mental model ("the program is a stream of tokens, and I want to iterate over that"); c.f. Sam's comments too about layers of callbacks vs "normal control flow".
I think people with a Unix background understand that, but not sure about Windows natives. Windows threads really are cheap, which easily slides into abuse; e.g., the recently-fixed electron-width hole in cleaning up thread states required extreme rates of thread death to provoke, and has been reported by multiple Windows users. An SGI guy was kind enough to confirm the test case died for him too, but did any non-Windows person ever report this bug?
But then you end up with contorted code which is why high-performance systems require experts to write them.
Which feeds back into Sam's agenda: the "advanced" control-flow gimmicks can be used by an expert to implement a high-performance system that doesn't require expertise to use. Fake threads would be good enough for that purpose too (while real threads aren't), although he's got his heart set on one of the others.
What were those applications of threads again you were talking about that could be serviced by fake threads that weren't coroutines/generators?
First, let me apologize for the rhetorical excess there -- it went too far. Forgive me, or endure more of the same <wink>. Second, the answer is (of course) "none", but that was a rant about real threads, not fake ones. so-close-you-can-barely-tell-'em-apart-ly y'rs - tim

Just a few clarifications. I have no time, but need to share what I learned. Tim Peters wrote:
[Guido and Tim, Guido and Tim]
...
I have just proven that this is not true. Full continuations cannot be expressed by coroutines. All the rest is true. Coroutines and fake threads just need the absence of the C stack. To be more exact: It needs that the current state of the C stack is independent from executing bound Python code (which frames are). Now the big surprize: This *can* be done without removing the C stack. It can give more speed to let the stack wind up to some degree and wind down later. Even some Scheme implementations are doing this. But the complexity to make this work correctly is even higher than to be stackless whenever possible. So this is the basement, but improvements are possible and likely to appear. Anyway, with this, you can build fake threads, coroutines and generators. They all do need a little extra treatment. Switching of context, how to stop a coroutine, how to catch exceptions and so on. You can do all that with some C code. I just believe that even that can be done with Python. Here the unsayable continuation word appears. You must have them if you want to try the above *in* Python. Reason why continuations are the hardest of the above to implement and cannot expressed by them: A continuation is the future of some computation. It allows to change the order of execution of a frame in a radical way. A frame can have as many as one dormant continuation per every function call which appears lexically, and it cannot predict which of these is actually a continuation.

I have to say thank you, christian! I think your intent - provide the basis for designers of python's advanced control mechanisms to truly explore, and choose the direction in a well informed way - is ideal, and it's a rare and wonderful opportunity to be able to pursue something like an ideal course. Thanks to your hard work. Whatever comes of this, i think we all have at least refined our understandings of the issues - i know i have. (Thanks also to the ensuing discussion's clarity and incisiveness - i need to thank everyone involved for that...) I may not be able to contribute particularly to the implementation, but i'm glad to be able to grasp the implications as whatever proceeds, proceeds. And i actually expect that the outcome will be much better informed than it would have been without your following through on your own effort to understand. Yay! Ken klm@digicool.com

I'll second Ken's congratulations to Christian! [Christian]
... Full continuations cannot be expressed by coroutines. All the rest is true.
I beg enlightenment from someone more familiar with these high-falutin' concepts. Would the following characterization be accurate? All these beasts (continuations, coroutines, generators) involve the idea of "resumable", but: A generator's state is wholly self-contained A coroutines's state is not necessarily self-contained but it is stable Continuations may have volatile state. Is this right, wrong, necessary, sufficient...?? goto-beginning-to-look-attractive-ly y'rs - Gordon

[Gordon]
"goto" is deliciously ironic, for a reason to become clear <wink>. Here's my biased short course. NOW First, I have the feeling most people would panic if we simply described Python's current subroutine mechanism under a new name <0.9 wink>. I'll risk that. When Python makes a call, it allocates a frame object. Attached to the frame is the info everyone takes for granted so thinks is "simple & obvious" <wink>. Chiefly, "the locals" (name -> object bindings) a little evaluation stack for holding temps and dynamic block-nesting info the offset to the current bytecode instruction, relative to the start of the code object's fixed (immutable) bytecode vector When a subroutine returns, it decrefs the frame and then the frame typically goes away; if it returns because of an exception, though, traceback objects may keep the frame alive. GENERATORS Generators add two new abstract operations, "suspend" and "resume". When a generator suspends, it's exactly like a return today except we simply decline to decref the frame. That's it! The locals, and where we are in the computation, aren't thrown away. A "resume" then consists of *re*starting the frame at its next bytecode instruction, with the retained frame's locals and eval stack just as they were. Some generator properties: + In implementation terms a trivial variation on what Python currently does. + They're asymmetric: "suspend" is something only a generator can do, and "resume" something only its caller can do (this does not preclude a generator from being "the caller" wrt to some other generator, though, and indeed that's very useful in practice). + A generator always returns control directly to its caller, at the point the caller invoked the generator. And upon resumption, a generator always picks up where it left off. + Because a generator remembers where it is and what its locals are, its state and "what to do next" don't have to be encoded in global data structures then decoded from scratch upon entry. That is, whenever you build a little (or large!) state machine to figure out "what to do next" from a collection of persistent flags and state vrbls, chances are good there's a simple algorithm dying to break free of that clutter <wink>. COROUTINES Coroutines add only one new abstract operation, "transfer". They're fully symmetric so can get away with only one. "transfer" names a coroutine to transfer to, and gives a value to deliver to it (there are variations, but this one is common & most useful). When A transfers to B, it acts like a generator "suspend" wrt A and like a generator "resume" wrt B. So A remembers where it is, and what its locals etc are, and B gets restarted from the point *it* last transfered to someone else. Coroutines grew up in simulation languages because they're an achingly natural way to model independent objects that interact with feedback. There each object (which may itself be a complex system of other stuff) is written as an infinite loop, transferring control to other objects when it has something to tell them, and transferred to by other objects when they have something to tell it. A Unix pipeline "A | B | C | D" doesn't exploit the full power but is suggestive. A may be written as while 1: x = compute my next output B.transfer(x) # resume B with my output B as while 1: x = A.transfer() # resume A to get my input y = compute something from x and my own history C.transfer(y) # resume C with my output C as while 1: x = B.transfer() # resume B to get my input y = compute something from x and my own history D.transfer(y) # resume D with my output and D as while 1: x = C.transfer() # resume C to get my input y = compute something from x and my own history print y If e.g. C collapses pairs of values from B, it can be written instead as while 1: # get a pair of B's x = B.transfer() y = B.transfer() z = f(x, y, whatever) D.transfer(z) # resume D with my output It's a local modification to C: B doesn't know and shouldn't need to know. This keeps complex algorithms manageable as things evolve. Initialization and shutdown can be delicate, but once the pipe is set up it doesn't even matter which of {A, B, C, D} first gets control! You can view A as pushing results through the pipe, or D as pulling them, or whatever. In reality they're all equal partners. Why these are so much harder to implement than generators: "transfer" *names* who next gets control, while generators always return to their (unnamed) caller. So a generator simply "pops the stack" when it suspends, while coroutine flow need not be (and typically isn't) stack-like. In Python this is currently a coroutine-killer, because the C stack gets intertwined. So if coroutine A merely calls (in the regular sense) function F, and F tries to transfer to coroutine B, the info needed to resume A includes the chunk of the C stack between A and F. And that's why the Python coroutine implementation I referenced earlier uses threads under the covers (where capturing pieces of the C stack isn't a problem). Early versions of coroutines didn't allow for this, though! At first coroutines could only transfer *directly* to other coroutines, and as soon as a coroutine made "a regular call" transfers were prohibited until the call returned (unless the called function kicked off a brand new collection of coroutines, which could then transfer among themselves -- making the distinction leads to convoluted rules, so modern practice is to generalize from the start). Then the current state of each coroutine was contained in a single frame, and it's really no harder to implement than generators. Knuth seems to have this restricted flavor of coroutine in mind when he describes generator behavior as "semi-coroutine". CONTINUATIONS Given the pedagogical structure so far, you're primed to view continuations as an enhancement of coroutines. And that's exactly what will get you nowhere <wink>. Continuations aren't more elaborate than coroutines, they're simpler. Indeed, they're simpler than generators, and even simpler than "a regular call"! That's what makes them so confusing at first: they're a different *basis* for *all* call-like behavior. Generators and coroutines are variations on what you already know; continuations challenge your fundamental view of the universe. Legend has it they were discovered when theorists were trying to find a solid reason for why goto statements suck: the growth of "denotational semantics" (DS) boomed at the same time "structured programming" took off. The former is a solid & fruitful approach to formally specifying the semantics of programming languages, built on the lambda calculus (and so dear to the Lisp/Scheme community -- this all ties together, of course <wink>). The early hope was that goto statements would prove to present intractable problems for formal specification, and then "that's why they suck: we can't even sort them out on paper, let alone in practice". But in one of God's cleverer tricks on the programming world <tee hee>, the semantics of goto turned out to be trivial: at a branch point, you can go one of two ways. Represent one of those ways by a function f that computes what happens if you branch one way, and the other way by a function g. Then an if+goto simply picks one of f or g as "the continuation" of the program, depending on whether the "if" condition is true or false. And a plain goto simply replaces the current continuation with a different one (representing what happens at the branch target) unconditionally. So goto turned out to be simpler (from the DS view) than even an assignment stmt! I've often suspected theorists were *surprised* (and maybe appalled <0.7 wink>) when the language folks went on to *implement* the continuation idea. Don't really know, but suppose it doesn't matter anyway. The fact is we're stuck with them now <wink>. In theory a continuation is a function that computes "the rest of the program", or "its future". And it really is like a supercharged goto! It's the formal DS basis for all control flow, from goto stmts to exception handling, subsuming vanilla call flow, recursion, generators, coroutines, backtracking, and even loops along the way. To a certain frame of mind (like Sam's, and Christian is temporarily under his evil influence <wink>), this relentless uniformity & consistency of approach is very appealing. Guido tends to like his implementations to mirror his surface semantics, though, and if he has ten constructs they're likely to be implemented ten ways. View that as a preview of future battles that have barely been hinted at so far <0.3 wink>. Anyway, in implementation terms a continuation "is like" what a coroutine would be if you could capture its resumption state at any point (even without the coroutine's knowledge!) and assign that state to a vrbl. So we could say it adds an abstract operation "capture", which essentially captures the program counter, call stack, and local (in Python terms) "block stack" at its point of invocation, and packages all that into a first-class "continuation object". IOW, a building block on top of which a generator's suspend, and the suspend half of a coroutine transfer, can be built. In a pure vision, there's no difference at all between a regular return and the "resume" half of a coroutine transfer: both amount to no more than picking some continuation to evaluate next. A continuation can be captured anywhere (even in the middle of an expression), and any continuation can be invoked at will from anywhere else. Note that "invoking a continuation" is *not* like "a call", though: it's abandoning the current continuation, *replacing* it with another one. In formal DS this isn't formally true (it's still "a call" -- a function application), but in practice it's a call that never returns to its caller so the implementation takes a shortcut. Like a goto, this is as low-level as it gets, and even hard-core continuation fans don't use them directly except as a means to implement better-behaved abstractions. As to whether continuations have "volatile state", I'm not sure what that was asking. If a given continuation is invoked more than once (which is something that's deliberately done when e.g. implementing backtracking searches), then changes made to the locals by the first invocation are visible to the second (& so on), so maybe <wink> the answer is "yes". It's more accurate to think of a continuation as being immutable, though: it holds a reference to the structure that implements name bindings, but does not copy (save or restore) the bindings. Quick example, given: (define continuation 0) (define (test) (let ((i 0)) (call/cc (lambda (k) (set! continuation k))) (set! i (+ i 1)) i)) That's like the Python: def test(): i = 0 global continuation continuation = magic to resume at the start of the next line i = i + 1 return i Then (this is interactive output from a Scheme shell):
too-simple-to-be-obvious?-ly y'rs - tim

Wow. That was by far the clearest tutorial on the subject I think I've read. I guess we need (for Tim to have) more 3 day holiday weekends. i-vote-we-pitch-in-and-pay-tim-to-take-/every/-monday-off-so-he-can-write- more-great-stuff-like-this-ly y'rs, -Barry

Barry> Wow. That was by far the clearest tutorial on the subject I Barry> think I've read. I guess we need (for Tim to have) more 3 day Barry> holiday weekends. What he said. Skip

NOW No problems, fine sailing...
GENERATORS
Cruising along - nice day to be out!
COROUTINES
Such a pleasant day!
CONTINUATIONS
Are they clouds I see?
A storm warning...
She's taking on water!
In theory a continuation is a function that computes "the rest of the program", or "its future".
OK - before I abandon ship, I might need my hand-held. Before I start, let me echo Skip and Barry - and excellent precis of a topic I knew nothing about (as I made you painfully aware!) And I will avoid asking you to explain the above paragraph again for now :-) Im a little confused by how these work in practice. I can see how continuations provide the framework to do all these control things. It is clear to me how you can capture the "state" of a running program. Indeed, this is exactly what it seems generators and coroutines do. With continuations, how is the state captured or created? Eg, in the case of implementing a goto or a function call, there doesnt seem to be to be a state available. Does the language supporting continuations allow you to explicitely create one from any arbitary position? I think you sort-of answered that below:
This makes sense, although it implies a "running state" is necessary for this to work. In the case of transfering control to somewhere you have never been before (eg, a goto or a new function call) how does this work? Your example:
My main problem is that this looks closer to your description of a kind-of one-sided coroutine - ie, instead of only being capable of transfering control, you can assign the state. I can understand that fine. But in the example, the function _is_ aware its state is being captured - indeed, it is explicitely capturing it. My only other slight conceptual problem was how you implement functions, as I dont understand how the concept of return values fits in at all. But Im sure that would become clearer when the rest of the mud is wiped from my eyes. And one final question: In the context of your tutorial, what do Chris' latest patches arm us with? Given my new-found expertise in this matter <wink> I would guess that the guts is there to have at least co-routines, as capturing the state of a running Python program, and restarting it later is possible. Im still unclear about continuations WRT "without the co-routines knowledge", so really unsure what is needed here... The truly final question:-) Assuming Chris' patches were 100% bug free and reliable (Im sure they are very close :-) what would the next steps be to take advantage of it in a "clean" way? ie, assuming Guido blesses them, what exactly could I do in Python? (All I really know is that the C stack has gone - thats it!) Thanks for the considerable time it must be taking to enlightening us! Mark.

[Mark Hammond]
... Thanks for the considerable time it must be taking to enlightening us!
You're welcome, but the holiday weekend is up and so is my time. Thank (all of) *you* for the considerable time it must take to endure all this <wink>! Let's hit the highlights (or lowlights, depending on your view):
... Im a little confused by how these [continuations] work in practice.
Very delicately <no wink>. Eariler I posted a continuation-based implementation of generators in Scheme, and Sam did the same for a hypothetical Python with a call/cc equivalent. Those are typical enough. Coroutines are actually easier to implement (using continuations), thanks to their symmetry. Again, though, you never want to muck with continuations directly! They're too wild. You get an expert to use them in the bowels of an implementation of something else.
Except that generators need only worry about their own frame. Another way to view it is to think of the current computation being run by a (real <wink>) thread -- then capturing a continuation is very much like making a frozen clone of that thread, stuffing it away somewhere for later thawing.
With continuations, how is the state captured or created?
There are, of course, many ways to implement these things. Christian is building them on top of the explicit frame objects Python already creates, and that's a fine way for Python. Guido views it as cloning the call stack, and that's accurate too.
This makes sense, although it implies a "running state" is necessary for this to work.
In implementations (like Chris's) that do it all dynamically at runtime, you bet: you not only need a "running state", you can only capture a continuation at the exact point (the specific bytecode) you run the code to capture it. In fact, there *is* "a continuation" at *every* dynamic instance of every bytecode, and the question is then simply which of those you want to save <wink>.
In the case of transfering control to somewhere you have never been before (eg, a goto or a new function call) how does this work?
Good eye: it doesn't in this scheme. The "goto" business is a theoretical transformation, in a framework where *every* operation is modeled as a function application, and an entire program's life is modeled as a single function call. Some things are very easy to do in theory <wink>.
Good!
But in the example, the function _is_ aware its state is being captured - indeed, it is explicitely capturing it.
In real life, "magic to resume at the start of the next line" may be spelled concretely as e.g. xyz(7) or even a.b That is, anywhere in "test" any sort of (explicit or implicit) call is made *may* be part of a saved continuation, because the callee can capture one -- with or without test's knowledge.
My only other slight conceptual problem was how you implement functions, as I dont understand how the concept of return values fits in at all.
Ya, I didn't mention that. In Scheme, the act of capturing a continuation returns a value. Like so: (define c #f) ; senseless, but Scheme requires definition before reference (define (test) (print (+ 1 (call/cc (lambda (k) (set! c k) 42)))) (newline)) The function called by call/cc there does two things: 1) Stores call/cc's continuation into the global "c". 2) Returns the int 42.
(test) 43
Is that clear? The call/cc expression returns 42. Then (+ 1 42) is 43; then (print 43) prints the string "43"; then (newline) displays a newline; then (test) returns to *its* caller, which is the Scheme shell's read/eval/print loop. Now that whole sequence of operations-- what happens to the 42 and beyond --*is* "call/cc's continuation", which we stored into the global c. A continuation is itself "a function", that returns its argument to the context where the continuation was captured. So now e.g.
(c 12) 13
c's argument (12) is used in place of the original call/cc expression; then (+ 1 12) is 13; then (print 13) prints the string "13"; then (newline) displays a newline; then (test) returns to *its* caller, which is *not* (c 12), but just as originally is still the Scheme shell's read/eval/print loop. That last point is subtle but vital, and maybe this may make it clearer:
The continuation of (c 12) includes printing "Tim lied!", but invoking a continuation *abandons* the current continuation in favor of the invoked one. Printing "Tim lied!" wasn't part of c's future, so that nasty slur about Tim never gets executed. But:
This is why I stick to trivial examples <wink>.
Christian is taking his work very seriously here, and as a result is flailing a bit trying to see whether it's possible to do "the 100% right thing". I think he's a lot closer than he thinks he is <0.7 wink>, but in any case he's at worst very close to having full-blown continuations working. Coroutines already work.
Nothing. What would you like to do? Sam & I tossed out a number of intriguing possibilities, but all of those build *on* what Christian is doing. You won't get anything useful out of the box unless somebody does the work to implement it. I personally have wanted generators in Python since '91, because they're extremely useful in the useless things that I do <wink>. There's a thread-based generator interface (Generator.py) in the source distribution that I occasionally use, but that's so slow I usually recode in Icon (no, I'm not a Scheme fan -- I *admire* it, but I rarely use it). I expect rebuilding that on Christian's work will yield a factor of 10-100 speedup for me (beyond losing the thread/mutex overhead, as Chris just pointed out on c.l.py resumption should be much faster than a Python call, since the frame is already set up and raring to go). Would be nice if the language grew some syntax to make generators pleasant as well as fast, but the (lack of) speed is what's really killing it for me now. BTW, I've never tried to "sell" coroutines -- let alone continuations. Just generators. I expect Sam will do a masterful job of selling those. send-today-don't-delay-couldn't-give-or-receive-a-finer-gift-ly y'rs - tim

Tim Peters wrote:
Just to let you know that I'm still there, thinking, not coding, still hesitating, but maybe I can conclude now and send it off. This discussion, and especially Tim's input was extremely helpful. He has spent considerable time reading my twisted examples, writing his own, hitting my chin, kicking my -censored-, and proving to me that the truth I was searching doesn't exist. ...
Maybe with one exception: With careful coding, you can use a continuation at the head of a very deep recursion and use it as an early break if the algorithm fails. The effect is the same as bailing out with an exception, despite the fact that no "finally" causes would be obeyed. It is just a incredibly fast jump out of something if you know what you are doing.
Actually, it is both! What I use (and it works fine) are so-called "push-back frames". My continuations are always appearing in some call. In order to make the caller able to be resumed, I create a push-back frame *from* it. That means, my caller frame is duplicated behind his "f_back" pointer. The original frame stays in place but now becomes a continuation frame with the current stack state preserved. All other locals and stuff are moved to the clone in the f_back which is now the real one. This always works fine, since references to the original caller frame are all intact, just the frame's meaning is modified a little. Well, I will hvae to write a good paper... ...
I believe so. Well, I admit that the continuation approach is slightly too much for the coroutine/generator case, since they exactly don't have the problem where continuations are suffering a little: Switching between frames which cannot be reached more than once at a time don't need the stack copying/pushback at all. I'm still staying at the secure side for now. But since I have all refcounting accurate already, we can use it to figure out if a frame needs to be copied at all.
How about "amb"? :-) (see "teach youself schem in fixnum days, chapter 14 at http://www.cs.rice.edu/~dorai/t-y-scheme/t-y-scheme-Z-H-15.html#%_chap_14) About my last problems: The hard decision is: - Either I just stop and I'm ready already, and loops are funny. - Or I do the hidden register search, which makes things more complicated and also voidens the pushback trick partially, since then I would manage all stack stuff in one frame. - Or, and that's what I will do finally: For now, I will really just correct the loops. Well, that *is* a change to Python again, but no semantic change. The internal loop counter will no longer be an integer object, but a mutable integer box. I will just create a one-element integer array and count with its zero element. This is correct, since the stack value isn't popped off, so all alive stack copies share this one element. As a side effect, I save the Object/Integer conversion, so I guess it will be faster. *and* this solution does not involve any other change, since the stack layout is identical to before. -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

[Christian]
You don't need continuations for this, though; e.g., in Icon I've done this often, by making the head of the deep recursion a co-expression, doing the recursion via straight calls, and then doing a coroutine resumption of &main when I want to break out. At that point I set the coexp to &null, and GC reclaims the stack frames (the coexp is no longer reachable from outside) when it feels like it <wink>. This is a particularly simple application of coroutines that could be packaged up in a simpler way for its own sake; so, again, while continuations may be used fruitfully under the covers here, there's still no reason to make a poor end user wrestle with them.
... Well, I admit that the continuation approach is slightly too much for the coroutine/generator case,
It's good that you admit that, because generators alone could have been implemented with a 20-line patch <wink>. BTW, I expect that by far the bulk of your changes *still* amount to what's needed for disentangling the C stack, right? The continuation implementation has been subtle, but so far I've gotten the impression that it requires little code beyond that required for stacklessness.
That's the point at which I think continuations get insane: it's an unreasonably convoluted implementation of a straightforward (via other means) backtracking framework. In a similar vein, I've read 100 times that continuations can be used to implement a notion of (fake) threads, but haven't actually seen an implementation that wasn't depressingly subtle & long-winded despite being just a feeble "proof of concept". These have the *feeling* of e.g. implementing generators on top of real threads: ya, you can do it, but nobody in their right mind is fooled by it <wink>.
OK by me -- forgetting implementation, I still can't claim to know what's the best semantic here.
Bleech.
Ah, very clever! Yes, that will fly -- the continuations will share a reference to the value rather than the value itself. Perfect!
Right, no downside at all. Except that Guido will hate it <wink>. there's-a-disturbance-in-the-force-ly y'rs - tim

Tim Peters wrote: ...
Well. def longcomputation(prog, *args, **kw): return quickreturn(prog, args, kw) # prog must be something with return function first arg # quickreturn could be done as so: def quickreturn(prog, args, kw): cont = getpcc() # get parent's continuation def jumpback(val=None, cont=cont): putcc(cont, val) # jump to continuation apply(prog, jumpback, args, kw) # and if they want to jump out, they call jumpback with # an optional return value. Can't help it, it still is continuation-ish.
Right. You will see soon. The only bit which cont's need more than coro's is to save more than one stack state for a frame. So, basically, it is just the frame copy operation. If I was heading just for coroutines, then I could save that, but then I need to handle special cases like exception, what to do on return, and so on. Easier to do that one stuff once right. Then I will never dump code for an unforeseen coro-effect, since with cont's, I *may* jump in and bail out wherever I want or don't want. The special cases come later and will be optimized, and naturally they will reduce themselves to what's needed. Example: If I just want to switch to a different coro, I just have to swap two frames. This leads to a data structure which can hold a frame and exchange it with another one. The cont-implementation does something like fetch my current continuation # and this does the frame copy stuff save into local state variable fetch cont from other coro's local state variable jump to new cont Now, if the source and target frames are guaranteed to be different, and if the source frame has no dormant extra cont attached, then it is safe to merge the above steps into one operation, without the need to save local state. In the end, two coro's will jump to each other by doing nothing more than this. Exactly that is what Sam's prototype does right now. WHat he's missing is treatment of the return case. If a coro returns towards the place where it was forked off, then we want to have a cont which is able to handle it properly. That's why exceptions work fine with my stuff: You can put one exceptionhandler on top of all your coroutines which you create. It works without special knowledge of coroutines. After I realized that, I knew the way to go.
Maybe this is a convoluted implementation. But the principle? Return a value to your caller, but stay able to continue and do this again. Two continuations, and with the optimizations from above, it will be nothing. I will show you the code in a few, and you will realize that we are discussing the empty set. The frames have to be used, and the frames are already continuations. Only if they can be reached twice, they will have to be armed for that. Moving back to my new "more code - less words" principle. [mutable ints as loop counters]
Ah, very clever! Yes, that will fly -- the continuations will share a reference to the value rather than the value itself. Perfect!
Actually I'm copying some code out of Marc's counterobject which is nothing more than a mutable integer and hide it in ceval.c, since that doesn't introduce another module for a thing which isn't needed elsewhere, after Guido's hint. Better than to use the array module which doesn't publish its internals and might not always be linked in.
Right, no downside at all. Except that Guido will hate it <wink>.
I made sure that this is what he hates the lest. off-for-coding-ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

Hokay. I *think* i have this, and i have a question to followup. First, i think the crucial distinction i needed to make was the fact that the stuff inside the body of the call/cc is evaluated only when the call/cc is initially evaluated. What constitutes the "future" of the continuation is the context immediately following the call/cc expression. Your final example is where that's most apparent for me: Tim presented:
Though not quite as simple, i think this nailed the distinction for me. (Too bad that i'm probably mistaken:-) In any case, one big unknown for me is the expense of continuations. Just how expensive is squirreling away the future, anyway? (:-) If we're deep in a call stack, seems like there can be a lot of lexical-bindings baggage, plus whatever i-don't-know-how-big-it-is control logic there is/was/will be pending. Does the size of these things (continuations) vary extremely, and is the variation anticipatable? I'm used to some surprises about the depth to which some call or other may go, i don't expect as much uncertainty about my objects - and it seems like continuations directly transform the call depth/complexity into data size/complexity... ?? unfamiliar-territory,how-far-can-i-fall?-ly, Ken klm@digicool.com

Ken Manheimer wrote:
Hokay. I *think* i have this, and i have a question to followup.
...
In any case, one big unknown for me is the expense of continuations. Just how expensive is squirreling away the future, anyway? (:-)
The future costs at most to create *one* extra frame with a copy of the original frame's local stack. By keeping the references to all the other frames which were intact, the real cost is of course bigger, since we keep the whole frame path from this one up to the topmost frame alive. As soon as we drop the handle, everything winds up and vanishes. I also changed the frame refcounting to guarantee exactly that behavior. (before, unwinding was explicitly done).
Really, no concern necessary. The state is not saved at all (despite one frame), it is just not dropped. :-) Example: You have some application running, in a nesting level of, say, four function calls. This makes four frames. The bottom function now decides to spawn 10 coroutines in a loop and puts them into an array. Your array now holds 10 continuations, where each one is just one frame, which points back to your frame. Now assume, you are running one of the coroutines/generators/whatever, and this one calls another function "bottom", just to have some scenario. Looking from "bottom", there is just a usual frame chain, now 4+1 frames long. To shorten this: The whole story is nothing more than a tree, where exactly one leaf is active at any time, and its view of the call chain is always linear. Continuation jumps are possible to every other frame in the tree. It now only depends of keeping references to the leaf which you just left or not. If the jump removes the left reference to your current frame, then the according chain will ripple away up to the next branch point. If you held a reference, as you will do with a coroutine to resume it, this chain stays as a possible jump target. for-me-it's-a-little-like-Tarzan-ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

[Ken Manheimer]
Right! call/cc is short for call-with-current-continuation, and "current" refers to the continuation of call/cc itself. call/cc takes a function as an argument, and passes to it its (call/cc's) *own* continuation. This is maximally clever and maximally confusing at first. Christian has a less clever way of spelling it that's likely to be less confusing too. Note that it has to be a *little* tricky, because the obvious API k = gimme_a_continuation_for_here() doesn't work. The future of "gimme_a_..." includes binding k to the result, so you could never invoke the continuation without stomping on k's binding. k = gimme_a_continuation_for_n_bytecodes_beyond_here(n) could work, but is a bit hard to explain coherently <wink>.
Christian gave a straight answer, so I'll give you the truth <wink>: it doesn't matter provided that you don't pay the price if you don't use it. A more interesting question is how much everyone will pay all the time to support the possibility even if they don't use it. But that question is premature since Chris isn't yet aiming to optimize. Even so, the answer so far appears to be "> 0 but not much". in-bang-for-the-buck-continuations-are-cheap-ly y'rs - tim

After a short vacation, I'm trying to swallow the latest discussion about control flow management & derivatives. Could someone help me please by answering two naive questions that popped up spontaneously in my head: Tim Peters wrote: [a biased short course on generators, continuations, coroutines]
Yes. I'm trying to understand the following: 1. What does a generator generate? 2. Clearly, what's the difference between a generator and a thread? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Vladimir Marangozov wrote: ...
Trying my little understanding. A generator generates a series of results if you ask for it. That's done by a resume call (generator, resume your computation), and the generate continues until he either comes to a suspend (return a value, but be prepared to continue from here) or it does a final return.
2. Clearly, what's the difference between a generator and a thread?
Threads can be scheduled automatically, and they don't return values to each other, natively. Generators are asymmetric to their callers, they're much like functions. Coroutines are more symmetric. They "return" to each other values. They are not determined as caller and callee, but they cooperate on the same level. Therefore, threads and coroutines look more similar, just that coroutines usually are'nt scheduled automatically. Add a scheduler, don't pass values, and you have threads, nearly. (of course I dropped the I/O blocking stuff which doesn't apply and isn't the intent of fake threads). ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

[Vladimir Marangozov]
Yes. I'm trying to understand the following:
1. What does a generator generate?
Any sequence of objects: the lines in a file, the digits of pi, a postorder traversal of the nodes of a binary tree, the files in a directory, the machines on a LAN, the critical bugs filed before 3/1/1995, the set of builtin types, all possible ways of matching a regexp to a string, the 5-card poker hands beating a pair of deuces, ... anything! Icon uses the word "generators", and it's derived from that language's ubiquitous use of the beasts to generate paths in a backtracking search space. In OO languages it may be better to name them "iterators", after the closest common OO concept. The CLU language had full-blown (semi-coroutine, like Icon generators) iterators 20 years ago, and the idea was copied & reinvented by many later languages. Sather is probably the best known of those, and also calls them iterators.
2. Clearly, what's the difference between a generator and a thread?
If you can clearly explain what "a thread" is, I can clearly explain the similarities and differences. Well? I'm holding my breath here <wink>. Generators/iterators are simpler than threads, whether looked at from a user's viewpoint or an implementor's. Their semantics are synchronous and deterministic. Python's for/__getitem__ protocol *is* an iterator protocol already, but if I ask you which is the 378th 5-card poker hand beating a pair of deuces, and ask you a new question like that every hour, you may start to suspect there may be a better way to *approach* coding enumerations in general <wink>. then-again-there-may-not-be-ly y'rs - tim

[Gordon]
I still don't understand all of this (I have not much of an idea of what Christian's search for hidden registers is about and what kind of analysis he needs) but I think of continuations as requiring (theoretically) coping the current stack (to and from), while generators and coroutines just need their own piece of stack set aside. The difference between any of these and threads (fake or real) is that they pass control explicitly, while threads (typically) presume pre-emptive scheduling, i.e. they make independent parallel progress without explicit synchronization. (Hmm, how do you do this with fake threads? Or are these only required to switch whenever you touch a mutex?) I'm not sure if there's much of a difference between generators and coroutines -- it seems just the termination convention. (Hmm... would/should a generator be able to raise an exception in its caller? A coroutine?) --Guido van Rossum (home page: http://www.python.org/~guido/)

Still trying to make the brain shift from out-of-town to back-to-work... Tim> [GordonM] >> Perhaps Christian's stackless Python would enable green threads... What's a green thread? Skip

What's a green thread?
a user-level thread (essentially what you can implement yourself by swapping stacks, etc). it's enough to write smoothly running threaded programs, but not enough to support true concurrency on multiple processors. also see: http://www.sun.com/solaris/java/wp-java/4.html </F>

Skip Montanaro wrote:
Nano-Threads. Threadless threads, solely Python driven, no system threads needed but possible. Think of the "big" system threads where each can run any number of tiny Python threads. Powered by snake oil - ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

[Tim]
OK, I'll explain. Suppose there's a wrapper for a read() call whose essential code looks like this: Py_BEGIN_ALLOW_THREADS n = read(fd, buffer, size); Py_END_ALLOW_THREADS When the read() call is made, other threads can run. However in green threads (e.g. using Christian's stackless Python, where a thread switcher is easily added) the whole program would block at this point. The way to fix this is to have a way to tell the scheduler "come back to this thread when there's input ready on this fd". The scheduler has to combine such calls from all threads into a single giant select. It gets more complicated when you have blocking I/O wrapped in library functions, e.g. gethostbyname() or fread(). Then, you need to have a way to implement sleep() by talking to the thread schedule (remember, this is the thread scheduler we have to write ourselves). Oh, and of course the thread scheduler must also have a select() lookalike API so I can still implement the select module. Does this help? Or am I misunderstanding your complaint? Or is a <wink> missing? --Guido van Rossum (home page: http://www.python.org/~guido/)

[Tim, claims not to understand Guido's
] [Guido replies, sketching an elaborate scheme for making threads that are fake nevertheless act like real threads in the particular case of potentially blocking I/O calls]
No missing wink; I think it hinges on a confusion about the meaning of your original word "useful". Threads can be very useful purely as a means for algorithm structuring, due to independent control flows. Indeed, I use threads in Python most often these days without any hope or even *use* for potential parallelism (overlapped I/O or otherwise). It's the only non-brain-busting way to write code now that requires advanced control of the iterator, generator, coroutine, or even independent-agents-in-a-pipeline flavors. Fake threads would allow code like that to run portably, and also likely faster than with the overheads of OS-level threads. For pedagogical and debugging purposes too, fake threads could be very much friendlier than the real thing. Heck, we could even run them on a friendly old Macintosh <wink>. If all fake threads block when any hits an I/O call, waiting for the latter to return, we're no worse off than in a single-threaded program. Being "fake threads", it *is* a single-threaded program, so it's not even a surprise <wink>. Maybe in your Py_BEGIN_ALLOW_THREADS n = read(fd, buffer, size); Py_END_ALLOW_THREADS you're assuming that some other Python thread needs to run in order for the read implementation to find something to read? Then that's a dead program for sure, as it would be for a single-threaded run today too. I can live with that! I don't expect fake threads to act like real threads in all cases. My assumption was that the BEGIN/END macros would do nothing under fake threads -- since there isn't a real thread backing it up, a fake thread can't yield in the middle of random C code (Python has no way to capture/restore the C state). I didn't picture fake threads working except as a Python-level feature, with context switches limited to bytecode boundaries (which a stackless ceval can handle with ease; the macro context switch above is "in the middle of" some bytecode's interpretation, and while "green threads" may be interested in simulating the that, Tim's "fake threads" aren't). different-threads-for-different-heads-ly y'rs - tim

[Tim responds, explaining that without this threads are quite useful.] I guess it's all in the perspective. 99.99% of all thread apps I've ever written use threads primarily to overlap I/O -- if there wasn't I/O to overlap I wouldn't use a thread. I think I share this perspective with most of the thread community (after all, threads originate in the OS world where they were invented as a replacement for I/O completion routines). (And no, I don't use threads to get the use of multiple CPUs, since I almost never have had more than one of those. And no, I wasn't expecting the read() to be fed from another thread.) As far as I can tell, all the examples you give are easily done using coroutines. Can we call whatever you're asking for coroutines instead of fake threads? I think that when you mention threads, green or otherwise colored, most people who are at all familiar with the concept will assume they provide I/O overlapping, except perhaps when they grew up in the parallel machine world. Certainly all examples I give in my never-completed thread tutorial (still available at http://www.python.org/doc/essays/threads.html) use I/O as the primary motivator -- this kind of example appeals to simples souls (e.g. downloading more than one file in parallel, which they probably have already seen in action in their web browser), as opposed to generators or pipelines or coroutines (for which you need to have some programming theory background to appreciate the powerful abstraction possibillities they give). Another good use of threads (suggested by Sam) is for GUI programming. An old GUI system, News by David Rosenthal at Sun, used threads programmed in PostScript -- very elegant (and it failed for other reasons -- if only he had used Python instead :-). On the other hand, having written lots of GUI code using Tkinter, the event-driven version doesn't feel so bad to me. Threads would be nice when doing things like rubberbanding, but I generally agree with Ousterhout's premise that event-based GUI programming is more reliable than thread-based. Every time your Netscape freezes you can bet there's a threading bug somewhere in the code. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I don't think this would match it. These threads can be implemented by coroutines which always run apart, and have some scheduling running. When there is polled I/O available, they can of course give a threaded feeling. If an application polls the kbhit function instead of reading, the other "threads" can run nicely. Can be quite useful for very small computers like CE. Many years before, I had my own threads under Turbo Pascal (I had no idea that these are called so). Ok, this was DOS, but it was enough of threading to have a "process" which smoothly updated a graphics screen, while another (single! :) "process" wrote data to the disk, a third one handled keyboard input, and a fourth drove a multichannel A/D sampling device. ? Oops, I just realized that these were *true* threads. The disk process would not run smooth, I agree. All the rest would be fine with green threads. ...
Right. But with a traceback instead of a machine hang, this could be more attractive to do. Green threads/coroutines are incredibly fast (one c call per switch). And since they have local state, you can save most of the attribute lookups which are needed with event based programming. (But this is all theory until we tried it). ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

I've been out of town, too (not with Skip), but I'll jump back in here... [Guido]
I suppose, in the best of all possible worlds, this is true. But I'm fairly sure there are a number of well-used green thread implementations which go only part way - eg, if this is a "selectable" fd, do a select with a timeout of 0 on this one fd and choose to read/write or swap accordingly. That's a fair amount of bang for the buck, I think... [Tim]
Threads can be very useful purely as a means for algorithm structuring, due to independent control flows.
Spoken like a true schizo, Tim me boyos! Actually, you and Guido are saying almost the same thing - threads are useful when more than one thing is "driving" your processing. It's just that in the real world, that's almost always I/O, not some sick, tortured internal dialogue... I think the real question is: how useful would this be on a Mac? On Win31? (I'll answer that - useful, though I've finally got my last Win31 client to promise to upgrade, RSN <hack, cough>). - Gordon

[Tim]
Threads can be very useful purely as a means for algorithm structuring, due to independent control flows.
FWIW, I've been following the coroutine/continuation/generator bit with 'academic' interest -- the CS part of my brain likes to read about them. Prompted by Tim's latest mention of Demo/threads/Generator.py, I looked at it (again?) and *immediately* grokked it and realized how it'd fit into a tool I'm writing. Nothing to do with concurrency, I/O, etc -- just compartmentalization of stateful iterative processes (details too baroque to go over). More relevantly, that tool would be useful on thread-less Python's (well, when it reaches usefulness on threaded Pythons =). Consider me pro-generator, and still agnostic on the co* things. --david

[David Ascher]
"stateful iterative process" is a helpful characterization of where these guys can be useful! State captured in variables is the obvious one, but simply "where you are" in a mass of nested loops and conditionals is also "state" -- and a kind of state especially clumsy to encode as data state instead (ever rewrite a hairy recursive routine to use iteration with an explicit stack? it's a transformation that can be mechanized, but the result is usually ugly & often hard to understand). Once it sinks in that it's *possible* to implement a stateful iterative process in this other way, I think you'll find examples popping up all over the place.
More relevantly, that tool would be useful on thread-less Python's (well, when it reaches usefulness on threaded Pythons =).
As Guido pointed out, the API provided by Generator.py is less restrictive than any that can be built with the "one frame" flavor of generator ("resumable function"). Were you able to make enough sense of the long discussion that ensued to guess whether the particular use you had in mind required Generator.py's full power? If you couldn't tell, post the baroque details & I'll tell you <wink>. not-putting-too-fine-a-point-on-possible-vs-natural-ly y'rs - tim

On Sun, 11 Jul 1999, Tim Peters wrote:
I'm pretty sure the use I mentioned would fit in even the simplest version of a generator. As to how much sense I made of the discussion, let's just say I'm glad there's no quiz at the end. I did shudder at the mention of unmentionables (male public displays of affection -- yeaach!), yodel at the mention of Lord Greystoke swinging among stack branches and chuckled at the vision of him being thrown back in a traceback (ouch! ouch! ouch!, "most painful last"...). --david

[Tim]
This is another key description of continuations (maybe not quite worth a hug :). The continuation captures exactly all state that is represented by "position in the program" and no state that is represented by variables. But there are many hairy details. In antiquated assembly, there might not be a call stack, and a continuation could be represented by a single value: the program counter. But now we have a call stack, a value stack, a block stack (in Python) and who knows what else. I'm trying to understand whether we can get away with saving just a pointer to a frame, whether we need to copy the frame, or whether we need to copy the entire frame stack. (In regular Python, the frame stack also contains local variables. These are explicitly exempted from being saved by a continuation. I don't know how Christian does this, but I presume he uses the dictionary which can be shared between frames.) Let's see... Say we have this function: def f(x): try: return 1 + (-x) finally: print "boo" The bytecode (simplified) looks like: SETUP_FINALLY (L1) LOAD_CONST (1) LOAD_FAST (x) UNARY_NEGATIVE BINARY_ADD RETURN_VALUE L1: LOAD_CONST ("boo") PRINT_ITEM PRINT_NEWLINE END_FINALLY Now suppose that the unary minus operator saves its continuation (e.g. because x was a type with a __neg__ method). At this point there is an entry on the block stack pointing to L1 as the try-finally block, and the value stack has the value 1 pushed on it. Clearly if that saved continuation is ever invoked (called? used? activated? What do you call what you do to a continuation?) it should substitute whatever value was passed into the continuation for the result of the unary minus, and the program should continue by pushing it on top of the value stack, adding it to 1, and returning the result, executing the block of code at L1 on the way out. So clearly when the continuation is used, 1 should be on the value stack and L1 should be on trh block stack. Assuming that the unary minus function initially returns just fine, the value stack and the block stack of the frame will be popped. So I conclude that saving a continuation must save at least the value and block stack of the frame being saved. Is it safe not to save the frame and block stacks of frames further down on the call stack? I don't think so -- these are all destroyed when frames are popped off the call stack (even if the frame is kept alive, its value and block stack are always empty when the function has returned). So I hope that Christian has code that saves the frame and block stacks! (It would be fun to try and optimize this by doing it lazily, so that frames which haven't returned yet aren't copied yet.) How does Scheme do this? I don't know if it has something like the block stack, but surely it has a value stack! Still mystified, --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido wonders about continuations -- must be a bad night for sleep <wink>] Paul Wilson's book-in-progress has a (large) page of HTML that you can digest quickly and that will clear up many mysteries: ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_142.html Scheme may be the most-often implemented language on Earth (ask 100 Schemers what they use Scheme for, persist until you get the truth, and 81 will eventually tell you that mostly they putz around writing their own Scheme interpreter <0.51 wink>); so there are a *lot* of approaches out there. Wilson describes a simple approach for a compiler. A key to understanding it is that continuations aren't "special" in Scheme: they're the norm. Even plain old calls are set up by saving the caller's continuation, then handing control to the callee. In Wilson's approach, "the eval stack" is a globally shared stack, but at any given moment contains only the eval temps relevant to the function currently executing. In preparation for a call, the caller saves away its state in "a continuation", a record which includes: the current program counter a pointer to the continuation record it inherited a pointer to the structure supporting name resolution (locals & beyond) the current eval stack, which gets drained (emptied) at this point There isn't anything akin to Python's block stack (everything reduces to closures, lambdas and continuations). Note: the continuation is immutable; once constructed, it's never changed. Then the callees' arguments are pushed on the eval stack, a pointer to the continuation as saved above is stored in "the continuation register", and control is transferred to the callee. Then a function return is exactly the same operation as "invoking a continuation": whatever is in the continuation register at the time of the return/invoke is dereferenced, and the PC, continuation register, env pointer and eval stack values are copied out of the continuation record. The return value is passed back in another "virtual register", and pushed onto the eval stack first thing after the guts of the continuation are restored. So this copies the eval stack all the time, at every call and every return/invoke. Kind of. This is partly why "tail calls" are such a big deal in Scheme: a tail call need not (*must* not, in std Scheme) create a new continuation. The target of a tail call simply inherits the continuation pointer inherited by its caller. Of course many Scheme implementations optimize beyond this.
In the absence of tail calls, the approach above saves the stack on every call and restores it on every return, so there's no "extra" copying needed when capturing, or invoking, a continuation (cold comfort, I agree <wink>). About Christian's code, we'd better let it speak for itself -- I'm not clear on the details of what he's doing today. Generalities:
Yes, but nothing gets copied until a continuation gets captured, and at the start of that I believe only one frame gets cloned.
(It would be fun to try and optimize this by doing it lazily, so that frames which haven't returned yet aren't copied yet.)
He's aware of that <wink>.
How does Scheme do this? I don't know if it has something like the block stack, but surely it has a value stack!
Stacks and registers and such aren't part of the language spec, but, you bet -- however it may be spelled in a given implementation, "a value stack" is there. BTW, many optimizing Schemes define a weaker form of continuation too (call/ec, for "escaping continuation"). Skipping the mumbo jumbo definition <0.9 wink>, you can only invoke one of those if its target is on the path back from the invoker to the root of the call tree (climb up tree like Cheetah, not leap across branches like Tarzan). This amounts to a setjmp/longjmp in C -- and may be implemented that way! i-say-do-it-right-or-not-at-all<wink>-ly y'rs - tim

Tim Peters wrote: ...
Right, maybe this would do enough. We will throw away what's not needed, when we know what we actually need...
i-say-do-it-right-or-not-at-all<wink>-ly y'rs - tim
...and at the moment I think it was right to take it all. just-fixing-continuations-spun-off-in-an-__init__-which- -is-quite-hard-since-still-recursive,-and-I-will-ship-it-ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

Guido van Rossum wrote: ...
You need to preserve the stack and the block stack of a frame, if and only if it can be reached twice. I make this dependent from its refcount. Every frame monitors itself before and after every call_function, if a handler field in the frame "f_callguard" has been set. If so, the callguard is called. Its task is to see wether we must preserve the current state of the frame and to carry this out. The idea is to create a shadow frame "on demand". When I touch a frame with a refcount > 1, I duplicate it at its f_back pointer. By that is is turned into a "continuation frame" which is nothing more than the stack copy, IP, and the block stack. By that, the frame stays in place where it was, all pointers are still fine. The "real" one is now in the back, and the continuation frame's purpose when called is only to restore the state of the "real one" and run it (after doing a new save if necessary). I call this technique "push back frames".
I keep the block stack and a stack copy. All the locals are only existing once. The frame is also only one frame. Actually always a new one (due to push back), but virtually it is "the frame", with multiple continuation frames pointing at it. ...
Clearly if that saved continuation is ever invoked (called? used? activated? What do you call what you do to a continuation?)
I think of throwing. Mine are thrown. The executive of standard frames is "eval_code2_loop(f, passed_retval)", where the executive of a continuation frame is "throw_continuation(f, passed_retval)". ...
:-) I have exactly that, and I do it lazily already. Unless somebody saves a continuation, nothing special happens. But if he does, the push back process follows his path like a zip (? Reißverschluß) and ensures that the path can be walked again. Tarzan has now the end of this liane in his hand. He might use it to swing over, or he might drop it, and it ribbles away and vanishes as if it never existed. Give me some final testing, and you will be able to try it out in a few days. ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

Backtracking a bit: [Guido]
This is another key description of continuations (maybe not quite worth a hug :).
I suppose a kiss is out of the question, then.
The continuation captures exactly all state that is represented by "position in the program" and no state that is represented by variables.
Right!
As you convinced yourself in following paragraphs, for 1st-class continuations "the entire frame stack" *may* be necessary.
... How does Scheme do this?
I looked up R. Kent Dybvig's doctoral dissertation, at ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/3imp.ps.gz He gives detailed explanations of 3 Scheme implementations there (from whence "3imp", I guess). The first is all heap-based, and looks much like the simple Wilson implementation I summarized yesterday. Dybvig profiled it and discovered it spent half its time in, together, function call overhead and name resolution. So he took a different approach: Scheme is, at heart, just another lexically scoped language, like Algol or Pascal. So how about implementing it with a perfectly conventional shared, contiguous stack? Because that doesn't work: the advanced features (lexical closures with indefinite extent, and user-captured continuations) aren't stack-like. Tough, forget those at the start, and do whatever it takes later to *make* 'em work. So he did. When his stack implementation hit a user's call/cc, it made a physical copy of the entire stack. And everything ran much faster! He points out that "real programs" come in two flavors: 1) Very few, or no, call/cc thingies. Then most calls are no worse than Algol/Pascal/C functions, and the stack implementation runs them at Algol/Pascal/C speed (if we knew of anything faster than a plain stack, the latter would use it). 2) Lots of call/cc thingies. Then "the stack" is likely to be shallow (the program is spending most of its time co-transferring, not recursing deeply), and because the stack is contiguous he can exploit the platform's fastest block-copy operation (no need to chase pointer links, etc). So, in some respects, Dybvig's stack implementation of Scheme was more Pythonic than Python's current implementation <wink -- Dybvig effectively used a Python list at this level, while Python effectively uses a Lispish cons'ed frame list>. His third implementation was for some propeller-head theoretical "string machine", so I won't even mention it. worrying-about-the-worst-case-can-hurt-the-normal-cases-ly y'rs - tim

Just so Guido doesn't feel like the quesion is being ignored <wink>:
... How does Scheme do this? [continuations]
One more reference here. Previously sketched Wilson's simple heap implementation and Dybvig's simple stack one. They're easy to understand, but are (heap) slow all the time, or (stack) fast most of the time but horribly slow in some cases. For the other extreme end of things, check out: Representing Control in the Presence of First-Class Continuations Robert Hieb, R. Kent Dybvig, and Carl Bruggeman PLDI, June 1990 http://www.cs.indiana.edu/~dyb/papers/stack.ps In part: In this paper we show how stacks can be used to implement activation records in a way that is compatible with continuation operations, multiple control threads, and deep recursion. Our approach allows a small upper bound to be placed on the cost of continuation operations and stack overflow and underflow recovery. ... ordinary procedure calls and returns are not adversely affected. ... One important feature of our method is that the stack is not copied when a continuation is captured. Consequently, capturing a continuation is very efficient, and objects that are known to have dynamic extent can be stack allocated and modified since they remain in the locations in which they were originally allocated. By copying only a small portion of the stack when a continuation is reinstated, reinstatement costs are bounded by a small constant. The basic gimmick is a segmented stack, where large segments are heap-allocated and each contains multiple contiguous frames (across their code base, only 1% of frames exceeded 30 machine words). But this is a complicated approach, best suited for industrial-strength native-code compilers (speed at any cost -- the authors go thru hell to save an add here, a pointer store there, etc). At least at the time the paper was written, it was the approach implemented by Dybvig's Chez Scheme (a commercial native-code Scheme compiler noted for high speed). Given that Python allocates frames from the heap, I doubt there's a much faster approach than the one Christian has crafted out of his own sweat and blood! It's worth a paper of its own. or-at-least-two-hugs<wink>-ly y'rs - tim

[Guido]
Different perspective indeed! Where I've been, you never used something as delicate as a thread to overlap I/O, you instead used the kernel-supported asynch Fortran I/O extensions <0.7 wink>. Those days are long gone, and I've adjusted to that. Time for you to leave the past too <wink>: by sheer numbers, most of the "thread community" *today* is to be found typing at a Windows box, where cheap & reliable threads are a core part of the programming culture. They have better ways to overlap I/O there too. Throwing explicit threads at this is like writing a recursive Fibonacci number generator in Scheme, but building the recursion yourself by hand out of explicit continuations <wink>.
I have multiple agendas, of course. What I personally want for my own work is no more than Icon's generators, formally "semi coroutines", and easily implemented in the interpreter (although not the language) as it exists today. Coroutines, fake threads and continuations are much stronger than generators, and I expect you can fake any of the first three given either of the others. Generators fall out of any of them too (*you* implemented generators once using Python threads, and I implemented general coroutines -- "fake threads" are good enough for either of those). So, yes, for that agenda any means of suspending/resuming control flow can be made to work. I seized on fake threads because Python already has a notion of threads. A second agenda is that Python could be a lovely language for *learning* thread programming; the threading module helps, but fake threads could likely help more by e.g. detecting deadlocks (and pointing them out) instead of leaving a thread newbie staring at a hung system without a clue. A third agenda is related to Mark & Greg's, making Python's threads "real threads" under Windows. The fake thread agenda doesn't tie into that, except to confuse things even more if you take either agenda seriously <0.5 frown>.
They didn't suggest I/O to me at all, but I grew up in the disqualified world <wink>; doubt they would to a Windows programmer either (e.g., my employer ships heavily threaded Windows apps of various kinds, and overlapped I/O isn't a factor in any of them; it's mostly a matter of algorithm factoring to keep the real-time incestuous subsystems from growing impossibly complex, and in some of the very expensive apps also a need to exploit multiple processors). BTW, I called them "fake" threads to get away from whatever historical baggage comes attached to "green".
The preceding "99.99% of all thread apps I've ever written use threads primarily to overlap I/O" may explain this <wink>. BTW, there is only one example there, which rather dilutes the strength of the rhetorical "all" ...
I don't at all object to using I/O as a motivator, but the latter point is off base. There is *nothing* in Comp Sci harder to master than thread programming! It's the pinnacle of perplexity, the depth of despair, the king of confusion (stop before I exaggerate <wink>). Generators in particular get re-invented often as a much simpler approach to suspending a subroutine's control flow; indeed, Icon's primary audience is still among the humanities, and even dumb linguists <wink> don't seem to have notable problems picking it up. Threads have all the complexities of the other guys, plus races, deadlocks, starvation, load imbalance, non-determinism and non-reproducibility. Threads simply aren't simple-soul material, no matter how pedestrian a motivating *example* may be. I suspect that's why your tutorial remains unfinished: you had no trouble describing the problem to be solved, but got bogged down in mushrooming complications describing how to use threads to solve it. Even so, the simple example at the end is already flawed ("print" isn't atomic in Python, so the print len(text), url may print the len(text) from one thread followed by the url from another). It's not hard to find simple-soul examples for generators either (coroutines & continuations *are* hard to motivate!), especially since Python's for/__getitem__ protocol is already a weak form of generator, and xrange *is* a full-blown generator; e.g., a common question on c.l.py is how to iterate over a sequence backwards: for x in backwards(sequence): print x def backwards(s): for i in xrange(len(s)-1, -1, -1): suspend s[i] Nobody needs a comp sci background to understand what that *does*, or why it's handy. Try iterating over a tree structure instead & then the *power* becomes apparent; this isn't comp-sci-ish either, unless we adopt a "if they've heard of trees, they're impractical dreamers" stance <wink>. BTW, iterating over a tree is what os.path.walk does, and a frequent source of newbie confusion (they understand directory trees, they don't grasp the callback-based interface; generating (dirname, names) pairs instead would match their mental model at once). *This* is the stuff for simple souls!
I don't use Netscape, but I can assure you the same is true of Internet Explorer -- except there the threading bug is now somewhere in the OS <0.5 wink>. Anyway, 1) There are lots of goods uses for threads, and especially in the Windows and (maybe) multiprocessor NumPy worlds. Those would really be happier with "free-threaded threads", though. 2) Apart from pedagogical purposes, there probably isn't a use for my "fake threads" that couldn't be done easier & better via a more direct (coroutine, continuation) approach; and if I had fake threads, the first thing I'd do for me is rewrite the generator and coroutine packages to use them. So, yes: you win <wink>. 3) Python's current threads are good for overlapping I/O. Sometimes. And better addressed by Sam's non-threaded "select" approach when you're dead serious about overlapping lots of I/O. They're also beaten into service under Windows, but not without cries of anguish from Greg and Mark. I don't know, Guido -- if all you wanted threads for was to speed up a little I/O in as convoluted a way as possible, you may have been witness to the invention of the wheel but missed that ox carts weren't the last application <wink>. nevertheless-ox-carts-may-be-the-best-ly y'rs - tim

[Tim]
No quibble so far...
They have better ways to overlap I/O there too.
Really? What are they? The non-threaded primitives for overlapping I/O look like Medusa to me: very high performance, but a pain to use -- because of the event-driven programming model! (Or worse, callback functions.) But maybe there are programming techniques that I'm not even aware of? (Maybe I should define what I mean by overlapping I/O -- basically every situation where disk or network I/O or GUI event handling goes on in parallel with computation or with each other. For example, in my book copying a set of files while at the same time displaying some silly animation of sheets of paper flying through the air *and* watching a Cancel button counts as overlapping I/O, and if I had to code this it would probably be a lot simpler to do using threads.
Aren't you contradicting yourself? You say that threads are ubiquitous and easy on Windows (and I agree), yet you claim that threads are overkill for doing two kinds of I/O or one kind of I/O and some computation in parallel...? I'm also thinking of Java threads. Yes, the GC thread is one of those computational threads you are talking about, but I think the examples I've seen otherwise are mostly about having one GUI component (e.g. an applet) independent from other GUI components (e.g. the browser). To me that's overlapping I/O, since I count GUI events as I/O.
Coroutines, fake threads and continuations? Can you really fake continuations given generators?
Hm. Maybe I'm missing something. Why didn't you simply say "you can fake each of the others given any of these"?
Yes.
What makes them unreal except for the interpreter lock? Python threads are always OS threads, and that makes them real enough for most purposes... (I'm not sure if there are situations on uniprocessors where the interpreter lock screws things up that aren't the fault of the extension writer -- typically, problems arise when an extension does some blocking I/O but doesn't place Py_{BEGIN,END}_ALLOW_THREADS macros around the call.)
Hm, you admit that they sometimes want to use multiple CPUs, which was explcitly excluded from our discussion (since fake threads don't help there), and I bet that they are also watching some kind of I/O (e.g. whether the user says some more stuff).
BTW, I called them "fake" threads to get away from whatever historical baggage comes attached to "green".
Agreed -- I don't understand where green comes from at all. Does it predate Java?
OK, ok. I was planning on more along the same lines. I may have borrowed this idea from a Java book I read.
I dunno, but we're probably both pretty poor predictors for what beginning programmers find hard. Randy Pausch (of www.alice.org) visited us this week; he points out that we experienced programmers are very bad at gauging what newbies find hard, because we've been trained "too much". He makes the point very eloquently. He also points out that in Alice, users have no problem at all with parallel activities (e.g. the bunny's head rotating while it is also hopping around, etc.).
Strange. Maybe dumb linguists are better at simply copying examples without thinking too much about them; personally I had a hard time understanding what Icon was doing when I read about it, probably because I tried to understand how it was done. For threads, I have a simple mental model. For coroutines, my head explodes each time.
No, I simply realized that I had to finish the threading module and release the thread-safe version of urllib.py before I could release the tutorial; and then I was distracted and never got back to it.
Fine -- that's a great excuse to introduce locks in the next section. (Most threading tutorials I've seen start by showing flawed examples to create an appreciation for the need of locks.)
But backwards() also returns, when it's done. What happens with the return value?
Probably right, although I think that os.path.walk just has a bad API (since it gives you a whole directory at a time instead of giving you each file).
This is independent of Python, and is (I think) fairly common knowledge -- if you have 10 threads this works fine, but with 100s of them the threads themselves become expensive resources. But then you end up with contorted code which is why high-performance systems require experts to write them.
They're also beaten into service under Windows, but not without cries of anguish from Greg and Mark.
Not sure what you mean here.
What were those applications of threads again you were talking about that could be serviced by fake threads that weren't coroutines/generators? --Guido van Rossum (home page: http://www.python.org/~guido/)

Hmmm. I jumped back into this one, but never saw my post show up... Threads (real or fake) are useful when more than one thing is "driving" your processing. It's just that in the real world (a place Tim visited, once, but didn't like - or was it vice versa?) those "drivers" are normally I/O. Guido complained that to do it right would require gathering up all the fds and doing a select. I don't think that's true (at least, for a decent fake thread). You just have to select on the one (to see if the I/O will work) and swap or do it accordingly. Also makes it a bit easier for portability (I thought I heard that Mac's select is limited to sockets). I see 2 questions. First, is there enough of an audience (Mac, mostly, I think) without native threads to make them worthwhile? Second, do we want to introduce yet more possibilities for brain-explosions by enabling coroutines / continuations / generators or some such? There is practical value there (as Sam has pointed out, and I now concur, watching my C state machine grow out of control with each new client request). I think the answer to both is probably "yes", and though they have a lot in common technically, they have totally different rationales. - Gordon

[Gordon McMillan]
Hmmm. I jumped back into this one, but never saw my post show up...
Me neither! An exclamation point because I see there's a recent post of yours in the Python-Dev archives, but I didn't get it in the mail either.
Yes, but that's the consensus view of "real", and so suffers from "ten billion flies can't be wrong" syndrome <wink>. If you pitch a parallel system to the NSA, they give you a long list of problems and ask you to sketch the best way to solve them on your platform; as I recall, none had anything to do with I/O even under Guido's definition; instead tons of computation with difficult twists, and enough tight coupling to make threads the natural approach in most cases. If I said any more they'd terminate me with extreme prejudice, and the world doesn't get any realer than that <wink>.
Can you flesh out the "swap" part more? That is, we're in the middle of some C code, so the C stack is involved in the state that's being swapped, and under fake threads we don't have a real thread to magically capture that.
a) Generators aren't enough for Sam's designs. b) Fake threads are roughly comparable to coroutines and continuations wrt power (depending on implementation details, continuations may be strictly most powerful, and coroutines least). c) Christian's stackless Python can, I believe, already do full coroutines, and is close to doing full continuations. So soon we can kick the tires instead of each other <wink>. or-what-the-heck-we-can-akk-kick-chris-ly y'rs - tim

[I jump back into a needlessly contentious thread]: [Gordon McMillan - me]
[Tim]
I can assure you that gov't work isn't "real", even when the problem domain appears to be, which in this case is assuredly not true <wink>. But the point really is that (1) Guido's definition of "I/O" is very broad and (2) given that definition, it probably does account for 99% of the cases. Which is immaterial, if the fix for one fixes the others.
Sure - it's spelled "T I S M E R". IFRC, this whole thread started with Guido dumping cold water on the comment that perhaps Chris's work could yield green (er, "fake") threads.
OK, but they're still (minorly) mind expanding for someone from the orthodox C / Python world...
So then we're down to Tim faking the others from whatever Chris comes up with? Sounds dandy to me! (Yah, bitch and moan Tim; you'd do it anyway...). (And yes, we're on the "dev" list; this is all experimental; so Guido can just live with being a bit uncomfortable with it <wink>). The rambling arguments have had to do with "reasons" for doing this stuff. I was just trying to point out that there are a couple valid but very different reasons: 1) Macs. 2) Sam. almost-a-palindrome-ly y'rs - Gordon

"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> Me neither! An exclamation point because I see there's a TP> recent post of yours in the Python-Dev archives, but I didn't TP> get it in the mail either. A bad performance problem in Mailman was causing cpu starvation and (I'm surmising) lost messages. I believe I've fixed this in the version currently running on python.org. If you think messages are showing up in the archives but you are still not seeing them delivered to you, please let me know via webmaster@python.org! -Barry

[Guido and Tim, Guido and Tim] Ouch! This is getting contentious. Let's unwind the "you said, I said, you said" business a bit. Among the three {coroutines, fake threads, continuations}, I expect any could be serviceably simulated via either of the others. There: just saved a full page of sentence diagramming <wink>. All offer a strict superset of generator semantics. It follows that, *given* either coroutines or continuations, I indeed see no semantic hole that would be plugged by fake threads. But Python doesn't have any of the three now, and there are two respects in which fake threads may have an advantage over the other two: 1) Pedagogical, a friendlier sandbox for learning "real threads". 2) Python already has *a* notion of threads. So fake threads could be seen as building on that (variation of an existing concept, as opposed to something unprecedented). I'm the only one who seems to see merit in #2, so I won't mention it again: fake threads may be an aid to education, but other than that they're useless crap, and probably cause stains if not outright disk failure <wink>. About newbies, I've seen far too many try to learn threads to entertain the notion that they're easier than I think. Threads != parallel programming, though! Approaches like Gelertner's Linda, or Klappholz's "refined languages", *are* easy for newbies to pick up, because they provide clear abstractions that prevent the worst parallelism bugs by offering primitives that *can't* e.g. deadlock. threading.py is a step in the right direction (over the "thread" module) too. And while I don't know what Alice presents as a parallelism API, I'd bet 37 cents unseen that the Alice user doesn't build "parallel activities" out of thread.start_new_thread and raw mutii <wink>. About the rest, I think you have a more expansive notion of I/O than I do, although I can squint and see what you mean; e.g., I *could* view almost all of what Dragon's products do as I/O, although it's a real stretch for the thread that just polls the other threads making sure they're still alive <wink -- part of our text-to-speech system is licensed, and we don't have the source, and it dies in mysterious ways; so we run it in a thread and restart it whenever it vanishes -- can't afford to have the whole app die!>. Back to quoting:
They're a general approach (like continuations) but, yes, given an asynch I/O interface most times I'd much rather use the latter (like I'd rather use recursion directly when it's available). BTW, I didn't say threads were "easy" under Windows: cheap, reliable & ubiquitous, yes. They're easier than under many other OSes thanks to a rich collection of system-supplied thread gimmicks that actually work, but no way are they "easy". Like you did wrt hiding "thread" under "threading", even under Windows real projects have to create usable app-specific thread-based abstractions (c.f. your on-target remark about Netscape & thread bugs).
Whereas I don't. So let's just agree to argue about this one with ever-increasing intensity <wink>.
We should move this part to the Thread-SIG; Mark & Greg are doubtless chomping at the bit to rehash the headaches the global lock causes under Windows <wink>; I'm not so keen either to brush off the potential benefits of multiprocessor parallelism, particularly not with the price of CPUs falling into spare-change range.
Hmm! What kinds of problems happen then? Just a lack of hoped-for overlap, or actual deadlock (the I/O thread needing another thread to proceed for itself to make progress)? If the latter, the extension writer's view of who's at fault may differ from ours <wink -- but I agree with you here>.
I've been ranting about both fake threads and real threads, and don't recall excluding anything; I do think I *should* have, though <smile>.
and I bet that they are also watching some kind of I/O (e.g. whether the user says some more stuff).
Sure, and whether the phone rings, and whether text-to-speech is in progress, and tracking the mouse position, and all sorts of other arguably I/O-like stuff too. Some of the subsytems are thread-unaware legacy or 3rd-party code, and need to run in threads dedicated to them because they believe they own the entire machine (working via callbacks). The coupling is too tight to afford IPC mechanisms, though (i.e., running these in a separate process is not an option). Mostly it's algorithm-factoring, though: text-to-speech and speech-to-text both require mondo complex processing, and the "I/O part" of each is a small link at an end of a massive chain. Example: you say something, and you expect to see "the result" the instant you stop speaking. But the CPU cycles required to recognize 10 seconds of speech consumes, alas, about 10 seconds. So we *have* to overlap the speech collection with the signal processing, the acoustic feature extraction, the acoustic scoring, the comparison with canned acoustics for many tens of thousands of words, the language modeling ("sounded most like 'Guido', but considering the context they probably said 'ghee dough'"), and so on. You simply can't write all that as a monolothic algorithm and have a hope of it working; it's most naturally a pipeline, severely complicated in that what pops out of the end of the first stage can have a profound effect on what "should have come out" at the start of the last stage. Anyway, thread-based pseudo-concurreny is a real help in structuring all that. It's *necessary* to overlap speech collection (input) with computation and result-so-far display (output), but it doesn't stop there.
Don't know, but I never heard of it before Java or outside of Solaris. [about generators & dumb linguists]
Yes, I expect the trick for "dumb linguists" is that they don't try to understand. They just use it, and it works or it doesn't. BTW, coroutines are harder to understand because of (paradoxically!) the symmetry; generators are slaves, so you don't have to bifurcate your brain to follow what they're doing <wink>.
Even better, they start with an endless sequence of flawed examples that makes the reader wonder if there's *any* way to get this stuff to work <wink>.
But backwards() also returns, when it's done. What happens with the return value?
I don't think a newbie would think to ask that: it would "just work" <wink>. Seriously, in Icon people quickly pick up that generators have a "natural lifetime", and when they return their life is over. It hangs together nicely enough that people don't have to think about it. Anyway, "return" and "suspend" both return a value; the only difference is that "return" kills the generator (it can't be resumed again after a return). The pseudo-Python above assumed that a generator signals the end of its life by returning None. Icon uses a different mechanism.
Well, in Ping's absence I've generally fielded the c.l.py questions about tokenize.py too, and there's a pattern: non-GUI people simply seem to find callbacks confusing! os.path.walk has some other UI glitches (like "arg" is the 3rd argument to walk but the 1st arg to the callback, & people don't know what its purpose is anyway), but I think the callback is the core of it (& "arg" is an artifact of the callback interface). I can't help but opine that part of what people find so confusing about call/cc in Scheme is that it calls a function taking a callback argument too. Generators aren't strong enough to replace call/cc, but they're exactly what's needed to make tokenize's interface match the obvious mental model ("the program is a stream of tokens, and I want to iterate over that"); c.f. Sam's comments too about layers of callbacks vs "normal control flow".
I think people with a Unix background understand that, but not sure about Windows natives. Windows threads really are cheap, which easily slides into abuse; e.g., the recently-fixed electron-width hole in cleaning up thread states required extreme rates of thread death to provoke, and has been reported by multiple Windows users. An SGI guy was kind enough to confirm the test case died for him too, but did any non-Windows person ever report this bug?
But then you end up with contorted code which is why high-performance systems require experts to write them.
Which feeds back into Sam's agenda: the "advanced" control-flow gimmicks can be used by an expert to implement a high-performance system that doesn't require expertise to use. Fake threads would be good enough for that purpose too (while real threads aren't), although he's got his heart set on one of the others.
What were those applications of threads again you were talking about that could be serviced by fake threads that weren't coroutines/generators?
First, let me apologize for the rhetorical excess there -- it went too far. Forgive me, or endure more of the same <wink>. Second, the answer is (of course) "none", but that was a rant about real threads, not fake ones. so-close-you-can-barely-tell-'em-apart-ly y'rs - tim

Just a few clarifications. I have no time, but need to share what I learned. Tim Peters wrote:
[Guido and Tim, Guido and Tim]
...
I have just proven that this is not true. Full continuations cannot be expressed by coroutines. All the rest is true. Coroutines and fake threads just need the absence of the C stack. To be more exact: It needs that the current state of the C stack is independent from executing bound Python code (which frames are). Now the big surprize: This *can* be done without removing the C stack. It can give more speed to let the stack wind up to some degree and wind down later. Even some Scheme implementations are doing this. But the complexity to make this work correctly is even higher than to be stackless whenever possible. So this is the basement, but improvements are possible and likely to appear. Anyway, with this, you can build fake threads, coroutines and generators. They all do need a little extra treatment. Switching of context, how to stop a coroutine, how to catch exceptions and so on. You can do all that with some C code. I just believe that even that can be done with Python. Here the unsayable continuation word appears. You must have them if you want to try the above *in* Python. Reason why continuations are the hardest of the above to implement and cannot expressed by them: A continuation is the future of some computation. It allows to change the order of execution of a frame in a radical way. A frame can have as many as one dormant continuation per every function call which appears lexically, and it cannot predict which of these is actually a continuation.

I have to say thank you, christian! I think your intent - provide the basis for designers of python's advanced control mechanisms to truly explore, and choose the direction in a well informed way - is ideal, and it's a rare and wonderful opportunity to be able to pursue something like an ideal course. Thanks to your hard work. Whatever comes of this, i think we all have at least refined our understandings of the issues - i know i have. (Thanks also to the ensuing discussion's clarity and incisiveness - i need to thank everyone involved for that...) I may not be able to contribute particularly to the implementation, but i'm glad to be able to grasp the implications as whatever proceeds, proceeds. And i actually expect that the outcome will be much better informed than it would have been without your following through on your own effort to understand. Yay! Ken klm@digicool.com

I'll second Ken's congratulations to Christian! [Christian]
... Full continuations cannot be expressed by coroutines. All the rest is true.
I beg enlightenment from someone more familiar with these high-falutin' concepts. Would the following characterization be accurate? All these beasts (continuations, coroutines, generators) involve the idea of "resumable", but: A generator's state is wholly self-contained A coroutines's state is not necessarily self-contained but it is stable Continuations may have volatile state. Is this right, wrong, necessary, sufficient...?? goto-beginning-to-look-attractive-ly y'rs - Gordon

[Gordon]
"goto" is deliciously ironic, for a reason to become clear <wink>. Here's my biased short course. NOW First, I have the feeling most people would panic if we simply described Python's current subroutine mechanism under a new name <0.9 wink>. I'll risk that. When Python makes a call, it allocates a frame object. Attached to the frame is the info everyone takes for granted so thinks is "simple & obvious" <wink>. Chiefly, "the locals" (name -> object bindings) a little evaluation stack for holding temps and dynamic block-nesting info the offset to the current bytecode instruction, relative to the start of the code object's fixed (immutable) bytecode vector When a subroutine returns, it decrefs the frame and then the frame typically goes away; if it returns because of an exception, though, traceback objects may keep the frame alive. GENERATORS Generators add two new abstract operations, "suspend" and "resume". When a generator suspends, it's exactly like a return today except we simply decline to decref the frame. That's it! The locals, and where we are in the computation, aren't thrown away. A "resume" then consists of *re*starting the frame at its next bytecode instruction, with the retained frame's locals and eval stack just as they were. Some generator properties: + In implementation terms a trivial variation on what Python currently does. + They're asymmetric: "suspend" is something only a generator can do, and "resume" something only its caller can do (this does not preclude a generator from being "the caller" wrt to some other generator, though, and indeed that's very useful in practice). + A generator always returns control directly to its caller, at the point the caller invoked the generator. And upon resumption, a generator always picks up where it left off. + Because a generator remembers where it is and what its locals are, its state and "what to do next" don't have to be encoded in global data structures then decoded from scratch upon entry. That is, whenever you build a little (or large!) state machine to figure out "what to do next" from a collection of persistent flags and state vrbls, chances are good there's a simple algorithm dying to break free of that clutter <wink>. COROUTINES Coroutines add only one new abstract operation, "transfer". They're fully symmetric so can get away with only one. "transfer" names a coroutine to transfer to, and gives a value to deliver to it (there are variations, but this one is common & most useful). When A transfers to B, it acts like a generator "suspend" wrt A and like a generator "resume" wrt B. So A remembers where it is, and what its locals etc are, and B gets restarted from the point *it* last transfered to someone else. Coroutines grew up in simulation languages because they're an achingly natural way to model independent objects that interact with feedback. There each object (which may itself be a complex system of other stuff) is written as an infinite loop, transferring control to other objects when it has something to tell them, and transferred to by other objects when they have something to tell it. A Unix pipeline "A | B | C | D" doesn't exploit the full power but is suggestive. A may be written as while 1: x = compute my next output B.transfer(x) # resume B with my output B as while 1: x = A.transfer() # resume A to get my input y = compute something from x and my own history C.transfer(y) # resume C with my output C as while 1: x = B.transfer() # resume B to get my input y = compute something from x and my own history D.transfer(y) # resume D with my output and D as while 1: x = C.transfer() # resume C to get my input y = compute something from x and my own history print y If e.g. C collapses pairs of values from B, it can be written instead as while 1: # get a pair of B's x = B.transfer() y = B.transfer() z = f(x, y, whatever) D.transfer(z) # resume D with my output It's a local modification to C: B doesn't know and shouldn't need to know. This keeps complex algorithms manageable as things evolve. Initialization and shutdown can be delicate, but once the pipe is set up it doesn't even matter which of {A, B, C, D} first gets control! You can view A as pushing results through the pipe, or D as pulling them, or whatever. In reality they're all equal partners. Why these are so much harder to implement than generators: "transfer" *names* who next gets control, while generators always return to their (unnamed) caller. So a generator simply "pops the stack" when it suspends, while coroutine flow need not be (and typically isn't) stack-like. In Python this is currently a coroutine-killer, because the C stack gets intertwined. So if coroutine A merely calls (in the regular sense) function F, and F tries to transfer to coroutine B, the info needed to resume A includes the chunk of the C stack between A and F. And that's why the Python coroutine implementation I referenced earlier uses threads under the covers (where capturing pieces of the C stack isn't a problem). Early versions of coroutines didn't allow for this, though! At first coroutines could only transfer *directly* to other coroutines, and as soon as a coroutine made "a regular call" transfers were prohibited until the call returned (unless the called function kicked off a brand new collection of coroutines, which could then transfer among themselves -- making the distinction leads to convoluted rules, so modern practice is to generalize from the start). Then the current state of each coroutine was contained in a single frame, and it's really no harder to implement than generators. Knuth seems to have this restricted flavor of coroutine in mind when he describes generator behavior as "semi-coroutine". CONTINUATIONS Given the pedagogical structure so far, you're primed to view continuations as an enhancement of coroutines. And that's exactly what will get you nowhere <wink>. Continuations aren't more elaborate than coroutines, they're simpler. Indeed, they're simpler than generators, and even simpler than "a regular call"! That's what makes them so confusing at first: they're a different *basis* for *all* call-like behavior. Generators and coroutines are variations on what you already know; continuations challenge your fundamental view of the universe. Legend has it they were discovered when theorists were trying to find a solid reason for why goto statements suck: the growth of "denotational semantics" (DS) boomed at the same time "structured programming" took off. The former is a solid & fruitful approach to formally specifying the semantics of programming languages, built on the lambda calculus (and so dear to the Lisp/Scheme community -- this all ties together, of course <wink>). The early hope was that goto statements would prove to present intractable problems for formal specification, and then "that's why they suck: we can't even sort them out on paper, let alone in practice". But in one of God's cleverer tricks on the programming world <tee hee>, the semantics of goto turned out to be trivial: at a branch point, you can go one of two ways. Represent one of those ways by a function f that computes what happens if you branch one way, and the other way by a function g. Then an if+goto simply picks one of f or g as "the continuation" of the program, depending on whether the "if" condition is true or false. And a plain goto simply replaces the current continuation with a different one (representing what happens at the branch target) unconditionally. So goto turned out to be simpler (from the DS view) than even an assignment stmt! I've often suspected theorists were *surprised* (and maybe appalled <0.7 wink>) when the language folks went on to *implement* the continuation idea. Don't really know, but suppose it doesn't matter anyway. The fact is we're stuck with them now <wink>. In theory a continuation is a function that computes "the rest of the program", or "its future". And it really is like a supercharged goto! It's the formal DS basis for all control flow, from goto stmts to exception handling, subsuming vanilla call flow, recursion, generators, coroutines, backtracking, and even loops along the way. To a certain frame of mind (like Sam's, and Christian is temporarily under his evil influence <wink>), this relentless uniformity & consistency of approach is very appealing. Guido tends to like his implementations to mirror his surface semantics, though, and if he has ten constructs they're likely to be implemented ten ways. View that as a preview of future battles that have barely been hinted at so far <0.3 wink>. Anyway, in implementation terms a continuation "is like" what a coroutine would be if you could capture its resumption state at any point (even without the coroutine's knowledge!) and assign that state to a vrbl. So we could say it adds an abstract operation "capture", which essentially captures the program counter, call stack, and local (in Python terms) "block stack" at its point of invocation, and packages all that into a first-class "continuation object". IOW, a building block on top of which a generator's suspend, and the suspend half of a coroutine transfer, can be built. In a pure vision, there's no difference at all between a regular return and the "resume" half of a coroutine transfer: both amount to no more than picking some continuation to evaluate next. A continuation can be captured anywhere (even in the middle of an expression), and any continuation can be invoked at will from anywhere else. Note that "invoking a continuation" is *not* like "a call", though: it's abandoning the current continuation, *replacing* it with another one. In formal DS this isn't formally true (it's still "a call" -- a function application), but in practice it's a call that never returns to its caller so the implementation takes a shortcut. Like a goto, this is as low-level as it gets, and even hard-core continuation fans don't use them directly except as a means to implement better-behaved abstractions. As to whether continuations have "volatile state", I'm not sure what that was asking. If a given continuation is invoked more than once (which is something that's deliberately done when e.g. implementing backtracking searches), then changes made to the locals by the first invocation are visible to the second (& so on), so maybe <wink> the answer is "yes". It's more accurate to think of a continuation as being immutable, though: it holds a reference to the structure that implements name bindings, but does not copy (save or restore) the bindings. Quick example, given: (define continuation 0) (define (test) (let ((i 0)) (call/cc (lambda (k) (set! continuation k))) (set! i (+ i 1)) i)) That's like the Python: def test(): i = 0 global continuation continuation = magic to resume at the start of the next line i = i + 1 return i Then (this is interactive output from a Scheme shell):
too-simple-to-be-obvious?-ly y'rs - tim

Wow. That was by far the clearest tutorial on the subject I think I've read. I guess we need (for Tim to have) more 3 day holiday weekends. i-vote-we-pitch-in-and-pay-tim-to-take-/every/-monday-off-so-he-can-write- more-great-stuff-like-this-ly y'rs, -Barry

Barry> Wow. That was by far the clearest tutorial on the subject I Barry> think I've read. I guess we need (for Tim to have) more 3 day Barry> holiday weekends. What he said. Skip
participants (11)
-
Barry A. Warsaw
-
Christian Tismer
-
David Ascher
-
Fredrik Lundh
-
Gordon McMillan
-
Guido van Rossum
-
Ken Manheimer
-
Mark Hammond
-
Skip Montanaro
-
Tim Peters
-
Vladimir Marangozov