[list has been quiet, thought I'd liven things up a bit. 8^)]
I'm not sure if this has been brought up before in other forums, but has there been discussion of separating the Python and C invocation stacks, (i.e., removing recursive calls to the intepreter) to facilitate coroutines or first-class continuations?
One of the biggest barriers to getting others to use asyncore/medusa is the need to program in continuation-passing-style (callbacks, callbacks to callbacks, state machines, etc...). Usually there has to be an overriding requirement for speed/scalability before someone will even look into it. And even when you do 'get' it, there are limits to how inside-out your thinking can go. 8^)
If Python had coroutines/continuations, it would be possible to hide asyncore-style select()/poll() machinery 'behind the scenes'. I believe that Concurrent ML does exactly this...
Other advantages might be restartable exceptions, different threading models, etc...
-Sam rushing@nightmare.com rushing@eGroups.net
rushing@nightmare.com wrote:
[list has been quiet, thought I'd liven things up a bit. 8^)]
Well, there certainly is enough on the todo list... it's probably the usual "ain't got no time" thing.
I'm not sure if this has been brought up before in other forums, but has there been discussion of separating the Python and C invocation stacks, (i.e., removing recursive calls to the intepreter) to facilitate coroutines or first-class continuations?
Wouldn't it be possible to move all the C variables passed to eval_code() via the execution frame ? AFAIK, the frame is generated on every call to eval_code() and thus could also be generated *before* calling it.
One of the biggest barriers to getting others to use asyncore/medusa is the need to program in continuation-passing-style (callbacks, callbacks to callbacks, state machines, etc...). Usually there has to be an overriding requirement for speed/scalability before someone will even look into it. And even when you do 'get' it, there are limits to how inside-out your thinking can go. 8^)
If Python had coroutines/continuations, it would be possible to hide asyncore-style select()/poll() machinery 'behind the scenes'. I believe that Concurrent ML does exactly this...
Other advantages might be restartable exceptions, different threading models, etc...
Don't know if moving the C stack stuff into the frame objects will get you the desired effect: what about other things having state (e.g. connections or files), that are not even touched by this mechanism ?
M.-A. Lemburg writes:
Wouldn't it be possible to move all the C variables passed to eval_code() via the execution frame ? AFAIK, the frame is generated on every call to eval_code() and thus could also be generated *before* calling it.
I think this solves half of the problem. The C stack is both a value stack and an execution stack (i.e., it holds variables and return addresses). Getting rid of arguments (and a return value!) gets rid of the need for the 'value stack' aspect.
In aiming for an enter-once, exit-once VM, the thorniest part is to somehow allow python->c->python calls. The second invocation could never save a continuation because its execution context includes a C frame. This is a general problem, not specific to Python; I probably should have thought about it a bit before posting...
Don't know if moving the C stack stuff into the frame objects will get you the desired effect: what about other things having state (e.g. connections or files), that are not even touched by this mechanism ?
I don't think either of those cause 'real' problems (i.e., nothing should crash that assumes an open file or socket), but there may be other stateful things that might. I don't think that refcounts would be a problem - a saved continuation wouldn't be all that different from an exception traceback.
-Sam
p.s. Here's a tiny VM experiment I wrote a while back, to explain what I mean by 'stackless':
http://www.nightmare.com/stuff/machine.h http://www.nightmare.com/stuff/machine.c
Note how OP_INVOKE (the PROC_CLOSURE clause) pushes new context onto heap-allocated data structures rather than calling the VM recursively.
Sam> I'm not sure if this has been brought up before in other forums, Sam> but has there been discussion of separating the Python and C Sam> invocation stacks, (i.e., removing recursive calls to the Sam> intepreter) to facilitate coroutines or first-class continuations?
I thought Guido was working on that for the mobile agent stuff he was working on at CNRI.
Skip Montanaro | Mojam: "Uniting the World of Music" http://www.mojam.com/ skip@mojam.com | Musi-Cal: http://www.musi-cal.com/ 518-372-5583
"SM" == Skip Montanaro skip@mojam.com writes:
SM> I thought Guido was working on that for the mobile agent stuff SM> he was working on at CNRI.
Nope, we decided that we could accomplish everything we needed without this. We occasionally revisit this but Guido keeps insisting it's a lot of work for not enough benefit :-)
-Barry
Guido van Rossum writes:
I've come up with two partial solutions: (1) allow for a way to arrange for a call to be made immediately after you return to the VM from C; this would take care of apply() at least and a few other "tail-recursive" cases; (2) invoke a new VM when C code needs a Python result, requiring it to return. The latter clearly breaks certain uses of coroutines but could probably be made to work most of the time. Typical use of the 80-20 rule.
I know this is disgusting, but could setjmp/longjmp 'automagically' force a 'recursive call' to jump back into the top-level loop? This would put some serious restraint on what C called from Python could do...
I think just about any Scheme implementation has to solve this same problem... I'll dig through my collection of them for ideas.
In general, I still think it's a cool idea, but I also still think that continuations are too complicated for most programmers. (This comes from the realization that they are too complicated for me!) Corollary: even if we had continuations, I'm not sure if this would take away the resistance against asyncore/asynchat. Of course I could be wrong.
Theoretically, you could have a bit of code that looked just like 'normal' imperative code, that would actually be entering and exiting the context for non-blocking i/o. If it were done right, the same exact code might even run under 'normal' threads.
Recently I've written an async server that needed to talk to several other RPC servers, and a mysql server. Pseudo-example, with possibly-async calls in UPPERCASE:
auth, archive = db.FETCH_USER_INFO (user) if verify_login(user,auth): rpc_server = self.archive_servers[archive] group_info = rpc_server.FETCH_GROUP_INFO (group) if valid (group_info): return rpc_server.FETCH_MESSAGE (message_number) else: ... else: ...
This code in CPS is a horrible, complicated mess, it takes something like 8 callback methods, variables and exceptions have to be passed around in 'continuation' objects. It's hairy because there are three levels of callback state. Ugh.
If Python had closures, then it would be a *little* easier, but would still make the average Pythoneer swoon. Closures would let you put the above logic all in one method, but the code would still be 'inside-out'.
Different suggestion: it would be cool to work on completely separating out the VM from the rest of Python, through some kind of C-level API specification.
I think this is a great idea. I've been staring at python bytecodes a bit lately thinking about how to do something like this, for some subset of Python.
[...]
Ok, we've all seen the 'stick'. I guess I should give an example of the 'carrot': I think that a web server built on such a Python could have the performance/scalability of thttpd, with the ease-of-programming of Roxen. As far as I know, there's nothing like it out there. Medusa would be put out to pasture. 8^)
-Sam
I know this is disgusting, but could setjmp/longjmp 'automagically' force a 'recursive call' to jump back into the top-level loop? This would put some serious restraint on what C called from Python could do...
Forget about it. setjmp/longjmp are invitations to problems. I also assume that they would interfere badly with C++.
I think just about any Scheme implementation has to solve this same problem... I'll dig through my collection of them for ideas.
Anything that assumes knowledge about how the C compiler and/or the CPU and OS lay out the stack is a no-no, because it means that the first thing one has to do for a port to a new architecture is figure out how the stack is laid out. Another thread in this list is porting Python to microplatforms like PalmOS. Typically the scheme Hackers are not afraid to delve deep into the machine, but I refuse to do that -- I think it's too risky.
In general, I still think it's a cool idea, but I also still think that continuations are too complicated for most programmers. (This comes from the realization that they are too complicated for me!) Corollary: even if we had continuations, I'm not sure if this would take away the resistance against asyncore/asynchat. Of course I could be wrong.
Theoretically, you could have a bit of code that looked just like 'normal' imperative code, that would actually be entering and exiting the context for non-blocking i/o. If it were done right, the same exact code might even run under 'normal' threads.
Yes -- I remember in 92 or 93 I worked out a way to emulat coroutines with regular threads. (I think in cooperation with Steve Majewski.)
Recently I've written an async server that needed to talk to several other RPC servers, and a mysql server. Pseudo-example, with possibly-async calls in UPPERCASE:
auth, archive = db.FETCH_USER_INFO (user) if verify_login(user,auth): rpc_server = self.archive_servers[archive] group_info = rpc_server.FETCH_GROUP_INFO (group) if valid (group_info): return rpc_server.FETCH_MESSAGE (message_number) else: ... else: ...
This code in CPS is a horrible, complicated mess, it takes something like 8 callback methods, variables and exceptions have to be passed around in 'continuation' objects. It's hairy because there are three levels of callback state. Ugh.
Agreed.
If Python had closures, then it would be a *little* easier, but would still make the average Pythoneer swoon. Closures would let you put the above logic all in one method, but the code would still be 'inside-out'.
I forget how this worked :-(
Different suggestion: it would be cool to work on completely separating out the VM from the rest of Python, through some kind of C-level API specification.
I think this is a great idea. I've been staring at python bytecodes a bit lately thinking about how to do something like this, for some subset of Python.
[...]
Ok, we've all seen the 'stick'. I guess I should give an example of the 'carrot': I think that a web server built on such a Python could have the performance/scalability of thttpd, with the ease-of-programming of Roxen. As far as I know, there's nothing like it out there. Medusa would be put out to pasture. 8^)
I'm afraid I haven't kept up -- what are Roxen and thttpd? What do they do that Apache doesn't?
--Guido van Rossum (home page: http://www.python.org/~guido/)
I'm afraid I haven't kept up -- what are Roxen and thttpd? What do they do that Apache doesn't?
a lean and mean secure web server written in Pike (http://pike.idonex.se/), from a company here in Linköping.
</F>
"FL" == Fredrik Lundh fredrik@pythonware.com writes:
FL> a lean and mean secure web server written in Pike FL> (http://pike.idonex.se/), from a company here in FL> Linköping.
Interesting off-topic Pike connection. My co-maintainer for CC-Mode original came on board to add Pike support, which has a syntax similar enough to C to be easily integrated. I think I've had as much success convincing him to use Python as he's had convincing me to use Pike :-)
-Barry
Barry A. Warsaw wrote:
"FL" == Fredrik Lundh fredrik@pythonware.com writes:
FL> a lean and mean secure web server written in Pike FL> (http://pike.idonex.se/), from a company here in FL> Linköping.
Interesting off-topic Pike connection. My co-maintainer for CC-Mode original came on board to add Pike support, which has a syntax similar enough to C to be easily integrated. I think I've had as much success convincing him to use Python as he's had convincing me to use Pike :-)
<HistoricalNote>
Heh. Pike is an outgrowth of the MUD world's LPC programming language. A guy named "Profezzorn" started a project (in '94?) to redevelop an LPC compiler/interpreter ("driver") from scratch to avoid some licensing constraints. The project grew into a generalized network handler, since MUDs' typical designs are excellent for these tasks. From there, you get the Roxen web server.
</HistoricalNote>
Cheers, -g
-- Greg Stein, http://www.lyra.org/
Guido van Rossum wrote:
[setjmp/longjmp -no-no]
Forget about it. setjmp/longjmp are invitations to problems. I also assume that they would interfere badly with C++.
I think just about any Scheme implementation has to solve this same problem... I'll dig through my collection of them for ideas.
Anything that assumes knowledge about how the C compiler and/or the CPU and OS lay out the stack is a no-no, because it means that the first thing one has to do for a port to a new architecture is figure out how the stack is laid out. Another thread in this list is porting Python to microplatforms like PalmOS. Typically the scheme Hackers are not afraid to delve deep into the machine, but I refuse to do that -- I think it's too risky.
...
I agree that this is generally bad. While it's a cakewalk to do a stack swap for the few (X86 based:) platforms where I work with. This is much less than a thread change.
But on the general issues: Can the Python-calls-C and C-calls-Python problem just be solved by turning the whole VM state into a data structure, including a Python call stack which is independent? Maybe this has been mentioned already.
This might give a little slowdown, but opens possibilities like continuation-passing style, and context switches between different interpreter states would be under direct control.
Just a little dreaming: Not using threads, but just tiny interpreter incarnations with local state, and a special C call or better a new opcode which activates the next state in some list (of course a Python list). This would automagically produce ICON iterators (duck) and coroutines (cover). If I guess right, continuation passing could be done by just shifting tiny tuples around. Well, Tim, help me :-)
[closures]
I think this is a great idea. I've been staring at python bytecodes a bit lately thinking about how to do something like this, for some subset of Python.
Lumberjack? How is it going? [to Sam]
ciao - chris
[Christian Tismer]
... But on the general issues: Can the Python-calls-C and C-calls-Python problem just be solved by turning the whole VM state into a data structure, including a Python call stack which is independent? Maybe this has been mentioned already.
The problem is that when C calls Python, any notion of continuation has to include C's state too, else resuming the continuation won't return into C correctly. The C code that *implements* Python could be reworked to support this, but in the general case you've got some external C extension module calling into Python, and then Python hasn't a clue about its caller's state.
I'm not a fan of continuations myself; coroutines can be implemented faithfully via threads (I posted a rather complete set of Python classes for that in the pre-DejaNews days, a bit more flexible than Icon's coroutines); and:
This would automagically produce ICON iterators (duck) and coroutines (cover).
Icon iterators/generators could be implemented today if anyone bothered (Majewski essentially implemented them back around '93 already, but seemed to lose interest when he realized it couldn't be extended to full continuations, because of C/Python stack intertwingling).
If I guess right, continuation passing could be done by just shifting tiny tuples around. Well, Tim, help me :-)
Python-calling-Python continuations should be easily doable in a "stackless" Python; the key ideas were already covered in this thread, I think. The thing that makes generators so much easier is that they always return directly to their caller, at the point of call; so no C frame can get stuck in the middle even under today's implementation; it just requires not deleting the generator's frame object, and adding an opcode to *resume* the frame's execution the next time the generator is called. Unlike as in Icon, it wouldn't even need to be tied to a funky notion of goal-directed evaluation.
don't-try-to-traverse-a-tree-without-it-ly y'rs - tim
[Christian Tismer]
... But on the general issues: Can the Python-calls-C and C-calls-Python problem just be solved by turning the whole VM state into a data structure, including a Python call stack which is independent? Maybe this has been mentioned already.
The problem is that when C calls Python, any notion of continuation has to include C's state too, else resuming the continuation won't return into C correctly. The C code that *implements* Python could be reworked to support this, but in the general case you've got some external C extension module calling into Python, and then Python hasn't a clue about its caller's state.
I'm not a fan of continuations myself; coroutines can be implemented faithfully via threads (I posted a rather complete set of Python classes for that in the pre-DejaNews days, a bit more flexible than Icon's coroutines); and:
This would automagically produce ICON iterators (duck) and coroutines (cover).
Icon iterators/generators could be implemented today if anyone bothered (Majewski essentially implemented them back around '93 already, but seemed to lose interest when he realized it couldn't be extended to full continuations, because of C/Python stack intertwingling).
If I guess right, continuation passing could be done by just shifting tiny tuples around. Well, Tim, help me :-)
Python-calling-Python continuations should be easily doable in a "stackless" Python; the key ideas were already covered in this thread, I think. The thing that makes generators so much easier is that they always return directly to their caller, at the point of call; so no C frame can get stuck in the middle even under today's implementation; it just requires not deleting the generator's frame object, and adding an opcode to *resume* the frame's execution the next time the generator is called. Unlike as in Icon, it wouldn't even need to be tied to a funky notion of goal-directed evaluation.
don't-try-to-traverse-a-tree-without-it-ly y'rs - tim
Guido van Rossum writes:
If Python had closures, then it would be a *little* easier, but would still make the average Pythoneer swoon. Closures would let you put the above logic all in one method, but the code would still be 'inside-out'.
I forget how this worked :-(
[with a faked-up lambda-ish syntax]
def thing (a): return do_async_job_1 (a, lambda (b): if (a>1): do_async_job_2a (b, lambda (c): [...] ) else: do_async_job_2b (a,b, lambda (d,e,f): [...] ) )
The call to do_async_job_1 passes 'a', and a callback, which is specified 'in-line'. You can follow the logic of something like this more easily than if each lambda is spun off into a different function/method.
I think that a web server built on such a Python could have the performance/scalability of thttpd, with the ease-of-programming of Roxen. As far as I know, there's nothing like it out there. Medusa would be put out to pasture. 8^)
I'm afraid I haven't kept up -- what are Roxen and thttpd? What do they do that Apache doesn't?
thttpd (& Zeus, Squid, Xitami) use select()/poll() to gain performance and scalability, but suffer from the same programmability problem as Medusa (only worse, 'cause they're in C).
Roxen is written in Pike, a c-like language with gc, threads, etc... Roxen is I think now the official 'GNU Web Server'.
Here's an interesting web-server comparison chart:
http://www.acme.com/software/thttpd/benchmarks.html
-Sam
def thing (a): return do_async_job_1 (a, lambda (b): if (a>1): do_async_job_2a (b, lambda (c): [...] ) else: do_async_job_2b (a,b, lambda (d,e,f): [...] ) )
The call to do_async_job_1 passes 'a', and a callback, which is specified 'in-line'. You can follow the logic of something like this more easily than if each lambda is spun off into a different function/method.
I agree that it is still ugly.
I see. Any pointers to a graph of thttp market share?
--Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
...
I see. Any pointers to a graph of thttp market share?
thttpd currently has about 70k sites (of 5.4mil found by Netcraft). That puts it at #6. However, it is interesting to note that 60k of those sites are in the .uk domain. I can't figure out who is running it, but I would guess that a large UK-based ISP is hosting a bunch of domains on thttpd.
It is somewhat difficult to navigate the various reports (and it never fails that the one you want is not present), but the data is from Netcraft's survey at: http://www.netcraft.com/survey/
Cheers, -g
-- Greg Stein, http://www.lyra.org/
[GvR]
... Anything that assumes knowledge about how the C compiler and/or the CPU and OS lay out the stack is a no-no, because it means that the first thing one has to do for a port to a new architecture is figure out how the stack is laid out. Another thread in this list is porting Python to microplatforms like PalmOS. Typically the scheme Hackers are not afraid to delve deep into the machine, but I refuse to do that -- I think it's too risky.
The Icon language needs a bit of platform-specific context-switching assembly code to support its full coroutine features, although its bread-and-butter generators ("semi coroutines") don't need anything special.
The result is that Icon ports sometimes limp for a year before they support full coroutines, waiting for someone wizardly enough to write the necessary code. This can, in fact, be quite difficult; e.g., on machines with HW register windows (where "the stack" can be a complicated beast half buried in hidden machine state, sometimes needing kernel privilege to uncover).
Not attractive. Generators are, though <wink>.
threads-too-ly y'rs - tim
Interesting topic! While I 'm on the road, a few short notes.
I thought Guido was working on that for the mobile agent stuff he was working on at CNRI.
Indeed. At least I planned on working on it. I ended up abandoning the idea because I expected it would be a lot of work and I never had the time (same old story indeed).
Sam also hit it on the nail: the hardest problem is what to do about all the places where C calls back into Python.
I've come up with two partial solutions: (1) allow for a way to arrange for a call to be made immediately after you return to the VM from C; this would take care of apply() at least and a few other "tail-recursive" cases; (2) invoke a new VM when C code needs a Python result, requiring it to return. The latter clearly breaks certain uses of coroutines but could probably be made to work most of the time. Typical use of the 80-20 rule.
And I've just come up with a third solution: a variation on (1) where you arrange *two* calls: one to Python and then one to C, with the result of the first. (And a bit saying whether you want the C call to be made even when an exception happened.)
In general, I still think it's a cool idea, but I also still think that continuations are too complicated for most programmers. (This comes from the realization that they are too complicated for me!) Corollary: even if we had continuations, I'm not sure if this would take away the resistance against asyncore/asynchat. Of course I could be wrong.
Different suggestion: it would be cool to work on completely separating out the VM from the rest of Python, through some kind of C-level API specification. Two things should be possiblw with this new architecture: (1) small platform ports could cut out the interactive interpreter, the parser and compiler, and certain data types such as long, complex and files; (2) there could be alternative pluggable VMs with certain desirable properties such as platform-specific optimization (Christian, are you listening? :-).
I think the most challenging part might be defining an API for passing in the set of supported object types and operations. E.g. the EXEC_STMT opcode needs to be be implemented in a way that allows "exec" to be absent from the language. Perhaps an __exec__ function (analogous to __import__) is the way to go. The set of built-in functions should also be passed in, so that e.g. one can easily leave out open(), eval() and comppile(), complex(), long(), float(), etc.
I think it would be ideal if no #ifdefs were needed to remove features (at least not in the VM code proper). Fortunately, the VM doesn't really know about many object types -- frames, fuctions, methods, classes, ints, strings, dictionaries, tuples, tracebacks, that may be all it knows. (Lists?)
Gotta run,
--Guido van Rossum (home page: http://www.python.org/~guido/)
In general, I still think it's a cool idea, but I also still think that continuations are too complicated for most programmers. (This comes from the realization that they are too complicated for me!)
in an earlier life, I used non-preemtive threads (that is, explicit yields) and co-routines to do some really cool stuff with very little code. looks like a stack-less inter- preter would make it trivial to implement that.
might just be nostalgia, but I think I would give an arm or two to get that (not necessarily my own, though ;-)
</F>