The discussion last week with Greg Ewing got me to thinking about the CPS transform, and how it might be a useful technique for callback/event-driven code like asyncore & Twisted. I'm pretty sure when I first thought about this eons ago it was before Python had closures. They definitely make it a bit easier! I put some simple demo code together, I think it demonstrates that the idea is feasible, I'm curious to know if anyone is interested. Don't get hung up on the poor quality of the generated code, big improvements could be made with a little bit of work. https://github.com/samrushing/cps-python/ -Sam
The README suggests that doing this optimization on a bytecode level may be better than doing it on a source/AST level. Can you explain why? On Sat, Nov 10, 2012 at 4:06 AM, Sam Rushing < sam-pydeas@rushing.nightmare.com> wrote:
The discussion last week with Greg Ewing got me to thinking about the CPS transform, and how it might be a useful technique for callback/event-driven code like asyncore & Twisted. I'm pretty sure when I first thought about this eons ago it was before Python had closures. They definitely make it a bit easier!
I put some simple demo code together, I think it demonstrates that the idea is feasible, I'm curious to know if anyone is interested.
Don't get hung up on the poor quality of the generated code, big improvements could be made with a little bit of work.
https://github.com/samrushing/cps-python/
-Sam
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- cheers lvh
On Sat, Nov 10, 2012 at 6:57 AM, Laurens Van Houtven <_@lvh.cc> wrote:
The README suggests that doing this optimization on a bytecode level may be better than doing it on a source/AST level. Can you explain why?
I would suggest it's because CPS directly represents control flow, as does bytecode (which has explicit jumps and so on). ASTs represent control flow indirectly, in that there are more constructs that can do more varied forms of jumping around. I think that it's often harder to write program analysis or transformation on syntax trees than on bytecode or control flow graphs, but that's from my own experience doing such, and may not be true for some sorts of situations. In particular there's a directly analogous CPS form for bytecode, where every bytecode instruction in a bytecode string is considered to be equivalent to a CPS function, and continuations represent returns from functions and not expression evaluation, and the CPS functions just navigate through the bytecode stream using tail calls, while keeping track of the expression evaluation stack and the call stack and globals dict. Some examples follow here (I'm doing purely functional stuff because too tired to think through the effects of mutation). I'm obviously not trying to be completely correct here, just trying to get the idea across that bytecode and CPS are more closely linked than AST and CPS. Note that exception support is entirely missing, and needs its own stack to keep track of exception handlers. A[i] = JUMP_FORWARD(d) def B[i](continuation, stack, callstack, globals): B[i + d](continuation, stack, callstack, globals) A[i] = BINARY_MULTIPLY() def B[i](continuation, stack, callstack, globals): B[i+1](continuation, stack[:-2] + [stack[-2] * stack[-1]], callstack, globals) A[i] = LOAD_FAST(local) def B[i](continuation, stack, callstack, globals): B[i+1](continuation, stack + [callstack[local]], callstack, globals) A[i] = RETURN_VALUE() def B[i](continuation, stack, callstack, globals): continuation(stack[-1]) A[i] = CALL_FUNCTION(argc) def B[i](continuation, stack, callstack, globals): f = stack[-1]; stack = stack[:-1] args = stack[-argc:] stack = stack[- argc] # let's please pretend getcallargs does the right thing # (it doesn't.) locals = inspect.getcallargs(f, *args) # get where f is in the bytecode (magic pseudocode) jump_location = f.magic_bytecode_location B[jump_location]( lambda returnvalue: B[i+1]( continuation, stack + [returnvalue], callstack, globals), [], callstack + [{}], # new locals dict f.func_globals) and so on. At least, I think I have that right. -- Devin
On 11/10/12 7:50 AM, Devin Jeanpierre wrote:
On Sat, Nov 10, 2012 at 6:57 AM, Laurens Van Houtven <_@lvh.cc> wrote:
The README suggests that doing this optimization on a bytecode level may be better than doing it on a source/AST level. Can you explain why? [...]
In particular there's a directly analogous CPS form for bytecode, where every bytecode instruction in a bytecode string is considered to be equivalent to a CPS function, and continuations represent returns from functions and not expression evaluation, and the CPS functions just navigate through the bytecode stream using tail calls, while keeping track of the expression evaluation stack and the call stack and globals dict.
Right... looking at an example output from the transform: v13 = n v14 = 1 v4 = v13 == v14 if v4: ... Viewed in CPS this might look like: (VAR, [v13, (INT, [v14, ((EQ, v13, v14), TEST, ...)])]) Where each node is (EXP, CONT). In this case the result of each operation is put into a variable/register (e.g., 'v13'), but python's bytecodes actually operate on the frame stack. So if there were some way to change this to (VAR, [PUSH, (INT, [PUSH, ((EQ 0 1)), TEST, ...)])]) Where (EQ 0 1) means 'apply EQ to the top two items on the stack'. The code above puts each value into a local variable, which gets pushed onto the stack anyway by the compiled bytecode. Another advantage to generating bytecode directly would be support for python 2, since I think 'nonlocal' can be done at the bytecode level. -Sam
participants (3)
-
Devin Jeanpierre
-
Laurens Van Houtven
-
Sam Rushing