Re: [Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)

At 10:09 PM 10/5/2005 -0700, Neal Norwitz wrote:
Yeah, I've been baking that idea for a long time, and it's a bit more complex than you've suggested, due to generators, sys._getframe(), and tracebacks.
Actually, Python/ceval.c already skips creating a tuple when calling Python functions with a fixed number of arguments (caller and callee) and no cell vars (i.e., not a closure). It copies them straight from the calling frame stack to the callee frame's stack.
Actually, I've been thinking that replacing the arg tuple with a PyObject* array would allow us to skip tuple creation when calling C functions, since you could just give the C functions a pointer to the arguments on the caller's stack. That would let us get rid of most remaining tuple allocations. I suppose we'd also need either an argcount parameter. The old APIs taking tuples for calls could trivially convert the tuples to a array pointer and size, then call the new APIs. Actually, we'd probably have to have a tp_arraycall slot or something, with the existing tp_call forwarding to tp_arraycall in most cases, but occasionally the reverse. The tricky part is making sure you don't end up with cases where you call a tuple API that converts to an array that then turns it back into a tuple!
Yeah, unfortunately for your idea, generators would have to copy off bits of the stack and then copy them back in, making generators slower. If it weren't for that part, the idea would probably be a good one, as arguments, locals, cells, and the block and value stacks could all be handled that way, with the compiler treating all operations as base-pointer offsets, thereby eliminating lots of more-complex pointer management in ceval.c and frameobject.c. Another possible fix for generators would be of course to give them their own stack arena, but then you have the problem of needing to copy overflows from one such stack to another - at which point you're basically back to having frames. On the other hand, maybe the good part of this idea is just eliminating all the pointer fudging and having the compiler determine stack offsets. Then, the frame object layout would just consist of a big hunk of stack space, laid out as a PyObject* array. The main problem with this concept is that it would change the meaning of certain opcodes, since right now the offsets of free variables in opcodes start over the numbering, but this approach would add the number of locals to those offsets.

On 10/6/05, Phillip J. Eby <pje@telecommunity.com> wrote:
If we had these seperate stacks for each thread, would it be possible to also create a stack for generator calls? The current call operations could possibly do a check to see if the function being called is a generator (if they don't have a generator bit, could they, to speed this up?). This generator-specific stack would be used for the generator's frame and any calls it makes on each iteration. This may pose threat of a bottleneck, allocating a new stack in the heap for every generator call, but generators are generally iterated more than created and the stacks could be pooled, of course. I don't know as much as I'd like about the CPython internals, so I'm just throwing this out there for commenting by those in the know.
participants (2)
-
Calvin Spealman
-
Phillip J. Eby