On Mar 29, 2015, at 21:12, Ron Adam <ron3200@gmail.com> wrote:
On 03/29/2015 08:36 PM, Andrew Barnert wrote:
Something related to this that I've wanted to experiment with, but is hard to do in python to be able to split a function signature and body, and be able to use them independently. (But in a well defined way.)
Almost everything you're asking for is already there.
Yes, I have looked into most of what you mention here.
A function object contains, among other things, a sequence of closure cells, a local and global environment, default parameter values, and a code object.
A code object contains, among other things, parameter names, a count of locals, and a bytecode string.
You can see the attributes of these objects at runtime, and the inspect module docs describe what they mean. You can also construct these objects at runtime by using their constructors (you have to use types.FunctionType and types.CodeType; the built-in help can show you the parameters).
You can also compile source code (or an AST) to a code object with the compile function.
You can call a code object with the exec function, which takes a namespace (or, optionally, separate local and global namespaces--and, in a slightly hacky way, you can also override the builtin namespace).
There are also Signature objects in the inspect module, but they're not "live" usable objects, they're nicely-organized-for-human-use representations of the signature of a function. So practically you'd use a dummy function object or just a dict or something, and create a new function from its attributes/members and the new code.
So, except for minor spelling differences, that's exactly what you're asking for, and it's already there.
I think it's more than minor spelling differences. :-)
The objects you're asking for already exist, but in some cases with slightly different names. There isn't neat calling syntax for things like cloning a function object with a different code object but the same other attributes is a bit painful, but it's just a few two-line wrapper functions that you only have to write once (I think you can even subclass CodeType and FunctionType to add the syntax; if not, you can wrap and delegate). So, what you're asking for really is already part of Python, except for minor spelling differences. The problem is that what you're asking for doesn't give you what you want, because other things (like "the environment") don't work the way you're assuming they do, or because of consequences you probably haven't thought of (like the return issue).
I've played around with de-constructing functions and using the constructors to put them back together again, enough to know it's actually quite hard to get everything right with other than the original parts.
It really isn't, once you learn how they work. Getting the values to put into them (assembling bytecode, creating the lnotab, etc.) is a different story, but if your intention is to do that just by compiling function (or fragment) source, it's all done for you.
The exec function may be a good start to experimenting with this. I havn't used it with code objects enough to be familiar with what limits that has. It may not be that difficult to copy the C source for exec and create a new function more specific to this idea. (as a test)
Even if that's slow, it may be good enough as a start.
Why would it be slow? Also, what do you want your function to do differently from exec? What you described is wanting to run a code object in a specified environment; that's what exec does. I think you're thinking of Python environments as if they were scheme environments, but they aren't. In particular, Python closures don't work in terms of accessing variables from the environment by name; they work in terms of accessing cells from the function object.
But if I've guessed right about_why_ you want this, it doesn't do what you'd want it to, and I don't think there's any way it could.
Bytecode accesses locals (including arguments), constants, and closure cells by index from the frame object, not by name from a locals dict (although the frame has one of those as well, in case you want to debug or introspect, or call locals()). So, when you call a function, Python sets up the frame object, matching positional and keyword arguments (and default values in the function object) up to parameters and building up the sequence of locals. The frame is also essential for returning and uncaught exceptions (it has a back pointer to the calling frame).
This isn't a problem if the callable_code object creates a new frame.
Yes, it is, as I explained before. There's the minor problem that your code object needs to use nonlocal if it wants to reassign to the calling frame's x variable, and the major problem that there is no way to give it a closure that gives it access to that frame's x variable unless, when that calling frame's code was compiled, there was already an embedded function that needed to close over x. I suppose you could hack up the compiler to generate cells for every local variable instead of just those that are actually needed. Then you could sort of do what you want--instead of exec, you construct a function with the appropriate cells in the closure (by matching up the callee code's freevars with the calling code's cellvars).
It is an issue when running a code block in the current frame. But I think there may be a way to get around that.
The big thing you can't do directly is to create new closure cells programmatically from Python. The compiler has to know which of your locals will be used as closure variables by any embedded functions; it then stores these specially within your code object, so the MAKE_CLOSURE bytecode that creates a function object out of each embedded function can create matching closure cells to store in the embedded function object. This is the part that you need to add into what Python already has, and I'm not sure there's a clean way to do it.
But you really should learn how all the existing stuff works (the inspect docs, the dis module, and the help for the constructors in the types module are actually sufficient for this, without having to read the C source code, in 3.4 and later) and find out for yourself, because if I'm wrong, you may come up with something cool. (Plus, it's fun and useful to learn about.)
I'm familiar with most of how python works, and even hacked a bit on ceval.c. (for fun) I haven't played much with the AST side of things, but I do know generally how python is put together and works.
That only gets you through the first half of your message--enough to make inc_x work as a local function (well, almost--without a "nonlocal x" statement it's going to compile x as a local variable rather than a closure variable, and however you execute it, you're just going to get UnboundLocalError).
What about the second part, where you execute code in an existing frame?
That's even trickier.
Yes, I definitely agree. I think one of the tests of a good idea is that it makes something that is normally hard (or tricky), simple and easy. But actually doing that may be quite hard. (or even not possible.)
A frame already has its complete lists of locals, cells, and constants at construction time. If your code objects never used any new locals or constants, and never touched any nonlocal variables that weren't already touched by the calling function, all you need is some kind of "relocation" step that remaps the indices compiled into the bytecode into the indices in the calling frame (there's enough info in the code objects to do the mapping; for actually creating the relocated bytecode from the original you'll want something like the byteplay module, which unfortunately doesn't exist for 3.4--although I have an incomplete port that might be good enough to play with if you're interested).
You can almost get away with the "no new locals or nonlocal cells" part, but "no new constants" is pretty restrictive. For example, if you compile inc_x into a fragment that can be executed inline, the number 1 is going to be constant #0 in its code object. And now, you try to "relocate" it to run in a frame with a different code object, and (unless that different code object happened to refer to 1 as a constant as well) there's nothing to match it up to.
And again, I don't see a way around this without an even more drastic rearchitecting of how Python frames work--but again, I think it's worth looking for yourself in hopes that I'm wrong.
Think of these things as non_local_blocks. The difference is they would use dynamic scope instead of static scope. Or to put it another way, they would inherit the scope they are executed in.
That's a bit of a weird name, since it's a block that's executed locally on the current frame, as opposed to the normal non-local way, but... OK. It makes sense conceptually, but practically it doesn't fit in with Python's execution model. First, the scope is partly defined at compilation time--that's where the list of constants comes from, and the lists of local and free names. If the frame's code is still the calling function, none of these things are available in the frame. If, on the other hand, it's the non_local_block (what I was falling the "fragment"), then the calling scope's variables aren't available. (And just copying the caller's code's stuff into a new frame only gives you copies of the caller's variables, which doesn't let you do the main thing you wanted to do.) Unless you somehow make them both available (e.g., the Ruby-style two-level call stack I mentioned), I don't see a way around that. Second, remember that Python bytecode accesses locals by index; unless you do something like the relocation I described above, you have no way to access the calling scope's variables, and that's doable, but not at all trivial, and doesn't seem very Pythonic. Also, even if you do that, you've still got a problem. Python's LEGB rule that decides which scope to find a name in is handled mostly at compile time. The compiler decides whether to emit LOAD_FAST or one of the other LOAD_*, and likewise for STORE_*, based on the static lexical scope the name is defined in. That's going to make it very hard to hack in dynamic scope on top of Python bytecode--the "relocation" has to actually involve not just renumbering locals, but reproducing all the work the compiler does to decide what's local/global/etc. and replacing instructions as appropriate. As an alternative, maybe you don't want code objects at all, but rather AST objects. Then they can be compiled to bytecode in a given scope (with the builtin compile function) and then executed there (with exec). This solves most of the new problems added by the dynamic scoping/non_local_block/fragment idea while still allowing most of the benefits. The downside, of course, is that you're compiling stuff all over the place--but that's how dynamic code works in most Lisps, so...
There's another problem: every function body compiler to code ends with a return bytecode. If you didn't write one explicitly, you get the equivalent of "return None". That means that, unless you solve that in some way, executing a fragment inline is always going to return from the calling function. And what do you want to do about explicit return? Or break and continue?
As a non_local_block, a return could be a needed requirement. It would return the value to the current location in the current frame.
Do you mean that a call to a non_local_block is an expression, and the value of the expression is the value returned by the block? If so, that makes sense, but then the RETURN_VALUE handler in the interpreter has to be sensitive to the context it's in and do two different things, or the compiler has to be sensitive to what it's compiling--which seems impossible, given that you want to def a regular function and then decorate it after the fact--and issue a different opcode for return (and for the implicit return None).
Break and continue are harder. Probably give an error the same as what would happen if they are used outside a loop. A break or return would need to be local to the loop. So you can't have a break in a non_local_block unless the block also has the loop in it. That keeps the associated parts local to each other.
That definitely makes things simpler--and, I think, better. It will piss off people who expect (from Lisp or Ruby) to be able to do flow control across the boundaries of a non_local_block, but too bad for them. :)
Of the two approaches, I think the first one seems cleaner. If you can make a closure cell out of x and then wrap inc_x's code in a normal closure that references it, that still feels like Python. (And having to make "nonlocal x" explicit seems like a good thing, not a limitation.)
Agree... I picture the first approach as a needed step to get to the second part.
I don't think it really is; most of what you need to solve for the first becomes irrelevant for the second. For example, figuring out a way to dynamically generate closure cells to share variables across frame's is useless when you switch to running with the parent's variables as locals in the same frame.
Fixing up fragments to run in a different frame, and modifying frames at runtime to allow them to be fixed up, seems a lot hackier. And the whole return issue is pretty serious, too.
One last possibility to consider is something between the two: a different kind of object, defined differently, like a proc in Ruby (which is defined with a block rather than a def) might solve some of the problems with either approach. And stealing from Ruby again, procs have their own "mini-frames"; there's a two-level stack where every function stack frame has a proc stack frame, which allows a solution to the return-value problem that wouldn't be available with either closures or fragments.
This may be closer to how I am thinking it would work. :-)
It sounds like it might be. I think it's a really clumsy solution, but obviously it can work or Ruby wouldn't work. :) Notice that Ruby also avoids a lot of the problems below by having completely different syntax and semantics for defining procs vs. functions, and not allowing you to convert them into each other. But if you don't want that, then you have to solve all the problems they got around this way.
(However, note that the return-value problem is much more serious in Ruby, where everything is supposed to be an expression, with a value; in Python you can just say "fragment calls are statements, so they don't have values" if that's what you want.)
It seems to me they can be either. Python ignores None when it's returned by a function and not assigned to anything. And if a value is returned, than it's returned to the current position in the current frame.
No, that's not how things work. Python doesn't do anything special with None. A value is _always_ returned, whether None or otherwise. That value always becomes the value of the calling expression. And Python certainly doesn't care whether you assign it to something--e.g., you can use it inside a larger expression or return it or yield it without assigning it to anything. Of course if the outermost expression is part of an expression statement, the value that it evaluated to is ignored, but that has nothing to do with the value being None, or coming from a function call, or anything else; an expression statement just means "evaluate this expression, then throw away the results". Also, you're missing the bigger point: look at how RETURN_VALUE actually works. If you're running inside the caller's scope, it's going to return from the caller's scope unless you do something (which you need to figure out) to make that not true.
The return in this case is a non-local-block return. So I think it wouldn't be an issue.
What does "a non-local-block return" mean at the implementation level? A different opcode from RETURN_VALUE? A RETURN_VALUE executed from within a non_local_block object's code? A RETURN_VALUE executed from a frame that has a non_local_block running? Whatever the answer, how does the compiler or interpreter (as appropriate) know to do that?
One last note inline:
A signature object could have a default body that returns the closure.
And a body (or code) could have a default signature that*takes* a namespace.
Then a function becomes ...
code(sig(...)) <---> function(...)
The separate parts could be created with a decorator.
@signature def sig_x(x): pass
@code def inc_x(): x += 1
@code def dec_x(): x -= 1
In most cases it's best to think of applying code bodies to names spaces.
names = sig_x(0) inc_x(names) dec_x(names)
That is nicer than continuations as each code block is a well defined unit that executes to completion and doesn't require suspending the frame.
(Yes, it can be done with dictionaries, but that wouldn't give the macro like functionality (see below) this would. And there may be other benifits to having it at a lower more efficient level.)
To allow macro like ability a code block needs to be executable in the current scope. That can be done just by doing...
code(locals()) # Dependable?
And sugar to do that could be...
if x < 10: ^^ inc_x #just an example syntax. else: ^^ dec_x # Note the ^^ looks like the M in Macro.;-)
Possibly the decorators could be used with lambda directly to get inline functionality.
code(lambda : x + 1)
This is a very different thing from what you were doing above. A function that modifies a closure cell's value, like inc_x, can't be written as a lambda (because assignments are statements). And this lambda is completely pointless if you're going to use it in a context where you ignore its return value (like the way you used inc_x above). So, I'm not sure what you're trying to do here, but I think you may have another problem to solve on top of the ones I already mentioned.
It was an incomplete example. It should have been...
add_1_to_x = code(lambda: x + 1)
and then later you could use it in the same way as above.
x = ^^ add_1_to_x
OK, it sounds like what you're really looking for here is that code(spam) returns a function that's just like spam, but all of its variables (although you still have to work out what that means--remember that Python has already decided local vs. cell vs. global at compile time, before you even get to this code function) will use dynamic rather than lexical scoping. All of the other stuff seems to be irrelevant. In fact, maybe it would be simpler to just do what Lisp does: explicitly define individual _variables_ as dynamically scoped, effectively the same way we can already define variables as global or nonlocal, instead of compiling a function and then trying to turn some of its variables into dynamic variables after the fact. And the good news is, I'm 99% sure someone already did this and wrote a blog post about it. I don't know where, and it may be a few years and versions out of date, but it would be nice if you could look at what he did, see that you're 90% of the way to what you want, and just have to solve the last 10%. Plus, you can experiment with this without hacking up anything, with a bit of clumsiness. It's pretty easy to create a class whose instances dynamically scope their attributes with an explicit stack. (If it isn't obvious how, let me know and I'll write it for you.) Then you just instantiate that class (globally, if you want), and have both the caller and the callee use an attribute of that instance instead of a normal variable whenever you want a dynamically-scoped variable, and you're done. You can write nice examples that actually work in Python today to show how this would be useful, and then compare to how much better it would look with real dynamic variable support.
This is just an example to show how the first option above connects to the examples below. with "^^: x + 1" being equivalent to "code(lambda: x + 1)".
Which would also be equivalent to ...
@code def add_1_to_x(): return x + 1
x = ^^ add_1_to_x
And a bit of sugar to shorten the common uses if needed.
spam(x + 1, code(lambda : x + 1))
spam(x + 1, ^^: x + 1)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/