[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!

Dec. 4, 2021

      On Sat, Dec 4, 2021 at 8:48 PM Steven D'Aprano <steve@pearwood.info> wrote:
...
On Sat, Dec 04, 2021 at 03:14:46PM +1100, Chris Angelico wrote:
...
Lots and lots and lots of potential problems. Consider:
def f():
    a = 1
    def f(b, x=>a+b):
        def g(): return x, a, b
Both a and b are closure variables - one because it comes from an
outer scope, one because it's used in an inner scope. So to evaluate
a+b, you have to look up an existing closure cell, AND construct a new
closure cell.
The only way to do that is for the compiled code of a+b to exist
entirely within the context of f's code object.
I dispute that is the only way. Let's do a thought experiment.
First, we add a new flag to the co_flags field on code objects. Call it
the "LB" flag, for late-binding.
Second, we make this:
def f(b, x=>a+b): ...
syntactic sugar for this:
def f(b, x=lambda b: a+b): ...
except that the lambda has the LB flag set.
Okay. So the references to 'a' and 'b' here are one more level of
function inside the actual function we're defining, which means you're
paying the price of nonlocals just to be able to late-evaluate
defaults. Not a deal-breaker, but that is a notable cost (every
reference to them inside the function will be slower).
...
And third, when the interpreter fetches a default from
func.__defaults__, if it is a LB function, it automatically calls that
function with the parameters to the left of x (which in this case
would be just b).
Plausible. Okay.

What this does mean, though, is that there are "magic objects" that
cannot be used like other objects. Consider:

def make_printer(dflt):
    def func(x=dflt):
        print("x is", x)
    return func

Will make_printer behave the same way for all objects? Clearly the
expectation is that it will display the repr of whichever object is
passed to func, or if none is, whichever object is passed to
make_printer. But if you pass it a function with the magic LB flag
set, it will *execute* that function. I don't like the idea that some
objects will be invisibly different like that.
...
Here's your function, with a couple of returns to make it actually do
something:
def f():
        a = 1
        def f(b, x=>a+b):
            def g(): return x, a, b
            return g
        return f
We can test that right now (well, almost all of it) with this:
def func():  # change of name to distinguish inner and outer f
        a = 1
        def f(b, x=lambda b: a+b):
            def g(): return x, a, b
            return g
        return f
and just pretend that x is automatically evaluated by the interpreter.
But as a proof of concept, it's enough that we can demonstrate that *we*
can manually evaluate it, by calling the lambda.
Okay, sure. It's a bit hard to demo it (since it has to ONLY do that
magic if the arg was omitted), but sure, we can pretend.
...
We can call func() to get the inner function f, and call f to get g:
>>> f = func()
    >>> print(f)
    <function func.<locals>.f at 0x7fc945c41f30>
>>> g = f(100)
    >>> print(g)
    <function func.<locals>.f.<locals>.g at 0x7fc945e1f520>
Calling g works:
>>> print(g())
    (<function func.<locals>.<lambda> at 0x7fc945c40f70>, 1, 100)
with the understanding that the real implementation will have
automatically called that lambda, so we would have got 101 instead of
the lambda. That step requires interpreter support, so for now we just
have to pretend that we get
(101, 1, 100)
instead of the lambda. But we can demonstrate that calling the lambda
works, by manually calling it:
>>> x = g()[0]
    >>> print(x)
    <function func.<locals>.<lambda> at 0x7fc945c40f70>
    >>> print(x(100))  # the interpreter knows that b=100
    101
Now let's see if we can extract the default and play around with it:
>>> default_expression = f.__defaults__[0]
    >>> print(default_expression)
    <function func.<locals>.<lambda> at 0x7fc945c40f70>
The default expression is just a function (with the new LB flag set). So
we can inspect its name, its arguments, its cell variables, etc:
>>> default_expression.__closure__
    (<cell at 0x7fc945de74f0: int object at 0x7fc94614c0f0>,)
We can do anything that we could do with any other other function object.
Yup. As long as it doesn't include any assignment expressions, or
anything else that would behave differently.
...
Can we evaluate it? Of course we can. And we can test it with any value
we like, we're not limited to the value of b that we originally passed
to func().
>>> default_expression(3000)
    3001
Of course, if we are in a state of *maximal ignorance* we might have no
clue what information is needed to evaluate that default expression:
>>> default_expression()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: func.<locals>.<lambda>() missing 1 required positional argument: 'b'
Oh look, we get a useful diagnostic message for free!
What are we missing? The source code of the original expression, as
text. That's pretty easy too: the compiler knows the source, it can cram
it into the default expression object:
>>> default_expression.__expression__ = 'a+b'
Introspection tools like help() can learn to look for that.
What else are we missing? A cool repr.
>>> print(default_expression)  # Simulated.
    <late bound default expression a+b>
We can probably come up with a better repr, and a better name than "late
bound default expression". We already have other co_flags that change
the repr:
32 GENERATOR
    128 COROUTINE
    256 ITERABLE_COROUTINE
so we need a name that is at least as cool as "generator" or
"coroutine".
Those parts are trivial, no problem.
...
Summary of changes:
* add a new co_flag with a cool name better than "LB";
* add an `__expression__` dunder to hold the default expression;
  (possibly missing for regular functions -- we don't necessarily
  need *every* function to have this dunder)
* change the repr of LB functions to display the expression;
* teach the interpreter to compile late-bound defaults into one of
  these LB functions, including the source expression;
* teach the interpreter that when retrieving default values from
  the function's `__defaults__`, if they are a LB function, it
  must call the function and use its return result as the actual
  default value;
* update help() and other introspection tools to handle
  these LB functions; but if any tools don't get updated,
  you still get a useful result with an informative repr.
Great. So now we have some magnificently magical behaviour in the
language, which will have some nice sharp edge cases, but which nobody
will ever notice. Totally. I'm sure. Plus, we pay a performance price
in any function that makes use of argument references, not just for
the late-bound default, but in the rest of the code. We also need to
have these special functions that get stored as separate code objects.

All to buy what, exactly? The ability to manually synthesize an
equivalent parameter value, as long as there's no assignment
expressions, no mutation, no other interactions, etc, etc, etc? That's
an awful lot of magic for not a lot of benefit.

I *really* don't like the idea that some types of object will be
executed instead of being used, just because they have a flag set.
That strikes me as the sort of thing that should be incredibly scary,
but since I can't think of any specific reasons, I just have to call
it "extremely off-putting".

But hey. Go ahead and build a reference implementation. I'll compile
it and give it a whirl.

ChrisA