[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

Oct. 26, 2021

      On Tue, Oct 26, 2021 at 04:48:17AM +1100, Chris Angelico wrote:
...
The problem is the bizarre inconsistencies that can come up, which are
difficult to explain unless you know exactly how everything is
implemented internally. What exactly is the difference between these,
and why should some be legal and others not?
They should all be legal. Legal doesn't mean "works". Code that raises 
an exception is still legal code.
...
def f1(x=>y + 1, y=2): ...
def f2(x=>y + 1, y=>2): ...
def f3(x=>y + 1, *, y): ...
def f4(x=>y + 1): y = 2
def f5(x=>y + 1):
    global y
    y = 2
What "bizarre inconsistencies" do you think they have? Each example is 
different so it is hardly shocking if they behave different too.

f1() assigns positional arguments first (there are none), then 
keyword arguments (still none), then early-bound defaults left to 
right (y=2), then late-bound defaults left to right (x=y+1).

That is, I argue, the most useful behaviour. But if you insist on a 
strict left-to-right single pass to assign defaults, then instead it 
will raise UnboundLocalError because y doesn't have a value.

Just like the next case: f2() assigns positional arguments first (there 
are none), then keyword arguments (still none), then early-bound 
defaults left to right (none of these either), then late-bound defaults 
left to right (x=y+1) which raises UnboundLocalError because y is a 
local but doesn't have a value yet.

f3() assigns positional arguments first (there are none), then 
keyword arguments (still none), at which point it raises TypeError 
because you have a mandatory keyword-only argument with no default.

f4() is just like f2().

And lastly, f5() assigns positional arguments first (there are none), 
then keyword arguments (still none), then early-bound defaults left to 
right (none of these either), then late-bound defaults left to right 
(x=y+1) which might raise NameError if global y doesn't exist, otherwise 
it will succeed.

Each of those cases is easily understandable. There is no reason to 
expect the behaviour in all four cases to be the same, so we can hardly 
complain that they are "inconsistent" let alone that they are "bizarrely 
inconsistent".

The only novelty here is that functions with late-binding can raise 
arbitrary exceptions, including UnboundLocalError, before the body of 
the function is entered. If you don't like that, then you don't like 
late-bound defaults at all and you should be arguing in favour of 
rejecting the PEP :-(

If we consider code that already exists today, with the None sentinel 
trick, each of those cases have equivalent errors today, even if some of 
the fine detail is different (e.g. getting TypeError because we attempt 
to add 1 to None instead of an unbound local).

However there is a real, and necessary, difference in behaviour which I 
think you missed:

    def func(x=x, y=>x)  # or func(x=x, @y=x)

The x=x parameter uses global x as the default. The y=x parameter uses 
the local x as the default. We can live with that difference. We *need* 
that difference in behaviour, otherwise these examples won't work:

    def method(self, x=>self.attr)  # @x=self.attr

    def bisect(a, x, lo=0, hi=>len(a))  # @hi=len(a)

Without that difference in behaviour, probably fifty or eighty percent 
of the use-cases are lost. (And the ones that remain are mostly trivial 
ones of the form arg=[].) So we need this genuine inconsistency.

If you can live with that actual inconsistency, why are you losing sleep 
over behaviour (functions f1 through f4) which isn't actually inconsistent?

* Code that does different things is supposed to behave differently;

* The differences in behaviour are easy to understand;

* You can't prevent the late-bound defaults from raising 
  UnboundLocalError, so why are you trying to turn a tiny subset
  of such errors into SyntaxError?

* The genuine inconsistency is *necessary*: late-bound expressions 
  should be evaluated in the function's namespace, not the surrounding
  (global) namespace.
...
And importantly, do Python core devs agree with less-skilled Python
programmers on the intuitions?
We should write a list of the things that Python wouldn't have if the 
intuitions of "less-skilled Python programmers" was a neccessary 
condition.

- no metaclasses, descriptors or decorators;
- no classes, inheritence (multiple or single); 
- no slices or zero-based indexing;
- no mutable objects;
- no immutable objects;
- no floats or Unicode strings;

etc. I think that, *maybe*, we could have `print("Hello world")`, so 
long as the programmer's intuition is that print needs parentheses.
...
If this should be permitted, there are two plausible semantic meanings
for these kinds of constructs:
1) Arguments are defined left-to-right, each one independently of each other
2) Early-bound arguments and those given values are defined first,
then late-bound arguments
The first option is much easier to explain, but will never give useful
results for out-of-order references (unless it's allowed to refer to
the containing scope or something). The second is closer to the "if x
is None: x = y + 1" equivalent, but is harder to explain.
You just explained it perfectly in one sentence.

The two options are equally easy to explain. The second takes a few more 
words, but the concepts are no harder. And the second is much more 
useful.

In comparison, think about how hard it is to explain your preferred 
behaviour, a SyntaxError. Think about how many posts you have written, 
and how many examples you have given, hundreds maybe thousands of words, 
dozens or hundreds of sentences, and you have still not convinced 
everyone that "raise SyntaxError" is the right thing to do.

"Why does this simple function definition raise SyntaxError?" is MUCH 
harder to explain than "Why does a default value that tries to access an 
unbound local variable raise UnboundLocalError?".
...
Two-phase initialization is my second-best preference after rejecting
with SyntaxError, but I would love to see some real-world usage before
opening it up. Once permission is granted, it cannot be revoked, and
it might turn out that one of the other behaviours would have made
more sense.
Being cautious about new syntax is often worthy, but here you are being 
overcautious. You are trying to prohibit something as a syntax error 
because it *might* fail at runtime. We don't even protect against things 
that we know *will* fail!

    x = 1 + 'a'  # Not a syntax error.

In this case, two-pass defaults is clearly superior because it would 
allow everything that the one-pass behaviour would allow, *plus more* 
applications that we haven't even thought of yet (but others will).

Analogy:

When Python 1 was first evolving, nobody said that we ought to be 
cautious about parallel assignment:

    a, b, c = ...

just because the user might misuse it.

    a = 1
    if False:
        b = 1  # oops I forgot to define b
    a, b = b, a  # SyntaxError just in case

Nor did we lose sleep over which parallel assignment model is better, 
and avoid making a decision:

    a, b = b, a

    # Model 1:
    push b
    push a
    swap
    a = pop stack
    b = pop stack

versus:

    # Model 2:
    push b
    a = pop stack
    push a
    b = pop stack

The two models are identical if the expressions on the right are all 
distinct from the targets on the left, e.g. `a, b = x, y`, but the first 
model allows us to do much more useful things that the second doesn't, 
such as the "swap two variables" idiom.

Be bold! The "two pass" model is clearly better than the "one pass" 
model. You don't need to prevaricate just in case.

Worst case, the Steering Council will say "Chris we love everything 
about the PEP except this..." and you will have to change it. But they 
won't because the two pass model is clearly the best *wink*

-- 
Steve

[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

Steven D'Aprano