
On Tue, Oct 26, 2021 at 04:48:17AM +1100, Chris Angelico wrote:
The problem is the bizarre inconsistencies that can come up, which are difficult to explain unless you know exactly how everything is implemented internally. What exactly is the difference between these, and why should some be legal and others not?
They should all be legal. Legal doesn't mean "works". Code that raises an exception is still legal code.
def f1(x=>y + 1, y=2): ... def f2(x=>y + 1, y=>2): ... def f3(x=>y + 1, *, y): ... def f4(x=>y + 1): y = 2 def f5(x=>y + 1): global y y = 2
What "bizarre inconsistencies" do you think they have? Each example is different so it is hardly shocking if they behave different too. f1() assigns positional arguments first (there are none), then keyword arguments (still none), then early-bound defaults left to right (y=2), then late-bound defaults left to right (x=y+1). That is, I argue, the most useful behaviour. But if you insist on a strict left-to-right single pass to assign defaults, then instead it will raise UnboundLocalError because y doesn't have a value. Just like the next case: f2() assigns positional arguments first (there are none), then keyword arguments (still none), then early-bound defaults left to right (none of these either), then late-bound defaults left to right (x=y+1) which raises UnboundLocalError because y is a local but doesn't have a value yet. f3() assigns positional arguments first (there are none), then keyword arguments (still none), at which point it raises TypeError because you have a mandatory keyword-only argument with no default. f4() is just like f2(). And lastly, f5() assigns positional arguments first (there are none), then keyword arguments (still none), then early-bound defaults left to right (none of these either), then late-bound defaults left to right (x=y+1) which might raise NameError if global y doesn't exist, otherwise it will succeed. Each of those cases is easily understandable. There is no reason to expect the behaviour in all four cases to be the same, so we can hardly complain that they are "inconsistent" let alone that they are "bizarrely inconsistent". The only novelty here is that functions with late-binding can raise arbitrary exceptions, including UnboundLocalError, before the body of the function is entered. If you don't like that, then you don't like late-bound defaults at all and you should be arguing in favour of rejecting the PEP :-( If we consider code that already exists today, with the None sentinel trick, each of those cases have equivalent errors today, even if some of the fine detail is different (e.g. getting TypeError because we attempt to add 1 to None instead of an unbound local). However there is a real, and necessary, difference in behaviour which I think you missed: def func(x=x, y=>x) # or func(x=x, @y=x) The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work: def method(self, x=>self.attr) # @x=self.attr def bisect(a, x, lo=0, hi=>len(a)) # @hi=len(a) Without that difference in behaviour, probably fifty or eighty percent of the use-cases are lost. (And the ones that remain are mostly trivial ones of the form arg=[].) So we need this genuine inconsistency. If you can live with that actual inconsistency, why are you losing sleep over behaviour (functions f1 through f4) which isn't actually inconsistent? * Code that does different things is supposed to behave differently; * The differences in behaviour are easy to understand; * You can't prevent the late-bound defaults from raising UnboundLocalError, so why are you trying to turn a tiny subset of such errors into SyntaxError? * The genuine inconsistency is *necessary*: late-bound expressions should be evaluated in the function's namespace, not the surrounding (global) namespace.
And importantly, do Python core devs agree with less-skilled Python programmers on the intuitions?
We should write a list of the things that Python wouldn't have if the intuitions of "less-skilled Python programmers" was a neccessary condition. - no metaclasses, descriptors or decorators; - no classes, inheritence (multiple or single); - no slices or zero-based indexing; - no mutable objects; - no immutable objects; - no floats or Unicode strings; etc. I think that, *maybe*, we could have `print("Hello world")`, so long as the programmer's intuition is that print needs parentheses.
If this should be permitted, there are two plausible semantic meanings for these kinds of constructs:
1) Arguments are defined left-to-right, each one independently of each other 2) Early-bound arguments and those given values are defined first, then late-bound arguments
The first option is much easier to explain, but will never give useful results for out-of-order references (unless it's allowed to refer to the containing scope or something). The second is closer to the "if x is None: x = y + 1" equivalent, but is harder to explain.
You just explained it perfectly in one sentence. The two options are equally easy to explain. The second takes a few more words, but the concepts are no harder. And the second is much more useful. In comparison, think about how hard it is to explain your preferred behaviour, a SyntaxError. Think about how many posts you have written, and how many examples you have given, hundreds maybe thousands of words, dozens or hundreds of sentences, and you have still not convinced everyone that "raise SyntaxError" is the right thing to do. "Why does this simple function definition raise SyntaxError?" is MUCH harder to explain than "Why does a default value that tries to access an unbound local variable raise UnboundLocalError?".
Two-phase initialization is my second-best preference after rejecting with SyntaxError, but I would love to see some real-world usage before opening it up. Once permission is granted, it cannot be revoked, and it might turn out that one of the other behaviours would have made more sense.
Being cautious about new syntax is often worthy, but here you are being overcautious. You are trying to prohibit something as a syntax error because it *might* fail at runtime. We don't even protect against things that we know *will* fail! x = 1 + 'a' # Not a syntax error. In this case, two-pass defaults is clearly superior because it would allow everything that the one-pass behaviour would allow, *plus more* applications that we haven't even thought of yet (but others will). Analogy: When Python 1 was first evolving, nobody said that we ought to be cautious about parallel assignment: a, b, c = ... just because the user might misuse it. a = 1 if False: b = 1 # oops I forgot to define b a, b = b, a # SyntaxError just in case Nor did we lose sleep over which parallel assignment model is better, and avoid making a decision: a, b = b, a # Model 1: push b push a swap a = pop stack b = pop stack versus: # Model 2: push b a = pop stack push a b = pop stack The two models are identical if the expressions on the right are all distinct from the targets on the left, e.g. `a, b = x, y`, but the first model allows us to do much more useful things that the second doesn't, such as the "swap two variables" idiom. Be bold! The "two pass" model is clearly better than the "one pass" model. You don't need to prevaricate just in case. Worst case, the Steering Council will say "Chris we love everything about the PEP except this..." and you will have to change it. But they won't because the two pass model is clearly the best *wink* -- Steve