
On Tue, Oct 26, 2021 at 05:27:49PM +1100, Chris Angelico wrote:
On Tue, Oct 26, 2021 at 3:00 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Oct 26, 2021 at 04:48:17AM +1100, Chris Angelico wrote:
The problem is the bizarre inconsistencies that can come up, which are difficult to explain unless you know exactly how everything is implemented internally. What exactly is the difference between these, and why should some be legal and others not?
They should all be legal. Legal doesn't mean "works". Code that raises an exception is still legal code.
Then there's no such thing as illegal code,
I mean that code that compiles and runs is legal, even if it raises a runtime error. Code that cannot compile due to syntax errors is "illegal", we often talk about "illegal syntax": None[0] # Legal syntax, still raises import() = while x or and else # Illegal syntax Sorry if I wasn't clear.
and my entire basis for explanation is bunk. Come on, you know what I mean. If it causes SyntaxError:, it's not legal code.
Sorry Chris, I don't know what you mean. It only causes syntax error because you are forcing it to cause syntax error, not because it cannot be interpreted under the existing (proposed or actual) semantics. You are (were?) arguing that something that is otherwise meaningful should be a syntax error because there are some circumstances that it could fail. That's not "illegal code" in the sense I mean, and I don't know why you want it to be a syntax error (unless you've changed your mind). We don't do this: y = x+1 # Syntax error, because x might be undefined and we shouldn't make this a syntax error def func(@spam=eggs+1, @eggs=spam-1): either just because `func()` with no arguments raises. So long as you pass at least one argument, it works fine, and that may be perfectly suitable for some uses. Let linters worry about flagging that as an violation. The interpreter should be for consenting adults. There is plenty of code that we can already write that might raise a NameError or UnboundLocalError. This is not special enough to promote it to a syntax error.
def f5(x=>y + 1): global y y = 2
According to the previously-defined equivalencies, this would mean:
def f5(x=None): if x is None: x = y + 1 global y y = 2
Of course it would not mean that. That's a straw-man. You have deliberately written code which you know is illegal (now, it wasn't illegal just a few releases back). Remember that "global y" is not an executable statement, it is a declaration, we can move the declaration anywhere we want to make the code legal. So it would be equivalent to: def f5(x=None): global y if x is None: x = y + 1 y = 2 And it can still raise NameError if y is not defined. Caveat utilitor (let the user beware). Parameters (and their defaults) are not written inside the function body, they are written in the function header, and the function header by definition must preceed the body and any declarations inside it. We should not allow such an unimportant technicality to prevent late bound defaults from using globals. Remember that for two decades or so, global declarations could be placed anywhere in the function body. It is only recently that we have tightened that up with a rule that the declaration must occur before any use of a name inside the function body. We created that more restrictive rule by fiat, we can loosen it *for late-bound expressions* by fiat too: morally, global declarations inside the body are deemed to occur before the parameter defaults. Done and solved. (I don't know why we decided on this odd rule that the global declaration has to occur before the usage of the variable, instead of just insisting that any globals be declared immediately after the function header and docstring. Oh well.)
That implies that a global statement *anywhere* in a function will also apply to the function header, despite it not otherwise being legal to refer to a name earlier in the function than the global statement.
Great minds think alike :-) If it makes you happy, you could enforce a rule that the global has to occur after the docstring and before the function body, but honestly I'm not sure why we would bother. Some more comments, which hopefully match your vision of the feature: If a late bound default refers to a name -- and most of them will -- we should follow the same rules as we otherwise would, to the extent that makes sense. For example: * If the name in the default expression matches a parameter, then it refers to the parameter, not the same name in the surrounding scope; parameters are always local to the function, so the name should be local to the function inside the default expression too. * If the name in the default expression matches a local name in the body of the function, that is, one that we assign to and haven't declared as global or nonlocal, then the default expression should likewise treat it as local. * If the name in the default matches a name in the function body that has been declared global or nonlocal, then treat it the same way in the default expression. * Otherwise treat it as global/nonlocal/builtin. (I think that covers all the cases.) Do these scoping rules mean it is possible to write defaults that will fail at run time? Yes. So does the code we can write today. Don't worry about it. It is the coder's responsibility, not the interpreters and not yours, to ensure that the code they write works.
And lastly, f5() assigns positional arguments first (there are none), then keyword arguments (still none), then early-bound defaults left to right (none of these either), then late-bound defaults left to right (x=y+1) which might raise NameError if global y doesn't exist, otherwise it will succeed.
It's interesting that you assume this. By any definition, the header is a reference prior to the global statement, which means the global statement would have to be hoisted. I think that's probably the correct behaviour, but it is a distinct change from the current situation.
See my comments above. What other possible meaning would make sense? We can write a language with whatever restrictions we like: # Binding operations must follow # the I before E rule, unless after C self.increment = 1 # Syntax error because E occurs before I container = [] # Syntax error because I before E after C but such a language would be no fun to use. Or possibly lots of fun, if you had a twisted mind *wink* So yes, I looked at what the clear and obvious intention of the code was, and assumed that it should work. Executable pseudocode, remember? There's no need to restrict things just for the sake of restricting them.
Based on the multi-pass assignment model, which you still favour, those WOULD be quite inconsistent, and some of them would make little sense. It would also mean that there is a distinct semantic difference between:
def f1(x=>y + 1, y=2): ... def f2(x=>y + 1, y=>2): ...
Sure. They behave differently because they are different. These are different too: # Block 1 y = 2 x = y + 1 # Block 2 x = y + 1 y = 2
in that it changes what's viable and what's not. (Since you don't like the term "legal" here, I'll go with "viable", since a runtime exception isn't terribly useful.) Changing the default from y=2 to y=>2 would actually stop the example from working.
Um, yes? Changing the default from y=2 to y="two" will also stop it from working. Even if you swap the order of the parameters.
Multi-pass initialization makes sense where it's necessary. Is it really necessary here?
We already have multi-pass initialisation. 1. positional arguments are applied, left to right; 2. then keyword arguments; 3. then defaults are applied. (It is, I think, an implementation detail whether 2 and 3 are literally two separate passes or whether they can be rolled into a single pass. There are probably many good ways to actually implement binding of arguments to parameters. But semantically, argument binding to parameters behaves as if it were multiple passes. Since the number of parameters is likely to be small (more likely 6 parameters than 6000), we shouldn't care about the cost of a second pass to fill in the late-bound defaults after all the early-bound defaults are done.
No, you misunderstand. I am not saying that less-skilled programmers have to intuit things perfectly; I am saying that, when there are drastic differences of expectation, there is probably a problem.
I can easily explain "arguments are assigned left to right". It is much harder to explain multi-stage initialization and why different things can be referenced.
I disagree that it is much harder. In any case, my fundamental model here is that if we can do something using pseudo-late binding (the "if arg is None" idiom), then it should (more or less) be possible using late-binding. We should be able to just move the expression from the body of the function to the parameter and in most cases it should work. Obviously some conditions apply: - single expressions only, not a full block; - exceptions may change (e.g. a TypeError from `None + 1` may turn into an UnboundLocalError, etc) - not all cases will work, due to order of operations, but we should be able to get most cases to work. Inside the body of a function, we can apply pseudo-late binding using the None idiom in any order we like. As late-binding parameters, we are limited to left-to-right. But we can get close to the (existing) status quo by ensuring that all early-bound defaults are applied before we start the late-bound defaults. # Status quo def function(arg, spam=None, eggs="something useful"): if spam is None: spam = process(eggs) eggs is guaranteed to have a result here because the early-bound defaults are all assigned before the body of the function is entered. So in the new regime of late-binding, I want to write: def function(arg, @spam=process(eggs), eggs="something useful"): and the call to process(eggs) should occur after the early bound default is assigned. The easiest way to get that is to say that early bound defaults are assigned in one pass, and late bound in a second pass. Without that, many use cases for late-binding (I won't try to guess a proportion) are not going to translate to the new idiom. -- Steve