[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!

17 Jun 2022

      On Sat, 18 Jun 2022 at 02:14, Paul Moore <p.f.moore@gmail.com> wrote:
...
On Fri, 17 Jun 2022 at 15:55, Chris Angelico <rosuav@gmail.com> wrote:
...
On Sat, 18 Jun 2022 at 00:21, Paul Moore <p.f.moore@gmail.com> wrote:
...
On Fri, 17 Jun 2022 at 14:15, Chris Angelico <rosuav@gmail.com> wrote:
...
There are several ways to make this clearly sane.
# Clearly UnboundLocalError
def frob(n=>len(items), items=>[]):
Um, I didn't see that as any more obvious than the original example. I
guess I can see it's UnboundLocalError, but honestly that's not
obvious to me.
Question: Is this obvious?
def f():
    x, x[0] = [2], 3
    print(x)
def boom():
    x[0], x = 3, [2]
    # raises UnboundLocalError
No. I'm not sure what point you're trying to make here?
The point is that many things can be unobvious, including aspects of
important features that we make good use of all the time. But they are
consistent, which means that, once you try it and run into a problem,
you should be able to see *why* it's a problem.

(This particular example is another case of LTR evaluation - or to be
more precise, LTR assignment - and while I wouldn't do it in a simple
statement like this, I certainly have made use of it in a 'for' loop.)
...
...
I understand that left-to-right evaluation is something that has to be
learned (and isn't 100% true - operator precedence is a thing too),
but at very least, if it isn't *obvious*, it should at least be
*unsurprising* if you then get UnboundLocalError.
Why? Are you saying I can't be surprised by the details of rules that
I don't often have a need to understand in detail?
My point is that "unsurprising", while a much weaker criterion than
"obvious", should be quite attainable. If you try the above two pieces
of code, you'll quickly find that one of them works and one doesn't,
and from the exceptions you get, the rule should be fairly clear.
...
In case you are, consider that as written, the PEP says
that the *defaults* are evaluated left to right in the function's
runtime scope, but it doesn't say when the parameter names are
introduced in that scope - prior to this PEP there was no need to
define that detail, as nothing could happen before the names were
introduced at the start of the scope. If you accept that
clarification, can you accept that the current text isn't as clear as
it might be?
I actually don't accept that clarification, because nothing has
changed. At what point in this function do the names get introduced to
the scope?

def spam(x, y=1, *, z=2):
    ham = [x, y, z]

They are all "introduced", if that term even has meaning, at the very
instant that the scope begins to exist. The name 'ham' isn't
introduced to the scope at a subsequent point. There are languages
that work this way (and it can be very convenient when used
correctly), but Python is not one of them.

Late-bound defaults do not affect this in any way. A function
parameter, like any other local, is local for the entire scope of the
function. It doesn't "become local" part way through.

Do I need to state this in the PEP? Are there other parts of Python's
semantics which need to be restated in the PEP too? Which parts,
despite not changing, are now going to be brought into question?
...
I meant the walrus operator, and that's my point. There's a lot of
not-immediately-obvious interactions here. Even if we don't include
default expressions, I'd argue that the behaviour is non-obvious:
...
...
...
def f(a=(b:=12)):
...   print(a, b)
...
f()
12 12
b
12
I assume (possibly naïvely) that this is defined in the language spec,
though, as it's existing behaviour. But when you add in default
expressions, you need to be sure that the various interactions are
well-defined.
They absolutely are well-defined. Almost certainly not useful, but
well-defined. The right hand side of either "a=EXPR" or "a=>EXPR" is
simply evaluated as an ordinary expression; the only difference is
whether it's evaluated at function definition time and in function
definition context, or at function invocation time and in the context
of the function itself.
...
Agreed. Although consider the following:
...
...
...
def f(a=(b:=12), b=9):
...   print(a, b)
...
f()
12 9
b
12
Since this is an early-bound default, it can be considered like this:

_default = (b:=12)
def f(a=None, b=9):
    if a is None: a = _default
    print(a, b)

And then it should be unsurprising that b becomes 12 in the
surrounding scope, paralleling a's default value, and b defaults to 9
in the function's context.
...
Would
def frob(n=>len(items:=[]), items=>[1,2]):
    ...
reassign items if n is omitted? Or would it assign the *global* items
and then shadow it with a local for the parameter? Can you point to
the explanation in the PEP that covers this? And even if you can, are
you trying to claim that the behaviour is "obvious"?
Since these are both late-bound defaults, they can be considered like this:

def frob(n=None, items=None):
    if n is None: n = len(items:=[])
    if items is None: items = [1, 2]
    ...

Under the "Specification" section, the PEP says:

"""Multiple late-bound arguments are evaluated from left to right, and
can refer to previously-defined values."""

Everything hinges on this left-to-right evaluation. The entire
expression, including the assignment, is evaluated, and then you move
on to the next one.

(Of course, in the actual proposal, None isn't special like this. But
from the perspective of assignment semantics, the longhand forms are
broadly equivalent, and should be read more as a mythical "if items is
not assigned:" syntax.)
...
...
Then let's leave aside the term "obvious" and just go for
"unsurprising". If you write code and get UnboundLocalError, will you
be surprised that it doesn't work? If you write code and it works,
will you be surprised with the result you got?
As I noted above, "surprising" is no different. I can easily be
surprised by well-defined behaviour. I'm not arguing that there's no
explanation for why a particular construct works the way that it does,
just that the behaviour may not be intuitive to people even if it is a
consequence of the rules. I'm arguing that the behaviour fails an "is
this easy to teach" criterion, not "is this logically consistent".
Okay. So what's the threshold then? I've tried to make this logically
consistent, not only with itself, but with *every other place in
Python where assignment happens*. It's always left-to-right.
...
...
Once you learn the basic idea of left-to-right evaluation, it should
be possible to try things out and get unsurprising results. That's
what I'm hoping for.
Get "explainable" results, yes. But I thought Python was supposed to
aspire to more than that, and match how people thought about things.
"Executable pseudocode" and all that.
I'm sorry that Python already doesn't live up to this expectation, but
there's nothing I can do about that. Ultimately, everything has to
have defined semantics, even the weird edge cases, and this is
definitely an edge case.

If this feature were implemented, I doubt that people would often see
examples like this outside of test suites. Referring to arguments
out-of-order simply isn't a normal thing that programmers want to do,
because it makes for a confusing API. We are debating the teachability
of something that is usually going to be irrelevant, because most use
of this will be trivially simple to understand:

def f(items, n=>len(items)):

This will Just Work, and there's no backwards evaluation to worry
about. It's ONLY when you put the arguments the other way around that
evaluation order even becomes significant. This is no different from
many other parts of Python, where the order of evaluation is defined,
but usually irrelevant.
...
...
...
Feel free to state that there's not *enough* cases of people
being confused by the semantics to outweigh the benefits, but it feels
to me that there are a few people claiming confusion here, and simply
saying "you shouldn't be confused, it's obvious" isn't really
addressing the point.
Part of the problem is that one person seems to think that Python will
completely change its behaviour, and he's spreading misinformation.
Ignore him, look just at the proposal itself, and tell me if it's
still confusing.
OK, if this is going to boil down to you asserting that the only
problems here are with "one person" then I don't think it's worth
continuing. I am not simply parroting "misinformation spread by that
one person" (and you've made it very obvious already who that
individual is, so please try to keep your personal problem with them
out of your discussions with me). If you're not willing to accept my
comments as feedback given in my own right, then it's you who is
shutting down discussion here, and I don't see much point in trying to
provide a good-faith response to you.
If you can show me a way in which this proposal isn't consistent with
the rest of Python, then I'll address that.
...
...
The only two possible behaviours are:
1) It does the single obvious thing: n defaults to the length of
items, and items defaults to an empty tuple.
2) It raises UnboundLocalError if you omit n.
So why not pick one?
For the same reason that Python didn't just "pick one" about things
like __del__ invocation time: it constrains language implementations
unnecessarily.
...
...
To be quite honest, I can't think of any non-toy examples where the
defaults would be defined backwards, like this.
If that's the case, then what is the downside of picking one?
Even if the situation never came up outside of toy examples, the
language would be forced to go through hoops to implement it.
Whichever semantic form was chosen, it would likely be suboptimal for
some implementation.

Maybe down the track, it would be able to be more rigorously defined.
That's happened before, plenty of times. It's much harder to change
the definition than to tighten up something that wasn't fully
specified, because people won't have been depending on it.
...
Personally, I have a nagging feeling that I could find a non-toy
example, but it's not that important to me. What I'm arguing is that
there's no point in not picking a behaviour. You're saying you don't
want to lock other implementations into the particular behaviour you
choose - but you also don't have an example of where that would be a
problem, so we're *both* arguing hypotheticals here.
I actually do have an example, except that it was just a previous
version of my reference implementation, where I tried to implement
perfect left-to-right evaluation (as opposed to two-pass). It was
incredibly messy. But maybe in the future, someone will be able to
make a much better one, and then it would be worth using it.

Iteration order of Python's dictionaries had, for years, been
completely unspecified. Then hash randomization came along, and
iterating over a dictionary of strings became actually random. And
then iteration order became defined, not because someone felt like the
specification should have 'just chosen', but because an
*implementation* made it worthwhile.

It's easy for you to say that there's "no point in not picking", but
believe you me, there is plenty of point, otherwise I would have cut
off all these debates by simply locking in the two-pass behaviour of
the reference implementation.
...
...
It's not like
Steven's constant panic-fear that "undefined behaviour" literally
means the Python interpreter could choose to melt down your computer.
Oh, please. If that's the only way in which you can imagine
implementation-defined behaviour being an issue, then you've lived a
pretty sheltered life. How about "My code works on Python 3.12 but not
on 3.13, because the behaviour in this case changed with no warning"?
It seems to be the assumption that he has. Ask him some time about C's
concept of undefined behaviour, and then see if you can understand why
he's so vitriolic about my proposal.
...
Sure, the PEP (and presumably the docs) said "don't do that", but you
said above that people experiment and work out behaviour from those
experiments. So breaking their code because they did precisely that
seems at best pretty harsh.
Things DO change. Generally, Python tries to avoid breaking changes,
and especially, changes where there's no way to "straddle" your code
(if 3.13 breaks your code in some way, but the fixed version works
just as well as the original on 3.12, then you can push out the fix
without worrying about 3.12 now breaking your code). In this
particular situation, the absolute worst-case option is that you
forfeit the benefits of this feature and go with the sentinel object:

_UNSPECIFIED = object()
def foo(n=_UNSPECIFIED, items=()):
    if n is _UNSPECIFIED: n = len(items)

So even if this does start to become a problem in the future, people
can, without materially changing their APIs, write code that uses this
out-of-order evaluation. But I would still like to see a non-toy
example where this would even come up.
...
...
There are *two* options, no more, no less, for what is legal.
Nope, there are two that you consider acceptable behaviour. And I
don't disagree with you. But what's so magical about two? Why not have
just one that's legal. Because people might disagree with your choice?
You're the PEP author, let them. Or are you worried that this single
point could cause the PEP to fail?
Let me rephrase. According to the specification in the PEP, these are
the only two behaviours which are considered compliant.

Python implementations are not permitted to, in the face of
out-of-order parameter references, do completely arbitrary things like
assigning 42 to all parameters.

What's so magical about two? Nothing. They're just the only two
behaviours that are consistent with the rest of the document.
...
...
...
You're not *just* recommending this for style guides, you're also
explicitly stating that you refuse to assign semantics to it.
It's unfair to say that I "refuse to assign semantics" as if I'm
permitting literally any behaviour.
Don't put words into my mouth. You have stated that you won't require
a particular behaviour. That's refusing to assign semantics. If it
makes you feel better I'll concede that you're not allowing
*arbitrary* semantics.
I'm not sure what you mean by putting words in your mouth, but the
part inside the quotation marks was literally words from your
preceding comment. You did indeed say that.
...
By the way, a lot of this debate could be solved incredibly easily by
writing the PEP in terms of code equivalence:
def fn(p1=>e1, p2=>e2, p3=e3):
    body
behaves the same as
def fn(p1=(_d1:=object()), p2=(_d2:=object()), p3=e3):
    if p1 is _d1:
        p1 = e1
    if p2 is _d2:
        p2 = e2
The trouble is, it's not 100% equivalent. It's good enough for a post
here, but it needs a lot of caveats. Generally speaking, using => is
*broadly equivalent* to this sort of check, but I can't do what PEP
380 did for the "yield from" statement here and define its semantics
entirely, because Python simply doesn't have a way to leave something
unassigned and then check if it's been assigned to.

But yes, if you want some example equivalencies, I could add those. (I
would simply assign before the def statement though, rather than
muddying the waters with assignment expressions. People will be
confused enough without wondering what the scope of those is.)
...
There's probably some details to flesh out, but that's precise and
well-defined. Debates over whether the resulting behaviour is
"obvious" or "intuitive" can then take place against a background
where everyone agrees what will happen (and can experiment with real
code to see if they are comfortable with it).
Well, I did write a reference implementation, so if people want to
experiment with real code, they absolutely can.
...
...
All I'm doing is saying that the
UnboundLocalError is optional, *at this stage*. There have been far
less-defined semantics that have remained in the language for a long
time, or cases where something has changed in behaviour over time
despite not being explicitly stated as implementation-defined. Is this
legal?
def f():
    x = 1
    global x
Does Python mandate whether this is legal or not? If so, how far back
in Python's history has it been defined?
*Shrug*. There was never a PEP about it, I suspect, and the behaviour
was probably defined a long time before Python was the most popular
language in the world. It would be nice if we still had the freedom
that we did back then. Sadly, we don't. Maybe some people are *too*
cautious nowadays. It's entirely possible I'm one of them. That's why
we have the SC - if you're confident that your proposal is solid in
spite of people like me complaining about edge cases, then submit it.
I'll trust the SC's judgement.
The behaviour was actually fully legal until quite recently (it did
issue a warning, but most people have those turned off). It didn't get
a PEP, and it was just a small note in the What's New under "smaller
changes to the language".

Behaviour DOES change. Is it so bad to have advance warning that
something might change? Because if people prefer it, I could
absolutely lock in one definition of this, knowing full well that the
next release might want to reverse that decision.
...
...
The semantics, if this code is legal, are obvious: the name x must
always refer to the global, including in the assignment above it. If
it's not legal, you get an exception, not an interpreter crash, not
your hard drive getting wiped, and not a massive electric shock to the
programmer.
Sigh. You have a very narrow view of "obvious". I can think of other
equally "obvious" interpretations. I won't list them because you'll
just accuse me of being contrary.
I'll grant you that other languages DO have completely different
semantics here, but in Python, a name is what it is throughout a
function; there is not a single circumstance where you can refer to a
name in two different scopes at once. The nearest to that is tricks
like "def f(x=x):" where something is evaluated in a different
context, but it is still completely unambiguous. (Okay, it might be
only *technically* unambiguous when you mix comprehensions and
assignment expressions, but they got special-cased to make them less
surprising.)
...
But I will say that I tried that code and you get an exception. But
interestingly, it's a *syntax* error (name assigned before global
declaration), not a *runtime* exception. I genuinely don't know which
you intended to suggest would be the obvious behaviour...
Yes, that's because the global statement is a syntactic feature, not
an executable one. But you may note that I never said it was obvious
that it had to be SyntaxError; only that it had to be an error (or to
refer to the global). The distinction between syntax errors (parse
time), function definition runtime errors, and function invocation
runtime errors, is much more subtle, and I don't expect people to be
able to intuit which one anything should be.
...
...
Would you prefer that I simply mandate that it be permitted, and then
a future version of Python changes it to be an exception? Or the other
way around? Because I could do that. Maybe it would reduce the
arguments. Pun intended, and I am not apologizing for it.
lol, I'm always up for a good pun :-)
Good :)
...
Are you still talking about the global example? Because I'd prefer you
left that part of the language alone. And if you're talking about PEP
671, you know my answer (I'd prefer you permit it and define what it
does, so it can't change in future).
But I don't want to force it to not change in the future. In any case,
the future can't be fully mandated like that. So your options are:

1) Lock in the semantics now, and if in the future it changes, then it
breaks people's code
2) Provide two options that implementations can choose, and if in the
future only one is legal, people's code should still have been
compatible with both

Which would you prefer? I am completely open to the first option, but
I just think it's unfair to future people's code to have it break,
when I could have given them fair warning that this shouldn't be done.

ChrisA