[Python-ideas] PEP 572: Statement-Local Name Bindings

Fri Mar 2 07:48:48 EST 2018

On 2 March 2018 at 21:50, Paul Moore <p.f.moore at gmail.com> wrote:

> On 2 March 2018 at 11:15, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > On 2 March 2018 at 19:05, Paul Moore <p.f.moore at gmail.com> wrote:
> >>
> >> The problem with statement local variables is that the extent over
> >> which the name is in scope is not as clear to the human reader (the
> >> rules the *compiler* follows may be precise, but they aren't obvious
> >> to the human reader - that's the root of the debate I'm having with
> >> Chris over "what the reference implementation does isn't a sufficient
> >> spec"). In particular, assignment statements are non-obvious, as shown
> >> by the examples that triggered your suggestion of a "." prefix.
> >
> > Those examples didn't trigger the suggestion: the suggestion was borne
> from
> > the fact that I don't think it should be possible to close over statement
> > locals.
>
> Ah, OK. If closing over statement locals isn't allowed, then yes, they
> are a different type of name, and you may need to distinguish them. On
> the other hand, I'm not sure I agree with you that it shouldn't be
> possible to close over statement locals. I can see that there are a
> lot of *difficulties* with allowing it, but that's not the same.
>

> What's your logic for saying you shouldn't be able to close over a
> statement local name? What is fundamentally different about them that
> makes them unsuitable to work like all other names in Python?
>

I have two reasons, one based on design intent, one based on practicality
of implementation.

Starting with the practical motivation first: based on my experience
preventing the implementation variable from leaking in comprehensions, I
don't think it's going to be practical to allow closing over statement
locals while still scoping them to the statement that sets them in any
meaningful way. The difficulty of implementing that correctly is why I
ended up going for the implicit-nested-scope implementation for Python 3
comprehensions in the first place.

While I do assume it would technically be possible to go that way (it's
only software after all), it would require some *significant* changes to
the way at least CPython's compiler handles symbol analysis, and likely to
the way symtable analysis is handled in other compilers as well. By
contrast, if you drop the "must support closures" requirement, then the
required name mangling to only nested statements to re-use the same
statement local names without interfering with each other or with matching
function local names gets *much* simpler (especially with a syntactic
marker to distinguish statement local references from normal variable
names), since all you need to track is how many levels deep you are in
statement nesting within the current compilation unit.

The design intent related rationale stems from the fact that closed over
references can live for an arbitrarily long time (as can regular function
locals in a generator or coroutine), and I think statement locals should be
as reliably ephemeral as we can reasonably make them if they're going to be
different enough from function locals to be worth the hassle of adding them.

So that means asking a bunch of questions and deliberately giving the
*opposite* answer for statement locals than I would for function locals:

* Visible in locals()? Yes for function locals, no for statement locals
* Visible in frame.f_locals? Yes for function locals, no for statement
locals
* Prevents access to the same name in outer scopes? Yes for function
locals, no for statement locals
* Can be closed over? Yes for function locals, no for statement locals

Answer "no" to all those questions is only reasonable if statement local
references *look* different from regular variable names. But I also think
we need to proposing answering "No" to all of them to make statement locals
potentially interesting enough to be worth considering.

(Debuggers would need to access these in order to display them, so they'd
have to be available on the frame object together with metadata on the code
object to correctly populate them at runtime, but that's a solvable problem)

While this would require non-trivial compiler changes as well, they'd be
much safer updates with a lower risk of unintended consequences, since they
wouldn't need to touch the existing name resolution code - they'd be their
own thing, operating in parallel with the established dynamic name lookup
support.

> > PEP 3150 ended up needing syntactic markers as well, to handle the
> forward
> > references to names set in the `given` clause while staying within the
> LL(1)
> > parsing design constraint imposed on Python's grammar.
>
> Is this basically about forward references then? Certainly the "a = (1
> as a)" examples are a problem because of forward references. And the
> problem here is that we can't use the normal solution of simply
> prohibiting forward references ("Name assigned before declaration"
> errors) because - well, I'm not quite sure why, actually. Surely if
> you're naming a subexpression that's repeated, you can always just
> choose to name the *first* occurrence and use that name for the rest?
> I guess the exception is statements that introduce or bind names
> (assignments, for, etc). I'd still be inclined to just say prohibit
> such cases (if we can't, then we're back into the territory of
> implementation difficulties driving the design).
>

Neither PEP 527 nor 3150 *needs* the syntactic markers - the compiler can
figure out what is going on because it does the symbol table analysis pass
before the code generation pass, and hence can tag names appropriately
based on what it finds.

My concern is for people reading the code, where omitting a syntactic
marker also forces *humans* to read things in two passes to make sure they
understand them correctly. Consider this example:

    x = 10
    data = [x*x for i in range(10)]

This will give you a 10 item list, where each item is 100.

But now consider a world with statement locals, where statement locals use
the same reference syntax as regular locals:

    x = 10
    data = [x*x for i in range(10) if (12 as x)]

That's a deliberately pathological example (and you can already write
something similarly misleading using the "for x in [12]" trick), but the
fact remains that we allow out of order evaluation in expressions in a way
that we don't permit for statements.

With a syntactic marker though, there's typically going to be less
ambiguity about where a name comes from:

    x = 10
    data = [.x*.x for i in range(10) if (12 as .x)]

It's not a panacea (since you may still be shadowing a name from an outer
statement), and adapting it for PEP 3150 isn't without it's problems
(specifically, you need to allow ".x = 12" in the given clause to make the
names match up), but it should be less confusing than allowing a new
subexpression to interfere with name resolution semantics folks have been
relying on for years.

This all still feels to me like an attempt to rescue the proposal from
> the issues that arise from not treating statement-local names exactly
> like any other name.
>

If statement locals behave just like function locals, then there's no
reason to add them to the language in the first place - "(expr as name)"
would just become a confusing second way to spell "name = expr".

It's only potentially worthwhile if statement locals offer us something
that function locals don't, and the clearest potential candidate for that
differentiation is to provide a greater level of assurance regarding
locality of use than regular name binding operations do.

> Right, but that extra notation *does* convey useful information to a
> reader
> > that better enables local reasoning about a piece of code. Currently, if
> > you're looking at an unfamiliar function and see a name you don't
> recognise,
> > then you need to search the whole module for that name to see whether or
> not
> > it's defined anywhere. Even if it's missing, you may still need to check
> for
> > dynamic injection of module level names via globals().
>
> Hang on, that's how all existing names in Python work (and in pretty
> much any language that doesn't require explicit declarations). Surely
> no-one is trying to suggest that this is a fundamental flaw?
>

It's good when that's what you want (which is most of the time once you've
made the decision to use Python in the first place).

It's not good when you're just wanting to name a subexpression to avoid
evaluating it twice, and want to minimise the risk of introducing
unintended side effects when doing so (hence the decision to hide
comprehension iteration variables in Py3).

> > Seeing ".name" would be different (both for the compiler and for the
> human
> > reader): if such a reference can't be resolved explicitly within the
> scope
> > of the current statement, then *it's a bug* (and the compiler would be
> able
> > to flag it as such at compile time).
>
> Sorry, but you could use exactly that argument to propose that
> function local variables should be prefixed with "$". I don't buy it.
>

Function locals have been part of the language from the beginning though,
rather than only being added after folks have already had years or decades
to develop their intuitions about how variable name resolution works in
Python.

I'll also note that we *do* offer declarations to override what the
compiler would infer by default based on assignment statements: global and
nonlocal.

> I guess I remain -1 on the proposal, and nothing that's getting said
> about how we can make it work is doing anything to persuade me
> otherwise (quite the opposite).
>

Yep, that's fair, as there are *lots* of viable alternatives to inline
naming of subexpressions (each with various trade-offs). The question at
hand is whether we can come up with semantics and syntax that folks
actually like well enough to put to python-dev for a yes/no decision, with
the pros and cons consolidated in one place.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180302/09ebb481/attachment-0001.html>