On 2 March 2018 at 21:50, Paul Moore <p.f.moore@gmail.com> wrote:
On 2 March 2018 at 11:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
> On 2 March 2018 at 19:05, Paul Moore <p.f.moore@gmail.com> wrote:
>>
>> The problem with statement local variables is that the extent over
>> which the name is in scope is not as clear to the human reader (the
>> rules the *compiler* follows may be precise, but they aren't obvious
>> to the human reader - that's the root of the debate I'm having with
>> Chris over "what the reference implementation does isn't a sufficient
>> spec"). In particular, assignment statements are non-obvious, as shown
>> by the examples that triggered your suggestion of a "." prefix.
>
> Those examples didn't trigger the suggestion: the suggestion was borne from
> the fact that I don't think it should be possible to close over statement
> locals.

Ah, OK. If closing over statement locals isn't allowed, then yes, they
are a different type of name, and you may need to distinguish them. On
the other hand, I'm not sure I agree with you that it shouldn't be
possible to close over statement locals. I can see that there are a
lot of *difficulties* with allowing it, but that's not the same.
 
What's your logic for saying you shouldn't be able to close over a
statement local name? What is fundamentally different about them that
makes them unsuitable to work like all other names in Python?

I have two reasons, one based on design intent, one based on practicality of implementation.

Starting with the practical motivation first: based on my experience preventing the implementation variable from leaking in comprehensions, I don't think it's going to be practical to allow closing over statement locals while still scoping them to the statement that sets them in any meaningful way. The difficulty of implementing that correctly is why I ended up going for the implicit-nested-scope implementation for Python 3 comprehensions in the first place.

While I do assume it would technically be possible to go that way (it's only software after all), it would require some *significant* changes to the way at least CPython's compiler handles symbol analysis, and likely to the way symtable analysis is handled in other compilers as well. By contrast, if you drop the "must support closures" requirement, then the required name mangling to only nested statements to re-use the same statement local names without interfering with each other or with matching function local names gets *much* simpler (especially with a syntactic marker to distinguish statement local references from normal variable names), since all you need to track is how many levels deep you are in statement nesting within the current compilation unit.

The design intent related rationale stems from the fact that closed over references can live for an arbitrarily long time (as can regular function locals in a generator or coroutine), and I think statement locals should be as reliably ephemeral as we can reasonably make them if they're going to be different enough from function locals to be worth the hassle of adding them.

So that means asking a bunch of questions and deliberately giving the *opposite* answer for statement locals than I would for function locals:

* Visible in locals()? Yes for function locals, no for statement locals
* Visible in frame.f_locals? Yes for function locals, no for statement locals
* Prevents access to the same name in outer scopes? Yes for function locals, no for statement locals
* Can be closed over? Yes for function locals, no for statement locals

Answer "no" to all those questions is only reasonable if statement local references *look* different from regular variable names. But I also think we need to proposing answering "No" to all of them to make statement locals potentially interesting enough to be worth considering.

(Debuggers would need to access these in order to display them, so they'd have to be available on the frame object together with metadata on the code object to correctly populate them at runtime, but that's a solvable problem)

While this would require non-trivial compiler changes as well, they'd be much safer updates with a lower risk of unintended consequences, since they wouldn't need to touch the existing name resolution code - they'd be their own thing, operating in parallel with the established dynamic name lookup support.

 
> PEP 3150 ended up needing syntactic markers as well, to handle the forward
> references to names set in the `given` clause while staying within the LL(1)
> parsing design constraint imposed on Python's grammar.

Is this basically about forward references then? Certainly the "a = (1
as a)" examples are a problem because of forward references. And the
problem here is that we can't use the normal solution of simply
prohibiting forward references ("Name assigned before declaration"
errors) because - well, I'm not quite sure why, actually. Surely if
you're naming a subexpression that's repeated, you can always just
choose to name the *first* occurrence and use that name for the rest?
I guess the exception is statements that introduce or bind names
(assignments, for, etc). I'd still be inclined to just say prohibit
such cases (if we can't, then we're back into the territory of
implementation difficulties driving the design).

Neither PEP 527 nor 3150 *needs* the syntactic markers - the compiler can figure out what is going on because it does the symbol table analysis pass before the code generation pass, and hence can tag names appropriately based on what it finds.

My concern is for people reading the code, where omitting a syntactic marker also forces *humans* to read things in two passes to make sure they understand them correctly. Consider this example:

    x = 10
    data = [x*x for i in range(10)]

This will give you a 10 item list, where each item is 100.

But now consider a world with statement locals, where statement locals use the same reference syntax as regular locals:

    x = 10
    data = [x*x for i in range(10) if (12 as x)]

That's a deliberately pathological example (and you can already write something similarly misleading using the "for x in [12]" trick), but the fact remains that we allow out of order evaluation in expressions in a way that we don't permit for statements.

With a syntactic marker though, there's typically going to be less ambiguity about where a name comes from:

    x = 10
    data = [.x*.x for i in range(10) if (12 as .x)]

It's not a panacea (since you may still be shadowing a name from an outer statement), and adapting it for PEP 3150 isn't without it's problems (specifically, you need to allow ".x = 12" in the given clause to make the names match up), but it should be less confusing than allowing a new subexpression to interfere with name resolution semantics folks have been relying on for years.

This all still feels to me like an attempt to rescue the proposal from
the issues that arise from not treating statement-local names exactly
like any other name.

If statement locals behave just like function locals, then there's no reason to add them to the language in the first place - "(expr as name)" would just become a confusing second way to spell "name = expr".

It's only potentially worthwhile if statement locals offer us something that function locals don't, and the clearest potential candidate for that differentiation is to provide a greater level of assurance regarding locality of use than regular name binding operations do.

> Right, but that extra notation *does* convey useful information to a reader
> that better enables local reasoning about a piece of code. Currently, if
> you're looking at an unfamiliar function and see a name you don't recognise,
> then you need to search the whole module for that name to see whether or not
> it's defined anywhere. Even if it's missing, you may still need to check for
> dynamic injection of module level names via globals().

Hang on, that's how all existing names in Python work (and in pretty
much any language that doesn't require explicit declarations). Surely
no-one is trying to suggest that this is a fundamental flaw?

It's good when that's what you want (which is most of the time once you've made the decision to use Python in the first place).

It's not good when you're just wanting to name a subexpression to avoid evaluating it twice, and want to minimise the risk of introducing unintended side effects when doing so (hence the decision to hide comprehension iteration variables in Py3).
 
> Seeing ".name" would be different (both for the compiler and for the human
> reader): if such a reference can't be resolved explicitly within the scope
> of the current statement, then *it's a bug* (and the compiler would be able
> to flag it as such at compile time).

Sorry, but you could use exactly that argument to propose that
function local variables should be prefixed with "$". I don't buy it.

Function locals have been part of the language from the beginning though, rather than only being added after folks have already had years or decades to develop their intuitions about how variable name resolution works in Python.

I'll also note that we *do* offer declarations to override what the compiler would infer by default based on assignment statements: global and nonlocal.
 
I guess I remain -1 on the proposal, and nothing that's getting said
about how we can make it work is doing anything to persuade me
otherwise (quite the opposite).

Yep, that's fair, as there are *lots* of viable alternatives to inline naming of subexpressions (each with various trade-offs). The question at hand is whether we can come up with semantics and syntax that folks actually like well enough to put to python-dev for a yes/no decision, with the pros and cons consolidated in one place.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia