[Python-Dev] The new and improved PEP 572, same great taste with 75% less complexity!

Wed Apr 25 04:23:00 EDT 2018

On Wed, Apr 25, 2018 at 4:55 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Apr 24, 2018 at 8:31 AM, Chris Angelico <rosuav at gmail.com> wrote:
>> The most notable change since last posting is that the assignment
>> target is no longer as flexible as with the statement form of
>> assignment, but is restricted to a simple name.
>>
>> Note that the reference implementation has not been updated.
>
> I haven't read most of the discussion around this, so my apologies if
> I say anything that's redundant. But since it seems like this might be
> starting to converge, I just read through it for the first time, and
> have a few comments.
>
> First, though, let me say that this is a really nice document, and I
> appreciate the incredible amount of work it takes to write something
> like this and manage the discussions! Regardless of the final outcome
> it's definitely a valuable contribution.

Thank you, but I'm hoping to do more than just rejected PEPs. (I do
have one co-authorship to my name, but that was something Guido
started, so it kinda had a bit of an advantage there.) My reputation
as champion of death march PEPs is not something I want to continue.
:|

> Concretely, I find it unnerving that two of these work, and one doesn't:
>
> # assignment on the right of usage
> results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
>
> # assignment on the left of usage
> stuff = [[y := f(x), x/y] for x in range(5)]
>
> # assignment on the right of usage
> stuff = [[x/y, y := f(x)] for x in range(5)]

Fair point. However, this isn't because of assignment operators, but
because of comprehensions. There are two important constructs in
Python that have out-of-order evaluation:

x if cond else y # evaluates cond before x
[result for x in iterable if cond] # evaluates result last
target()[0] = value # evaluates target after value

... Amongst our weaponry ... err, I'll come in again.

Ahem. Most of Python is evaluated left-to-right, top-to-bottom, just
as anyone familiar with a Latin-derived language would expect. There
are exceptions, however, and those are generally on the basis that
"practicality beats purity". Those exceptions include assignment, the
if/else expression, and comprehensions/genexps. But even within those
constructs, evaluation is left-to-right as much as it possibly can be.
A list comprehension places the target expression first (exception to
the LTR rule), but evaluates all its 'for' and 'if' clauses in order.

We *already* have some strange cases as a result of this out-of-order
evaluation. For instance:

reciprocals = [1/x for x in values if x]

The 'if x' on the right means we won't get a ZeroDivisionError from
the expression on the left. Were this loop to be unrolled, it would
look like this:

def listcomp():
    result = []
    for x in values:
        if x:
            result.append(1/x)
    return result
reciprocals = listcomp()

And you can easily audit the longhand form to confirm that, yes, "if
x" comes before "1/x". It's the same with assignment expressions; the
only exception to the "left before right" rule is the primary
expression being evaluated after all for/if clauses. Unrolling your
three examples gives (eliding the function wrappers for simplicity):

# assignment on the right of usage
for x in input_data:
    if (y := f(x)) > 0:
        results.append((x, y, x/y))

# assignment on the left of usage
for x in range(5):
    stuff.append([y := f(x), x/y])

# assignment on the right of usage
for x in range(5):
    stuff.append([x/y, y := f(x)])

Were list comprehensions *wrong* to put the expression first? I'm not
sure, but I can't see a better way to write them; at best, you'd end
up with something like:

[for x in numbers if x % 2: x * x]

which introduces its own sources of confusion (though I have to say,
it would look pretty clean in the "one for loop, no conditions" case).
But whether they're right or wrong, they're what we have, and side
effects are already legal, and assignment expressions are just another
form of side effect.

> I guess this isn't limited to comprehensions either – I rarely see
> complex expressions with side-effects embedded in the middle, so I'm
> actually a bit hazy about the exact order of operations inside Python
> expressions. I could probably figure it out if necessary, but even in
> simple cases like f(g(), h()), then do you really know for certain off
> the top of your head whether g() or h() runs first? Does the average
> user? With code like f(a := g(), h(a)) this suddenly matters a lot!
> But comprehensions suffer from a particularly extreme version of this,
> so it worries me that they're being put forward as the first set of
> motivating examples.

Honestly, I would fully expect that g() is run first, but I know there
are more complicated cases. For instance, here are three ways to print
"1" followed by "2", and create a dictionary mapping None to None:

>>> x={}
>>> x[print(2)] = print(1)
1
2
>>> x={print(1): print(2)}
1
2
>>> x={print(2): print(1) for _ in [1]}
1
2

Hmmmmmmmmm. One of these is not like the others...

>> Capturing condition values
>> --------------------------
>
> Note to Chris: your examples in this section have gotten their order
> scrambled; you'll want to fix that :-). And I'm going to reorder them
> yet again in my reply...
>
>>     # Reading socket data until an empty string is returned
>>     while data := sock.read():
>>         print("Received data:", data)
>
> I don't find this example very convincing. If it were written:
>
>     for data in iter(sock.read, b""):
>         ...
>
> then that would make it clearer what's happening ("oh right, sock.read
> uses b"" as a sentinel to indicate EOF).

That's assuming that you want an equality comparison, which is the
case for the thing you're assuming of it, but that's all. Recommending
that people use iter() and a 'for' loop has a couple of consequences:

1) It uses a syntax that's distinctly unobvious. You're achieving the
end result, but what exactly does it mean to iterate over sock.read?
It doesn't read very cleanly.

2) You encourage the use of == as the one and only comparison. If you
have an API that returns None when it's done, and you use "iter(func,
None)", you're actually checking if the yielded value == None, not if
it is None.

Suppose you want to change the check so that there are two termination
conditions. Do you stick with iter() and then add the other check
inside the loop? Do you rewrite the code back into the four-line loop
header with the infinite loop, so you can use "in {x, y}" as your
condition? iter() can solve a few problems, but not many of them, and
not always correctly.

> And the fact that this is
> needed at all is only because sockets are a low-level API with lots of
> complexity inherited from BSD sockets. If this were a normal python
> API, it'd just be
>
>     for data in sock:
>         ...

<chomp details of why it can't be that sort of API>

The complexity they inherit is fundamental to the nature of sockets,
so that part isn't going to change. I've simplified it down to a
coherent example, in the hopes that it'd make sense that way. Maybe I
need a different example; people keep getting hung up on the details
of sockets instead of the proposed syntax.

>>     # Proposed syntax
>>     while (command := input("> ")) != "quit":
>>         print("You entered:", command)
>>
>>     # Equivalent in current Python, not caring about function return value
>>     while input("> ") != "quit":
>>         print("You entered a command.")
>>
>>     # To capture the return value in current Python demands a four-line
>>     # loop header.
>>     while True:
>>         command = input("> ");
>>         if command == "quit":
>>             break
>>         print("You entered:", command)
>>
>> Particularly with the ``while`` loop, this can remove the need to have an
>> infinite loop, an assignment, and a condition. It also creates a smooth
>> parallel between a loop which simply uses a function call as its condition,
>> and one which uses that as its condition but also uses the actual value.
>
> I dare you to describe that first version in English :-). I would say:
> "it reads the next line of input and stores it in 'command'; then
> checks if it was 'quit', and if so it exits the loop; otherwise, it
> prints the command".

While the command (from the input function) is not equal to "quit",
print out "You entered:" and the command.

> (Plus in a real version of this you'd have some command line parsing
> to do – at least stripping off whitespace from the command, probably
> tokenizing it somehow –  before you could check what the command was,
> and then you're back to the final version anyway.)

Not if this is a wire protocol. But again, people keep getting hung up
on the specifics of the examples, saying "oh but if you change what
you're doing, it won't fit the := operator any more". Well, yeah, of
course that's true.

>>     # Capturing regular expression match objects
>>     # See, for instance, Lib/pydoc.py, which uses a multiline spelling
>>     # of this effect
>>     if match := re.search(pat, text):
>>         print("Found:", match.group(0))
>
> Now this is a genuinely compelling example! re match objects are
> always awkward to work with. But this feels like a very big hammer to
> make re.match easier to use :-). I wonder if there's anything more
> focused we could do here?

Rejected proposals: Dedicated syntax for regular expression matching?

>> Special-casing conditional statements
>> -------------------------------------
>>
>> One of the most popular use-cases is ``if`` and ``while`` statements.  Instead
>> of a more general solution, this proposal enhances the syntax of these two
>> statements to add a means of capturing the compared value::
>>
>>     if re.search(pat, text) as match:
>>         print("Found:", match.group(0))
>>
>> This works beautifully if and ONLY if the desired condition is based on the
>> truthiness of the captured value.  It is thus effective for specific
>> use-cases (regex matches, socket reads that return `''` when done), and
>> completely useless in more complicated cases (eg where the condition is
>> ``f(x) < 0`` and you want to capture the value of ``f(x)``).  It also has
>> no benefit to list comprehensions.
>>
>> Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction
>> of possible use-cases, even in ``if``/``while`` statements.
>
> It does only cover a fraction of possible use-cases, but
> interestingly, the fraction it covers includes:
>
> - two of the three real examples given in the rationale section
> - exactly the cases that *don't* force you to twist your brain in
> pretzels thinking about sequential side-effecting control flow in the
> middle of expressions.

It *only* is able to cover the regex case because the regex API has
been deliberately crafted to make this possible. If you need any sort
of comparison other than "is the captured object truthy?", this syntax
is insufficient.

> However, I do think it'd be kinda confusing if we had:
>
> if EXPR as X:
> while EXPR as X:
> with EXPR as X:
>
> and the first two assign the value of EXPR to X, while the last one
> does something more subtle. Or maybe it'd be fine?

Nope, it definitely would not be fine. This is covered by your opening
acknowledgement that you haven't read all the previous posts. :)

ChrisA