[Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part)

Sun Jul 1 02:11:41 EDT 2018

On 1 July 2018 at 14:32, Tim Peters <tim.peters at gmail.com> wrote:
> [Nick]
>
>> The PEP specifically cites this example as motivation:
>
> The PEP gives many examples.  Your original was a strawman
> mischaracterization of the PEP's _motivations_ (note the plural:  you only
> mentioned "minor performance improvement", and snipped my listing of the
> major motivations).

I listed two motivations, not one:

1. Minor performance improvements (the "avoiding repeated
subexpressions without use multiple statements" rational)
2. Making certain coding patterns easier to spot (the loop-and-a-half
and if-elif chaining cases)

Technically, avoid repeated subexpressions without requiring a
separate line also falls into the second category.

The subsequent interaction with comprehensions and generator
expressions is an interesting side effect of extending the basic idea
to a fully coherent and self-consistent proposal, not one of the
original motivations for it.

>>   group = re.match(data).group(1) if re.match(data) else None
>
>> That code's already perfectly straightforward to read and write as a
>
>> single line,
>
> I disagree.  In any case of textual repetition, it's a visual
> pattern-matching puzzle to identify the common substrings (I have to
> visually scan that line about 3 times to be sure), and then a potentially
> difficult conceptual puzzle to figure out whether side effects may result in
> textually identical substrings evaluating to different objects.  That's why
> "refererential transparency" is so highly valued in functional languages
> ("if subexpressions are spelled the same, they evaluate to the same result,
> period" - which isn't generally true in Python - to get that enormously
> helpful (to reasoning) guarantee in Python you have to ensure the
> subexpression is evaluated exactly once).
>
> And as you of all people should be complaining about, textual repetition is
> also prone to "oops - forgot one!" and "oops! made a typo when changing the
> second one!" when code is later modified.

That's a reasonable readability based argument, but it's not what the
PEP currently gives as a motivation for this aspect of the proposal.

>> so the only reason to quibble about it
>
> I gave you three better reasons to quibble about it just above ;-)

Then add them to the PEP, as what's currently there really isn't
offering a compelling motivation for this aspect of the proposal :)

>> is because it's slower than the arguably less clear two-line alternative:
>
>>  _m = re.match(data)
>
>>   group = _m.group(1) if _m else None
>
>
> I find that much clearer than the one-liner above:  the visual pattern
> matching is easier because the repeated substring is shorter and of much
> simpler syntactic structure; it guarantees _by construction_ that the two
> instances of `_m` evaluate to the same object, so there's no possible
> concern about that (it doesn't even matter if you bound `re` to some
> "non-standard" object that has nothing to do with Python's `re` module); and
> any later changes to the single instance of `re.match(data)` don't have to
> be repeated verbatim elsewhere.  It's possible that it runs twice as fast
> too, but that's the least of my concerns.

I agree with this, but also think the two-line form is a perfectly
acceptable way of spelling it, and a perfectly acceptable refactoring
of the one-line form with duplicated subexpressions to improve
maintainability.

> All of those advantages are retained in the one-liner too if an assignment
> expression can be used in it.

Sure, but the open design question is whether folks that would have
written the one-liner with repeated subexpressions are going to be any
more likely to use an assignment expression to avoid the repetition
without prompting by a more experienced developer than they are to use
a separate preceding assignment statement.

That assessment of "What is the increased chance that the repeated
subexpression will be avoided when the code is first written?" then
gets traded off against the overall increase in language complexity
arising from allowing name bindings in arbitrary subexpressions.

Don't get me wrong, I now agree that the proposal in PEP 572 is the
most coherent and self-consistent approach to assignment expressions
that we could pursue given the existing scoping semantics of
comprehensions and generator expressions.

The remaining point of contention is only the "Inevitable cost of
change" one: given the level of disruption this will cause in the way
that Python gets taught to new users, is it giving a commensurate
pay-off in increased semantic expressiveness?

My answer to that question remains "No", while your answer is either
"Yes" or "I don't see why that should matter" (I'm genuinely unsure
which).

>>> sometimes it allows a more compact way of reusing an expensive
>>> subexpression by giving it a name.   Which they already do by giving
>>> it a name in a separate statement, so the possible improvement would
>>> be in brevity rather than performance.
>
> You already realized the performance gain could be achieved by using two
> statements.  The _additional_ performance gain by using assignment
> expressions is at best trivial (it may save a LOAD_FAST opcode to fetch the
> object bound to `_m` for the `if` test).
>
> So, no, gaining performance is _not_ the motivation here.  You already had a
> way to make it "run fast'.  The motivation is the _brevity_ assignment
> expressions allow while _retaining_ all of the two-statement form's
> advantages in easier readability, easier reasoning, reduced redundancy, and
> performance.

I never said the motivation was to gain performance relative to the
two-statement version - I said the motivation given in the PEP is to
gain performance relative to the *repeated subexpression* version,
*without* making the transition to the already supported two-statement
version.

> As Guido said, in the PEP, of the example you gave here:
>
> Guido found several examples where a programmer repeated
> a subexpression, slowing down the program, in order to save
> one line of code
>
>
> It couldn't possibly be clearer that Guido thought the programmer's
> motivation was brevity ("in order to save one line of code").  Guido only
> happened to mention that they were willing to slow down the code to get that
> brevity, but, as above, they were also willing to make the code harder to
> read, reason about, and maintain.  With the assignment expression, they
> don't have to give up any of the latter to get the brevity they mistakenly
> _think_ ;-) they care most about - and, indeed, they can make it even
> briefer.

The quoted paragraph from the PEP clearly states that the reason the
repeated subexpression is considered a problem is because it slows
down the program, not because it repeats code.

As noted above, the PEP could certainly be updated to point out that
repeating subexpressions is problematic for more reasons than just
speed, but that isn't what it currently says.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia