[Python-ideas] Temporary variables in comprehensions

Robert Vanden Eynde robertve92 at gmail.com
Thu Feb 15 23:03:23 EST 2018


Hello, talking about this syntax :

[y+2 for x in range(5) let y = x+1]

*Previous talks*
I've had almost exactly the same idea in June 2017, see subject "variable
assignment in functional context here: https://mail.python.org/
pipermail/python-ideas/2017-June/subject.html.

Currently this can easily be done by iterating over a list of size 1:

[y+2 for x in range(5) for y in [x+1]]

(Other ways exist, see section "current possibilities" below).

This comparison would answer a lot of questions like "can we write [x+2 for
x in range(5) for x in [x+1]], so [x+2 for x in range(5) let x = x+1]" the
new variable would indeed shadow the old one.

In June 2017 I introduced the "let" syntax using the existing "for"
keyword, using the "=" instead of a "in" like this:

[y+2 for x in range(5) for y = x+1]

The only difference I introduced was that it would be logical to accept the
syntax in any expression:

x = 5
print(y+2 for y = x + 1)

or with the "let" keyword:

x = 5
print(y+2 let y = x + 1)

*Previous talk: Pep*
In the June conversation one conclusion was that someone wrote a pep (
https://www.python.org/dev/peps/pep-3150/) in 2010 that's still pending
(not exactly the same syntax but the same idea, he used a "given:" syntax)
that looked like that :

print(y+2 given:
            y=x+1)

*Previous talk: GitHub implementation on Cython*
Another conclusion was that someone on GitHub implemented a "where"
statement in Cython (url: https://github.com/thektulu/cpython/commit/
9e669d63d292a639eb6ba2ecea3ed2c0c23f2636) where one could write :

print(y+2 where y = x+1)

So in a list comprehension:

[y+2 where y = x + 1 for x in range(5)]

As the author thektulu said "just compile and have fun".

*New syntax in list comprehension*
However, we probably would like to be able to use this "where" after the
for:

[y+2 for x in range(5) where y = x+1]

This would allow the new variable to be used in further "for" and "if"
statement :

[y+z for x in range(5) where y = x+1 for z in range(y+1)]

*Choice of syntax*
Initially I thought re-using the "for" keyword would be a good idea for
backward comptability (a variable named "where" in old code wouldn't be a
problem), however some people pointed out that the python Grammar wouldn't
be LL1, when reading the "for" token it wouldn't be able to choose directly
if the rest would be a "for in" or "for =" so actually introducing a
dedicated keyword is probably better, like this :

[y+2 for x in range(5) where y = x+1]
print(y+2 where y = x+1)

The "where" keyword is very readable in my opinion, very close to English
sentences we use. Sometimes we introduce a new variable before using it,
generally using "let" or after using it using "where". For example "let y =
x+1; print(y+2)". Or "print(y+2 where y = x+1)".

The first syntax is chosen in the "let in" syntax in haskell :

print(let y = x+2 in y+2)

Or chained:
print(let x = 2 in let y = x+1 in y+2)

But Haskell user would probably line break for clarity :

print(let x = 2 in
         let y = x+1 in
         y+2)

A postfix notation using "where" would probably be less verbose in my
opinion. Another example of "postfix notation" python has is with the "a if
condition else b" so adding a new one wouldn't be surprising. Furthermore,
the postfix notation is preferred in the context of "presenting the result
first, then the implementation" (context discussed already in the 2010
pep), the "presenting the result first" is also a goal of the list
comprehension, indeed one does write [x+3 for x in range(5)] and not [for x
in range(5): x+3], the latter would be more "imperative programming" style,
and would be translated to a normal loop.

The problem of chaining without parenthesis, how to remove the parenthesis
in the following statement ?

print((y+2 where y = x+1) where x = 2)

We have two options :

print(y+2 where x = 2 where y = x+1)
print(y+2 where y = x+1 where x = 2)

The first option would be probably closer to the way multiple "for" are
linked in a list comprehension:

[y+2 for x in range(5) for y in [x+1]]

But the second option would be more "present result first" and more close
to the parenthesized version, the user would create new variables as they
go, "I want to compute y+2 but hey, what is y ? It's x+1 ! But what is x ?
It's 5 !)). However, keeping the same order as in "multiple for in list
comprehension" is better, so I'd choose first option.

In the implementation on GitHub of thektulu the parenthesis are mandatory.

Another syntax issue would probably surprise some users, the following
statement, parenthesized :

[(y+2 where y = x+1) for x in range(5)]

Would have two totally legal ways to be done:

[y+2 where y = x+1 for x in range(5)]
[y+2 for x in range(5) where y = x+1]

The first one is a consequence of the "where" keyword usable in an
expression and the second one is a consequence of using it in a list
comprehension.

However I think it doesn't break the "there is only one obvious way to do
it", because depending on the case, the "y" variable would be a consequence
of the iteration or a consequence of the computation.

*Goals of the new syntax*
Personally, I find it very useful to do an assignment in such context, I
use list comprehension for example to generate a big json with multiple for
and if.

I don't have here a list of big real world example thar would be simplified
using this syntax but people interested could search arguments here.

People could argue only the "list comprehension" case would be useful and
not the "any expression" case:

[y+2 for x in range(5) where y = x+1]
would be accepted but not :

print(y+2 where y=x+1)

Because the latter could be written:

y = x + 1
print(y+2)

However, the idea of having an isolated scope can be a good idea.

*Current possibilities*
Currently we have multiple options :

[y+2 for x in range(5) for y in [x+1]]
[y+2 for (x+1 for x in range(5))]
[(lambda y:y+2)(y=x+1) for x in range(5)]

The first one works well but it's not obvious we just want to assign a new
variable, especially when the expression is long, or multiple, or both:
[y+z for x in range(5) for y,z in [((x + 1) * 2, x ** 2 - 5)]]

The second one makes it impossible to reuse the "x" variable and the y =
x+1 relation is not obvious.

The third example is what a functional programmer would think but is really
too much complex for a beginner and very verbose.

*Proposed syntax : Conclusion*
As I said, I like the "where" syntax with the "where" keyword.

[y+2 for x in range(5) where y = x+1]

Also usable in any expression :

print(y+2 where y = x+1)

*Conclusion*

Here is all the talk/work/argument I've already found about this syntax.
Apparently it's been a while (2010) since such an idea was thought but I
think having a new pep listing all the pros and cons would be a good idea.
So that we can measure how much the community would want this concept to be
introduced, and if it's refused, the community would have a document where
the "cons" are clearly written.

Robert Vanden Eynde
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180216/f8cbf4d0/attachment-0001.html>


More information about the Python-ideas mailing list