[Python-ideas] Temporary variables in comprehensions

fhsxfhsx fhsxfhsx at 126.com
Sun Feb 18 09:07:16 EST 2018


Thanks so much for the comments and the collect on this syntax!

Comments on *previous talk*
The list also mentioned some other previous proposals, so I myself search it and find that there're even more in the mailing list since 2008.
https://mail.python.org/pipermail/python-ideas/2008-August/001842.html

Comments on *previous talk: PEP*
The PEP seems to be about an explicit temporary namespace where any objects (including functions, classes, etc.) could be held.
However, I find that this syntax may not work for temporary variables in comprehensions.

We might expect this syntax work like
[?.y+2 for x in range(5) given: y = x+1]
But notice that the proposed `given` statement here appeared in comprehensions, where syntax changes in comprehensions are needed, and nothing about this is mentioned in the PEP.
What's more, the proposed syntax is a statement like `for` statement, `if` statement, rather than a for-clause or if-clause in comprehensions. In my opinion, it's not a good idea to have a statement in comprehensions. Instead, another given-clause should be added if one wants to write code like above, doing assignments in comprehensions, which can look like:
[?.y+2 for x in range(5) given y = x+1]
So the ? symbol seems useless here, so maybe
[y+2 for x in range(5) given y = x+1]
make it quite similar to the `where` syntax you proposed.

So, I agree with you that it is a good idea to have a new PEP, though for a different reason.
 
Comments on *Choice of syntax*
I gave two candidate syntaxs in https://mail.python.org/pipermail/python-ideas/2008-August/001842.html, one said `for ... in ...`, another said `with ... as ...`.
The first has the same problem as `for ... = ...` you proposed. And the biggest problem I think the second will face is the difference in semantic between the `with` statement and it.
When it comes to `where ... = ...`, there are one possible problem I can think of.
`where` is not now a keyword in Python. There are WHERE clauses in SQL, so in many modules including peewee and SQLAlchemy, `where` is an important method. The new syntax would cause quite incompatibilities.

Personally I agreed with you that postfix notation would have advantage over prefix notation. Other than `where`, `with` is quite readable in my opinion. So maybe `with ... = ...` can be another candidate?

Plus the `given ... = ...` I mentioned above, there are several more candidates now. Personally I perfer `with ... = ...`, because `with` is a python keyword so it would be good for backward compatibility.

*About comprehensions and expressions*
You gave `print(y+2 where y = x+1)` as an example, I think it should be clarified that it's not, or at least, does not look like a comprehension, but an expression. It should give an object `y+2` rather than a list or a generator. (otherwise what list can it give?)
There are for-clause and if-clause for comprehensions, and if-clause (aka ternary operators) for expressions. So, In my opinion, there should be additional discuss and examples to show that it's helpful to have such syntax.

For the where-clause in expressions, I think we could refer to how python handles if-clause.
The following setences are legal:
[x if x else 0 for x in mylist if x > 10]
The following illegal:
[x if x for x in mylist if x > 10]
[x if x else 0 for x in mylist if x > 10 else 10]
That's to say, the two kinds of if-clause, one is only used in expressions (aka `ternary operator`), the other is only used in comprehensions. They slightly differ in syntax.

The where-clause might work in similar ways. Then
[z+2 where z = y+1 for x in mylist where y = x+1]
means
[(z+2 where z=y+1) for x in mylist where y = x+1]
where the parenthesis (also the part before the first `for`) is a expression, the rest is comprehension clauses.
To be more accurate, the new syntax would be:

test: where_test ['if' where_test 'else' test] | lambdef
where_test: or_test | ( '(' or_test 'where' NAME '=' testlist_star_expr ')' )

Mandatory parenthesis in where_test is to resolve the ambiguity in code like
print(y+2 where y=x+1 if x>0 else x-1 if x>1 else 0)
It could be analysed like
print((y+2 where y=x+1 if x>0 else x-1) if x>1 else 0)
or
print(y+2 where y=(x+1 if x>0 else x-1 if x>1 else 0)).
I guess thektulu may have mandatory parenthesis for the same reason.

I haven't check the new syntax very carefully so there might be other ambiguities.

Another example is
print(y+2 if x>0 else y-2 where y=x+1)
with mandatory parenthesis, one must write
print((y+2 if x>0 else y-2 where y=x+1))
or
print(y+2 if x>0 else (y-2 where y=x+1))

However, it might still confuse many people. I wonder whether it's a good idea to have such syntax.

It would be much easier to add assignments in comprehensions.
comp_iter: comp_for | comp_if | comp_where
comp_where: 'where' NAME '=' testlist_star_expr [comp_iter]

Comments on *Goals of the new syntax*
I have a real-world example in https://mail.python.org/pipermail/python-ideas/2018-February/048997.html, it's to generate a big json, you seem to have a very goods feeling of it though you didn't give a real-world example.






At 2018-02-16 12:03:23, "Robert Vanden Eynde" <robertve92 at gmail.com> wrote:

Hello, talking about this syntax :


[y+2 for x in range(5) let y = x+1]



*Previous talks*
I've had almost exactly the same idea in June 2017, see subject "variable assignment in functional context here: https://mail.python.org/pipermail/python-ideas/2017-June/subject.html.


Currently this can easily be done by iterating over a list of size 1:


[y+2 for x in range(5) for y in [x+1]]


(Other ways exist, see section "current possibilities" below).


This comparison would answer a lot of questions like "can we write [x+2 for x in range(5) for x in [x+1]], so [x+2 for x in range(5) let x = x+1]" the new variable would indeed shadow the old one.


In June 2017 I introduced the "let" syntax using the existing "for" keyword, using the "=" instead of a "in" like this:


[y+2 for x in range(5) for y = x+1]


The only difference I introduced was that it would be logical to accept the syntax in any expression:


x = 5
print(y+2 for y = x + 1)


or with the "let" keyword:


x = 5
print(y+2 let y = x + 1)


*Previous talk: Pep*
In the June conversation one conclusion was that someone wrote a pep (https://www.python.org/dev/peps/pep-3150/) in 2010 that's still pending (not exactly the same syntax but the same idea, he used a "given:" syntax) that looked like that :


print(y+2 given:
            y=x+1)


*Previous talk: GitHub implementation on Cython*
Another conclusion was that someone on GitHub implemented a "where" statement in Cython (url: https://github.com/thektulu/cpython/commit/9e669d63d292a639eb6ba2ecea3ed2c0c23f2636) where one could write :


print(y+2 where y = x+1)


So in a list comprehension:


[y+2 where y = x + 1 for x in range(5)]


As the author thektulu said "just compile and have fun".


*New syntax in list comprehension*
However, we probably would like to be able to use this "where" after the for:


[y+2 for x in range(5) where y = x+1]


This would allow the new variable to be used in further "for" and "if" statement :


[y+z for x in range(5) where y = x+1 for z in range(y+1)]



*Choice of syntax*
Initially I thought re-using the "for" keyword would be a good idea for backward comptability (a variable named "where" in old code wouldn't be a problem), however some people pointed out that the python Grammar wouldn't be LL1, when reading the "for" token it wouldn't be able to choose directly if the rest would be a "for in" or "for =" so actually introducing a dedicated keyword is probably better, like this :


[y+2 for x in range(5) where y = x+1]
print(y+2 where y = x+1)



The "where" keyword is very readable in my opinion, very close to English sentences we use. Sometimes we introduce a new variable before using it, generally using "let" or after using it using "where". For example "let y = x+1; print(y+2)". Or "print(y+2 where y = x+1)".


The first syntax is chosen in the "let in" syntax in haskell :


print(let y = x+2 in y+2)


Or chained:
print(let x = 2 in let y = x+1 in y+2)


But Haskell user would probably line break for clarity :


print(let x = 2 in
         let y = x+1 in
         y+2)



A postfix notation using "where" would probably be less verbose in my opinion. Another example of "postfix notation" python has is with the "a if condition else b" so adding a new one wouldn't be surprising. Furthermore, the postfix notation is preferred in the context of "presenting the result first, then the implementation" (context discussed already in the 2010 pep), the "presenting the result first" is also a goal of the list comprehension, indeed one does write [x+3 for x in range(5)] and not [for x in range(5): x+3], the latter would be more "imperative programming" style, and would be translated to a normal loop.


The problem of chaining without parenthesis, how to remove the parenthesis in the following statement ?


print((y+2 where y = x+1) where x = 2)


We have two options :


print(y+2 where x = 2 where y = x+1)
print(y+2 where y = x+1 where x = 2)


The first option would be probably closer to the way multiple "for" are linked in a list comprehension:


[y+2 for x in range(5) for y in [x+1]]


But the second option would be more "present result first" and more close to the parenthesized version, the user would create new variables as they go, "I want to compute y+2 but hey, what is y ? It's x+1 ! But what is x ? It's 5 !)). However, keeping the same order as in "multiple for in list comprehension" is better, so I'd choose first option.


In the implementation on GitHub of thektulu the parenthesis are mandatory.


Another syntax issue would probably surprise some users, the following statement, parenthesized :


[(y+2 where y = x+1) for x in range(5)]


Would have two totally legal ways to be done:


[y+2 where y = x+1 for x in range(5)]
[y+2 for x in range(5) where y = x+1]


The first one is a consequence of the "where" keyword usable in an expression and the second one is a consequence of using it in a list comprehension.


However I think it doesn't break the "there is only one obvious way to do it", because depending on the case, the "y" variable would be a consequence of the iteration or a consequence of the computation.


*Goals of the new syntax*
Personally, I find it very useful to do an assignment in such context, I use list comprehension for example to generate a big json with multiple for and if.


I don't have here a list of big real world example thar would be simplified using this syntax but people interested could search arguments here.


People could argue only the "list comprehension" case would be useful and not the "any expression" case:


[y+2 for x in range(5) where y = x+1]
would be accepted but not :


print(y+2 where y=x+1)


Because the latter could be written:


y = x + 1
print(y+2)


However, the idea of having an isolated scope can be a good idea.


*Current possibilities*
Currently we have multiple options :


[y+2 for x in range(5) for y in [x+1]]
[y+2 for (x+1 for x in range(5))]
[(lambda y:y+2)(y=x+1) for x in range(5)]


The first one works well but it's not obvious we just want to assign a new variable, especially when the expression is long, or multiple, or both:
[y+z for x in range(5) for y,z in [((x + 1) * 2, x ** 2 - 5)]]


The second one makes it impossible to reuse the "x" variable and the y = x+1 relation is not obvious.


The third example is what a functional programmer would think but is really too much complex for a beginner and very verbose.


*Proposed syntax : Conclusion*
As I said, I like the "where" syntax with the "where" keyword.


[y+2 for x in range(5) where y = x+1]


Also usable in any expression :


print(y+2 where y = x+1)


*Conclusion*


Here is all the talk/work/argument I've already found about this syntax. Apparently it's been a while (2010) since such an idea was thought but I think having a new pep listing all the pros and cons would be a good idea. So that we can measure how much the community would want this concept to be introduced, and if it's refused, the community would have a document where the "cons" are clearly written.


Robert Vanden Eynde
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180218/b0cda6e8/attachment-0001.html>


More information about the Python-ideas mailing list