About list comprehension syntax

Hi List comprehensions (and generator expressions) come in two 'flavours' at the moment: (1) [f(x) for x in L], which stands for map(f, L). Let's call this a 'map comprehension' (2) [f(x) for x in L if p(x)], which stands for map(f, filter(p, L)). Let's call this a 'map-filter comprehension'. Now if one wants to write simply filter(p, L) as a list comprehension, one has to write: (3) [x for x in L if p(x)]. This could be called a 'filter comprehension'. the 'x for x in L' is not very nice IMHO, but it is often handy to use such expressions over 'filter(...)', eg building the sublist of a given list consisting of all the items of a given type could be written as: filter(lambda x: isinstance(x, FilteringType), heterogeneous_list) or: [x for x in heterogenous_list if isinstance(x, FilteringType)] I still prefer the list comprehension over the lambda/filter combination, but neither feels very satisfying (to me :) (not that one cannot use partial in the filter version) Why not just drop the 'x for' at the start of a 'filter comprehension' (or generator expression)? Thus (3) could be written more simply as: (3') [x in L if p(x)] This is consistent with common mathematical notation: * { f(x) | x \in L } means the set of all f(x) for x in L * { f(x) | x \in L, p(x) } means the set of all f(x) for x in L satisfying predicate p. * { x \in L | p(x) } means the set of all x in L satisfying predicate p. -- Arnaud

"Arnaud Delobelle" <arno@marooned.org.uk> wrote in message news:8C1BDF74-1DAB-4F64-A28E-16788C48AA95@marooned.org.uk... | Hi | | List comprehensions (and generator expressions) come in two | 'flavours' at the moment: Actually, you can have 1 to many for clauses and 0 to many if clauses. | (1) [f(x) for x in L], which stands for map(f, L). Let's call this a | 'map comprehension' | | (2) [f(x) for x in L if p(x)], which stands for map(f, filter(p, L)). | Let's call this a 'map-filter comprehension'. | | Now if one wants to write simply filter(p, L) as a list | comprehension, one has to write: | | (3) [x for x in L if p(x)]. This could be called a 'filter | comprehension'. | | the 'x for x in L' is not very nice IMHO, but it is often handy to | use such expressions over 'filter(...)', eg building the sublist of a | given list consisting of all the items of a given type could be | written as: | | filter(lambda x: isinstance(x, FilteringType), heterogeneous_list) | | or: | | [x for x in heterogenous_list if isinstance(x, FilteringType)] | | I still prefer the list comprehension over the lambda/filter | combination, but neither feels very satisfying (to me :) (not that | one cannot use partial in the filter version) | | Why not just drop the 'x for' at the start of a 'filter | comprehension' (or generator expression)? Because such micro abbreviations are against the spirit of Python, which is designed for readability over writablilty. Even for a writer, it might take as much time to mentally deal with the exception and to simply type 'for x', which takes all of a second. Also, this breaks the mapping between for/if statements and clauses and makes the code ambiguous for both humans and the parser | Thus (3) could be written more simply as: | | (3') [x in L if p(x)] (x in L) is a legal expression already. (x in L) if p(x) looks like the beginning of (x in L) if p(x) else 'blah' . The whole thing looks like a list literal with an incompletely specified one element. | This is consistent with common mathematical notation: 'Common mathematical notation' is not codified and varies from writer to writer and even within the work of one writer. Humans make do and make guesses, but parser programs are less flexible. | * { f(x) | x \in L } means the set of all f(x) for x in L | * { f(x) | x \in L, p(x) } means the set of all f(x) for x in L | satisfying predicate p. | * { x \in L | p(x) } means the set of all x in L satisfying predicate p. I personally do not like the inconsistency of the last form, which flips '\in L' over the bar just because f(x) is the identify function. It would be OK though in a situation where that was the only set comprehension being used. But that is not the case with Python. Terry Jan Reedy

On 30 May 2007, at 17:30, Terry Reedy wrote:
That's true. I use that very seldom in fact. [...]
I wasn't suggesting this to save myself from typing 5 characters. You'll find it strange but I actually find [x in L if p(x)] more readable than [x for x in L if p(x)]. To me it says that I'm filtering, not mapping.
By ambiguous do you mean 'difficult to parse'? I didn't think it was ambiguous in the technical sense.
I'm not sure I understand. I agree that x if (y in L if p(y)) else z doesn't look great. Neither does x if (y for y in L if p(y)) else z Well, the 'for' in the second one is a bit of a hint, I suppose. I wouldn't write either anyway. Most of the time when I write a list comprehension / generator expression it is to bind it to a name.
Yet all modern mathematicians will understand the three forms without any hesitation and 'making guesses' (consciously at least).
In fact the last form is 'the consistent one', as the first two should really be written as: * { y \in M | \exists x \in L, y=f(x) } * { y \in M | \exists x \in L, p(x) and y=f(x) } (M being the codomain of f) ;oP Anyway, while I still like the idea, you've made me think about it as some sort of 'useless tinkering', which is probably is. -- Arnaud

Terry Reedy wrote:
That's the only real issue IMO - and I agree there is no acceptable solution.
I guess it also depends on how much math (eg theorem proofs) one had to deal with. FWIW, it took me months to adapt to the correct Python listcomp/genexp syntax, after being bitten dozens of times by Python not accepting Arnaud's (3') form above. The latter was *much* more natural to my fingers. Cheers, BB

Arnaud Delobelle wrote:
It would be very nice, but could be difficult to parse, because there's no clue you're not looking at a normal list constructor until you get to the 'if'. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

On Wed, 30 May 2007 12:41:51 +0200, Arnaud Delobelle <arno@marooned.org.uk> wrote:
(3') [x in L if p(x)]
I like the idea as well. It doesn't look ambiguous to me. 'if' can only appear as a statement or in conjunction with an 'else', so this expression can't mean anything else imo. Jan

Jan Kanis schrieb:
Tell that to the LL(1) parser ;) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

"Arnaud Delobelle" <arno@marooned.org.uk> wrote in message news:8C1BDF74-1DAB-4F64-A28E-16788C48AA95@marooned.org.uk... | Hi | | List comprehensions (and generator expressions) come in two | 'flavours' at the moment: Actually, you can have 1 to many for clauses and 0 to many if clauses. | (1) [f(x) for x in L], which stands for map(f, L). Let's call this a | 'map comprehension' | | (2) [f(x) for x in L if p(x)], which stands for map(f, filter(p, L)). | Let's call this a 'map-filter comprehension'. | | Now if one wants to write simply filter(p, L) as a list | comprehension, one has to write: | | (3) [x for x in L if p(x)]. This could be called a 'filter | comprehension'. | | the 'x for x in L' is not very nice IMHO, but it is often handy to | use such expressions over 'filter(...)', eg building the sublist of a | given list consisting of all the items of a given type could be | written as: | | filter(lambda x: isinstance(x, FilteringType), heterogeneous_list) | | or: | | [x for x in heterogenous_list if isinstance(x, FilteringType)] | | I still prefer the list comprehension over the lambda/filter | combination, but neither feels very satisfying (to me :) (not that | one cannot use partial in the filter version) | | Why not just drop the 'x for' at the start of a 'filter | comprehension' (or generator expression)? Because such micro abbreviations are against the spirit of Python, which is designed for readability over writablilty. Even for a writer, it might take as much time to mentally deal with the exception and to simply type 'for x', which takes all of a second. Also, this breaks the mapping between for/if statements and clauses and makes the code ambiguous for both humans and the parser | Thus (3) could be written more simply as: | | (3') [x in L if p(x)] (x in L) is a legal expression already. (x in L) if p(x) looks like the beginning of (x in L) if p(x) else 'blah' . The whole thing looks like a list literal with an incompletely specified one element. | This is consistent with common mathematical notation: 'Common mathematical notation' is not codified and varies from writer to writer and even within the work of one writer. Humans make do and make guesses, but parser programs are less flexible. | * { f(x) | x \in L } means the set of all f(x) for x in L | * { f(x) | x \in L, p(x) } means the set of all f(x) for x in L | satisfying predicate p. | * { x \in L | p(x) } means the set of all x in L satisfying predicate p. I personally do not like the inconsistency of the last form, which flips '\in L' over the bar just because f(x) is the identify function. It would be OK though in a situation where that was the only set comprehension being used. But that is not the case with Python. Terry Jan Reedy

On 30 May 2007, at 17:30, Terry Reedy wrote:
That's true. I use that very seldom in fact. [...]
I wasn't suggesting this to save myself from typing 5 characters. You'll find it strange but I actually find [x in L if p(x)] more readable than [x for x in L if p(x)]. To me it says that I'm filtering, not mapping.
By ambiguous do you mean 'difficult to parse'? I didn't think it was ambiguous in the technical sense.
I'm not sure I understand. I agree that x if (y in L if p(y)) else z doesn't look great. Neither does x if (y for y in L if p(y)) else z Well, the 'for' in the second one is a bit of a hint, I suppose. I wouldn't write either anyway. Most of the time when I write a list comprehension / generator expression it is to bind it to a name.
Yet all modern mathematicians will understand the three forms without any hesitation and 'making guesses' (consciously at least).
In fact the last form is 'the consistent one', as the first two should really be written as: * { y \in M | \exists x \in L, y=f(x) } * { y \in M | \exists x \in L, p(x) and y=f(x) } (M being the codomain of f) ;oP Anyway, while I still like the idea, you've made me think about it as some sort of 'useless tinkering', which is probably is. -- Arnaud

Terry Reedy wrote:
That's the only real issue IMO - and I agree there is no acceptable solution.
I guess it also depends on how much math (eg theorem proofs) one had to deal with. FWIW, it took me months to adapt to the correct Python listcomp/genexp syntax, after being bitten dozens of times by Python not accepting Arnaud's (3') form above. The latter was *much* more natural to my fingers. Cheers, BB

Arnaud Delobelle wrote:
It would be very nice, but could be difficult to parse, because there's no clue you're not looking at a normal list constructor until you get to the 'if'. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

On Wed, 30 May 2007 12:41:51 +0200, Arnaud Delobelle <arno@marooned.org.uk> wrote:
(3') [x in L if p(x)]
I like the idea as well. It doesn't look ambiguous to me. 'if' can only appear as a statement or in conjunction with an 'else', so this expression can't mean anything else imo. Jan

Jan Kanis schrieb:
Tell that to the LL(1) parser ;) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
participants (7)
-
Arnaud Delobelle
-
Boris Borcic
-
Georg Brandl
-
Greg Ewing
-
Jan Kanis
-
Josiah Carlson
-
Terry Reedy