FW: Map-then-filter in comprehensions

Hello, I’m mostly in favour of extending the comprehension syntax, but not at any cost. The `as` proposal is a big -1 from me. Right now, `as` is used for : - from foo import bar as baz - try: ... except Exception as e: ... And maybe a few others I’ve probably forgotten. The general concept is name binding, so if I see something like [x for x, y in some_iterable as y > 5] I’m going be confused by what sort of name binding it does. A quick glance at the keywords list tells me that no currently-existing keyword is “the obvious choice” for this purpose. I don’t think such a small niche warrants a new keyword, either. Python is mostly intuitive, and your proposition is anything but intuitive to me. -1 from me for now, but might change if you come up with an intuitive syntax. -Emanuel ~If it doesn’t quack like a duck, add a quack() method~

On Wed, Mar 9, 2016 at 1:59 AM, Émanuel Barry <vgr255@live.ca> wrote:
Maybe a pseudo-keyword would be sufficient - comprehension/genexp syntax is pretty solidly specified, so it's less likely to break stuff than in general syntax. I'm really not liking the proposed syntax, but maybe there's an alternative. My first thought on reading this proposal is: "Ah, it's like SQL's 'HAVING' keyword". The trouble is that SQL can identify its columns, but Python doesn't have an easy syntax for "the thing you're about to return". Something like: [abs(x) for x in numbers having _ > 5] The semantics would be equivalent to: def <listcomp>(): result = [] for x in numbers: _ = abs(x) if _ > 5: result.append(_) return result A 'having' clause (and, by the way, I'm not too enthused about the name, but I'm using it as a placeholder because of SQL's use) would have to go after all 'for' and 'if' clauses, and would have access to one special name (maybe _ because of its use in interactive mode, or maybe something else) which holds the value that would be returned. A simple filter-out-the-false-ones could do this: [regex.match(line) for line in lines having _] Of course, people could just switch from 'if' to 'having' across the board, but this has the same consequences that it does in SQL: now *every* row has to be fully processed, only to be discarded at the last step. Using 'having' with no underscore would violate common sense and good style. ChrisA

On 8 March 2016 at 15:20, Chris Angelico <rosuav@gmail.com> wrote:
Yes I certainly agree that this kind of extra syntax should not use a new keyword.
I have to agree that I'm not mad-keen on the syntax I proposed either, I just couldn't come up with an elegant alternative, and I guess I hoped someone else might be.
If I were introducing a new clause, I would probably suggest 'where' from Haskell, which would allow multiple definitions perhaps. I'm not sure, I guess there are two questions, 1) is this a need that should be addressed at all, and 2) if so, what is the right syntax extension. I think it more important to answer 1 and then the syntax-related endless threads can be commenced. ;) Although of course the answer to 1 might be "only if someone proposes an especially elegant syntax".

This map-then-filter shortcoming is one of the most common warts I encounter with python syntax, and I would absolutely love an improvement here. I think ChrisA is closest with:
But I wonder, why not just: [abs(x) for x in numbers if _ > 5] Where _ refers to the return expression within a comprehension, and is interpreted specially in this context? On Tue, Mar 8, 2016 at 7:20 AM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 9, 2016 at 7:46 AM, Mark Mollineaux <bufordsharkley@gmail.com> wrote:
That makes it somewhat magical; compare these two conditions: [(base//x) % 10 for x in numbers if x if _] The first one MUST be filtered prior to mapping (else you'll get a ZeroDivisionError), but the second has to be filtered afterwards. Putting the two conditions into one comprehension might not be common, but it's a bit odd to have them mean different things. ChrisA

It's special within the REPL (referring to the last returned expression), so it would at least have some precedence in python to be made special in another (somewhat similar) context... Too, there's already special scoping for list comprehensions.
Agreed that two ifs applied at different times is fairly odd, but as long as _ is known to be magical, it's fairly intuitive, in my opinion. Also, Koos's offering above:
[abs(x) for x in numbers if > 5]
Is also very nice, but I don't like it when the filtering is: [abs(x) for x in numbers if] On Tue, Mar 8, 2016 at 12:52 PM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 9, 2016 at 12:10 AM, Mark Mollineaux <bufordsharkley@gmail.com> wrote: [...]
I assume you mean filter out zeros? Then it could be [abs(x) for x in numbers if != 0] Or someone might want to do [foo(x) for x in things if not None] Another possibility might be [abs(x) if > 5 for x in numbers] [foo(x) if not None for x in things] -- Koos

On 10 March 2016 at 13:40, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
[foo(x) for x in things if is not None] # What you wrote is already valid syntax, equivalent to [foo(x) for x in things]
... which pretty much explains why this syntax isn't a good idea. Paul

On Thu, Mar 10, 2016 at 3:40 PM, Rob Cliffe <rob.cliffe@btinternet.com> wrote: [...]
[foo(x) for x in things if is not None] # What you wrote is already valid syntax, equivalent to [foo(x) for x in things]
Oh, of course! Thanks. I suppose my confusion may be an argument against the syntax ;) -- Koos

On 03/08/2016 06:59 AM, Émanuel Barry wrote:
I think everyone would be confused because that code is wrong: it's assigning the `iterable` as `y`, and then comparing that to the value `5`. More realistic (and correct ;) might be: [z for x, y in some_iterable if x+y as z > 10] and the result is a list of numbers whose combined value is greater than 10. A name binding is in fact occuring, so `as` is a fine choice. -- ~Ethan~

On Wed, Mar 9, 2016 at 4:26 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Implement that, and people will ask why they can't then unroll that: def <listcomp>(): result = [] for x, y in some_iterable: if x+y as z > 10: # SyntaxError result.append(z) return z I'm not sure people want name bindings in general expressions (note that this can't be a feature of the 'if' statement, as it's capturing and then continuing on), but as I see it, that's the only consistent way to do what you're attempting there. ChrisA

On Wed, Mar 9, 2016 at 5:32 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Ah yes, but that's not how you've written it in the comprehension. You wrote it with 'as'. Believe you me, people WILL expect that outside of comprehensions. Otherwise, you have to explain why a name binding is legal in a condition in a comprehension, but not in any other expression. ChrisA

Haskell has a feature like this in comprehensions where one may write: [r + 1| n <- [1..10], let r = n * 3, r `rem` 4 == 0] Here we are sharing the definition of `r` in both the predicate and the value expresions, we can also use the name in different expressions in both contexts. If we were to translate this to python syntax we could have something like: [r + 1 for n in range(1, 11) for n * 3 as r if r % 4 == 0] There is no reason that the name binding needs to be a part of the predicate expression, they can just be seperate clauses. I think the `for expr as name` is nice because it matches the order that comprehensions over multiple iterators are evaluated like: `[n for n in ns for m in ms]`. I personally just write `map` and `filter` and then have the control to make the correct function get called first but adding something like this might make me more inclined to use comprehensions for the simple cases. On Tue, Mar 8, 2016 at 1:34 PM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 9, 2016 at 5:57 AM, Joseph Jevnik <joejev@gmail.com> wrote:
That's somewhat more appealing. Not enthused about "for expr as name"; maybe "with expr as name"? Aside from not calling __enter__ and __exit__, it's the same kind of operation that a with block does. But the semantic difference isn't a good thing. ChrisA

I was going to write `with expr as name` but I felt that people might find it too conficting with the other use of the `with` keyword. I personally find that it reads very well. We could always go with a new keyword like: `bind expr as name` but that seems pretty heavy. In defense of the `for expr as name` proposal, it does match pretty closely to the nested for statements in a comprehension. On Tue, Mar 8, 2016 at 2:01 PM, Chris Angelico <rosuav@gmail.com> wrote:

On 8 March 2016 at 18:57, Joseph Jevnik <joejev@gmail.com> wrote:
If we were to translate this to python syntax we could have something like: [r + 1 for n in range(1, 11) for n * 3 as r if r % 4 == 0]
Quite seriously, I have no idea what that means. If I saw it in a code review, I'd insist that it were rewritten. Paul

On Wed, Mar 9, 2016 at 6:51 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Well, okay then. Spin that off as its own idea: Name bindings in expressions. It's a coherent idea, but it's likely to see some fairly stiff opposition :) (For what it's worth, I am _not_ in opposition to it.) ChrisA

On 2016-03-08 11:54, Chris Angelico wrote:
I agree that that option should be considered. I don't see that much value in the current proposals because they only handle the case where you want to filter the returned values based on those same values, but not the case where you want to filter (before or after the map) based on some criterion that re-uses a computed expression, or where you want to re-use an expression that's not in a comprehension at all. Something like: [some_dict[x.lower()] for x in whatever if x.lower() not in exclusion_list and x.lower() in some_dict] Even though this filter is being done before the map, it's still awkward, because you have to keep repeating the .lower(). If we had assignment-as-expression, you could do [some_dict[lx] for x in whatever if lx not in exclusion_list and lx in some_dict where lx=x.lower()] (or whatever the syntax may be). Of course, that example is rather silly since the version with the "where" is barely shorter :-). But hopefully the point is clear. I often find myself doing awkward things like the first example in numerical situations where I want to collect one thing while filtering and mapping based on something else (e.g., work with numbers while filtering and mapping based on their logs or something). Moreover, a general-purpose assignment-as-expression would also be usable in other contexts besides comprehensions. I run into this same repeated-expression thing when doing computations in pandas using .apply(), where you do .apply(lambda x: ...), and maybe the ... involves a repeated expression involving x. With assignment-as-expression you could .apply(lambda x: ... where temp=f(x)) or what have you I think adding assignment-as-expression would be a significant change to Python. I'm not sure whether it would overall be a good change. Almost all the cases where I really feel like I want this are cases where I'm working with data interactively, and it really is easier to use a lambda than define a separate one-off function. For anything larger-scale, I agree that the "problem" can be solved by taking a deep breath and writing a separate function or generator comprehension to do what you want. As someone mentioned earlier on this thread, the original motivating example: foo = [abs(x) for x in numbers if abs(x) > 5] already has a crystal clear solution: abses = (abs(x) for x in numbers) foo = [x for x in abses if x > 5] There is really nothing wrong with this existing solution unless you're working interactively and want to do things more tersely. But if you're doing that, you're likely to want to do other kinds of terse expressions besides list comprehensions too. So basically, to me the issue is much more general than "I want to filter after mapping in comprehensions". What I want is to write expressions that re-use an intermediate computation. Some of those are in comprehensions, some aren't. Most of the cases where I want to do this are in interactive situations where I'm using a lot of one-off lambdas and temp expressions. If that's something we want to support more, assignment-as-expression could be quite useful, but if we don't, it's a different story. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Wed, Mar 9, 2016 at 1:59 AM, Émanuel Barry <vgr255@live.ca> wrote:
Maybe a pseudo-keyword would be sufficient - comprehension/genexp syntax is pretty solidly specified, so it's less likely to break stuff than in general syntax. I'm really not liking the proposed syntax, but maybe there's an alternative. My first thought on reading this proposal is: "Ah, it's like SQL's 'HAVING' keyword". The trouble is that SQL can identify its columns, but Python doesn't have an easy syntax for "the thing you're about to return". Something like: [abs(x) for x in numbers having _ > 5] The semantics would be equivalent to: def <listcomp>(): result = [] for x in numbers: _ = abs(x) if _ > 5: result.append(_) return result A 'having' clause (and, by the way, I'm not too enthused about the name, but I'm using it as a placeholder because of SQL's use) would have to go after all 'for' and 'if' clauses, and would have access to one special name (maybe _ because of its use in interactive mode, or maybe something else) which holds the value that would be returned. A simple filter-out-the-false-ones could do this: [regex.match(line) for line in lines having _] Of course, people could just switch from 'if' to 'having' across the board, but this has the same consequences that it does in SQL: now *every* row has to be fully processed, only to be discarded at the last step. Using 'having' with no underscore would violate common sense and good style. ChrisA

On 8 March 2016 at 15:20, Chris Angelico <rosuav@gmail.com> wrote:
Yes I certainly agree that this kind of extra syntax should not use a new keyword.
I have to agree that I'm not mad-keen on the syntax I proposed either, I just couldn't come up with an elegant alternative, and I guess I hoped someone else might be.
If I were introducing a new clause, I would probably suggest 'where' from Haskell, which would allow multiple definitions perhaps. I'm not sure, I guess there are two questions, 1) is this a need that should be addressed at all, and 2) if so, what is the right syntax extension. I think it more important to answer 1 and then the syntax-related endless threads can be commenced. ;) Although of course the answer to 1 might be "only if someone proposes an especially elegant syntax".

This map-then-filter shortcoming is one of the most common warts I encounter with python syntax, and I would absolutely love an improvement here. I think ChrisA is closest with:
But I wonder, why not just: [abs(x) for x in numbers if _ > 5] Where _ refers to the return expression within a comprehension, and is interpreted specially in this context? On Tue, Mar 8, 2016 at 7:20 AM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 9, 2016 at 7:46 AM, Mark Mollineaux <bufordsharkley@gmail.com> wrote:
That makes it somewhat magical; compare these two conditions: [(base//x) % 10 for x in numbers if x if _] The first one MUST be filtered prior to mapping (else you'll get a ZeroDivisionError), but the second has to be filtered afterwards. Putting the two conditions into one comprehension might not be common, but it's a bit odd to have them mean different things. ChrisA

It's special within the REPL (referring to the last returned expression), so it would at least have some precedence in python to be made special in another (somewhat similar) context... Too, there's already special scoping for list comprehensions.
Agreed that two ifs applied at different times is fairly odd, but as long as _ is known to be magical, it's fairly intuitive, in my opinion. Also, Koos's offering above:
[abs(x) for x in numbers if > 5]
Is also very nice, but I don't like it when the filtering is: [abs(x) for x in numbers if] On Tue, Mar 8, 2016 at 12:52 PM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 9, 2016 at 12:10 AM, Mark Mollineaux <bufordsharkley@gmail.com> wrote: [...]
I assume you mean filter out zeros? Then it could be [abs(x) for x in numbers if != 0] Or someone might want to do [foo(x) for x in things if not None] Another possibility might be [abs(x) if > 5 for x in numbers] [foo(x) if not None for x in things] -- Koos

On 10 March 2016 at 13:40, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
[foo(x) for x in things if is not None] # What you wrote is already valid syntax, equivalent to [foo(x) for x in things]
... which pretty much explains why this syntax isn't a good idea. Paul

On Thu, Mar 10, 2016 at 3:40 PM, Rob Cliffe <rob.cliffe@btinternet.com> wrote: [...]
[foo(x) for x in things if is not None] # What you wrote is already valid syntax, equivalent to [foo(x) for x in things]
Oh, of course! Thanks. I suppose my confusion may be an argument against the syntax ;) -- Koos

On 03/08/2016 06:59 AM, Émanuel Barry wrote:
I think everyone would be confused because that code is wrong: it's assigning the `iterable` as `y`, and then comparing that to the value `5`. More realistic (and correct ;) might be: [z for x, y in some_iterable if x+y as z > 10] and the result is a list of numbers whose combined value is greater than 10. A name binding is in fact occuring, so `as` is a fine choice. -- ~Ethan~

On Wed, Mar 9, 2016 at 4:26 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Implement that, and people will ask why they can't then unroll that: def <listcomp>(): result = [] for x, y in some_iterable: if x+y as z > 10: # SyntaxError result.append(z) return z I'm not sure people want name bindings in general expressions (note that this can't be a feature of the 'if' statement, as it's capturing and then continuing on), but as I see it, that's the only consistent way to do what you're attempting there. ChrisA

On Wed, Mar 9, 2016 at 5:32 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Ah yes, but that's not how you've written it in the comprehension. You wrote it with 'as'. Believe you me, people WILL expect that outside of comprehensions. Otherwise, you have to explain why a name binding is legal in a condition in a comprehension, but not in any other expression. ChrisA

Haskell has a feature like this in comprehensions where one may write: [r + 1| n <- [1..10], let r = n * 3, r `rem` 4 == 0] Here we are sharing the definition of `r` in both the predicate and the value expresions, we can also use the name in different expressions in both contexts. If we were to translate this to python syntax we could have something like: [r + 1 for n in range(1, 11) for n * 3 as r if r % 4 == 0] There is no reason that the name binding needs to be a part of the predicate expression, they can just be seperate clauses. I think the `for expr as name` is nice because it matches the order that comprehensions over multiple iterators are evaluated like: `[n for n in ns for m in ms]`. I personally just write `map` and `filter` and then have the control to make the correct function get called first but adding something like this might make me more inclined to use comprehensions for the simple cases. On Tue, Mar 8, 2016 at 1:34 PM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 9, 2016 at 5:57 AM, Joseph Jevnik <joejev@gmail.com> wrote:
That's somewhat more appealing. Not enthused about "for expr as name"; maybe "with expr as name"? Aside from not calling __enter__ and __exit__, it's the same kind of operation that a with block does. But the semantic difference isn't a good thing. ChrisA

I was going to write `with expr as name` but I felt that people might find it too conficting with the other use of the `with` keyword. I personally find that it reads very well. We could always go with a new keyword like: `bind expr as name` but that seems pretty heavy. In defense of the `for expr as name` proposal, it does match pretty closely to the nested for statements in a comprehension. On Tue, Mar 8, 2016 at 2:01 PM, Chris Angelico <rosuav@gmail.com> wrote:

On 8 March 2016 at 18:57, Joseph Jevnik <joejev@gmail.com> wrote:
If we were to translate this to python syntax we could have something like: [r + 1 for n in range(1, 11) for n * 3 as r if r % 4 == 0]
Quite seriously, I have no idea what that means. If I saw it in a code review, I'd insist that it were rewritten. Paul

On Wed, Mar 9, 2016 at 6:51 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Well, okay then. Spin that off as its own idea: Name bindings in expressions. It's a coherent idea, but it's likely to see some fairly stiff opposition :) (For what it's worth, I am _not_ in opposition to it.) ChrisA

On 2016-03-08 11:54, Chris Angelico wrote:
I agree that that option should be considered. I don't see that much value in the current proposals because they only handle the case where you want to filter the returned values based on those same values, but not the case where you want to filter (before or after the map) based on some criterion that re-uses a computed expression, or where you want to re-use an expression that's not in a comprehension at all. Something like: [some_dict[x.lower()] for x in whatever if x.lower() not in exclusion_list and x.lower() in some_dict] Even though this filter is being done before the map, it's still awkward, because you have to keep repeating the .lower(). If we had assignment-as-expression, you could do [some_dict[lx] for x in whatever if lx not in exclusion_list and lx in some_dict where lx=x.lower()] (or whatever the syntax may be). Of course, that example is rather silly since the version with the "where" is barely shorter :-). But hopefully the point is clear. I often find myself doing awkward things like the first example in numerical situations where I want to collect one thing while filtering and mapping based on something else (e.g., work with numbers while filtering and mapping based on their logs or something). Moreover, a general-purpose assignment-as-expression would also be usable in other contexts besides comprehensions. I run into this same repeated-expression thing when doing computations in pandas using .apply(), where you do .apply(lambda x: ...), and maybe the ... involves a repeated expression involving x. With assignment-as-expression you could .apply(lambda x: ... where temp=f(x)) or what have you I think adding assignment-as-expression would be a significant change to Python. I'm not sure whether it would overall be a good change. Almost all the cases where I really feel like I want this are cases where I'm working with data interactively, and it really is easier to use a lambda than define a separate one-off function. For anything larger-scale, I agree that the "problem" can be solved by taking a deep breath and writing a separate function or generator comprehension to do what you want. As someone mentioned earlier on this thread, the original motivating example: foo = [abs(x) for x in numbers if abs(x) > 5] already has a crystal clear solution: abses = (abs(x) for x in numbers) foo = [x for x in abses if x > 5] There is really nothing wrong with this existing solution unless you're working interactively and want to do things more tersely. But if you're doing that, you're likely to want to do other kinds of terse expressions besides list comprehensions too. So basically, to me the issue is much more general than "I want to filter after mapping in comprehensions". What I want is to write expressions that re-use an intermediate computation. Some of those are in comprehensions, some aren't. Most of the cases where I want to do this are in interactive situations where I'm using a lot of one-off lambdas and temp expressions. If that's something we want to support more, assignment-as-expression could be quite useful, but if we don't, it's a different story. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
participants (12)
-
Allan Clark
-
Brendan Barnwell
-
Chris Angelico
-
Ethan Furman
-
Greg Ewing
-
Joseph Jevnik
-
Koos Zevenhoven
-
Mark Mollineaux
-
MRAB
-
Paul Moore
-
Rob Cliffe
-
Émanuel Barry