More power in list comprehensions with the 'as' keyword

Hello There's a pattern I am doing all the time: filtering out some elements of a list, and cleaning them in the same move. For example, if I have a multi line text, where I want to: - keep non empty lines - clean non empty lines I am doing: >>> text = """ ... this is a multi-line text\t ... ... \t\twith ... ... muliple lines.""" >>> [l.strip() for l in text.split('\n') if l.strip() != ''] ['this is a multi-line text', 'with', 'muliple lines.'] It is not optimal, because I call strip() twice. I could use ifilter then imap or even use a real loop, but I want my simple, concise, list comprehension ! And I couldn't find a simple way to express it. The pattern can be generically resumed like this : [transform(e) for e in seq if some_test(transform(e))] So what about using the 'as' keyword to extend lists comprehensions, and to avoid calling transform() twice ? Could be: [transform(e) as transformed for e in seq if some_test(transformed)] In my use case I would simply have to write;: [l.strip() as stripped for l in text.split('\n') if stripped != ''] Which seems to me clear and concise. Regards, Tarek -- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/

+1 from me, I find myself doing that all the time as well. On Wed, Aug 27, 2008 at 7:51 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
-- Imri Goldberg -------------------------------------- www.algorithm.co.il/blogs/ www.imri.co.il -------------------------------------- -- insert signature here ----

On Wed, Aug 27, 2008 at 7:30 PM, Bruce Leban <bruce@leapyear.org> wrote:
[t for t in [t.strip() for t in text.split('\n')] if t != '']
Yes, and other ways could be found with imap/map, but I would find this simpler and more natural : [t.strip() as s for t in text.split('\n') if s != ''] Regards Tarek

-1 on the feature, I use the compound expression as below, only have the internal item be a generator expression to reduce peak memory usage. Also, the != condition is unnecessary. [t for t in (t.strip() for t in text.split('\n')) if t] Overloading 'as', 'with', etc., when there are simple expressions that already do the *exact* same thing is silly. Never mind that not everything needs to be done in a 1-liner or list comprehension. - Josiah On Wed, Aug 27, 2008 at 10:30 AM, Bruce Leban <bruce@leapyear.org> wrote:

Josiah Carlson wrote:
If split was a bit smarter... text.split(pattern=(' *\n+ *')) ;-) It can be done with the re module, but I need to look that up each time I use it since I don't use it enough to remember all it's subtleties. Ron

The split/strip stuff is simple on purpose; it's fast. Tossing in regular expressions handling is a great way to slow down the general case, never mind if you actually want to split on the passed literal string. - Josiah On Thu, Aug 28, 2008 at 9:14 AM, Ron Adam <rrr@ronadam.com> wrote:

On Thu, Aug 28, 2008 at 1:41 PM, Ron Adam <rrr@ronadam.com> wrote:
We're well off the original subject now, but what you propose can today be written re.split(' *\n+ *', text) which I'd argue is just as easy to remember as your proposed new method, and in fact reads better, as it's much easier to guess that regex matching is going on. Greg F

-1 on the first post.
[l.strip() as stripped for l in text.split('\n') if stripped != '']
This is equivalent, with respect to semantics as well as efficiency, to: [s for s in (l.strip() for l in text.split('\n')) if s != ''] And, even easier to read: stripped = (l.strip() for l in text.split('\n')) non_null_lines = [s for s in stripped if s] # let's use the implicit truth value too It seems to me that there's too much being asked for here when the tools are already all available. Generator expressions/list comprehensions seem to be getting a lot of attention from python-ideas lately, when it's really making the code _harder_ to read, and not even easier to write, because of all the extra thinking required. In fact, most of these things are better done with straight-out looping and yielding: for line in text.split('\n'): stripped = l.strip() if stripped: yield stripped Now _that_ I can read! -- Cheers, Leif

On Wed, Aug 27, 2008 at 12:51 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
-1; not general enough to justify the extra overhead (both in human mental effort and compiler changes), given that the current alternatives (esp. genexps) are already quite readable and more flexible. George

George Sakkis wrote:
-1; It's not clear to me that transform(e) is what's going into the list. And what if I want to test on two different values (or two different transforms on the same value), which one is put into the comprehension? Later, Blake.

On Wed, Aug 27, 2008 at 9:51 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
Not worth it since while you might want your "simple. concise list comprehension", listcomps are not meant to generalize for all possible situations in 'for' loops that can lead to a new list. Is it really that much harder to write:: list_ = [] for line in text.splitlines(): line = line.strip(): if line: list_.append(line) ? And if you really want your one-liner, you can avoid your duplicate call to str.strip() in two different ways:: [line.strip() for line in text.splitlines() if line and not line.isspace()] or:: filter(None, (line.strip() for line in text.splitlines())) I don't see enough benefit to clutter listcomps with more syntax and to provide another alternative to this pattern to add to the multiple ones that already exist. -Brett

2008/8/27 Tarek Ziadé <ziade.tarek@gmail.com>:
[l.strip() as stripped for l in text.split('\n') if stripped != '']
Two linkes are better than one: stripped = [l.strip() for l in text.split('\n')] [l for l in stripped if l != ''] You are performing two different operations on the list after all. And more readable. -- mvh Björn

Tarek Ziadé wrote:
-1 For me it is backward and confusing, whereas lines = [] for l in test: l = l.strip() if l: lines.append(l) is clear as could be. To me, the drive to replace all for loops with list comps is mis-directed. There is no end to the clauses people could propose to add: while, when, whatever, until, unless. mapped_to, transformed_by, and so on. The result would soon by something quite different from Python as we know it. tjr

On Wed, Aug 27, 2008 at 4:17 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Indeed, probably something approaching Common Lisp's overcomplicated "loop" macro: http://www.unixuser.org/~euske/doc/cl/loop.html - Chris ======== Follow the path of the Iguana... Rebertia: http://rebertia.com Blog: http://blog.rebertia.com

Chris Rebert wrote:
Doesn't have to be that bad. I'd personally like a way to make a loop expression that returns a list, with two operators that stuff new things on and update old ones. That's most likely a reflection of my expression-based-but-not-purely style, though. One thing I've noticed in this discussion is that people seem to want the "as" keyword to operate as a general expression-assignment operator. Is that worth adding to the language? There would be no possibility of getting it mixed up with "==", and the word as an infix operator is sufficiently ugly that people would prefer "=" in general. Neil

A solution could be this: [stripped for l in text.split('\n') with l.strip() as stripped if stripped != ''] so that you can keep both values (l and l.strip()) too. Cheers, Cesare On 27 agu 2008 at 18:51:41, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
-- Dott. Cesare Di Mauro A-Tono S.r.l. T.: (+39)095-7365314 Information in this email is confidential and may be privileged. It is intended for the addresses only. If you have received it in error, please notify the sender immediately and delete it from your system. You should not otherwise copy it, retransmit it or use or disclose its content to anyone. Thank you for your co-operation.

2008/8/28 Cesare Di Mauro <cesare.dimauro@a-tono.com>:
In Haskell this would be (I translate only the list comprehension structure, leaving expressions in Python syntax): [stripped | l <- text.split('\n'), let stripped = l.strip(), stripped != ''] Python borrowed 2 out of 3 kinds of list comprehension constructs. -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

Josiah Carlson schrieb:
That's just as bad as the original.
I know. I wasn't serious (I should have made that clear).
Please stop offering new syntax.
Do you mean on this particular issue or ever again? So you don't want any new syntax in python ever again??

On Thu, Aug 28, 2008 at 1:16 PM, Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
This issue. It's just like the "group by" operator that was proposed months ago and rejected. Keep it simple. Additional syntax typically burdens the user with additional mental overhead. Decorators are *still* causing significant consternation among relatively new users who haven't seen them before, as have variable leakage out of for loops and list comprehensions. - Josiah

Mathias Panzenböck wrote:
To parallel the Haskell-ish example, this should be [stripped for l in text.split('/n') stripped as l.strip() if stripped != ''] but the clause has 'as' in the middle instead of at the beginning, making it hard to parse. Haskell used commas [stripped for l in text.split('/n'), stripped as l.strip(), if stripped != ''] but I think this would conflict with Python's other comma usage. Most feasible, I think, would be [stripped for l in text.split('/n') with stripped as l.strip() if stripped != ''] This corresponds to the multi-statement for loop version _=[] for l in text.split('\n'): stripped = l.strip() if stripped != '': _.append(stripped) with 'stripped = l.strip()' replaced by 'with stripped as l.strip()'. If other use cases were presented that could not be more easily written otherwise, as with the re.split() version, I might at least be neutral on this. Terry Jan Reedy

On 28 agu 2008 at 22:37:20, Terry Reedy <tjreedy@udel.edu> wrote:
We already a "with Expression as Identifier" syntax that is well known and used in Python: why use something different? [stripped for l in text.split('\n') with l.strip() as stripped if stripped != ''] will be a much better syntax to parse and acquire for a typical pythonista. ;) Cheers, Cesare

On Thu, Aug 28, 2008 at 11:33 PM, Cesare Di Mauro <cesare.dimauro@a-tono.com> wrote:
Just because it exists, doesn't mean that it's "well known and used". Also, don't conflate the need to handle context management (locking, closing files, etc.) with the false perceived need to add temporary assignments in list comprehensions and generator expressions. - Josiah

+1 from me, I find myself doing that all the time as well. On Wed, Aug 27, 2008 at 7:51 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
-- Imri Goldberg -------------------------------------- www.algorithm.co.il/blogs/ www.imri.co.il -------------------------------------- -- insert signature here ----

On Wed, Aug 27, 2008 at 7:30 PM, Bruce Leban <bruce@leapyear.org> wrote:
[t for t in [t.strip() for t in text.split('\n')] if t != '']
Yes, and other ways could be found with imap/map, but I would find this simpler and more natural : [t.strip() as s for t in text.split('\n') if s != ''] Regards Tarek

-1 on the feature, I use the compound expression as below, only have the internal item be a generator expression to reduce peak memory usage. Also, the != condition is unnecessary. [t for t in (t.strip() for t in text.split('\n')) if t] Overloading 'as', 'with', etc., when there are simple expressions that already do the *exact* same thing is silly. Never mind that not everything needs to be done in a 1-liner or list comprehension. - Josiah On Wed, Aug 27, 2008 at 10:30 AM, Bruce Leban <bruce@leapyear.org> wrote:

Josiah Carlson wrote:
If split was a bit smarter... text.split(pattern=(' *\n+ *')) ;-) It can be done with the re module, but I need to look that up each time I use it since I don't use it enough to remember all it's subtleties. Ron

The split/strip stuff is simple on purpose; it's fast. Tossing in regular expressions handling is a great way to slow down the general case, never mind if you actually want to split on the passed literal string. - Josiah On Thu, Aug 28, 2008 at 9:14 AM, Ron Adam <rrr@ronadam.com> wrote:

On Thu, Aug 28, 2008 at 1:41 PM, Ron Adam <rrr@ronadam.com> wrote:
We're well off the original subject now, but what you propose can today be written re.split(' *\n+ *', text) which I'd argue is just as easy to remember as your proposed new method, and in fact reads better, as it's much easier to guess that regex matching is going on. Greg F

-1 on the first post.
[l.strip() as stripped for l in text.split('\n') if stripped != '']
This is equivalent, with respect to semantics as well as efficiency, to: [s for s in (l.strip() for l in text.split('\n')) if s != ''] And, even easier to read: stripped = (l.strip() for l in text.split('\n')) non_null_lines = [s for s in stripped if s] # let's use the implicit truth value too It seems to me that there's too much being asked for here when the tools are already all available. Generator expressions/list comprehensions seem to be getting a lot of attention from python-ideas lately, when it's really making the code _harder_ to read, and not even easier to write, because of all the extra thinking required. In fact, most of these things are better done with straight-out looping and yielding: for line in text.split('\n'): stripped = l.strip() if stripped: yield stripped Now _that_ I can read! -- Cheers, Leif

On Wed, Aug 27, 2008 at 12:51 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
-1; not general enough to justify the extra overhead (both in human mental effort and compiler changes), given that the current alternatives (esp. genexps) are already quite readable and more flexible. George

George Sakkis wrote:
-1; It's not clear to me that transform(e) is what's going into the list. And what if I want to test on two different values (or two different transforms on the same value), which one is put into the comprehension? Later, Blake.

On Wed, Aug 27, 2008 at 9:51 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
Not worth it since while you might want your "simple. concise list comprehension", listcomps are not meant to generalize for all possible situations in 'for' loops that can lead to a new list. Is it really that much harder to write:: list_ = [] for line in text.splitlines(): line = line.strip(): if line: list_.append(line) ? And if you really want your one-liner, you can avoid your duplicate call to str.strip() in two different ways:: [line.strip() for line in text.splitlines() if line and not line.isspace()] or:: filter(None, (line.strip() for line in text.splitlines())) I don't see enough benefit to clutter listcomps with more syntax and to provide another alternative to this pattern to add to the multiple ones that already exist. -Brett

2008/8/27 Tarek Ziadé <ziade.tarek@gmail.com>:
[l.strip() as stripped for l in text.split('\n') if stripped != '']
Two linkes are better than one: stripped = [l.strip() for l in text.split('\n')] [l for l in stripped if l != ''] You are performing two different operations on the list after all. And more readable. -- mvh Björn

Tarek Ziadé wrote:
-1 For me it is backward and confusing, whereas lines = [] for l in test: l = l.strip() if l: lines.append(l) is clear as could be. To me, the drive to replace all for loops with list comps is mis-directed. There is no end to the clauses people could propose to add: while, when, whatever, until, unless. mapped_to, transformed_by, and so on. The result would soon by something quite different from Python as we know it. tjr

On Wed, Aug 27, 2008 at 4:17 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Indeed, probably something approaching Common Lisp's overcomplicated "loop" macro: http://www.unixuser.org/~euske/doc/cl/loop.html - Chris ======== Follow the path of the Iguana... Rebertia: http://rebertia.com Blog: http://blog.rebertia.com

Chris Rebert wrote:
Doesn't have to be that bad. I'd personally like a way to make a loop expression that returns a list, with two operators that stuff new things on and update old ones. That's most likely a reflection of my expression-based-but-not-purely style, though. One thing I've noticed in this discussion is that people seem to want the "as" keyword to operate as a general expression-assignment operator. Is that worth adding to the language? There would be no possibility of getting it mixed up with "==", and the word as an infix operator is sufficiently ugly that people would prefer "=" in general. Neil

A solution could be this: [stripped for l in text.split('\n') with l.strip() as stripped if stripped != ''] so that you can keep both values (l and l.strip()) too. Cheers, Cesare On 27 agu 2008 at 18:51:41, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
-- Dott. Cesare Di Mauro A-Tono S.r.l. T.: (+39)095-7365314 Information in this email is confidential and may be privileged. It is intended for the addresses only. If you have received it in error, please notify the sender immediately and delete it from your system. You should not otherwise copy it, retransmit it or use or disclose its content to anyone. Thank you for your co-operation.

2008/8/28 Cesare Di Mauro <cesare.dimauro@a-tono.com>:
In Haskell this would be (I translate only the list comprehension structure, leaving expressions in Python syntax): [stripped | l <- text.split('\n'), let stripped = l.strip(), stripped != ''] Python borrowed 2 out of 3 kinds of list comprehension constructs. -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

Josiah Carlson schrieb:
That's just as bad as the original.
I know. I wasn't serious (I should have made that clear).
Please stop offering new syntax.
Do you mean on this particular issue or ever again? So you don't want any new syntax in python ever again??

On Thu, Aug 28, 2008 at 1:16 PM, Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
This issue. It's just like the "group by" operator that was proposed months ago and rejected. Keep it simple. Additional syntax typically burdens the user with additional mental overhead. Decorators are *still* causing significant consternation among relatively new users who haven't seen them before, as have variable leakage out of for loops and list comprehensions. - Josiah

Mathias Panzenböck wrote:
To parallel the Haskell-ish example, this should be [stripped for l in text.split('/n') stripped as l.strip() if stripped != ''] but the clause has 'as' in the middle instead of at the beginning, making it hard to parse. Haskell used commas [stripped for l in text.split('/n'), stripped as l.strip(), if stripped != ''] but I think this would conflict with Python's other comma usage. Most feasible, I think, would be [stripped for l in text.split('/n') with stripped as l.strip() if stripped != ''] This corresponds to the multi-statement for loop version _=[] for l in text.split('\n'): stripped = l.strip() if stripped != '': _.append(stripped) with 'stripped = l.strip()' replaced by 'with stripped as l.strip()'. If other use cases were presented that could not be more easily written otherwise, as with the re.split() version, I might at least be neutral on this. Terry Jan Reedy

On 28 agu 2008 at 22:37:20, Terry Reedy <tjreedy@udel.edu> wrote:
We already a "with Expression as Identifier" syntax that is well known and used in Python: why use something different? [stripped for l in text.split('\n') with l.strip() as stripped if stripped != ''] will be a much better syntax to parse and acquire for a typical pythonista. ;) Cheers, Cesare

On Thu, Aug 28, 2008 at 11:33 PM, Cesare Di Mauro <cesare.dimauro@a-tono.com> wrote:
Just because it exists, doesn't mean that it's "well known and used". Also, don't conflate the need to handle context management (locking, closing files, etc.) with the false perceived need to add temporary assignments in list comprehensions and generator expressions. - Josiah
participants (20)
-
Andrew Akira Toulouse
-
Bill Janssen
-
BJörn Lindqvist
-
Blake Winton
-
Brett Cannon
-
Bruce Leban
-
Cesare Di Mauro
-
Chris Rebert
-
Georg Brandl
-
George Sakkis
-
Greg Falcon
-
Imri Goldberg
-
Josiah Carlson
-
Leif Walsh
-
Marcin 'Qrczak' Kowalczyk
-
Mathias Panzenböck
-
Neil Toronto
-
Ron Adam
-
Tarek Ziadé
-
Terry Reedy