[Python-ideas] Filtered "for" loop with list-comprehension-like syntax
Dan Baker
dbaker3448 at gmail.com
Sat May 21 02:57:23 CEST 2011
One common pattern I run across when parsing plain text data files is
that I want to skip over blank lines when processing. If I wanted to
build a list of all non-blank lines in the file, I could simply do:
lines = [line for line in input_file if line.strip()]
But as a loop, it almost invariably gets written as:
for line in input_file:
if not line.strip():
continue
# do the real processing
It seems odd that "for x in y if z" is allowed in comprehensions but
not in a regular for loop. Why not let
for x in y if z:
do_stuff(x)
be a shorthand for
for x in y:
if not z:
continue
do_stuff(x)
Similarly, I occasionally have multiple sections that need to be
handled differently. One way to write this is:
for line in input_file:
if is_section_delimiter(line):
break
do_stuff_1(line)
for line in input_file: # this picks up where the last one left off
if is_section_delimiter(line):
break
do_stuff_2(line)
etc.
(This is a little bit of a weird idiom with files since repeated
iteration over them remembers where it left off, at least in 2.7.)
It would be nice to have this shorthand for it:
for line in input_file while not is_section_delimiter(line):
do_stuff_1(line)
for line in input_file while not is_section_delimiter(line):
do_stuff_2(line)
etc.
This makes it more immediately clear (to me, at least) that it stops
at the end of the section. This could also be added to comprehensions;
it's somewhat tricky to emulate in comprehensions now. I think the
easiest way to do the equivalent of [f(x) for x in y while z] with a
comprehension is
a = [(f(x) if z else None) for x in y]
try:
idx = a.index(None)
except ValueError: # no None found
pass
else: # truncate before first None
a = a[:idx]
but even that fails if None is a potentially valid result of f(x) (or
if you forget to use the try/except block and z was always True), and
it processes the entire list even though it may throw out a sizable
chunk of it immediately after. The only totally safe way I can think
of to do it now is by unpacking it into a loop:
a = []
for x in y:
if not z:
break
a.append(f(x))
I think adding these would make such idioms a little more readable,
but it might not be enough of a gain to justify a syntax addition.
Thoughts?
Dan Baker
More information about the Python-ideas
mailing list