[Python-ideas] Filtered "for" loop with list-comprehension-like syntax

Dan Baker dbaker3448 at gmail.com
Sat May 21 02:57:23 CEST 2011


One common pattern I run across when parsing plain text data files is
that I want to skip over blank lines when processing. If I wanted to
build a list of all non-blank lines in the file, I could simply do:

lines = [line for line in input_file if line.strip()]

But as a loop, it almost invariably gets written as:

for line in input_file:
   if not line.strip():
      continue
   # do the real processing

It seems odd that "for x in y if z" is allowed in comprehensions but
not in a regular for loop. Why not let

for x in y if z:
   do_stuff(x)

be a shorthand for

for x in y:
   if not z:
      continue
   do_stuff(x)

Similarly, I occasionally have multiple sections that need to be
handled differently. One way to write this is:
for line in input_file:
   if is_section_delimiter(line):
      break
   do_stuff_1(line)
for line in input_file: # this picks up where the last one left off
   if is_section_delimiter(line):
       break
   do_stuff_2(line)
etc.
(This is a little bit of a weird idiom with files since repeated
iteration over them remembers where it left off, at least in 2.7.)

It would be nice to have this shorthand for it:
for line in input_file while not is_section_delimiter(line):
   do_stuff_1(line)
for line in input_file while not is_section_delimiter(line):
   do_stuff_2(line)
etc.

This makes it more immediately clear (to me, at least) that it stops
at the end of the section. This could also be added to comprehensions;
it's somewhat tricky to emulate in comprehensions now. I think the
easiest way to do the equivalent of [f(x) for x in y while z] with a
comprehension is
a = [(f(x) if z else None) for x in y]
try:
   idx = a.index(None)
except ValueError: # no None found
   pass
else: # truncate before first None
   a = a[:idx]
but even that fails if None is a potentially valid result of f(x) (or
if you forget to use the try/except block and z was always True), and
it processes the entire list even though it may throw out a sizable
chunk of it immediately after. The only totally safe way I can think
of to do it now is by unpacking it into a loop:
a = []
for x in y:
   if not z:
      break
   a.append(f(x))

I think adding these would make such idioms a little more readable,
but it might not be enough of a gain to justify a syntax addition.
Thoughts?

Dan Baker



More information about the Python-ideas mailing list