[Python-ideas] PEP 479: Change StopIteration handling inside generators

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Fri Nov 21 11:19:40 CET 2014


On 21.11.2014 00:51, Guido van Rossum wrote:
> On Thu, Nov 20, 2014 at 2:39 PM, Wolfgang Maier
> <wolfgang.maier at biologie.uni-freiburg.de
> <mailto:wolfgang.maier at biologie.uni-freiburg.de>>
> wrote:
>
>     [...]
>     Hmm, I'm not convinced by these toy examples, but I did inspect some
>     of my own code for incompatibility with the proposed change. I found
>     that there really is only one recurring pattern I use that I'd have
>     to change and that is how I've implemented several file parsers. I
>     tend to write them like this:
>
>     def parser (file_object):
>          while True:
>              title_line = next(file_object) # will terminate after the
>     last record
>
>              try:
>                  # read and process the rest of the record here
>              except StopIteration:
>                  # this record is incomplete
>                  raise OSError('Invalid file format')
>              yield processed_record
>
> There's probably something important missing from your examples. The
> above while-loop is equivalent to
>
>      for title_line in io_object:
>          ...
>

My reason for not using a for loop here is that I'm trying to read from 
a file where several lines form a record, so I'm reading the title line 
of a record (and if there is no record in the file any more I want the 
parser generator to terminate/return. If a title line is read 
successfully then I'm reading the record's body lines inside a 
try/except, i.e. where it says "# read and process the rest of the 
record here" in my shortened code I am actually calling next several 
times again to retrieve the body lines (and while reading these lines an 
unexpected StopIteration in the IOWrapper is considered a file format 
error).
I realize that I could also use a for loop and still call 
next(file_object) inside it, but I find this a potentially confusing 
pattern that I'm trying to avoid by using the while loop and all 
explicit next(). Compare:

for title_line in file_object:
     record_body = next(file_object)
     # in reality record_body is generated using several next calls
     # depending on the content found in the record body while it's read
     yield (title_line, record_body)

vs

while True:
     title_line = next(file_object)
     body = next(file_object)
     yield (title_line, body)

To me, the for loop version suggests to me that the content of 
file_object is read in line by line by the loop (even though the name 
title_line tries to hint at this being not true). Only when I inspect 
the loop body I see that further items are retrieved with next() and, 
thus, skipped in the for iteration. The while loop, on the other hand, 
makes the number of iterations very clear by showing all of them in the 
loop body.

Would you agree that this is justification enough for while instead of 
for or is it only me who thinks that a for loop makes the code read 
awkward ?


> If you're okay with getting RuntimeError instead of OSError for an
> undesirable StopIteration, you can just drop the except clause altogether.

Right, I could do this if the PEP-described behavior was in effect today.




More information about the Python-ideas mailing list