[Python-ideas] PEP 479: Change StopIteration handling inside generators
Wolfgang Maier
wolfgang.maier at biologie.uni-freiburg.de
Fri Nov 21 11:19:40 CET 2014
On 21.11.2014 00:51, Guido van Rossum wrote:
> On Thu, Nov 20, 2014 at 2:39 PM, Wolfgang Maier
> <wolfgang.maier at biologie.uni-freiburg.de
> <mailto:wolfgang.maier at biologie.uni-freiburg.de>>
> wrote:
>
> [...]
> Hmm, I'm not convinced by these toy examples, but I did inspect some
> of my own code for incompatibility with the proposed change. I found
> that there really is only one recurring pattern I use that I'd have
> to change and that is how I've implemented several file parsers. I
> tend to write them like this:
>
> def parser (file_object):
> while True:
> title_line = next(file_object) # will terminate after the
> last record
>
> try:
> # read and process the rest of the record here
> except StopIteration:
> # this record is incomplete
> raise OSError('Invalid file format')
> yield processed_record
>
> There's probably something important missing from your examples. The
> above while-loop is equivalent to
>
> for title_line in io_object:
> ...
>
My reason for not using a for loop here is that I'm trying to read from
a file where several lines form a record, so I'm reading the title line
of a record (and if there is no record in the file any more I want the
parser generator to terminate/return. If a title line is read
successfully then I'm reading the record's body lines inside a
try/except, i.e. where it says "# read and process the rest of the
record here" in my shortened code I am actually calling next several
times again to retrieve the body lines (and while reading these lines an
unexpected StopIteration in the IOWrapper is considered a file format
error).
I realize that I could also use a for loop and still call
next(file_object) inside it, but I find this a potentially confusing
pattern that I'm trying to avoid by using the while loop and all
explicit next(). Compare:
for title_line in file_object:
record_body = next(file_object)
# in reality record_body is generated using several next calls
# depending on the content found in the record body while it's read
yield (title_line, record_body)
vs
while True:
title_line = next(file_object)
body = next(file_object)
yield (title_line, body)
To me, the for loop version suggests to me that the content of
file_object is read in line by line by the loop (even though the name
title_line tries to hint at this being not true). Only when I inspect
the loop body I see that further items are retrieved with next() and,
thus, skipped in the for iteration. The while loop, on the other hand,
makes the number of iterations very clear by showing all of them in the
loop body.
Would you agree that this is justification enough for while instead of
for or is it only me who thinks that a for loop makes the code read
awkward ?
> If you're okay with getting RuntimeError instead of OSError for an
> undesirable StopIteration, you can just drop the except clause altogether.
Right, I could do this if the PEP-described behavior was in effect today.
More information about the Python-ideas
mailing list