[Python-ideas] Propagating StopIteration value

Mon Oct 8 01:36:20 CEST 2012

On Sun, Oct 7, 2012 at 3:43 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> On 7 October 2012 21:19, Guido van Rossum <guido at python.org> wrote:
>> On Sun, Oct 7, 2012 at 12:30 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>>> On 07.10.12 04:45, Guido van Rossum wrote:
>>>>
>>>> But yes, this was all considered and accepted when PEP 380 was debated
>>>> (endlessly :-), and I see no reason to change anything about this.
>>>
>>> The reason is that when someone uses StopIteration.value for some purposes,
>>> he will lose this value if the iterator will be wrapped into itertools.chain
>>> (quite often used technique) or into other standard iterator wrapper.
>>
>> If this is just about iterator.chain() I may see some value in it (but
>> TBH the discussion so far mostly confuses -- please spend some more
>> time coming up with good examples that show actually useful use cases
>> rather than f() and g() or foo() and bar())
>>
>>  OTOH yield from is not primarily for iterators -- it is for
>> coroutines. I suspect most of the itertools functionality just doesn't
>> work with coroutines.
>
> I think what Serhiy is saying is that although pep 380 mainly
> discusses generator functions it has effectively changed the
> definition of what it means to be an iterator for all iterators:
> previously an iterator was just something that yielded values but now
> it also returns a value. Since the meaning of an iterator has changed,
> functions that work with iterators need to be updated.

I think there are different philosophical viewpoints possible on that
issue. My own perspective is that there is no change in the definition
of iterator -- only in the definition of generator. Note that the
*ability* to attach a value to StopIteration is not new at all.

> Before pep 380 filter(lambda x: True, obj) returned an object that was
> the same kind of iterator as obj (it would yield the same values). Now
> the "kind of iterator" that obj is depends not only on the values that
> it yields but also on the value that it returns. Since filter does not
> pass on the same return value, filter(lambda x: True, obj) is no
> longer the same kind of iterator as obj. The same considerations apply
> to many other functions such as map, itertools.groupby,
> itertools.dropwhile.

There are other differences between iterators and generators that are
not preserved by the various forms of "iterator algebra" that can be
applied -- in particular, non-generator iterators don't support
send(). I think it's perfectly valid to view generators as a kind of
special iterators with properties that aren't preserved by applying
generic iterator operations to them (like itertools or filter()).

> Cases like itertools.chain and zip are trickier since they each act on
> multiple underlying iterables. Probably chain should return a tuple of
> the return values from each of its iterables.

That's one possible interpretation, but I doubt it's the most useful one.

> This feature was new in Python 3.3 which was released a week ago

It's been in alpha/beta/candidate for a long time, and PEP 380 was
first discussed in 2009.

> so it is not widely used but it has uses that are not anything to do with
> coroutines.

Yes, as a shortcut for "for x in <iterator>: yield x". Note that the
for-loop ignores the value in the StopIteration -- would you want to
change that too?

> As an example of how you could use it, consider parsing a
> file that can contains #include statements. When the #include
> statement is encountered we need to insert the contents of the
> included file. This is easy to do with a recursive generator. The
> example uses the return value of the generator to keep track of which
> line is being parsed in relation to the flattened output file:
>
> def parse(filename, output_lineno=0):
>     with open(filename) as fin:
>         for input_lineno, line in enumerate(fin):
>             if line.startswith('#include '):
>                 subfilename = line.split()[1]
>                 output_lineno = yield from parse(subfilename, output_lineno)
>             else:
>                 try:
>                     yield parse_line(line)
>                 except ParseLineError:
>                     raise ParseError(filename, input_lineno, output_lineno)
>                 output_lineno += 1
>     return output_lineno

Hm. This example looks constructed to prove your point... It would be
easier to count the output lines in the caller. Or you could use a
class to hold that state. I think it's just a bad habit to start using
the return value for this purpose. Please use the same approach as you
would before 3.3, using "yield from" just as the shortcut I mentione
above.

> When writing code like the above that depends on being able to get the
> value returned from an iterator, it is no longer possible to freely
> mix utilities like filter, map, zip, itertools.chain with the
> iterators returned by parse() as they no longer act as transparent
> wrappers over the underlying iterators (by not propagating the value
> attached to StopIteration).

I see that as one more argument for not using the return value here...

> Hopefully, I've understood Serhiy and the docs correctly (I don't have
> access to Python 3.3 right now to test any of this).

I don't doubt it. But I think you're fighting windmills.

-- 
--Guido van Rossum (python.org/~guido)