[Python-ideas] Generator unpacking

Andrew Barnert abarnert at yahoo.com
Mon Feb 15 17:27:30 EST 2016


On Feb 15, 2016, at 01:21, Brendan Barnwell <brenbarn at brenbarn.net> wrote:
> 
>> On 2016-02-15 00:39, Andrew Barnert via Python-ideas wrote:
>>> On Feb 15, 2016, at 00:12, Paul Moore <p.f.moore at gmail.com> wrote:
>>> 
>>> IMO, the other downside is that the semantic difference between
>>> 
>>> a, b, ... = value
>>> 
>>> and
>>> 
>>> a, b, *_ = value
>>> 
>>> is very subtle, and (even worse) only significant if value is an
>>> iterable as opposed to a concrete container such as a list.
>> 
>> You mean iterator, not iterable. And being "concrete" has nothing to
>> do with it--a dict view, a memoryview, a NumPy slice, etc. aren't
>> iterators any more than a list is.
>> 
>> This is exactly why I think we need an official term like
>> "collection" for "iterables that are not iterators" (or "iterables
>> whose __iter__ doesn't return self" or similar). People struggling to
>> come up with terms end up confusing themselves--not just about
>> wording, but about actual concepts. As proven below:
> 
>    I still don't think that is at all what we need, as this example shows.  Whether the value is an iterable or an iterator is not relevant.

You're making the same mistake that this idea was meant to cure: iterators _are_ iterables. Which means this is vacuously true. But whether the value is a collection/reiterable/whatever or an iterator is exactly what's relevant.

> Whether the iterable's iterator is self is not relevant.  What is relevant is the difference in *behavior* --- namely, whether you can rewind, restart, or otherwise retrieve already-obtained values from the object, or whether advancing it is an irreversible operation and there is no way to get old values except by storing them yourself.

An iterator returns self from iter, which means advancing the iterator is consuming self, so there is no way to get the old values again.

A non-iterator iterable may return a different object from iter, which means advancing the iterator doesn't have to consume self; you can get the old values again just by calling iter to get a new iterator at the start.

A collection, or reiterable, or whatever, could be defined as an iterable that _doesn't_ return self. Or, nearly-equivalently, it could be defined as an iterable that returns a new iterator over the same values (unless mutated in between iter calls), borrowing the distinction that's already in the docs for defining dict and dict view semantics.

Of course any useful definition would leave pathological types that are neither iterator nor collection (e.g., an object that returns a new iterator each time, but those iterators destructively modify self), or maybe where they do qualify but misleadingly so (I can't think of any examples). And there may also be cases where it isn't clear (e.g., is a collection of shared memory not a collection because some other process can change its values, or does that count as "unless mutated"? that probably depends on how and why your app is using that shared memory). But that isn't a problem; we're not trying to come up with a definition that could be used to write type-theoretic behavior proofs for Python semantics, but something that's useful in practice for discussing almost all real Python programs with other humans.

>  In this example, being able to say that value was or was not an iterator or an iterable would in no way help to clarify how the code would behave differently.  Saying that it is an iterable or an iterator is just saying that it has or doesn't have .next() and/or .__iter__() methods that follow certain very broad protocols, but what matters for understanding examples like this is what those methods actually DO.

The iterator protocol defines what those methods do. The __iter__ method returns self, and the __next__ method advances self to return the next value. The fact that these semantics aren't checked (even by the ABC or by mypy) doesn't mean they don't exist or are meaningless.

The iterable protocol, on the other hand, just tells you that __iter__ returns an iterator. Because iterators and collections are both iterable, just knowing that something is an iterable doesn't tell you whether it's reusable. But knowing that something is a collection (assuming we have a well-defined term "collection") would. Which is exactly the point of the proposal.



More information about the Python-ideas mailing list