[Python-ideas] iterable.__unpack__ method

Terry Reedy tjreedy at udel.edu
Sun Feb 24 01:53:30 CET 2013


On 2/23/2013 6:58 AM, Nick Coghlan wrote:
> On Sat, Feb 23, 2013 at 8:18 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> On Sat, Feb 23, 2013 at 8:14 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>> It would be much easier, and have much the same effect, if unpacking simply
>>> requested the minumum number of items and stopped raising a ValueError if
>>> the iterator has more items. No new protocol is needed. Guido rejected this
>>> as silently masking errors.
>>
>> What if it were done explicitly, though?
>>
>> Currently:
>>>>> a,b,c=range(3)
>>>>> a,b,c=range(4)
>> Traceback (most recent call last):
>>    File "<pyshell#55>", line 1, in <module>
>>      a,b,c=range(4)
>> ValueError: too many values to unpack (expected 3)
>>>>> a,b,c,*d=range(6)
>>>>> a,b,c,* =range(4)
>> SyntaxError: invalid syntax
>>
>> Suppose the last notation were interpreted as "request values for a,
>> b, and c, and then ignore the rest".

I think this is fundamentally the right idea. The 'problem' to be 
solved, such as it is, is that the multiple assignment unpacker, after 
requesting the number of values it needs, requests one more that it does 
not need, does not want, and will not use. If it gets it, it says 'too 
bad', undoes any work it has done, and raises an error. The only point 
in doing this is to uncover possible bugs. But some people say they want 
to knowingly provide extra values and have it not be considered a bug.

The solution, if there is to be one, and if that extra behavior is not 
to be completely undone, is to be able to tell the unpacker to skip the 
extra check. I strongly feel that the right place to do that is on the 
target side. This fixes the problem in one place rather than requiring 
that the solution be duplicated everywhere.

> It highly unlikely it will ever be interpreted that way, because it
> contradicts the way the unnamed star is used in function headers (that
> is, to indicate the start of the keyword only arguments *without*
> accepting an arbitrary number of positional arguments). If you want to
> ignore excess values in unpacking, a double-underscore is the best
> currently available option (even though it has the downside of not
> working well with infinite iterators or large sequences):

 > >>> a,b,c,*__ =range(4)

If ,* is not acceptible, how about ,** or ,... or ,None or <take your 
pick>. I rather like 'a, b, c, ... =' as it clearly implies that we are 
picking and naming the first three values from 3 or more; '...' clearly 
cannot be an assignment target.


> However, Alex's original idea of an "unpack protocol" (distinct from,
> but falling back to, the ordinary iterable protocol) definitely has at
> least a few points in its favour.

I strongly disagree as I think the two major points are wrong.

> 1. Iterating over heterogeneous types doesn't really make sense,  but
> unpacking them does. A separate protocol lets a type support unpacking
> assignment without supporting normal iteration.

This, repeated in this

 > - this change increases the complexity of the language, by explicitly
 > separating the concept of heterogeneous unpacking from homogeneous
 > iteration. Those *are* two distinct concepts though, so the increased
 > complexity may be justifiable.

attempt to reintroduce 'heterogeneous type' as a fundamental concept in 
the language, after being removed with the addition of .count and .index 
as tuple methods. Since I consider the pairing 'heterogeneous type', to 
be wrong, especially in Python, I consider this to be a serious 
regression. Let me repeat some of the previous discussion.

In Python 3, every object is an instance of class count. At that level, 
every collection is homogeneous. At other levels, and for particular 
purposes, any plural collection might be considered heterogeneous. That 
is a function of the objects or values in the collection, and not of the 
collection class itself. So I claim that 'heterogeneous' has not much 
meaning as an absolute attribute of a 'type'.

In any assignment, targets (mostly commonly names) are untyped, or have 
type 'object' -- take your pick. So iterable (of objects) is the 
appropriate source.

In any case, 'iteration' and 'unpacking' both mean 'accessing the items 
of a collection one at a time, as individual items rather than as part 
of a collection'. I do not see any important distinction at all and no 
justification for complexifying the language again.

> 3. As Alex noted, if a function that previously returned a 2-tuple
> wants to start returning a 3-tuple, that's currently a backwards
> incompatible change because it will break unpacking assignment. With
> an unpack protocol, such a function can return an object that behaves
> like a 3-tuple most of the time, but also transparently supports
> unpacking assignment to two targets rather than only supporting three.

This seems to claiming that it is sensible to change the return type of 
a function to a somewhat incompatible return type. (I am obviously 
including fixed tuple length in 'return type', as would be explicit in 
other languages). I believe many/most/all design and interface experts 
would disagree and would say it would be better to give the new function 
a new name.

The statement "that's currently a backwards incompatible change because 
it will break unpacking assignment." is a gross understatement that 
glosses over the fact that returning a 3-tuple instead of a 2-tuple will 
breaks lots of things. Just two examples:

def foo(): return 1,2
def bar(): return tuple('abc')
foobar = foo() + bar()
oof = reversed(foo())

Changing foo to return 1,2,'x' will mess up both foobar and oof for most 
uses.

--
 > 2. The protocol could potentially be designed to allow an *iterable*
 > to be assigned to the star target rather than requiring it to be
 > unpacked into a tuple. This could be used not only to make unpacking
 > assignment safe to use with infinite iterators, but also to make it
 > cheaper with large sequences (through appropriate use of
 > itertools.islice in a custom container's __unpack__ implementation)
 > and with virtual sequences like range() objects.

The problem with *x, that it is inefficient, not applicable to infinite 
iterators, and that assigning the iterable directly to x when *x is in 
final position is more likely what one wants anyway, is a different 
issue from the unwanted bug check and exception. Again, I think the 
solution is an explicit notation on the target side. Perhaps '**x' or 
'x*' or something based on _. If you do not like any of those, suggest 
another.

-- 
Terry Jan Reedy




More information about the Python-ideas mailing list