[Python-ideas] iterable.__unpack__ method
Terry Reedy
tjreedy at udel.edu
Sun Feb 24 01:53:30 CET 2013
On 2/23/2013 6:58 AM, Nick Coghlan wrote:
> On Sat, Feb 23, 2013 at 8:18 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> On Sat, Feb 23, 2013 at 8:14 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>> It would be much easier, and have much the same effect, if unpacking simply
>>> requested the minumum number of items and stopped raising a ValueError if
>>> the iterator has more items. No new protocol is needed. Guido rejected this
>>> as silently masking errors.
>>
>> What if it were done explicitly, though?
>>
>> Currently:
>>>>> a,b,c=range(3)
>>>>> a,b,c=range(4)
>> Traceback (most recent call last):
>> File "<pyshell#55>", line 1, in <module>
>> a,b,c=range(4)
>> ValueError: too many values to unpack (expected 3)
>>>>> a,b,c,*d=range(6)
>>>>> a,b,c,* =range(4)
>> SyntaxError: invalid syntax
>>
>> Suppose the last notation were interpreted as "request values for a,
>> b, and c, and then ignore the rest".
I think this is fundamentally the right idea. The 'problem' to be
solved, such as it is, is that the multiple assignment unpacker, after
requesting the number of values it needs, requests one more that it does
not need, does not want, and will not use. If it gets it, it says 'too
bad', undoes any work it has done, and raises an error. The only point
in doing this is to uncover possible bugs. But some people say they want
to knowingly provide extra values and have it not be considered a bug.
The solution, if there is to be one, and if that extra behavior is not
to be completely undone, is to be able to tell the unpacker to skip the
extra check. I strongly feel that the right place to do that is on the
target side. This fixes the problem in one place rather than requiring
that the solution be duplicated everywhere.
> It highly unlikely it will ever be interpreted that way, because it
> contradicts the way the unnamed star is used in function headers (that
> is, to indicate the start of the keyword only arguments *without*
> accepting an arbitrary number of positional arguments). If you want to
> ignore excess values in unpacking, a double-underscore is the best
> currently available option (even though it has the downside of not
> working well with infinite iterators or large sequences):
> >>> a,b,c,*__ =range(4)
If ,* is not acceptible, how about ,** or ,... or ,None or <take your
pick>. I rather like 'a, b, c, ... =' as it clearly implies that we are
picking and naming the first three values from 3 or more; '...' clearly
cannot be an assignment target.
> However, Alex's original idea of an "unpack protocol" (distinct from,
> but falling back to, the ordinary iterable protocol) definitely has at
> least a few points in its favour.
I strongly disagree as I think the two major points are wrong.
> 1. Iterating over heterogeneous types doesn't really make sense, but
> unpacking them does. A separate protocol lets a type support unpacking
> assignment without supporting normal iteration.
This, repeated in this
> - this change increases the complexity of the language, by explicitly
> separating the concept of heterogeneous unpacking from homogeneous
> iteration. Those *are* two distinct concepts though, so the increased
> complexity may be justifiable.
attempt to reintroduce 'heterogeneous type' as a fundamental concept in
the language, after being removed with the addition of .count and .index
as tuple methods. Since I consider the pairing 'heterogeneous type', to
be wrong, especially in Python, I consider this to be a serious
regression. Let me repeat some of the previous discussion.
In Python 3, every object is an instance of class count. At that level,
every collection is homogeneous. At other levels, and for particular
purposes, any plural collection might be considered heterogeneous. That
is a function of the objects or values in the collection, and not of the
collection class itself. So I claim that 'heterogeneous' has not much
meaning as an absolute attribute of a 'type'.
In any assignment, targets (mostly commonly names) are untyped, or have
type 'object' -- take your pick. So iterable (of objects) is the
appropriate source.
In any case, 'iteration' and 'unpacking' both mean 'accessing the items
of a collection one at a time, as individual items rather than as part
of a collection'. I do not see any important distinction at all and no
justification for complexifying the language again.
> 3. As Alex noted, if a function that previously returned a 2-tuple
> wants to start returning a 3-tuple, that's currently a backwards
> incompatible change because it will break unpacking assignment. With
> an unpack protocol, such a function can return an object that behaves
> like a 3-tuple most of the time, but also transparently supports
> unpacking assignment to two targets rather than only supporting three.
This seems to claiming that it is sensible to change the return type of
a function to a somewhat incompatible return type. (I am obviously
including fixed tuple length in 'return type', as would be explicit in
other languages). I believe many/most/all design and interface experts
would disagree and would say it would be better to give the new function
a new name.
The statement "that's currently a backwards incompatible change because
it will break unpacking assignment." is a gross understatement that
glosses over the fact that returning a 3-tuple instead of a 2-tuple will
breaks lots of things. Just two examples:
def foo(): return 1,2
def bar(): return tuple('abc')
foobar = foo() + bar()
oof = reversed(foo())
Changing foo to return 1,2,'x' will mess up both foobar and oof for most
uses.
--
> 2. The protocol could potentially be designed to allow an *iterable*
> to be assigned to the star target rather than requiring it to be
> unpacked into a tuple. This could be used not only to make unpacking
> assignment safe to use with infinite iterators, but also to make it
> cheaper with large sequences (through appropriate use of
> itertools.islice in a custom container's __unpack__ implementation)
> and with virtual sequences like range() objects.
The problem with *x, that it is inefficient, not applicable to infinite
iterators, and that assigning the iterable directly to x when *x is in
final position is more likely what one wants anyway, is a different
issue from the unwanted bug check and exception. Again, I think the
solution is an explicit notation on the target side. Perhaps '**x' or
'x*' or something based on _. If you do not like any of those, suggest
another.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list