[Python-ideas] How assignment should work with generators?
steve at pearwood.info
Mon Nov 27 08:55:05 EST 2017
On Mon, Nov 27, 2017 at 12:17:31PM +0300, Kirill Balunov wrote:
> Currently during assignment, when target list is a comma-separated list of
> targets (*without "starred" target*) the rule is that the object (rhs) must
> be an iterable with the same number of items as there are targets in the
> target list. That is, no check is performed on the number of targets
> present, and if something goes wrong the ValueError is raised.
That's a misleading description: ValueError is raised when the number of
targets is different from the number of items. I consider that to be
performing a check on the number of targets.
> To show this on simple example:
> >>> from itertools import count, islice
> >>> it = count()
> >>> x, y = it
> >>> it
For everyone else who was confused by this, as I was, that's not
actually a copy and paste from the REPL. There should be a ValueError
raised after the x, y assignment. As given, it is confusing because it
looks like the assignment succeeded, when in fact it didn't.
> Here the count was advanced two times but assignment did not happen.
Correct, because there was an exception raised.
> I found that in some cases it is too much restricting that rhs must
> have the same number of items as targets. It is proposed that if the
> rhs is a generator or an iterator (better some object that yields
> values on demand), the assignmenet should be lazy and dependent on the
> number of targets.
I think that's problematic. How do you know what objects that yields
values on demand? Not all lazy iterables are iterators: there are also
lazy sequences like range.
But even if we decide on a simple rule like "iterator unpacking depends
on the number of targets, all other iterables don't", I think that will
be a bug magnet. It will mean that you can't rely on this special
behaviour unless you surround each call with a type check:
if isinstance(it, collections.abc.Iterator):
# special case for iterators
x, y = it
# sequences keep the old behaviour
x, y = it[:2]
> I find this feature to be very convenient for
> interactive use,
There are many things which would be convenient for interactive use that
are a bad idea outside of the interactive environment. Errors which pass
silently are one of them. Unpacking a sequence of 3 items into 2
assignment targets should be an error, unless you explicitly limit it to
only two items.
Sure, sometimes it would be convenient to unpack just two items out of
some arbitrarily large iterator just be writing `x, y = it`. But
other times that would be an error, even in the interactive interpreter.
I don't want Python trying to *guess* whether I want to unpack the
entire iteratable or just two items. Whatever tiny convenience there is
from when Python guesses correctly will be outweighed by the nuisance
value of when it guesses wrongly.
> while it remains readable, expected, and expressed in a more compact
I don't think it is expected behaviour. It is different from the current
behaviour, so it will be surprising to everyone used to the current
behaviour, annoying to those who like the current behaviour, and a
general inconvenience to those writing code that runs under multiple
versions of Python.
Personally, I would not expect this suggested behaviour. I would be very
surprised, and annoyed, if a simple instruction like:
x, y = some_iterable
behaved differently for iterators and sequences.
> There are some Pros:
> 1. No overhead
No overhead compared to what?
> 2. Readable and not so verbose code
> 3. Optimized case for x,y,*z = iterator
The semantics of that are already set: the first two items are assigned
to x and y, with all subsequent items assigned to z as a list. How will
this change optimize this case? It still needs to run through the
iterator to generate the list.
> 4. Clear way to assign values partially from infinite generators.
It isn't clear at all. If I have a non-generator lazy sequence like:
# Toy example
def __getitem__(self, i):
it = EvenNumbers() # A lazy, infinite sequence
then `x, y = it` will keep the current behaviour and raise an exception
(since it isn't an iterator), but `x, y = iter(it)` will use the new
So in general, when I'm reading code and I see:
x, y = some_iterable
I have very little idea of which behaviour will apply. Will it be the
special iterator behaviour that stops at two items, or the current
sequence behaviour that raises if there are more than two items?
> 1. A special case of how assignment works
> 2. As with any implicit behavior, hard-to-find bugs
Right. Hard-to-find bugs beats any amount of convenience in the
interactive interpreter. To use an analogy:
"Sure, sometimes my car suddenly shifts into reverse while I'm driving
at 60 kph, sometimes the engine falls out when I go around the corner,
and occasionally the brakes catch fire, but gosh the cup holder makes it
really convenient to drink coffee while I'm stopped at traffic lights!"
> There several cases with "undefined" behavior:
> 1. Because the items are assigned, from left to right to the corresponding
> targets, should rhs see side effects during assignment or not?
I don't understand what you mean by this. Surely the behaviour should be
exactly the same as if you wrote:
x, y = islice(it, 2)
What would you do differently, and why?
> 2. Should this work only for generators or for any iterators?
I don't understand why you are even considering singling out *only*
generators. A generator is a particular implementation of an iterator. I
yield 1; yield 2; yield 3
it = gen()
or I can write:
it = iter([1, 2, 3])
and the behaviour of `it` should be identical.
> 3. Is it Pythonic to distinguish what is on the rhs during assignment, or
> it contradicts with duck typing (goose typing)?
I don't understand this question.
> In many cases it is possible to do this right now, but in too verbose way:
> >>> x, y = islice(gen(), 2)
I don't think that is excessively verbose.
But maybe we should consider allowing slice notation on arbitrary
x, y = it[:2]
I have not thought this through in any serious detail, but it seems to
me that if the only problem here is the inconvenience of using islice(),
we could add slicing to iterators. I think that would be better than
having iterators and other iterables behave differently.
Perhaps a better idea might be special syntax to tell the interpreter
you don't want to run the right-hand side to completion. "Explicit is
better than implicit" -- maybe something special like:
x, y, * = iterable
will attempt to extract exactly two items from iterable, without
advancing past the second item. And it could work the same for
sequences, iterators, lazy sequences like range, and any other iterable.
I don't love having yet another meaning for * but that would be better
than changing the standard behaviour of iterator unpacking.
More information about the Python-ideas