Mailman 3 How assignment should work with generators? - Python-ideas

newer
Re: [Python-ideas] [Python-Dev]...

How assignment should work with generators?

Kirill Balunov

Nov. 27, 2017

9:17 a.m.

Currently during assignment, when target list is a comma-separated list of targets (*without "starred" target*) the rule is that the object (rhs) must be an iterable with the same number of items as there are targets in the target list. That is, no check is performed on the number of targets present, and if something goes wrong the ValueError is raised. To show this on simple example:

...

...
...
from itertools import count, islice it = count() x, y = it it count(3)

Here the count was advanced two times but assignment did not happen. I found that in some cases it is too much restricting that rhs must have the same number of items as targets. It is proposed that if the rhs is a generator or an iterator (better some object that yields values on demand), the assignmenet should be lazy and dependent on the number of targets. I find this feature to be very convenient for interactive use, while it remains readable, expected, and expressed in a more compact code. There are some Pros: 1. No overhead 2. Readable and not so verbose code 3. Optimized case for x,y,*z = iterator 4. Clear way to assign values partially from infinite generators. Cons: 1. A special case of how assignment works 2. As with any implicit behavior, hard-to-find bugs There several cases with "undefined" behavior: 1. Because the items are assigned, from left to right to the corresponding targets, should rhs see side effects during assignment or not? 2. Should this work only for generators or for any iterators? 3. Is it Pythonic to distinguish what is on the rhs during assignment, or it contradicts with duck typing (goose typing)? In many cases it is possible to do this right now, but in too verbose way:

...

...
...
x, y = islice(gen(), 2)

But it seems for me that:

...

...
...
x, y = gen()

Looks the same, easy to follow and not so verbose. Another case, which is a pitfall, but will be possible:

...

...
...
x, y = iter([0,1,2,3,4]) # Now it is Ok x 1 y 2

But:

...

...
...
x, y = [0,1,2,3,4] # raises ValueError

Any thoughts? With kind regards, -gdg

Attachments:

attachment.htm (text/html — 4.0 KB)

Show replies by date

Chris Angelico

November 2017

9:40 a.m.

On Mon, Nov 27, 2017 at 8:17 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

This, AIUI, is the nub of the proposal. I don't like the proposed solution, but if there's some alternative way to say "and then ignore the rest", that would be much safer. You said:

...

3. Optimized case for x,y,*z = iterator

It has to be semantically different from that, though. Consider these two generators: def gen1(): for _ in range(5): print("Yielding!") yield "spam" yield "ham" def gen2(): yield 1 yield 2 yield 3 while True: yield 4 If you use "x, y, *z = gen1()", you'll trigger all the prints and completely consume the generator. With gen2(), you'll get an infinite loop. Both of those are semantically different from the islice behaviour, which would consume only that part that you're looking for. What would be far safer is a special syntax that makes it clear that you are not assigning those extra values anywhere. Something along these lines has been proposed a number of times, and I think it's probably about time a PEP was written up, if only to get rejected. Here are a few syntaxes that I believe have been proposed at various times: x, y = islice(iter, 2) # status quo x, y = iter # your proposal x, y, = iter # omit last destination x, y, * = iter # unpack into nothing x, y, ... = iter # assigning to Ellipsis x, y, *... = iter # as above but clearly sequencing And there are a few others too. Every one of them has its downsides, none is perfect. (Personally, I think one of the Ellipsis options is likely the best, or perhaps the least-bad.) Want to spearhead the PEP? ChrisA

Kirill Balunov

10:45 a.m.

2017-11-27 12:40 GMT+03:00 Chris Angelico <rosuav@gmail.com>:

...

On Mon, Nov 27, 2017 at 8:17 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...
In many cases it is possible to do this right now, but in too verbose way:

...
...
...
x, y = islice(gen(), 2)

But it seems for me that:

...
...
...
x, y = gen()

Looks the same, easy to follow and not so verbose.

This, AIUI, is the nub of the proposal.

Yes, it is. For me, x,y = gen() underlines the lazy nature of generators.

...

I don't like the proposed solution, but if there's some alternative way to say "and then ignore the rest", that would be much safer.

Of course, everyone has a subjective assessment of what is good and what is bad. But here I am in something agree with you that at the present time some alternative way would be much safer. But if started from scratch, it seems to me natural to emphasize the nature of generators and iterators.

...

You said:

...
3. Optimized case for x,y,*z = iterator

It has to be semantically different from that, though. Consider these two generators:

def gen1(): for _ in range(5): print("Yielding!") yield "spam" yield "ham"

def gen2(): yield 1 yield 2 yield 3 while True: yield 4

If you use "x, y, *z = gen1()", you'll trigger all the prints and completely consume the generator. With gen2(), you'll get an infinite loop. Both of those are semantically different from the islice behaviour, which would consume only that part that you're looking for.

The idea is not to consume generator completely, but to get enough values to bind to x and y. It should be equivalent to islice(iter, 2), and perceiveв as "bind x, y, where you don't need values for z at all". x, y = islice(iter, 2) # status quo

...

x, y = iter # your proposal

As I wrote above, I would like to see them equivalent x, y, = iter # omit last destination

...

I don't like this, it is hard to notice this nuance. It is valid to have trailing comma (and I find it to be a brilliant feature to have (x,y) equivalent to (x,y,)). This reminds me of ` in Python2, although I never used it and start my journey with Python3.

...

x, y, * = iter # unpack into nothing

Maybe, I like it.

...

x, y, ... = iter # assigning to Ellipsis x, y, *... = iter # as above but clearly sequencing

Yes, it is nice to see Ellipsis as zero-length deques (or throw away container), it can be used also in the middle of list of targets. What I don't like about last three examples, that they imply by their form to be some kind of iteration which must completely consume the generator, which is opposite to the proposed idea. Moreover, this will not work with infinite generators, falling into infinite loop.

...

And there are a few others too. Every one of them has its downsides, none is perfect. (Personally, I think one of the Ellipsis options is likely the best, or perhaps the least-bad.) Want to spearhead the PEP?

To be honest, I'm quite new to the entire ecosystem of Python. Therefore, someone can perceive this as an ordinary attempt by a novice to change everything that is bad in his opinion. In addition, it seems to me that these changes, if approved, can not be made earlier than Python3.8, so I would like to get some feedback first. Nevertheless these words should not be taken as a renouncement, I would be happy and very interested in writing PEP and implementing it. But not to get stuck I need some support and guidelines from interested dev. With kind regards, -gdg

Chris Angelico

11:02 a.m.

On Mon, Nov 27, 2017 at 9:45 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

In terms of language proposals, you can't just say "don't need values for"; the semantics have to be EITHER "consume and discard" OR "don't consume". We already have a perfectly good way of spelling "consume and discard": x, y, _ = iter following the convention that a single underscore means "don't really care" (eg "for _ in range(3)"). So this proposal is about not consuming an iterator.

...

Since this has to be about non-consumption of the generator/iterator, Ellipsis cannot be a zero-length deque. Thus this syntax would have to be restricted to the *last* entry, and it then means "don't check for more elements". The assignment "x, y = it" is roughly equivalent to: try: _iter = iter(it) x = next(_iter) y = next(_iter) except StopIteration: raise ValueError else: try: next(_iter) except StopIteration: pass # good, we got the right number of elements else: raise ValueError The proposed semantics, if I understand you correctly, are: try: _iter = iter(it) x = next(_iter) y = next(_iter) except StopIteration: raise ValueError # no "else" clause, we're done here And I think this would be a good thing to be able to spell conveniently. The only questions are: what spelling is the best, and is it sufficiently useful to justify the syntax?

...

Yes; now that we're into alphas for Python 3.7, it's not going to land until 3.8. That's fine. The PEP process is basically a way of gathering all the arguments for-and-against into a single, coherent document, rather than having them scattered all over the python-ideas email archive. The PEP author has the final say on what is the thrust of the proposal, and then Guido (or his delegate) will decide to accept or reject the PEP. If it's accepted, the change will be made; if it's not, the PEP remains in the repository as a permanent record of the proposal. That way, the *next* person to come along and suggest something can pick up from where this discussion left off, rather than starting fresh. Start by perusing PEP 1, and the template in PEP 12: https://www.python.org/dev/peps/pep-0001/ https://www.python.org/dev/peps/pep-0012/ The PEP editors (myself included) are here to help you; don't hesitate to reach out with questions. ChrisA

Kirill Balunov

12:31 p.m.

...

In terms of language proposals, you can't just say "don't need values for"; the semantics have to be EITHER "consume and discard" OR "don't consume". We already have a perfectly good way of spelling "consume and discard":

x, y, _ = iter

You mean ( x, y, *_ = iter ) ? Since this has to be about non-consumption of the generator/iterator,

...

Ellipsis cannot be a zero-length deque. Thus this syntax would have to be restricted to the *last* entry, and it then means "don't check for more elements".

Yes, you are right to the *last* entry. (*last* depends on proposed syntax (spelling)).

...

The proposed semantics, if I understand you correctly, are:

try: _iter = iter(it) x = next(_iter) y = next(_iter) except StopIteration: raise ValueError # no "else" clause, we're done here

Yes, "roughly" this semantics is proposed, with some assumptions on _iter = iter(it). As I can see at the moment, these cases should behave differently:

...

...
...
x, y = [1,2,3,4] # must raise ValueError x, y = iter([1,2,3,4]) # should work

But at the same time, it violates current situation. So maybe, as you have said we need special syntax. I will think about it.

...

Start by perusing PEP 1, and the template in PEP 12:

https://www.python.org/dev/peps/pep-0001/ https://www.python.org/dev/peps/pep-0012/

The PEP editors (myself included) are here to help you; don't hesitate to reach out with questions.

Thank you! With kind regards, -gdg

Paul Moore

12:39 p.m.

On 27 November 2017 at 12:31, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

I would find this confusing. Consider where you don't have literals: def f(vals): x, y = vals data = [1,2,3,4] f(data) data = iter(data) f(data) Having the two calls behave differently would be a recipe for errors as someone refactors the calling code. Paul

Kirill Balunov

12:59 p.m.

2017-11-27 15:39 GMT+03:00 Paul Moore <p.f.moore@gmail.com>:

...

I can not completely disagree with you, but we all adults here. My first proposal was about generators only, but they are very similar to iterators in their behavior. Whatever it was with this syntax, there will be no difference: def f(vals): x, y = vals data = [1,2,3,4] f(data) data = (i for i in data) f(data) With kind regards, -gdg

Chris Angelico

1:47 p.m.

On Mon, Nov 27, 2017 at 11:31 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

Uhh, yeah, that's what I meant. Sorry. Anyhow, point is, there IS a syntax for that, so we don't need another.

...

That's the part I disagree with, but if you're the PEP author, you can make the recommendation be anything you like. However, one very strong piece of advice: it's easier to get a proposal accepted if the backward compatibility section simply says "the proposed notation is a SyntaxError in current versions of Python". Changing the semantics of currently legal code requires that you demonstrate that the current semantics are, in some way, faulty or buggy. ChrisA

Steven D'Aprano

2:13 p.m.

On Mon, Nov 27, 2017 at 03:31:38PM +0300, Kirill Balunov wrote:

...

I *completely disagree* that they should behave differently. That would be a radical change to the current equivalency between iterators and other iterables. Of course iterators support next() (and a few other things), while iterables (sequences and others) support slicing, __getitem__, and so forth. But when it comes to iteration, they behave exactly the same in all ways that I can think of: for x in iterable: ... list(iterable) iter(iterable) func(*iterable) and most importantly for this discussion, iterable unpacking: a, b, c = *iterable They all work the same, regardless of whether `iterable` is an iterator, a generator, a list, a tuple, a range object, a custom lazy sequence. Sure, there are a few differences: iterators generally cannot be restarted or rewound, while lazy sequences might be, and eager sequences like lists can be. You can't peek into an arbitrary iterator without consuming the value. But as far as iteration itself goes, they are all the same. -- Steve

Daniel Moisset

1:44 p.m.

On 27 November 2017 at 06:40, Chris Angelico <rosuav@gmail.com> wrote:

...

Just to clear the list, this one (trailing comma) would be ambiguous/backward incompatible for the 1 variable case: x, = iter which is a relatively common idiom and is expected to raise an error if the iterator has trailing elements. -- Daniel F. Moisset - UK Country Manager - Machinalis Limited www.machinalis.co.uk <http://www.machinalis.com> Skype: @dmoisset T: + 44 7398 827139 1 Fore St, London, EC2Y 9DT Machinalis Limited is a company registered in England and Wales. Registered number: 10574987.

Chris Angelico

2:01 p.m.

On Tue, Nov 28, 2017 at 12:44 AM, Daniel Moisset <dmoisset@machinalis.com> wrote:

...

Correct. For that and other reasons, I am not in favour of either of these two proposals. And the status quo is noisy and has duplicated information (you have to match the ", 2" to the number of assignment targets). I would support any syntax that (a) is currently illegal, (b) reads reasonably well, and (c) can't be TOO easily confused with something else. Assigning to Ellipsis (with or without a star) is my current preferred, but I'd happily support others that do the same job. ChrisA

Steven D'Aprano

2:46 p.m.

On Tue, Nov 28, 2017 at 01:01:01AM +1100, Chris Angelico wrote:

...

Er, not really. How about: a, b = islice(iterable, 3, 10, 4) A somewhat unusual case, to be sure, but still legal.

...

Honestly, I don't see this as such a big deal that Python needs to support it at all: maybe +0.25 on the idea of non-consuming iterable unpacking. islice does the job. If we have it at all, it is yet another special syntax to learn, and I'm not convinced that the benefit outways the cost of learning yet more Perlish magic syntax for a marginal use-case. Especially if the syntax looks like grit on Tim's monitor. On the other hand, if we did get this, I would prefer magic syntax over a backwards incompatible change. On balance, given that * is already used for at least 11 different things already[1], I'd actually prefer the grit on the monitor. Perhaps x, y, ... = iterable to indicate non-consuming iterable unpacking. Or maybe x, y, / = iterable since in some sense this is the opposite of unpacking -- the whole point is to NOT unpack the remaining items. And the slash kind of looks like cutting off the process from continuing: unpack two items, then stop. ... maybe +0.5 with the slash syntax *wink* [1] Multiplication, exponentiation, sequence replication, *varargs, **keyword varargs, regexes, globs, import wild cards, iterable unpacking in function calls, extended iterable unpacking in assignments, dict unpacking -- have I missed any? -- Steve

Chris Angelico

4:03 p.m.

On Tue, Nov 28, 2017 at 1:46 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Well, sure. But the situation that's equivalent to the proposal here is simply islice(iterable, 2). There's no proposal to have something that assigns the way you're slicing there, and there's certainly no proposal to abolish islice. ChrisA

Greg Ewing

9:13 p.m.

Chris Angelico wrote:

...

x, y, * = iter # unpack into nothing

I'm surprised this isn't already allowed. It seems like the One Obvious Way to me. -- Greg

Guido van Rossum

9:18 p.m.

On Mon, Nov 27, 2017 at 1:13 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

I'm not that surprised. While it appears to rhyme with the use of a lone '*' in function signatures, it would actually mean the opposite: in signatures it means "don't allow any more". These opposite meanings would cause some confusion. -- --Guido van Rossum (python.org/~guido)

Kirill Balunov

9:44 p.m.

While I was more optimistic when I proposed this idea, moving on I gradually become less and less optimistic. I did not underestimate how much this would change the semantics. At present, if the lhs is comma-separated list of targets, the rhs is evaluated, and only then if they match the assignment happens. Here it depends on the number of targets in lhs, and should be evaluated lazily. So someone too perlish clever can assume that with the proposed syntax:

...

This is bad and I do not like it, but I do not see any serious reasons why it should not be allowed. In case of Ellipsis they also should trigger special behavior. While I like this feature of lazy assignment, may be it becomes more special than it deserves. With kind regards, -gdg

Greg Ewing

5:07 a.m.

Kirill Balunov wrote:

...

There's no need for that to happen. It can still unpack all the values it needs before performing any assignments. -- Greg

Greg Ewing

10:15 p.m.

Guido van Rossum wrote:

...

That's the usage that's out of step with the rest -- all the others can be seen as some form of wildcard. So it's already confusing, and I don't think adding another wildcard meaning would make things any worse. -- Greg

Steven D'Aprano

12:31 a.m.

On Tue, Nov 28, 2017 at 11:15:56AM +1300, Greg Ewing wrote:

...

How does "stop iterating here" equate to a wildcard? We already have a "wildcard" for iterable unpacking, using the extended iterable unpacking syntax: x, y, *z = iterable which unpacks the first two items into x and y and the rest into z. This is the opposite: *stop* unpacking, so that after x and y are unpacked the process stops. I don't see how this is conceptually a wild card. -- Steve

Greg Ewing

5:15 a.m.

Steven D'Aprano wrote:

...

How does "stop iterating here" equate to a wildcard?

The * means "I don't care what else the sequence has in it". Because I don't care, there's no need to iterate any further. -- Greg

Steven D'Aprano

6:31 a.m.

On Tue, Nov 28, 2017 at 06:15:47PM +1300, Greg Ewing wrote:

...

C Anthony Risinger

7:06 a.m.

On Nov 28, 2017 12:32 AM, "Steven D'Aprano" <steve@pearwood.info> wrote: On Tue, Nov 28, 2017 at 06:15:47PM +1300, Greg Ewing wrote:

...

I'll grant you that. But I don't see how that relates to being a wildcard. I'm not seeing the connection. I mean, you wouldn't interpret from module import * to mean "I don't care what's in the module, so don't bother importing anything" would you? So the concept of wildcard here seems to be the opposite to its use here: - in imports, it means "import everything the module offers"; - in extended iterable unpacking, it means "collect everything"; both of which (to me) seem related to the concept of a wildcard; but in this proposed syntax, we have x, y, * = iterable which means the opposite to "collect everything", instead meaning "collect nothing and stop". Anyway, given that * = is visually ambiguous with *= I don't think this syntax is feasible (even though * = is currently a syntax error, or at least it is in 3.5). If not already considered, what if the RHS had to be explicitly unpacked? Something like: a, b, c = *iterator Which would essentially be: a, b, c = (*iterator,) This enables lazy assignment by default but `*` can force complete expansion (and exact matching) of the RHS. It's a breaking change, but it does have a straightforward fix (simply wrap and unpack any relevant RHS). Thanks, -- C Anthony

Kirill Balunov

7:44 p.m.

2017-11-28 10:06 GMT+03:00 C Anthony Risinger <c@anthonyrisinger.com>:

...

While I find your suggestions very close to my vision and the initial proposal, which I still like. I saw enough of the discussion to realize that by now it is already impossible.

...

It's a breaking change, but it does have a straightforward fix (simply wrap and unpack any relevant RHS

Although I have never used Python 2, the idea to distinguish fixed-sized and something lazy, even for Python 4, reminds me of the transition from str-unicode to the present state of affairs, but with much higher impact.To be honest, I do not like some aspects of how Python 2 issue has been resolved (especially bytes part) but it is another topic. With kind regards, -gdg

Guido van Rossum

7:50 p.m.

On Tue, Nov 28, 2017 at 11:44 AM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

Since Python 4 came up, I'd like to make something clear. Python 4 is *not* going to be a release where we break compatibility with a whole bunch of things at once. Basically if you think you'll need to wait for Python 4 to get your favorite change to the language, you can forget it. You need to come up with a plan to introduce the change without breaking existing code or at least a clear deprecation schedule. -- --Guido van Rossum (python.org/~guido)

Kirill Balunov

8:08 p.m.

2017-11-28 22:50 GMT+03:00 Guido van Rossum <guido@python.org>:

...

Oh no, I was misunderstood. I think that we have already come to some consensus which syntax can be discussed. In any case, I did not want to produce unnecessary noise, I apologize. With kind regards, -gdg

Steven D'Aprano

1:55 p.m.

On Mon, Nov 27, 2017 at 12:17:31PM +0300, Kirill Balunov wrote:

...

Currently during assignment, when target list is a comma-separated list of targets (*without "starred" target*) the rule is that the object (rhs) must be an iterable with the same number of items as there are targets in the target list. That is, no check is performed on the number of targets present, and if something goes wrong the ValueError is raised.

That's a misleading description: ValueError is raised when the number of targets is different from the number of items. I consider that to be performing a check on the number of targets.

...

To show this on simple example:

...
...
...
from itertools import count, islice it = count() x, y = it it count(3)

For everyone else who was confused by this, as I was, that's not actually a copy and paste from the REPL. There should be a ValueError raised after the x, y assignment. As given, it is confusing because it looks like the assignment succeeded, when in fact it didn't.

...

Here the count was advanced two times but assignment did not happen.

Correct, because there was an exception raised.

...

I found that in some cases it is too much restricting that rhs must have the same number of items as targets. It is proposed that if the rhs is a generator or an iterator (better some object that yields values on demand), the assignmenet should be lazy and dependent on the number of targets.

I think that's problematic. How do you know what objects that yields values on demand? Not all lazy iterables are iterators: there are also lazy sequences like range. But even if we decide on a simple rule like "iterator unpacking depends on the number of targets, all other iterables don't", I think that will be a bug magnet. It will mean that you can't rely on this special behaviour unless you surround each call with a type check: if isinstance(it, collections.abc.Iterator): # special case for iterators x, y = it else: # sequences keep the old behaviour x, y = it[:2]

...

I find this feature to be very convenient for interactive use,

There are many things which would be convenient for interactive use that are a bad idea outside of the interactive environment. Errors which pass silently are one of them. Unpacking a sequence of 3 items into 2 assignment targets should be an error, unless you explicitly limit it to only two items. Sure, sometimes it would be convenient to unpack just two items out of some arbitrarily large iterator just be writing `x, y = it`. But other times that would be an error, even in the interactive interpreter. I don't want Python trying to *guess* whether I want to unpack the entire iteratable or just two items. Whatever tiny convenience there is from when Python guesses correctly will be outweighed by the nuisance value of when it guesses wrongly.

...

while it remains readable, expected, and expressed in a more compact code.

I don't think it is expected behaviour. It is different from the current behaviour, so it will be surprising to everyone used to the current behaviour, annoying to those who like the current behaviour, and a general inconvenience to those writing code that runs under multiple versions of Python. Personally, I would not expect this suggested behaviour. I would be very surprised, and annoyed, if a simple instruction like: x, y = some_iterable behaved differently for iterators and sequences.

...

There are some Pros: 1. No overhead

No overhead compared to what?

...

2. Readable and not so verbose code 3. Optimized case for x,y,*z = iterator

The semantics of that are already set: the first two items are assigned to x and y, with all subsequent items assigned to z as a list. How will this change optimize this case? It still needs to run through the iterator to generate the list.

...

4. Clear way to assign values partially from infinite generators.

It isn't clear at all. If I have a non-generator lazy sequence like: # Toy example class EvenNumbers: def __getitem__(self, i): return 2*i it = EvenNumbers() # A lazy, infinite sequence then `x, y = it` will keep the current behaviour and raise an exception (since it isn't an iterator), but `x, y = iter(it)` will use the new behaviour. So in general, when I'm reading code and I see: x, y = some_iterable I have very little idea of which behaviour will apply. Will it be the special iterator behaviour that stops at two items, or the current sequence behaviour that raises if there are more than two items?

...

Cons: 1. A special case of how assignment works 2. As with any implicit behavior, hard-to-find bugs

Right. Hard-to-find bugs beats any amount of convenience in the interactive interpreter. To use an analogy: "Sure, sometimes my car suddenly shifts into reverse while I'm driving at 60 kph, sometimes the engine falls out when I go around the corner, and occasionally the brakes catch fire, but gosh the cup holder makes it really convenient to drink coffee while I'm stopped at traffic lights!"

...

There several cases with "undefined" behavior: 1. Because the items are assigned, from left to right to the corresponding targets, should rhs see side effects during assignment or not?

I don't understand what you mean by this. Surely the behaviour should be exactly the same as if you wrote: x, y = islice(it, 2) What would you do differently, and why?

...

2. Should this work only for generators or for any iterators?

I don't understand why you are even considering singling out *only* generators. A generator is a particular implementation of an iterator. I can write: def gen(): yield 1; yield 2; yield 3 it = gen() or I can write: it = iter([1, 2, 3]) and the behaviour of `it` should be identical.

...

3. Is it Pythonic to distinguish what is on the rhs during assignment, or it contradicts with duck typing (goose typing)?

I don't understand this question.

...

In many cases it is possible to do this right now, but in too verbose way:

...
...
...
x, y = islice(gen(), 2)

I don't think that is excessively verbose. But maybe we should consider allowing slice notation on arbitrary iterators: x, y = it[:2] I have not thought this through in any serious detail, but it seems to me that if the only problem here is the inconvenience of using islice(), we could add slicing to iterators. I think that would be better than having iterators and other iterables behave differently. Perhaps a better idea might be special syntax to tell the interpreter you don't want to run the right-hand side to completion. "Explicit is better than implicit" -- maybe something special like: x, y, * = iterable will attempt to extract exactly two items from iterable, without advancing past the second item. And it could work the same for sequences, iterators, lazy sequences like range, and any other iterable. I don't love having yet another meaning for * but that would be better than changing the standard behaviour of iterator unpacking. -- Steve

Chris Angelico

2:14 p.m.

On Tue, Nov 28, 2017 at 12:55 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Nah, far easier: x, y = iter(it) since that'll be a no-op in the first case, and trigger new behaviour in the second. However, I don't like this behaviour-switch. I'd much rather have actual syntax.

...

Exactly.

...

I think the point here (correct me if I'm wrong?) is that it takes work to probe the iterator to see if there's a third item, so grabbing just the first two items is simply *doing less work*. It's not doing MORE work (constructing an islice object, pumping it, then discarding it) - it's simply skipping the check that it would otherwise do.

...

Maybe 'optimized case for "x, y, *_ = iterator" where you then never use _ and it has no side effects'? But that could be worded better.

...

I do think islice is verbose, but the main problem is that you have to match the second argument to the number of assignment targets. Slice notation is an improvement, but it still has that same problem. But perhaps this should be added to the list of options for the PEP.

...

That's one of the options that I mentioned, as it's been proposed in the past. The problem is that it depends on internal whitespace to distinguish it from augmented assignment; granted, there's no way to use "*=" with multiple targets (or even in the single-target case, you can't do "x,*=it" with the comma in it), but that's still a readability problem. ChrisA

Kirill Balunov

2:50 p.m.

2017-11-27 17:14 GMT+03:00 Chris Angelico <rosuav@gmail.com>:

...

Nah, far easier:

x, y = iter(it)

Yes, you are right.

...

...
2. Readable and not so verbose code

...
3. Optimized case for x,y,*z = iterator

The semantics of that are already set: the first two items are assigned to x and y, with all subsequent items assigned to z as a list. How will this change optimize this case? It still needs to run through the iterator to generate the list.

Maybe 'optimized case for "x, y, *_ = iterator" where you then never use _ and it has no side effects'? But that could be worded better.

Yes, you did not need to consume and then to throw out _, and in other cases not to hang in an endless loop.

...

I do think islice is verbose, but the main problem is that you have to match the second argument to the number of assignment targets. Slice notation is an improvement, but it still has that same problem.

But perhaps this should be added to the list of options for the PEP.

Inconvenience is that in both cases: islice and iter[:2], you should specify the exact number of assignment targets.

...

That's one of the options that I mentioned, as it's been proposed in the past. The problem is that it depends on internal whitespace to distinguish it from augmented assignment; granted, there's no way to use "*=" with multiple targets (or even in the single-target case, you can't do "x,*=it" with the comma in it), but that's still a readability problem.

Your suggestion using Ellipsis at the moment seems to me the most readable, like: x, ... = iterable x, y, ... = iterable But I have not summed up yet what pitfalls can be on this path. I really do not like to use "starred" targets in any way: x, y, * = iterable x, y, *... Because any "starred" target implies consuming or collecting, and contradicts with the proposed behavior. With kind regards, -gdg

Koos Zevenhoven

3:23 p.m.

On Mon, Nov 27, 2017 at 4:50 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

I really do not like to use "starred" targets in any way:

x, y, * = iterable

This one won't work, because it looks like in-place multiplication (__imul__) which clashes with the above for a single name as assignment target: x *= 2 # "x = x * 2"

...

Consuming does not contradict with your proposal, but maybe you mean *fully consuming*. I think you are proposing partial or incremental consumption of the rhs. —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Steven D'Aprano

3:35 p.m.

On Tue, Nov 28, 2017 at 01:14:40AM +1100, Chris Angelico wrote:

...

Not necessarily a no-op. __iter__ might have side-effects. Of course if you can guarantee that you ONLY have iterators, then there's no need to test for an iterator. But it isn't always appropriate to convert sequences to an iterator. And besides, if you're going to call iter(), that's not that much shorter than islice() (assuming you've already imported it, of course). You only save five characters or so. But the big problem here is that iterable unpacking would be conceptually split into "iterator unpacking" and "all other iterables unpacking". I think that's unnecessary complication.

...

I do think islice is verbose, but the main problem is that you have to match the second argument to the number of assignment targets.

Yes, there is a certain amount of redundancy in having to specify the number of items in a slice, using either syntax: a, b = sequence[:2] a, b = islice(iterable, 2) but it is minimal duplication, hardly the sort of thing DRY is concerned with: http://www.artima.com/intv/dry.html and there is an equally important principle that is satisfied by being explicit about the number of items you want. (In the face of ambiguity, refuse the temptation to guess.) If you get the number wrong, you will find out immediately. And the fix is trivial. In this case, there is a small but real benefit to counting the assignment targets and being explicit about the number of items to slice. Consider an extension to this "non-consuming" unpacking that allowed syntax like this to pass silently: a, b = x, y, z That ought to be a clear error, right? I would hope you don't think that Python should let that through. Okay, now we put x, y, z into a list, then unpack the list: L = [x, y, z] a, b = L That ought to still be an error, unless we explicity silence it. One way to do so is with an explicit slice: a, b = L[:2] This isn't repeating yourself, it isn't really duplicated or redundant information. It is telling the interpreter, and more importantly the reader, that you want exactly two items out of a list that could contain any arbitrary number of items and that you are planning to bind them to exactly two targets. Same goes if we use islice(iterable) instead. Another way would be special syntax to say you want non-consuming unpacking, swapping out the "ambiguity" Zen for the "explicitly silencing errors" Zen. So I really don't see this as anything more than, at best, a fairly minor piece of syntactic sugar. It doesn't really add anything to the language, or allow us to do anything cool we couldn't do before. -- Steve

Chris Angelico

4:05 p.m.

On Tue, Nov 28, 2017 at 2:35 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

I absolutely agree with this for the default case. That's why I am ONLY in favour of the explicit options. So, for instance: a, b = x, y, z # error a, b, ... = x, y, z # valid (evaluates and ignores z) ChrisA

Paul Moore

4:23 p.m.

On 27 November 2017 at 16:05, Chris Angelico <rosuav@gmail.com> wrote:

...

Agreed, only explicit options are even worth considering (because of backward compatibility if for no other reason). However, the unpacking syntax is already complex, and hard to search for. Making it more complex needs a strong justification. And good luck in doing a google search for "..." if you saw that code in a project you had to maintain. Seriously, has anyone done a proper investigation into how much benefit this proposal would provide? It should be reasonably easy to do a code search for something like "=.*islice", to find code that's a candidate for using the proposed syntax. I suspect there's very little code like that. I'm -1 on this proposal without a much better justification of the benefits it will bring. Paul

C Anthony Risinger

5:48 p.m.

This proposal resonates with me. I've definitely wanted to use unpacking to crank an iterator a couple times and move on without exhausting the iterator. It's a very natural and intuitive meaning for unpacking as it relates to iterators. In my mind, this ask is aligned with, and has similar motivation to, lazy zip(), map(), and keys() in Python 3. Syntax support for unpacking as it stands today is not very conducive to iterables and conflicts with a widespread desire to use more of them. Couple thoughts: * Perhaps existence of `__len__` should influence unpacking? There is a semantic difference (and typically a visual one too) between 1-to-1 matching a fixed-width sequence/container on the RHS to identifiers on the LHS, even if they look similar (ie. "if RHS has a length make it fit, otherwise don't"). * (related to above) Perhaps the "rigidity"(?) of both RHS and LHS should influence unpacking? If they are both fixed-width, expect exact match. If either is variable-width, then lazily unravel until a different problem happens (eg. LHS is fixed-width but RHS ran out of values... basically we always unravel lazily, but keep track of when LHS or RHS become variable, and avoid checking length if they do). * Python 4 change as the language moves towards lazy iterators everywhere? If `__len__` influenced behavior like mentioned above, then a mechanical fix to code would simply be `LHS = tuple(*RHS)`, similar to keys(). While I like the original proposal, adding basic slice support to iterables is also a nice idea. Both are independently useful, eg. `gen.__getitem__(slice())` delegates to islice(). This achieves the goal of allowing meaningful unpacking of an iterator window, using normal syntax, without breaking existing code. The fixed/variable-width idea makes the most sense to me though. This enables things like:

...

...
...
a, b, c = (1, *range(2, 100), 3) (1, 2, 3)

Since both sides are not explicitly sized unpacking is not forcibly sized either. Expecting LHS/RHS to exactly match 100% of the time is the special case here today, not the proposed general unpacking rules that will work well with iterators. This change also has a Lua-like multiple return value feel to it that appeals to me. Thanks, On Nov 27, 2017 10:23 AM, "Paul Moore" <p.f.moore@gmail.com> wrote:

...

Greg Ewing

9:49 p.m.

C Anthony Risinger wrote:

...

-1. There's a convention that an iterator can implement __len__ to provide a hint about the number of items it will return (useful for preallocating space, etc.) It would be very annoying if such an iterator behaved differently from other iterators when unpacking. Another thing is that the proposed feature will be useful on non-iterator iterables as well, since it saves the overhead of unpacking the rest of the items only to throw them away.

...

While I like the original proposal, adding basic slice support to iterables is also a nice idea.

It's not as nice as it seems. You're asking for __getitem__ to be made part of the iterator protocol, which would be a huge change affecting all existing iterators. Otherwise, it would just be a piecemeal affair. Some iterators would support slicing and some wouldn't, so you couldn't rely on it. -- Greg

Steven D'Aprano

12:47 a.m.

On Tue, Nov 28, 2017 at 10:49:45AM +1300, Greg Ewing wrote:

...

I think you mean __length_hint__ for giving a hint. If an iterator supported __len__ itself, I'd expect it to be exact, not a hint. I'm not sure that its a strong convention, but we certainly could create custom iterators that supported len(). https://www.python.org/dev/peps/pep-0424/

...

There are ways around that. As I said earlier, I'm not going to champion this idea, but for the sake of brainstorming, the interpreter could implement obj[slice] as: if obj has __getitem__: call obj.__getitem__(slice) elif iter(obj) is obj: call itertools.islice(obj, slice) else: raise TypeError or similar. Its not literally necessary to require iterators themselves to support a __getitem__ method in order to add slicing to the iterator protocol. (Similar to the way bool() falls back on testing for non-zero length in the event that __bool__ doesn't exist, or != falls back on calling and NOT'ing __eq__ if __ne__ doesn't exist.) So I think this is solvable if it needs to be solved. -- Steve

C Anthony Risinger

4:39 p.m.

On Mon, Nov 27, 2017 at 3:49 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

C Anthony Risinger wrote:

...
* Perhaps existence of `__len__` should influence unpacking? There is a semantic difference (and typically a visual one too) between 1-to-1 matching a fixed-width sequence/container on the RHS to identifiers on the LHS, even if they look similar (ie. "if RHS has a length make it fit, otherwise don't").

-1. There's a convention that an iterator can implement __len__ to provide a hint about the number of items it will return (useful for preallocating space, etc.) It would be very annoying if such an iterator behaved differently from other iterators when unpacking.

Is __len__ a viable option now that __length_hint__ has been identified for hints? IOW, if it's possible to know the full length of RHS ahead of time, and the LHS is also fixed width, then unpack like today else unpack lazily. This does make unpacking behave slightly different if the RHS is `Sized` (per ABC, has __len__), but that doesn't seem too unreasonable. It's different after all. Using Ellipsis, eg. `a, b, ... = it()`, seems like a OK approach too but it's unfortunate we are effectively working around the current force-matching behavior of unpacking... is this the appropriate default? Is there precedence or guidance elsewhere? `a, b, ...` to me says "pull out a and b and throw away the rest", sort of like the spread operator in JS/ECMA. The mere presence of more characters (...) implies something else will *happen* to the remaining items, not that they will be skipped. What about the explicit RHS unpacking idea? Kirill commented on this approach but I'm not sure others did:

...

...
...
a, b = iter([1, 2, 3]); a, b (1, 2)

...

...
...
a, b = *iter([1, 2, 3]); a, b Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: too many values to unpack (expected 2)

This reads very clearly to me that the RHS is expected to be 100% unraveled, and would work with any iterable in the same way. Thanks,

Greg Ewing

9:49 p.m.

C Anthony Risinger wrote:

...

Is __len__ a viable option now that __length_hint__ has been identified for hints?

No, I misremembered that feature, sorry. But I still don't like the idea of changing behaviour depending on whether the RHS "looks like" an iterator or not. I'm not sure how to explain why I feel that way, but I think it's because the proposal would make the behaviour depend on the presence or otherwise of a method that's not actually used by the operation being performed. Unpacking doesn't need to know the length in advance, so it shouldn't care whether there is a __len__ method. -- Greg

Steven D'Aprano

3:55 a.m.

On Thu, Nov 30, 2017 at 10:49:19AM +1300, Greg Ewing wrote:

...

The reason I oppose that is that in all other ways related to iteration, iterators are a perfect substitute for any other iterable: for x in iterable: pass list(iterable) function(*iterable) a, *b, c = iterable a, *b = iterable all behave identically whether iterable is a list or an iterator. This would make the last example, and only that, behave differently depending on whether you pass an iterator or some other iterable. -- Steve

Greg Ewing

10:21 p.m.

C Anthony Risinger wrote:

...

It seems that many people think about unpacking rather differently from the way I do. I think the difference is procedural vs. declarative. To my way of thinking, something like a, b, c = x is a pattern-matching operation. It's declaring that x is a sequence of three things, and giving names to those things. It's not saying to *do* anything to x. With that interpretation, a, b, ... = x is declaring that x is a sequence of at least two items, and giving names to the first two. The ellipsis just means that there could be more items, but we don't want to give them names. On the other hand, some people seem to be interpreting the word "unpack" as in "unpack a suitcase", i.e. the suitcase is empty afterwards. But unpacking has never meant that in Python! If it did, then x = [1, 2, 3] a, b, c = x would leave x == [] afterwards. The only case where unpacking behaves like that is when the rhs is an iterator rather than a sequence, in which case a side effect is unavoidable. The question then is what the side effect should be. I would argue that, since the side effect is something that's not really wanted, it should be as *small* as possible. By that argument, a, b, ... = some_iterator should do as *little* as possible to fulfill what's being asked, i.e. give names to the first two items produced by the rhs. Consuming those two items is unavoidable, but there's no need to consume any more. As for the "extra syntax", we only need it because we've defined the existing unpacking syntax to imply verifying that the rhs has exactly the same length as the pattern. We now want to express patterns which don't impose a length constraint, so we need to write them some other way. -- Greg

Brendan Barnwell

11:08 p.m.

On 2017-11-29 14:21, Greg Ewing wrote:

...

That's an interesting analysis, but I don't think your view is really the right one. It *is* unpacking a suitcase, it's just that *if necessary* the suitcase is constructed just in time for you to unpack it. In other words, the suitcase is not the list [1, 2, 3], but an iterator over this list. This is the same as the behavior for "for" loops: if you do "for item in [1, 2, 3]", the actual thing you're unrolling is an iterator over the list. In some sense the point of the iterable/iterator distinction is to distinguish suitcases (iterators) from things-that-produce-suitcases-on-demand (iterables). It's just that Python syntax (very nicely) allows us to omit the explicit iter() call. The fact that iteration is taking place is specified by the context; that could be a for loop, or it could be multiple assignment targets, but it's iteration all the same.

...

I see your point, but I think that middle ground doesn't really give the benefits of either. If you expect your suitcase to remain unopened, it's pretty cold comfort to find that someone has opened it and taken only your pants and shoes but left the rest. If the side effect isn't wanted, you really need the RHS to be something that isn't affected (i.e., a re-iterable). It does seem that in some cases you may want the iterator to be exhausted, and in others not, but I don't think it's a good idea to try to "hide" the unpacking by limiting the number of iterations. The important difference is between any irreversible unpacking at all, and none at all. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Steven D'Aprano

4:43 a.m.

On Wed, Nov 29, 2017 at 03:08:43PM -0800, Brendan Barnwell wrote:

...

On 2017-11-29 14:21, Greg Ewing wrote:

...

...
On the other hand, some people seem to be interpreting the word "unpack" as in "unpack a suitcase", i.e. the suitcase is empty afterwards. But unpacking has never meant that in Python! If it did, then

x = [1, 2, 3] a, b, c = x

would leave x == [] afterwards.

The only case where unpacking behaves like that is when the rhs is an iterator rather than a sequence, in which case a side effect is unavoidable. The question then is what the side effect should be.

That's an interesting analysis, but I don't think your view is really the right one. It *is* unpacking a suitcase, it's just that *if necessary* the suitcase is constructed just in time for you to unpack it. In other words, the suitcase is not the list [1, 2, 3], but an iterator over this list.

At the point that you are conjuring from thin air an invisible suitcase that is an exact clone of the original suitcase, in order to unpack the clone without disturbing the original, I think the metaphor is so far from the real-world unpacking of suitcases that it no longer applies. Besides, it's not even correct to say that an invisible suitcase (iterator) is constructed. # Python 3.5 py> dis.dis("a, b, c = [97, 98, x]") 1 0 LOAD_CONST 0 (97) 3 LOAD_CONST 1 (98) 6 LOAD_NAME 0 (x) 9 ROT_THREE 10 ROT_TWO 11 STORE_NAME 1 (a) 14 STORE_NAME 2 (b) 17 STORE_NAME 3 (c) 20 LOAD_CONST 2 (None) 23 RETURN_VALUE Before iterators existed, Python had sequence unpacking going back to at least Python 1.5 if not older, so even if Python used a temporary and invisible iterator *now* that has not always been the case and it might not be the case in the future or in alternate interpreters. Even if Python *sometimes* builds a temporary and invisible iterator, it doesn't do it all the time, and when it does, it is pure implementation, not interface. The interpreter is free to do whatever it likes behind the scenes, but there's no iterator involved in the high-level Python statement: a, b, c = [1, 2, 3] That code involves a list and three assignment targets, that is all.

...

This is the same as the behavior for "for" loops: if you do "for item in [1, 2, 3]", the actual thing you're unrolling is an iterator over the list.

No, the actual thing *I* am unrolling is exactly what I write in my code, which is the list [1, 2, 3]. I don't care what the Python interpreter iterates over internally, so long as the results are the same. It can use an iterator, or unroll the loop at compile-time, or turn it into recursion over a linked list for all I care. As much as possible, we should avoid thinking about implementation details when trying to understand high-level semantics of Python code. Otherwise, our mental models become obsolete when the implementation changes. Or worse, we become incapable of thinking about better implementations (or better high-level semantics!) because the current implementation is so entwined with our understanding of the high-level semantics of the code. If we had allowed the old sequence protocol implementation to take priority over the abstract, high-level concept of iteration, we wouldn't have iterators today. -- Steve

Brendan Barnwell

5:25 a.m.

On 2017-11-29 20:43, Steven D'Aprano wrote:

...

It is not an exact clone of the original suitcase, because the original suitcase is a collection with stable contents (i.e., cannot be exhausted), but the "clone" (the iterator) CAN be exhausted. It iterates over the same *values*, but that doesn't mean it's the same thing.

...

The code only has a list and three assignment targets, but that doesn't mean that that's all it "involves". The expression "a + b" only has two variables and a plus sign, but it involves a call to __add__ which is not explicitly represented. Things like this aren't implementation details. Indeed, they're precisely the opposite: they are a high level specification of an API for how syntax is converted into semantics.

...

Don't you see a bit of irony in arguing based on the compiled bytecode, and then saying you don't care about implementation details? :-) Here is a simpler example: class Foo(object): def __iter__(self): print("You tried to call iter on me!")

...

You can see that iter() is called, even though "exactly what I wrote in the code" is not iter(Foo()) but just Foo(). The "implementation detail" is that this function call is concealed within a bytecode called "UNPACK_SEQUENCE". Another implementation detail is that in your example that bytecode not used, but that's only because you decompiled an expression with a literal list. If you do "x = [1, 2, 3]" and then decompile "a, b, c = x", you will see the UNPACK_SEQUENCE bytecode. These details of the bytecode are implementation details. What is not an implementation detail is the iteration protocol, which is exactly the kind of high-level semantic thing you're talking about. The iteration protocol says that when you go to iterate over something, iter() is called on it, and then next() is called on the result of that call. Because of this, I am loath to pretend that whether "a, b, c = x" is "like unpacking a suitcase" depends on whether x happens to be a list, some other iterable, or some other iterator. The end result in all cases is that the thing that actually doles out the items is an iterator. Sometimes that iterator is connected to a stable base (some kind of collection) that can be re-iterated; sometimes it isn't. But that doesn't change the iteration protocol. The interpreter is not free to do what it likes behind the scenes; an implementation that did not call __iter__ in the above case would be errroneous. __iter__ is part of the interface of any type that defines it. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Greg Ewing

5:32 a.m.

Brendan Barnwell wrote:

...

I don't think that's right. An iterator created from a sequence is not a container in its own right, it's something that provides access to a container. It's not a suitcase, it's a view of the contents of a suitcase.

...

So you think it's somehow better if he takes *all* of your clothes instead? Here's another way to think about it. If it's acceptable to exhaust the iterator when only the first few items are requested, then you're planning to throw the iterator away afterwards. In that case, what purpose is served by extracting the rest of the items? Remember, the ellipsis explicitly says you don't care how many more items there are. The only reason I can think of is if you're relying on side effects performed by the iterator, which is a pretty obscure way to design code. If the iterator really must be exhausted for some reason, it would be better to be explicit about it, e.g. a, b, ... = spam_iterator for _ in spam_iterator: pass # Mustn't leave any spam in the fridge or it will go bad -- Greg

Steven D'Aprano

4:04 a.m.

On Thu, Nov 30, 2017 at 11:21:48AM +1300, Greg Ewing wrote:

...

I hadn't thought of that interpretation before, but now that Greg mentions it, its so obvious-in-hindsight that I completely agree with it. I think that we should promote this as the "one obvious" interpretation. Obviously as a practical matter, there are some x (namely iterators) where you cannot extract items without modifying x, but in all other cases I think that the pattern-matching interpretation is superiour.

...

Indeed.

...

I'm still in 100% agreement with all of this.

...

To be clear, I'm still luke-warm on the need for this, I think islice is adequate, but if others believe this is a use-case important enough for a syntactic solution, I've come around to making ... my prefered syntax. -- Steve

C Anthony Risinger

December 2017

9:04 p.m.

On Nov 29, 2017 10:09 PM, "Steven D'Aprano" <steve@pearwood.info> wrote:

...

This conversation about suitcases, matching, and language assumptions is interesting. I've realized two concrete things about how I understand unpacking, and perhaps, further explain the dissonance we have here: * Unpacking is destructuring not pattern matching. * Tuple syntax is commas, paren, one, or both. For the former, destructuring, this reply conveys my thoughts verbatim: https://groups.google.com/forum/#!topic/clojure/SUoThs5FGvE "There are two different concerns in what people refer to as "pattern matching": binding and flow-control. Destructuring only addresses binding. Pattern matching emphasizes flow control, and some binding features typically come along for free with whatever syntax it uses. (But you could in principle have flow control without binding.)" The only part of unpacking that is 'pattern matching' is the fact that it blows up spectacularly when the LHS doesn't perfectly match the length of RHS, reversing flow via exception:

...

If Python really supported pattern matching (which I would 100% love! yes please), and unpacking was pattern matching, the above would succeed because zero matches zero. Pattern matching is used extensively in Erlang/Elixir for selecting between various kinds of clauses (function, case, etc), but you also see *significant* use of the `[a, b | _] = RHS` construct to ignore "the remainder" because 99% of the time what you really want is to [sometimes!] match a few things, bind a few others, and ignore what you don't understand or need. This is why destructuring Elixir maps or JS objects never expect (or even support AFAIK) exact-matching the entire object... it would render this incredible feature next to useless! *Destructuring is opportunistic if matching succeeds*. For the latter, tuples-are-commas-unless-they-are-parens :-), I suspect I'm very much in the minority here. While Python is one of my favorite languages, it's only 1 of 10, and I didn't learn it until I was already 4 languages deep. It's easy to forget how odd tuples are because they are so baked in, but I've had the "well, ehm, comma is the tuple constructor... usually" or "well, ehm, you are actually returning 1 tuple... not 2 things" conversation with *dozens* of seasoned developers. Even people professionally writing Python for months or more. Other languages use more explicit, visually-bounded object constructors. This makes a meaningful difference in how a human's intuition interprets the meaning of a new syntax. *Objects start and end but commas have no inherent boundaries*. These two things combined lead to unpacking problems because I look at all assignments through the lens of destructuring (first observation) and unpacking almost never uses parentheses (second observation). To illustrate this better, the following is how my mind initially parses different syntax contrasted with what's actually happening (and thus the correction I've internalized over a decade):

...

The tuple thing in particular takes non-zero time to internalize. I consider it one of Python's warts, attributed to times explained and comparisons with similar languages. Commas at the top-level, with no other construction-related syntax, look like expression groups or multiple returns. You have to already know Python's implicit tuple quirks to rationalize what it's really doing. This helps explain why I suggested the `LHS0, LHS1 = *RHS` syntax, because it would read "expand RHS[0] into RHS[:]". Thanks, -- C Anthony

Greg Ewing

9:51 p.m.

C Anthony Risinger wrote:

...

* Unpacking is destructuring not pattern matching.

We're just arguing about the definition of terms now. The point I was making is that unpacking is fundamentally a declarative construct, or at least that's how I think about it. I used the term "pattern matching" because that's something unambiguously declarative. Terms like "unpacking" and "destructuring" can be misleading to the uninitiated, because they sound like they're doing something destructive to the original object.

...

* Tuple syntax is commas, paren, one, or both.

The only situation where parentheses make a tuple is the case of the 0-tuple. Even then, you could argue that the tuple is really the empty space between the parens, and the parens are only there to make it visible. :-) I agree that this is out of step with mathematical convention, but it does make multiple-value returns look nice. There you don't really want to have to think about the fact that there's a tuple involved. -- Greg

Kirill Balunov

November 2017

9:54 p.m.

2017-11-27 19:23 GMT+03:00 Paul Moore <p.f.moore@gmail.com>:

...

While isclice is something equivalent, it can be used in places like: x, y = seq[:2] x, y, z = seq[:3] With kind regards, -gdg

Paul Moore

10:27 p.m.

On 27 November 2017 at 21:54, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

But in those places, x, y, *_ = seq works fine at the moment. So if the programmer didn't feel inclined to use x, y, *_ = seq, there's no reason to assume that they would get any benefit from x, y, ... = seq either. Paul

Greg Ewing

9:18 p.m.

Chris Angelico wrote:

...

The problem is that it depends on internal whitespace to distinguish it from augmented assignment;

Ah, didn't spot that. I guess the ellipsis is the next best thing then. An alternative would be to require parens: (x, y, *) = z -- Greg

Guido van Rossum

9:24 p.m.

On Mon, Nov 27, 2017 at 1:18 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

But that would have the same issue. Is this problem really important enough that it requires dedicated syntax? Isn't the itertools-based solution good enough? (Or failing that, couldn't we add something to itertools to make it more readable rather than going straight to new syntax?) -- --Guido van Rossum (python.org/~guido)

Chris Angelico

9:32 p.m.

On Tue, Nov 28, 2017 at 8:24 AM, Guido van Rossum <guido@python.org> wrote:

...

I don't think there's much that can be done without syntax; the biggest problem IMO is that you need to tell islice how many targets it'll be assigned into. It needs some interpreter support to express "grab as many as you have targets for, leaving everything else behind" without stating how many that actually is. So the question is whether that is sufficiently useful to justify extending the syntax. There are a number of potential advantages and several competing syntax options, and this suggestion keeps coming up, so I think a PEP is warranted. ChrisA

Guido van Rossum

9:49 p.m.

On Mon, Nov 27, 2017 at 1:32 PM, Chris Angelico <rosuav@gmail.com> wrote:

...

OK, that's reasonable, and at first blush the ellipsis proposal looks okay. My PEP queue for Python 3.7 is full though, so I would like to put this off until 3.8. -- --Guido van Rossum (python.org/~guido)

Chris Angelico

9:52 p.m.

On Tue, Nov 28, 2017 at 8:49 AM, Guido van Rossum <guido@python.org> wrote:

...

My PEP queue for Python 3.7 is full though, so I would like to put this off until 3.8.

Yeah, I don't think this could reasonably be raced into 3.7 even if it were critically important, and it's not. 3.8 will be fine. Kirill, do you want to spearhead the discussion? I'm happy to help out. ChrisA

Kirill Balunov

7:48 a.m.

2017-11-28 0:52 GMT+03:00 Chris Angelico <rosuav@gmail.com>:

...

Yes of course! With kind regards, -gdg

Steve Barnes

6 a.m.

On 27/11/2017 21:49, Guido van Rossum wrote:

...

Can we please take a note to ensure any future PEP clearly states which ellipsis (personally I prefer the first) of: - as 3 consecutive full stop characters (U+002E) i.e. ... - the Chicago system of 3 space separated full stops . . . - Unicode Horizontal ellipsis (U+2026) (at least there is a keyboard short cut for this) … - Unicode Midline horizontal ellipsis (U+22EF) ⋯ - any of the other ellipsis characters (https://en.wikipedia.org/wiki/Ellipsis#Computer_representations) As clarifying this early could save a lot of later discussion such as the recent minus, hyphen, underscore debate. -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com

Serhiy Storchaka

10:17 p.m.

27.11.17 23:24, Guido van Rossum пише:

...

I want to remind PEP 204 and PEP 212. The special purposed syntaxes were proposed to help solving much more common problems. They were rejected in favor of builtins range() and enumerate(). And we are happy with these builtins. The function for solving this problem already exists. It's itertools.islice(). It isn't builtin, but this problem is much less common than use cases for range() and enumerate(). If we don't have special non-function syntax for range(), enumerate(), zip(), itertools.chain(), itertools.count(), itertools.repeat(), etc, I don't think we should have a special syntax for one particular case of using itertools.islice().

Greg Ewing

10:40 p.m.

Guido van Rossum wrote:

...

Is this problem really important enough that it requires dedicated syntax? Isn't the itertools-based solution good enough?

Well, it works, but it feels very clumsy. It's annoying to have to specify the number of items in two places. Also, it seems perverse to have to tell Python to do *more* stuff to mitigate the effects of stuff it does that you didn't want it to do in the first place. Like I said, I'm actually surprised that this doesn't already work. To me it feels more like filling in a piece of functionality that was overlooked, rather than adding a new feature. Filling in a pothole in the road rather than bulding a new piece of road. (Pushing the road analogy maybe a bit too far, the current itertools solution is like digging *more* potholes to make the road bumpy enough that you don't notice the first pothole.)

...

I'm not sure how we would do that. Even if we could, it would still feel clumsy having to use anything from itertools at all. -- Greg

Mike Miller

2:31 a.m.

Hmm, I didn't like the options below because they say to me, "consume everything:" x, y, * = iterable x, y, ... = iterable Believe the question behind the idea was, how to grab a couple items and then *stop?* If the syntax route is chosen, I'd expect something that tells me it is going to stop, like a "full stop" as the period/dot is called in jolly ol' England, e.g.: x, y, . = iterable Not sure about the second comma though. -Mike On 2017-11-27 13:18, Greg Ewing wrote:

...

Steven D'Aprano

6:15 a.m.

On Mon, Nov 27, 2017 at 06:31:28PM -0800, Mike Miller wrote:

...

Sadly, that fails the "syntax should not look like grit on Tim's monitor" test. Ellipsis at least has three pieces of grit in sequence, which makes it more noticable.

...

Not sure about the second comma though.

Without the comma, it will be visually too hard to distinguish from x, y , = iterable -- Steve

electronnn＠gmail.com

6:24 a.m.

How about x, y ?= iterable with ?= being the lazy assignment operator? On Tue, Nov 28, 2017 at 9:45 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Koos Zevenhoven

4:53 p.m.

On Mon, Nov 27, 2017 at 3:55 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

I can see where this is coming from, but I wrote about it in a new thread: "generator vs iterator etc. (was: How assignment should work with generators?)".

...

Making iterators behave like sequences (slicing etc.) introduces various issues including memory considerations and backwards compatibility. That's why the `views` package [1] keeps a clear separations between sequences and iterators. IterABLES are a bit fuzzy here, but they at least should be able to produce an iterator. I should have time to discuss this more at a later point, if needed. —Koos [1] https://github.com/k7hoven/views -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Steven D'Aprano

12:25 a.m.

On Mon, Nov 27, 2017 at 06:53:23PM +0200, Koos Zevenhoven wrote:

...

Making iterators behave like sequences (slicing etc.) introduces various issues including memory considerations and backwards compatibility.

Perhaps. I'm not going to actively champion the idea of supporting slicing for iterators, but the idea I had was for no more than having iterator[start:stop:step] to do *exactly* what itertools.islice(iterator, start, stop, step) does now.

...

I don't think the concept of iterable is fuzzy: they are a superset of iterators. To be precise, an iterable is something which supports iteration, which means it must either: - support the iterator protocol with __iter__ and __next__ raising StopIteration; - or support the classic sequence protocol with __getitem__ raising IndexError at the end of the sequence. I think that's the only two possibilities. That covers (all?) collections, sequences, lists, lazy computed sequences like range, iterators, generators, etc. https://docs.python.org/3/glossary.html#term-iterable -- Steve

Chris Angelico

November 2017

9:40 a.m.

On Mon, Nov 27, 2017 at 8:17 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

This, AIUI, is the nub of the proposal. I don't like the proposed solution, but if there's some alternative way to say "and then ignore the rest", that would be much safer. You said:

...

3. Optimized case for x,y,*z = iterator

Kirill Balunov

10:45 a.m.

2017-11-27 12:40 GMT+03:00 Chris Angelico <rosuav@gmail.com>:

...

On Mon, Nov 27, 2017 at 8:17 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...
In many cases it is possible to do this right now, but in too verbose way:

...
...
...
x, y = islice(gen(), 2)

But it seems for me that:

...
...
...
x, y = gen()

Looks the same, easy to follow and not so verbose.

This, AIUI, is the nub of the proposal.

Yes, it is. For me, x,y = gen() underlines the lazy nature of generators.

...

I don't like the proposed solution, but if there's some alternative way to say "and then ignore the rest", that would be much safer.

...

You said:

...
3. Optimized case for x,y,*z = iterator

It has to be semantically different from that, though. Consider these two generators:

def gen1(): for _ in range(5): print("Yielding!") yield "spam" yield "ham"

def gen2(): yield 1 yield 2 yield 3 while True: yield 4

If you use "x, y, *z = gen1()", you'll trigger all the prints and completely consume the generator. With gen2(), you'll get an infinite loop. Both of those are semantically different from the islice behaviour, which would consume only that part that you're looking for.

...

x, y = iter # your proposal

As I wrote above, I would like to see them equivalent x, y, = iter # omit last destination

...

x, y, * = iter # unpack into nothing

Maybe, I like it.

...

x, y, ... = iter # assigning to Ellipsis x, y, *... = iter # as above but clearly sequencing

...

And there are a few others too. Every one of them has its downsides, none is perfect. (Personally, I think one of the Ellipsis options is likely the best, or perhaps the least-bad.) Want to spearhead the PEP?

Chris Angelico

11:02 a.m.

On Mon, Nov 27, 2017 at 9:45 PM, Kirill Balunov <kirillbalunov@gmail.com> wrote:

...

Kirill Balunov

12:31 p.m.

...

In terms of language proposals, you can't just say "don't need values for"; the semantics have to be EITHER "consume and discard" OR "don't consume". We already have a perfectly good way of spelling "consume and discard":

x, y, _ = iter

You mean ( x, y, *_ = iter ) ? Since this has to be about non-consumption of the generator/iterator,

...

Ellipsis cannot be a zero-length deque. Thus this syntax would have to be restricted to the *last* entry, and it then means "don't check for more elements".

Yes, you are right to the *last* entry. (*last* depends on proposed syntax (spelling)).

...

The proposed semantics, if I understand you correctly, are:

try: _iter = iter(it) x = next(_iter) y = next(_iter) except StopIteration: raise ValueError # no "else" clause, we're done here

Yes, "roughly" this semantics is proposed, with some assumptions on _iter = iter(it). As I can see at the moment, these cases should behave differently:

...

...
...
x, y = [1,2,3,4] # must raise ValueError x, y = iter([1,2,3,4]) # should work

But at the same time, it violates current situation. So maybe, as you have said we need special syntax. I will think about it.

...

Start by perusing PEP 1, and the template in PEP 12:

https://www.python.org/dev/peps/pep-0001/ https://www.python.org/dev/peps/pep-0012/

The PEP editors (myself included) are here to help you; don't hesitate to reach out with questions.

Thank you! With kind regards, -gdg