Mailman 3 Bare wildcard in de-structuring to ignore remainder and stop iterating (restart) - Python-ideas

newer
Null wildcard in de-structuring to...

Bare wildcard in de-structuring to ignore remainder and stop iterating (restart)

older
Add a line_offsets() method to str

Steve Jorgensen

June 16, 2022

10:50 p.m.

Restarting this with an improved title "Bare" vs "Raw", and I will try not to digress so much in the new thread. My suggestion is to allow a bare asterisk at the end of a desctructuring expression to indicate that additional elements are to be ignored if present and not iterated over if the rhs is being evaluated by iterating. (first, second, *) = items This provides a way of using destructuring from something that will be processed by iterating and for which the number of items might be very large and/or accessing of successive items is expensive. As Paul Moore pointed out in the original thread, itertools.islice can be used to limit the number of items iterated over. That's a nice solution, but it required knowing or thinking of the solution, an additional import, and repetition of the count of items to be destrucured at the outermost nesting level on the lhs. What are people's impressions of this idea. Is it valuable enough to pursue writing a PEP? If so, then what should I do in writing the PEP to make sure that it's somewhat close to something that can potentially be accepted? Perhaps, there is a guide for doing that?

Show replies by date

Chris Angelico

June 2022

11:39 p.m.

On Fri, 17 Jun 2022 at 12:52, Steve Jorgensen <stevecjor@gmail.com> wrote:

...

Restarting this with an improved title "Bare" vs "Raw", and I will try not to digress so much in the new thread.

My suggestion is to allow a bare asterisk at the end of a desctructuring expression to indicate that additional elements are to be ignored if present and not iterated over if the rhs is being evaluated by iterating.

(first, second, *) = items

This provides a way of using destructuring from something that will be processed by iterating and for which the number of items might be very large and/or accessing of successive items is expensive.

Important point: This is distinctly different from putting a dummy variable there: first, second, *_ = items as this will iterate over the rest of items. What you're proposing is actually a *removal* of a normal check - after unpacking two elements from items and assigning them to first and second, the interpreter normally queries the iterator once more and raises an error if it doesn't StopIteration. So for generators, adding the trailing asterisk will mean it doesn't try to pump it further.

...

As Paul Moore pointed out in the original thread, itertools.islice can be used to limit the number of items iterated over. That's a nice solution, but it required knowing or thinking of the solution, an additional import, and repetition of the count of items to be destrucured at the outermost nesting level on the lhs.

What are people's impressions of this idea. Is it valuable enough to pursue writing a PEP?

I think it's a valuable idea, though I don't think it needs a PEP yet. When the time comes, I'd be happy to help out with that aspect of things.

...

If so, then what should I do in writing the PEP to make sure that it's somewhat close to something that can potentially be accepted? Perhaps, there is a guide for doing that?

Before you get to that point, how comfortable are you with "kicking the tires" on this by putting together a basic proof-of-concept implementation? Sometimes, the best way to find out the potential problems is to just try doing it. Syntactically and semantically, this looks pretty straight-forward, but it's always possible for something weird to sneak past your notice. For instance, are there any bizarre situations in which this could become ambiguous? Currently, "a, b, = x" is perfectly valid, and "a, b, *= x" errors out saying that augmented assignment doesn't make sense with a tuple target, so I think you're fine; safest to check though. Will this syntax be supported in a match/case statement? It's probably not as useful (since "*_" won't actually bind, and since they only match sequences, not arbitrary iterables), but might be useful to maintain the parallel. ChrisA

Steven D'Aprano

5:35 a.m.

On Fri, Jun 17, 2022 at 02:50:50AM -0000, Steve Jorgensen wrote:

...

What are people's impressions of this idea. Is it valuable enough to pursue writing a PEP?

I don't think it is useful enough to dedicate syntax to it. If you are proposing this idea, it is your job to provide evidence that it is useful. That should be actual, real-world use-cases, not just toy snippets like `(first, second, *) = items` with no context. Examples of where and why people would use it. **Especially** the why part. Examples of the work-arounds people have to use in its place, or reasons why islice won't work. "People don't know about islice" is not a convincing argument -- people won't know about this either. Actual code is much more convincing than made up examples. Code from the stdlib that would benefit from this is a good place to start.

...

If so, then what should I do in writing the PEP to make sure that it's somewhat close to something that can potentially be accepted? Perhaps, there is a guide for doing that?

Read the PEPs. Start with the PEP 1, which is exactly the guide you are looking for. Then PEP 5, although it probably won't apply to this. (But it is useful to know regardless.) https://peps.python.org/pep-0001/ https://peps.python.org/pep-0005/ I suggest you read a variety of both successful and unsuccessful PEPs. I recommend PEPs 450, 506 and 584 as *especially* good, not that I'm the least bit biased *wink* This is also a good PEP to read, as it is an example of an extremely controversial (at the time) PEP that nevertheless was successful: https://peps.python.org/pep-0572/ This is another PEP which was, believe it or not, controversial at the time: https://peps.python.org/pep-0285/ This is an example of an excellent PEP that gathered support from stakeholders in the Numpy community before even raising the issue on this mailing list: https://peps.python.org/pep-0465/ There are many more excellent PEPs, I have just mentioned a few of my personal favs. Others may have other opinions. Remember that even the best PEPs may be rejected or deferred, and resist the temptation to attribute all criticism to bad faith and spite. Don't be That Guy. This is an excellent blog post to read: https://www.curiousefficiency.org/posts/2011/04/musings-on-culture-of-python... I recommend that you gather feedback from a variety of places, starting here. The Ideas topic on Python's Discourse is another good place. You might also try Reddit's r/python and the "Python Forum" here: https://python-forum.io and perhaps the comp.lang.python newsgroup, also available as a mailing list. Be prepared for a ton of bike-shedding. People may hate the syntax even if they like the idea. https://en.wikipedia.org/wiki/Law_of_triviality Good luck! -- Steve

Steve Jorgensen

7:32 a.m.

Steve Jorgensen wrote:

...

Restarting this with an improved title "Bare" vs "Raw", and I will try not to digress so much in the new thread. My suggestion is to allow a bare asterisk at the end of a desctructuring expression to indicate that additional elements are to be ignored if present and not iterated over if the rhs is being evaluated by iterating. (first, second, *) = items This provides a way of using destructuring from something that will be processed by iterating and for which the number of items might be very large and/or accessing of successive items is expensive. As Paul Moore pointed out in the original thread, itertools.islice can be used to limit the number of items iterated over. That's a nice solution, but it required knowing or thinking of the solution, an additional import, and repetition of the count of items to be destrucured at the outermost nesting level on the lhs. What are people's impressions of this idea. Is it valuable enough to pursue writing a PEP? If so, then what should I do in writing the PEP to make sure that it's somewhat close to something that can potentially be accepted? Perhaps, there is a guide for doing that? First, thanks very much for the thoughtful and helpful replies so far.

Since my last message here, I have noticed a couple of issues with the suggestion. 1. In a function declaration, the bare "*" specifically expects to match nothing, and in this case, I am suggesting that it have no expectation. That's a bit of a cognitive dissonance. 2. The new structural pattern matching that was introduced in Python 3.10 introduces a very similar concept by using an underscore as a wildcard that matches and doesn't bind to anything. That leads me to want to change the proposal to say that we give the same meaning to "_" in ordinary destructuring that it has in structural pattern matching, and then, I believe that a final "*_" in the expression on the left would end up with exactly the same meaning that I originally proposed for the bare "*". Although that would be a breaking change, it is already conventional to use "_" as a variable name only when we specifically don't care what it contains following its assignment, so for any code to be affected by the change would be highly unusual.

Chris Angelico

8:47 a.m.

On Fri, 17 Jun 2022 at 21:35, Steve Jorgensen <stevecjor@gmail.com> wrote:

...

Steve Jorgensen wrote:

...
Restarting this with an improved title "Bare" vs "Raw", and I will try not to digress so much in the new thread. My suggestion is to allow a bare asterisk at the end of a desctructuring expression to indicate that additional elements are to be ignored if present and not iterated over if the rhs is being evaluated by iterating. (first, second, *) = items This provides a way of using destructuring from something that will be processed by iterating and for which the number of items might be very large and/or accessing of successive items is expensive. As Paul Moore pointed out in the original thread, itertools.islice can be used to limit the number of items iterated over. That's a nice solution, but it required knowing or thinking of the solution, an additional import, and repetition of the count of items to be destrucured at the outermost nesting level on the lhs. What are people's impressions of this idea. Is it valuable enough to pursue writing a PEP? If so, then what should I do in writing the PEP to make sure that it's somewhat close to something that can potentially be accepted? Perhaps, there is a guide for doing that? First, thanks very much for the thoughtful and helpful replies so far.

Since my last message here, I have noticed a couple of issues with the suggestion.

1. In a function declaration, the bare "*" specifically expects to match nothing, and in this case, I am suggesting that it have no expectation. That's a bit of a cognitive dissonance.

2. The new structural pattern matching that was introduced in Python 3.10 introduces a very similar concept by using an underscore as a wildcard that matches and doesn't bind to anything.

That leads me to want to change the proposal to say that we give the same meaning to "_" in ordinary destructuring that it has in structural pattern matching, and then, I believe that a final "*_" in the expression on the left would end up with exactly the same meaning that I originally proposed for the bare "*".

Be careful here of another subtle distinction. match X: case (a, b): ... This is a *sequence* pattern, and will unpack something that follows sequence protocol and has a length of 2. (a, b) = X This is *iterable* unpacking (although, just to muddy the waters, CPython's bytecode disassembly will call it UNPACK_SEQUENCE); it will attempt to iterate over X, taking three elements from it, and as long as it gets two and then gets StopIteration, it assigns them to a and b. So, for instance, multiple assignment will happily unpack a generator: a, b = (lambda: ((yield 1), (yield 2)))() But if you use that in a case statement, it won't match. IMO you should be safe to define the semantics for a bare asterisk in multiple assignment without being overly bothered by the match statement, since you have the possibility of consumable and/or infinite iterables. That means that there's a fundamental difference between "*_" and simply not iterating over it - for instance, it wouldn't make sense to write this: id_generator = itertools.count(1) base, variant, *_ = id_generator But it could well make very good sense to do that with a bare asterisk at the end. It's up to you to define the semantics though.

...

Although that would be a breaking change, it is already conventional to use "_" as a variable name only when we specifically don't care what it contains following its assignment, so for any code to be affected by the change would be highly unusual.

Personally, I don't think that's an acceptable breaking change. Even for the match statement, where the syntax was completely new, there was a LOT of debate about making "_" so special in this way - everywhere else, it's simply a name like any other (and has a couple of important uses). For something where existing code uses "*_", suddenly ceasing to bind the variable is risky. ChrisA

Jonathan Fine

9:10 a.m.

Hi Consider >>> a, b, *_ = iter('abdef') >>> a, b, None = iter('abdef') File "<stdin>", line 1 SyntaxError: can't assign to keyword If providing this feature is found to be a good idea, it might be better to use 'None' or even a new keyword rather than '*'. Obvious benefits is it avoids further overloading '*', reduces the opportunity for a fat-fingers error, and a lazy eyes code review error. It's also easier to check a source file for use of this new feature. If you can't find a good keyword for this feature, then that would suggest that it's not a good idea. -- Jonathan

MRAB

11:23 a.m.

On 2022-06-17 14:10, Jonathan Fine wrote:

...

Hi

Consider >>> a, b, *_ = iter('abdef') >>> a, b, None = iter('abdef') File "<stdin>", line 1 SyntaxError: can't assign to keyword

If providing this feature is found to be a good idea, it might be better to use 'None' or even a new keyword rather than '*'. Obvious benefits is it avoids further overloading '*', reduces the opportunity for a fat-fingers error, and a lazy eyes code review error. It's also easier to check a source file for use of this new feature.

If you can't find a good keyword for this feature, then that would suggest that it's not a good idea.

How about "..."?

...

...
...
a, b, *_ = iter('abdef') a, b, ... = iter('abdef') File "<stdin>", line 1 a, b, ... = iter('abdef') ^^^ SyntaxError: cannot assign to ellipsis here. Maybe you meant '==' instead of '='?

Steve Jorgensen

3:22 p.m.

Also in reply to Paul & Stephen, … Yes. I really like the idea of using the ellipsis in the expression on the left. It avoids any breaking changes, avoids adding new semantics to '*', and also reads quite well.

Paul Moore

8:47 a.m.

On Fri, 17 Jun 2022 at 12:34, Steve Jorgensen <stevecjor@gmail.com> wrote:

...

Although that would be a breaking change, it is already conventional to use "_" as a variable name only when we specifically don't care what it contains following its assignment, so for any code to be affected by the change would be highly unusual.

a, b, *_ = iterator() seems like it would be a fairly common pattern to read 2 values and then consume the iterator (for side effects, for example, or simply to avoid the "must return exactly 2 values" error). I'm not sure if I've ever used this myself, but you need to be *very* cautious about asserting that a breaking change is unlikely to cause issues... Paul

Stephen J. Turnbull

10:41 a.m.

Steve Jorgensen writes:

...

That leads me to want to change the proposal to say that we give the same meaning to "_" in ordinary destructuring that it has in structural pattern matching,

This is already valid syntax with different semantics. Given the existence of islice, there's really no excuse for breaking this, if this were the only way to implement your syntax I'd be a solid -1 (as it is I'm a -0 perhaps maybe I don't know ;-). I could imagine a token other than "*" being used for this to avoid the potential confusion or typo between "a, * = x" and "a, *_ = x", such as Ellipsis or even None. (Both are currently "cannot assign to" errors.) The best list of use cases for ellipsis I've found in a quick look is https://python.land/python-ellipsis, and I don't see any gotchas for your syntax (with ellipsis instead of star) there, but I also didn't look very hard, and didn't think about the walrus operator which I guess is a potential gotcha generator. Steve

Steven D'Aprano

2:15 a.m.

On Fri, Jun 17, 2022 at 11:32:09AM -0000, Steve Jorgensen wrote:

...

That leads me to want to change the proposal to say that we give the same meaning to "_" in ordinary destructuring that it has in structural pattern matching, and then, I believe that a final "*_" in the expression on the left would end up with exactly the same meaning that I originally proposed for the bare "*".

Although that would be a breaking change, it is already conventional to use "_" as a variable name only when we specifically don't care what it contains following its assignment, so for any code to be affected by the change would be highly unusual.

Not so: it is very common to use `_()` as a function in internationalisation. https://stackoverflow.com/questions/3077227/mercurial-python-what-does-the-u... If we are bike-shedding symbols for this feature, I am a bit dubious about the asterisk. It already gets used in so many places, and it can be confused for `a, b, *x` with the name x lost. What do people think about first, second, / = items where / stands for "don't advance the iterator"? I like it because it reminds me of the slash in "No Smoking" signs, and similar. As in "No (more) iteration". -- Steve

Lucas Wiman

3:21 a.m.

Using either * or / could lead to some odd inconsistencies where a missing space is very consequential, eg: x, / = foo # fine x, /= foo # syntax error? x / = foo # syntax error x /= foo # fine, but totally different from the first example. That said, the * syntax feels intuitive in a way that / doesn’t. I’d suggest: x, *… = foo This seems unambiguous and fairly self-explanatory. - Lucas On Sat, Jun 18, 2022 at 11:23 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Fri, Jun 17, 2022 at 11:32:09AM -0000, Steve Jorgensen wrote:

...
That leads me to want to change the proposal to say that we give the same meaning to "_" in ordinary destructuring that it has in structural pattern matching, and then, I believe that a final "*_" in the expression on the left would end up with exactly the same meaning that I originally proposed for the bare "*".

Although that would be a breaking change, it is already conventional to use "_" as a variable name only when we specifically don't care what it contains following its assignment, so for any code to be affected by the change would be highly unusual.

Not so: it is very common to use `_()` as a function in internationalisation.

https://stackoverflow.com/questions/3077227/mercurial-python-what-does-the-u...

If we are bike-shedding symbols for this feature, I am a bit dubious about the asterisk. It already gets used in so many places, and it can be confused for `a, b, *x` with the name x lost.

What do people think about

first, second, / = items

where / stands for "don't advance the iterator"?

I like it because it reminds me of the slash in "No Smoking" signs, and similar. As in "No (more) iteration".

-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B2IIGM... Code of Conduct: http://python.org/psf/codeofconduct/

Stephen J. Turnbull

6:07 a.m.

Lucas Wiman writes:

...

That said, the * syntax feels intuitive in a way that / doesn’t.

I disagree. In C-like languages, it says "dereference a pointer" (ie, use the content at the pointer). In Python, it's used for destructuring iterables, ie, use the content at the iteration pointer by packing or unpacking. By contrast, "/" doesn't have a unary meaning in any language I know of (well, Lisp, but in Lisp it's just a symbol that happens to have a built-in function definition).

...

I’d suggest: x, *… = foo This seems unambiguous and fairly self-explanatory.

I advocated just "..." myself, so obviously I'm biased, but I don't see what prepending "*" says that "..." by itself doesn't. Steven d'Aprano wrote:

...

...
I like "/" because it reminds me of the slash in "No Smoking" signs, and similar. As in "No (more) iteration".

And in Python its only non-binary use is to say "no more positional- only parameters." I like "..." better for its suggestion that there's more to come ;-) but objectively I guess "/" is just as good. :-)

Steven D'Aprano

6:59 a.m.

On Sun, Jun 19, 2022 at 12:21:50AM -0700, Lucas Wiman wrote:

...

Using either * or / could lead to some odd inconsistencies where a missing space is very consequential, eg: x, / = foo # fine x, /= foo # syntax error? x / = foo # syntax error x /= foo # fine, but totally different from the first example.

Good point! Despite what people say, Python does not actually have significant whitespace (or at least, no more than most other languages). It has significant *indentation*, which is not quite the same. So in general, although we *recommend* spaces around the equals sign, we shouldn't *require* it. If `/=` and `/ =` have different meanings, then we shouldn't use the slash for this. Likewise for the asterisk `* =`.

...

That said, the * syntax feels intuitive in a way that / doesn’t. I’d suggest: x, *… = foo This seems unambiguous and fairly self-explanatory.

"Self-explanatory". This is how we got Perl and APL o_O What do we need the star for? x, *... = items x, ... = items Convince me that adding another meaning for the star symbol is a good idea. (That's assuming that we want the feature at all.) -- Steve

Lucas Wiman

10:43 a.m.

...

"Self-explanatory". This is how we got Perl and APL o_O

What I mean is that if you already know the destructuring syntax, then it’s pretty clear what it means. If you already know existing syntax, / isn’t suggesting of anything. The only related syntax is for declaring positional-only arguments, which is very uncommon and I usually have to look up. Using arbitrary unrelated symbols because they happen to be available is how we got Perl. What do we need the star for?

...

x, *... = items x, ... = items

Convince me that adding another meaning for the star symbol is a good idea

I don’t see it as another meaning for the star symbol. It’s an extension to an existing meaning. The only argument I can give is based on intuition having written Python for years. From a code reader’s perspective, you often guess meaning based on the rest of the language and the context rather than memorizing the entire grammar. Consider the following: x, *remainder = items # 1 x, *_ = items # 2 x, *… = items # 3 x, … = items # 4 (1) and (2) are valid syntax. They are how one would currently write the concept under discussion, other than the optimizations of not allocating a list or advancing the iterator. (3) has a pretty clear analogy to (2). If you basically know Python 3.10 syntax and the interpreter is telling you that (3) is valid Python code, there are two thing this could possibly mean: (a) [wrong] … is an identifier that can be assigned to, similar to _. Its value will be the rest of the list/iterator, but given what … means in English and other Python code, you’re being told by the code author that you don’t care. The main difference with (2) is that _ is used in internationalization, which makes assigning to it a bad idea in some code. (b) [correct] You recall that … is actually a singleton object that cannot be assigned to. This suggests the intended meaning in this thread. Since … cannot be assigned to, the rest of the list/iterator is probably not being used for anything. What’s notable about this is that the wrong interpretation is still mostly correct, eg if this syntax is used on a list. With (4), you could have the same misunderstanding about whether … is a special-case identifier name, but then your interpretation becomes completely wrong as opposed to slightly wrong. In other words, * is telling you that this is variable-length object destructuring syntax, and it has a similar meaning to (1) and (2). So I think your question is backwards. Omitting the * means you are genuinely introducing a new syntax for something that is extremely similar to existing syntax, as opposed to slightly altering existing syntax for a slightly different meaning. Best wishes, Lucas

Steven D'Aprano

10:47 a.m.

Okay, I'm convinced. If we need this feature (and I'm not convinced about that part), then it makes sense to keep the star and write it as `spam, eggs, *... = items`. -- Steve

Steve Jorgensen

1:40 p.m.

Steven D'Aprano wrote:

...

Okay, I'm convinced. If we need this feature (and I'm not convinced about that part), then it makes sense to keep the star and write it as `spam, eggs, *... = items`.

I thought about that, but to me, there are several reasons to not do that and to have the ellipsis mean multiple rather than prepending * for that: 1. In common usage outside of programming, the ellipsis means a continuation and not just a single additional thing. 2. Having `*...` mean any number of things implies that `...` means a single thing, and I don't think there is a reason to match 1 thing but not assign it to a variable. It is also already fine to repeat `_` in the left side expression. 3. I am guessing (though I could be wrong) that support for `*...` would be a bigger change and more complicated in the Python source code.

Paul Moore

2:03 p.m.

On Mon, 20 Jun 2022 at 18:42, Steve Jorgensen <stevecjor@gmail.com> wrote:

...

Steven D'Aprano wrote:

...
Okay, I'm convinced. If we need this feature (and I'm not convinced about that part), then it makes sense to keep the star and write it as `spam, eggs, *... = items`.

I thought about that, but to me, there are several reasons to not do that and to have the ellipsis mean multiple rather than prepending * for that: 1. In common usage outside of programming, the ellipsis means a continuation and not just a single additional thing. 2. Having `*...` mean any number of things implies that `...` means a single thing, and I don't think there is a reason to match 1 thing but not assign it to a variable. It is also already fine to repeat `_` in the left side expression. 3. I am guessing (though I could be wrong) that support for `*...` would be a bigger change and more complicated in the Python source code.

Also, while I can't speak for others, I found that when writing examples for posts here, the "*" in "*..." has too strong of a connection with "consume", and I *still* naturally read *... as "consume the rest" (even though it's not currently valid syntax, and the rules for what it *does* mean would be clear and unambiguous, etc etc). So for me at least, any syntax that uses a * would be too easy to misread. Paul

Mike Miller

3:21 p.m.

My first thought was next(), which I use occasionally: >>> items = (i for i in range(9)) >>> items <generator object <genexpr> at 0x7f33251766d0> >>> first, second = next(items), next(items) # 👀 >>> first, second (0, 1) >>> tuple(items) (2, 3, 4, 5, 6, 7, 8) No imports needed. Is this deficient for the use case in some way? -Mike

...

On Fri, Jun 17, 2022 at 11:32:09AM -0000, Steve Jorgensen wrote:

Chris Angelico

3:57 p.m.

On Mon, 20 Jun 2022 at 05:32, Mike Miller <python-ideas@mgmiller.net> wrote:

...

My first thought was next(), which I use occasionally:

>>> items = (i for i in range(9)) >>> items <generator object <genexpr> at 0x7f33251766d0>

>>> first, second = next(items), next(items) # 👀

>>> first, second (0, 1)

>>> tuple(items) (2, 3, 4, 5, 6, 7, 8)

No imports needed. Is this deficient for the use case in some way?

It's fine for exactly two elements, where you'll never need to adjust the code to want three, and where you know already that this is an iterator (not some other iterable). If you had five elements to unpack, it would be quite clunky, and even more so if you wanted to change the precise number of elements unpacked, as you'd have to match the number of next calls. ChrisA

Jeremiah Paige

2:03 a.m.

On Sun, Jun 19, 2022 at 1:01 PM Chris Angelico <rosuav@gmail.com> wrote:

...

On Mon, 20 Jun 2022 at 05:32, Mike Miller <python-ideas@mgmiller.net> wrote:

...
My first thought was next(), which I use occasionally:

It's fine for exactly two elements, where you'll never need to adjust the code to want three, and where you know already that this is an iterator (not some other iterable). If you had five elements to unpack, it would be quite clunky, and even more so if you wanted to change the precise number of elements unpacked, as you'd have to match the number of next calls.

What if next grew a new argument? Changing the signature of a builtin is a big change, but surely not bigger than new syntax? If we could ask for the number of items returned the original example might look like

...

...
...
first, second = next(iter(items), count=2)

I don’t think anyone who has started to learn python would be confused by this. And this new arg could be combined with the existing default to avoid possible exceptions.

...

...
...
spam, eggs, cheese = next(range(1), 9, count=3) spam, eggs, cheese … 0, 9, 9

I guess this is starting to look like the islice solution, but now it’s magically in the builtin namespace. I don’t recall ever using islice myself, but I would believe the one argument form to be the most commonly used.

Lucas Wiman

4:25 a.m.

Some background. PEP 3132 (https://peps.python.org/pep-3132/) lists the following:

...

Possible changes discussed were:

- Only allow a starred expression as the last item in the exprlist. This would simplify the unpacking code a bit and allow for the starred expression to be assigned an iterator. This behavior was rejected because it would be too surprising.

This seems to be a reference to this message:

https://mail.python.org/pipermail/python-3000/2007-May/007299.html Guido van Rossum said the following ( https://mail.python.org/pipermail/python-3000/2007-May/007378.html):

...

The important use case in Python for the proposed semantics is when you have a variable-length record, the first few items of which are interesting, and the rest of which is less so, but not unimportant. (If you wanted to throw the rest away, you'd just write a, b, c = x[:3] instead of a, b, c, *d = x.)

There was also discussion about retaining the type of the object on the RHS, e.g.: c, *rest = "chair" # c="r", rest="hair" it = iter(range(10)) x, *rest = it # c=1, rest is a reference to `it` That proposal was rejected because the types were too confusing, e.g.: header, *lines = open("some-file", "r") # lines is an iterator header, *lines, footer = open("some-file", "r") # lines is a list van Rossum later said ( https://mail.python.org/pipermail/python-3000/2007-May/007391.html):

...

From an implementation POV, if you have an unknown object on the RHS, you have to try slicing it before you try iterating over it; this may cause problems e.g. if the object happens to be a defaultdict -- since x[3:] is implemented as x[slice(None, 3, None)], the defaultdict will give you its default value. I'd much rather define this in terms of iterating over the object until it is exhausted, which can be optimized for certain known types like lists and tuples.

It seems like these objections don't apply in this case, if we define a syntax that explicitly says not to assign anything. There is no inconsistency in the types there. E.g. in the proposal here: header, *... = open("some-file", "r") header, *..., footer = open("some-file", "r") It's clear that to compute what the footer is, you would need to iterate over the whole file, whereas you don't in the first one. So historically, the idea here was discussed and rejected, but for a reason which does not apply in this case. ======= Regarding utility, there are many sort of ugly ways of doing this with method calls, especially from itertools. I tend to like syntax over methods for handling basic data types. This is partly because it's more readable: almost any method which takes more than one positional argument introduces cognitive load because you have to remember what the order of the arguments are and what they mean. You can add keyword arguments to improve readability, but then it's more characters and you have to remember the name or have it autocompleted. So if there is a simple way to support a use case with simple built-in syntax, it can improve the utility of the language. Like honestly, does anyone remember the arguments to `islice`? I'm fairly sure I've had to look it up every single time I've ever used it. For iterator-heavy code, this might be multiple times on the same day. For the `next(iterator, [default], count=1)` proposal, it's very easy to write incorrect code that might look correct, e.g. `next(iterator, 3)`. Does 3 refer to the count or the default? If you've written python for years, it's clear, but less clear to a novice. There are efficiency arguments too: method calls are expensive, whereas bytecode calls can be much more optimized. If you're already using iterators, efficiency is probably relevant:

...

...
...
import dis from itertools import islice def first_two_islice(it): ... return tuple(islice(it, 2)) ... def first_two_destructuring(it): ... x, y, *rest = it ... return x, y ... dis.dis(first_two_islice) 2 0 LOAD_GLOBAL 0 (tuple) 2 LOAD_GLOBAL 1 (islice) 4 LOAD_FAST 0 (it) 6 LOAD_CONST 1 (2) 8 CALL_FUNCTION 2 10 CALL_FUNCTION 1 12 RETURN_VALUE dis.dis(first_two_destructuring) 2 0 LOAD_FAST 0 (it) 2 UNPACK_EX 2 4 STORE_FAST 1 (x) 6 STORE_FAST 2 (y) 8 STORE_FAST 3 (rest)

3 10 LOAD_FAST 1 (x) 12 LOAD_FAST 2 (y) 14 BUILD_TUPLE 2 16 RETURN_VALUE The latter requires no expensive CALL_FUNCTION operations, though it does currently allocate rest pointlessly. Personally, I think the main use case would be for handling large lists in a memory efficient and readable manner. Currently using *_ means you have to balance readability against performance. Why is there that tradeoff? Does it serve literally any purpose? I think about this /every single time/ I use *object destructuring if I don't care about the *thing. But I don't want to think about how big *thing is: the language is forcing me to assign it a name and allocate memory for it. It would be a minor improvement to easily write an expression that is similarly readable, but does not have the performance penalty. The performance penalty will be minor in most cases, but you still have to think about whether it's minor or not, which is a cost of the existing syntax. === It seems like a lot of the arguments against this syntax would apply equally well to existing syntax. If using indexing, next, islice, etc. was good enough, why were PEPs like 3132, 448 or 636 approved? This proposal seems like a pretty natural extension of a trend in the last several versions of python to make these sorts of expressions more and more expressive. It's polishing a minor rough place in a syntax that's been developing for years, which seems like a good idea regardless of whether somewhat usable alternatives exist in the standard library. === Of course, all of that needs to be balanced against the complexity of the implementation. If it's ruinously complicated to add the feature, then the arguments above are very weak arguments. If it's simple to add (as it sounds like PEP 3132 was), then mere performance, consistency and readability seem like more compelling arguments. Best wishes, Lucas

Steven D'Aprano

6 a.m.

On Sun, Jun 19, 2022 at 11:03:45PM -0700, Jeremiah Paige wrote:

...

What if next grew a new argument? Changing the signature of a builtin is a big change, but surely not bigger than new syntax? If we could ask for the number of items returned the original example might look like

...
...
...
first, second = next(iter(items), count=2)

There are times where "Not everything needs to be a one liner" applies. # You can skip the first line if you know items is already an iterator. it = iter(items) first, second, third = (next(it) for i in range(3)) That's crying out to be made into a helper function. Otherwise our one-liner is: # Its okay to hate me for this :-) first, second, third = (lambda obj: (it:=iter(obj)) and (next(it) for i in range(3)))(items) But that's basically islice. So: # Its okay to put reusable helper functions in a module. # Not everything has to be syntax. first, second, third = itertools.islice(items, 3) I think that we have a working solution for this problem; the only argument is whether or not that problem is common enough, or special enough, or the solution clunky enough, to justify a syntax solution. -- Steve

Paul Moore

6:34 a.m.

On Mon, 20 Jun 2022 at 11:08, Steven D'Aprano <steve@pearwood.info> wrote:

...

But that's basically islice. So:

# Its okay to put reusable helper functions in a module. # Not everything has to be syntax. first, second, third = itertools.islice(items, 3)

I think that we have a working solution for this problem; the only argument is whether or not that problem is common enough, or special enough, or the solution clunky enough, to justify a syntax solution.

I think there's a lot of people (I'm not one of them) who prefer working with syntax rather than functions for "basic operations". Of course, what's "basic" is up for debate, but Lucas Wiman commented earlier "I tend to like syntax over methods for handling basic data types", and while I don't necessarily agree, I can see how people gravitate towards asking for syntax when built in data types are involved. In this case, there's also the need to explicitly state the count, which can be inferred from the LHS when using syntax, but not in a function call. And the (perceived or real?) performance issue with "function calls are slow". Ultimately, this type of proposal is mostly decided by a judgement on "what do we want the language to look like", which attracts subjective comments like "Python isn't Perl", or "it's a natural extension of existing syntax", or "it's more readable". But no-one here has the authority to declare what is or is not "Pythonic" - that authority is with the steering council. So we do our best to reach some sort of group consensus, and dump the hard questions on the SC (via a PEP). My sense is that a lot more people are coming to Python these days with an expectation that syntax-based solutions are OK, and the "old guard" (like myself!) are pushing more for the "not everything has to be syntax" arguments. Maybe I'm not sufficiently self-aware, and when I was newer to Python I too liked the idea of adding syntax more. I honestly can't remember (I did love list comprehensions when they were added, so I clearly wasn't always against syntax!). But I do think that the broad question of "should Python have more complex syntax" is probably a more fundamental debate that we won't resolve here. For the record, I think the islice solution is sufficient for this case. But I have needed this sort of thing occasionally, and islice didn't immediately come to mind - so I have sympathy with the discoverability argument. If a syntax like "a, b, *... = some_iterator" existed, I suspect I'd use it. But picking a syntax that *didn't* mislead me into assuming the iterator was fully consumed would be hard - I thought *... was OK, but writing it just now I realised I had to remind myself that it didn't consume everything, to the point where I'd probably add a comment if I was writing the code. Paul

Jonathan Fine

7:10 a.m.

Hi Some have liked adding a new syntax a, b, ... = iterable to mean consume two items from the iterable. However, a, b, Ellipsis = iterable has a different meaning (at least in Python 3.8). It redefines Ellipsis. (As an explicit constant, '...' can be redefined.) The syntax a, b, ... = iterable so to speak fills a gap in existing syntax, as the construct is at present invalid. I actually like gaps in syntax, for the same reason that I like a central reservation in a highway. The same goes for the hard shoulder / breakdown lane. The title of this thread includes the phrase 'Stop Iterating' (capitals added). This suggests the syntax a, b, StopIterating = iterable where StopIterating is a new keyword that can be used only in this context. I'd like to know what others think about this suggestion. -- Jonathan

Chris Angelico

10:05 a.m.

On Mon, 20 Jun 2022 at 21:11, Jonathan Fine <jfine2358@gmail.com> wrote:

...

Hi

Some have liked adding a new syntax a, b, ... = iterable to mean consume two items from the iterable. However, a, b, Ellipsis = iterable has a different meaning (at least in Python 3.8). It redefines Ellipsis. (As an explicit constant, '...' can be redefined.)

To clarify: The syntactic token '...' will always refer to the special object Ellipsis (at least back as far as Python 3.4 - can't remember when it became available in all contexts), but the name Ellipsis can be rebound. So even though, in many contexts, "x = Ellipsis" and "x = ..." will have the same net effect, they are distinct (one is a name lookup and the other is a constant), and they're definitely different in assignment. (Though it wouldn't surprise me if a future Python release adds Ellipsis to the set of non-assignable names, with None/True/False.)

...

The syntax a, b, ... = iterable so to speak fills a gap in existing syntax, as the construct is at present invalid. I actually like gaps in syntax, for the same reason that I like a central reservation in a highway. The same goes for the hard shoulder / breakdown lane.

The title of this thread includes the phrase 'Stop Iterating' (capitals added). This suggests the syntax a, b, StopIterating = iterable where StopIterating is a new keyword that can be used only in this context.

I'd like to know what others think about this suggestion.

Hard no. That is currently-legal syntax, and it's also clunky. I'd much rather the "assign to ..." notation than a weird new soft keyword that people are going to think is a typo for StopIteration. It's worth noting that the proposed syntax has a slight distinction from the normal asterisk notation, in that it makes perfect sense to write this: a, *_, b = thing but does not make sense to write this: a, ..., b = thing as the "don't iterate over this thing" concept doesn't work here. (Supporting this would require some way to reverse the iterator, and that's not a language guarantee.) ChrisA

Jonathan Fine

10:56 a.m.

Hi Some of us might believe that a currently legal syntax should only exceptionally be given a new meaning, even if there is no evidence whatsoever that this legal syntax is actually in use. My own belief is more pragmatic. If there's very strong evidence that the syntax is not in use, I'm happy to consider changing the meaning. I wrote:

...

The title of this thread includes the phrase 'Stop Iterating' (capitals added). This suggests the syntax a, b, StopIterating = iterable where StopIterating is a new keyword that can be used only in this context.

In response Chris wrote:

...

Hard no. That is currently-legal syntax, and it's also clunky.

Although a, b, StopIterating = iterable is currently legal syntax, I believe that no-one has ever used it in Python before today. My evidence is this search, which gives 25 pages. https://www.google.com/search?q=%22stopiterating%22+python&nfpr=1 These pages found by this search do match "StopIterating", but do not provide an example of their use in Python. https://stackoverflow.com/questions/19892204/send-method-using-generator-sti... https://julia-users.narkive.com/aD1Uin0y/implementing-an-iterator-which-cond... The following are copies of the stackoverflow page. https://mlink.in/qa/?qa=810675/ https://www.796t.com/post/MmFubjI=.html https://qa.1r1g.com/sf/ask/1392454311/ -- Jonathan

MRAB

11:38 a.m.

On 2022-06-20 15:05, Chris Angelico wrote:

...

On Mon, 20 Jun 2022 at 21:11, Jonathan Fine <jfine2358@gmail.com> wrote:

...
Hi

Some have liked adding a new syntax a, b, ... = iterable to mean consume two items from the iterable. However, a, b, Ellipsis = iterable has a different meaning (at least in Python 3.8). It redefines Ellipsis. (As an explicit constant, '...' can be redefined.)

To clarify: The syntactic token '...' will always refer to the special object Ellipsis (at least back as far as Python 3.4 - can't remember when it became available in all contexts), but the name Ellipsis can be rebound. So even though, in many contexts, "x = Ellipsis" and "x = ..." will have the same net effect, they are distinct (one is a name lookup and the other is a constant), and they're definitely different in assignment.

(Though it wouldn't surprise me if a future Python release adds Ellipsis to the set of non-assignable names, with None/True/False.)

...
The syntax a, b, ... = iterable so to speak fills a gap in existing syntax, as the construct is at present invalid. I actually like gaps in syntax, for the same reason that I like a central reservation in a highway. The same goes for the hard shoulder / breakdown lane.

The title of this thread includes the phrase 'Stop Iterating' (capitals added). This suggests the syntax a, b, StopIterating = iterable where StopIterating is a new keyword that can be used only in this context.

I'd like to know what others think about this suggestion.

Hard no. That is currently-legal syntax, and it's also clunky. I'd much rather the "assign to ..." notation than a weird new soft keyword that people are going to think is a typo for StopIteration.

It's worth noting that the proposed syntax has a slight distinction from the normal asterisk notation, in that it makes perfect sense to write this:

a, *_, b = thing

but does not make sense to write this:

a, ..., b = thing

as the "don't iterate over this thing" concept doesn't work here. (Supporting this would require some way to reverse the iterator, and that's not a language guarantee.)

It could be taken to mean "consume but discard", leaving 'a' bound to the first item and 'b' bound to the last item, but then: a, ... = thing would have to leave 'a' bound to the first item and the iterator exhausted. In fact, use of ... would always have to exhaust the iterator, which, I think, would not be very useful. Best not to go that way.

Chris Angelico

1:13 p.m.

On Tue, 21 Jun 2022 at 01:44, MRAB <python@mrabarnett.plus.com> wrote:

...

On 2022-06-20 15:05, Chris Angelico wrote:

...
On Mon, 20 Jun 2022 at 21:11, Jonathan Fine <jfine2358@gmail.com> wrote:

...
Hi

Some have liked adding a new syntax a, b, ... = iterable to mean consume two items from the iterable. However, a, b, Ellipsis = iterable has a different meaning (at least in Python 3.8). It redefines Ellipsis. (As an explicit constant, '...' can be redefined.)

To clarify: The syntactic token '...' will always refer to the special object Ellipsis (at least back as far as Python 3.4 - can't remember when it became available in all contexts), but the name Ellipsis can be rebound. So even though, in many contexts, "x = Ellipsis" and "x = ..." will have the same net effect, they are distinct (one is a name lookup and the other is a constant), and they're definitely different in assignment.

(Though it wouldn't surprise me if a future Python release adds Ellipsis to the set of non-assignable names, with None/True/False.)

...
The syntax a, b, ... = iterable so to speak fills a gap in existing syntax, as the construct is at present invalid. I actually like gaps in syntax, for the same reason that I like a central reservation in a highway. The same goes for the hard shoulder / breakdown lane.

The title of this thread includes the phrase 'Stop Iterating' (capitals added). This suggests the syntax a, b, StopIterating = iterable where StopIterating is a new keyword that can be used only in this context.

I'd like to know what others think about this suggestion.

Hard no. That is currently-legal syntax, and it's also clunky. I'd much rather the "assign to ..." notation than a weird new soft keyword that people are going to think is a typo for StopIteration.

It's worth noting that the proposed syntax has a slight distinction from the normal asterisk notation, in that it makes perfect sense to write this:

a, *_, b = thing

but does not make sense to write this:

a, ..., b = thing

as the "don't iterate over this thing" concept doesn't work here. (Supporting this would require some way to reverse the iterator, and that's not a language guarantee.)

It could be taken to mean "consume but discard", leaving 'a' bound to the first item and 'b' bound to the last item, but then:

a, ... = thing

would have to leave 'a' bound to the first item and the iterator exhausted.

In fact, use of ... would always have to exhaust the iterator, which, I think, would not be very useful.

Best not to go that way.

Yeah. "Consume but discard" is spelled *_, so we don't need this. The whole point of this is to NOT consume it. ChrisA

Mike Miller

3:40 p.m.

On 2022-06-20 03:34, Paul Moore wrote:

...

For the record, I think the islice solution is sufficient for this case. But I have needed this sort of thing occasionally, and islice

The post above sums it up for me. We have next() for one to a few, islice for several to zillions, and a for-enumerate-break also for several to zillions. Cases handled, with or without import. The parameter to next() sounds like a reasonable thing to add however, doesn't seem like it would hurt anything but the use of islice. If any syntax is chosen, hope it won't include "*" as that definitely says "unpack" to me, as that's what I say when reading it (without a space afterward). -Mike

997

Age (days ago)

1000

Last active (days ago)

List overview

Download

29 comments

10 participants

participants (10)

Chris Angelico
Jeremiah Paige
Jonathan Fine
Lucas Wiman
Mike Miller
MRAB
Paul Moore
Stephen J. Turnbull
Steve Jorgensen
Steven D'Aprano

Bare wildcard in de-structuring to ignore remainder and stop iterating (restart)

tags

participants (10)