Null wildcard in de-structuring to ignore remainder and stop iterating
A contrived use case: with open('document.txt', 'r') as io: (line1, line2, *) = io It is possible to kind of achieve the same result using `*_` except that would actually read all the lines from the file, even if we only want the first 2. …so I am suggesting that we use the bare `*` here to mean that we don't care whether there are additional items in the sequence, _and_ we want to stop iterating.
On Sat, Jun 04, 2022 at 07:31:58AM -0000, Steve Jorgensen wrote:
A contrived use case:
with open('document.txt', 'r') as io: (line1, line2, *) = io
with open('document.txt', 'r') as io: line1 = io.readline() line2 = io.readline() It would be lovely if readlines() took a parameter to specify the number of lines to return: line1, line2 = io.readlines(2) # Doesn't work :-( but alas and alack, the readlines() method has exactly the wrong API for that. I don't know what use the hint parameter is for readlines, it seems totally useless to me, and the wrong abstraction, counting bytes/characters instead of lines. Maybe we could add a keyword only argument? line1, line2 = io.readlines(count=2) Or there's always the explicit: line1, line2 = [io.readline() for j in (1, 2)] No need for new syntax for something so easy. -- Steve
I was using the reading of lines from a file as a contrived example. There are many other possible cases such as de-structuring from iterator such as `itertools.repeat()` with no `count` argument which will generate values endlessly.
On Sat, 4 Jun 2022 at 09:39, Steve Jorgensen <stevej@stevej.name> wrote:
I was using the reading of lines from a file as a contrived example. There are many other possible cases such as de-structuring from iterator such as `itertools.repeat()` with no `count` argument which will generate values endlessly.
itertools.islice will (in effect) allow you to have a count argument for any iterator. Paul
OK. That's not terrible. It is a redundancy though, having to re-state the count of variables that are to be de-structured into on the left.
On Sat, Jun 04, 2022 at 10:04:39AM -0000, Steve Jorgensen wrote:
OK. That's not terrible. It is a redundancy though, having to re-state the count of variables that are to be de-structured into on the left.
Redundancy is good: # Obviously, clearly wrong: spam, eggs, cheese = islice(myvalues, 5) # Not obviously right. spam, eggs, cheese, * = myvalues We don't have to squeeze every bit of redundancy out of code. -- Steve
On Sat, 4 Jun 2022 at 22:16, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Jun 04, 2022 at 10:04:39AM -0000, Steve Jorgensen wrote:
OK. That's not terrible. It is a redundancy though, having to re-state the count of variables that are to be de-structured into on the left.
Redundancy is good:
# Obviously, clearly wrong: spam, eggs, cheese = islice(myvalues, 5)
Yes but which part is wrong?
# Not obviously right. spam, eggs, cheese, * = myvalues
We don't have to squeeze every bit of redundancy out of code.
Redundancy introduces possibilities of desynchronization. ANY code can be wrong, but it's entirely possible for the second one to be right, and the first one is undoubtedly wrong. In this case, redundancy is merely noise, like me saying a second time that redundancy is merely niose, except that the duplicate can be wrong. Obviously sometimes it's unavoidable, but I don't think we can genuinely accept that the redundancy is *good*. # Obviously right, but inefficient. spam, eggs, cheese, *_ = myvalues ChrisA
On Sat, Jun 04, 2022 at 11:16:18PM +1000, Chris Angelico wrote:
Redundancy is good:
# Obviously, clearly wrong: spam, eggs, cheese = islice(myvalues, 5)
Yes but which part is wrong?
You're a professional programmer, so I am confident that you know the answer to that :-) It's a logic error. Unlike trivial and obvious spelling errors ('import colections'), most logic errors are not amenable to trivial fixes. You have to read the code (which may be as little as the surrounding one or two lines, or as much as the entire damn program) to understand what the logic is supposed to be before you can fix it. Welcome to programming :-) A day may come when computers will write and debug their own code, making human programmers obsolete, but it is not this day. Until then, I will take all the help I can get. I'm not going to introduce unnecessary redundancy for no reason, but nor am I going to go out of my way to remove it when it is helpful.
Redundancy introduces possibilities of desynchronization.
Indeed, and that is why I never write documentation or comments -- they are redundant when you have the source. *wink* All joking aside, of course you are correct. But we don't typically worry too much about such minor risks when we do things like imports: from collections import Counter # oh no, now Counter and collections.Counter may become desynced! or assertions (which should always succeed, and so they are redundant -- right up to the moment when they fail). Or when we grab a temporary reference to a sub-expression to avoid repeating ourselves. We balance the risks against the benefits, and if the risks are small compared to the benefits, we don't fret about redundancy. And sometimes, in the face of noisy communication channels, hardware failure, or hostile environments, redundancy is all but essential.
ANY code can be wrong, but it's entirely possible for the second one to be right, and the first one is undoubtedly wrong.
Indeed. When all else is equal -- which it may not always be -- we should prefer code which is obviously correct over code which merely has no obvious bugs. A line of code like `spam, eggs, cheese = islice(myvalues, 3)` is obviously correct in the sense that there is no discrepency between the left and right hand sides of the assignment, and any change which fails to keep that invariant is an obvious bug. The proposed equivalent `spam, eggs, cheese, * = myvalues` may be more convenient to write, but you no longer have that invariant. How much you care about that loss will depend on your risk tolerance compared to your laziness (one of Larry Wall's three virtues of programmers -- although opinions differ on whether he is right or not).
Obviously sometimes it's unavoidable, but I don't think we can genuinely accept that the redundancy is *good*.
You have convinced me! I'm now removing all my RAID devices! *wink* Would it have helped if I had said redundancy is *sometimes* good? -- Steve
On Sun, 5 Jun 2022 at 05:42, Steven D'Aprano <steve@pearwood.info> wrote:
Obviously sometimes it's unavoidable, but I don't think we can genuinely accept that the redundancy is *good*.
You have convinced me! I'm now removing all my RAID devices!
*wink*
Would it have helped if I had said redundancy is *sometimes* good?
*facepalm* Would it have helped if I had said that the redundancy *here* is not good? You gain nothing whatsoever, other than "oh hey, you said this twice and said the same thing" - in other words, the ONLY thing you gain here is the possibility for it to be wrong. All your other examples are places where that redundancy is either fully automated (RAID does not require that you, as the user, save multiple copies of things and keep them synchronized), or not fundamentally redundant (assertions are a form of executable comment, and if the assertions are simply repeating the code around them, they are utterly useless). x = 1 assert x == 1 # ensure that x is 1 What do we gain here? How is redundancy fundamentally good, when all it does is introduce the possibility of error? ChrisA
On Sun, Jun 05, 2022 at 07:03:32AM +1000, Chris Angelico wrote:
How is redundancy fundamentally good,
I don't know, you will have to ask somebody who is arguing that "redundancy is fundamentally good", which is not me. Redundancy can be either good or bad. https://www.informationweek.com/government/redundancy-in-programming-languag... I didn't think that would be a contraversial position to take, programming and IT frequently makes use of redundancy, e.g. - multiple routes to a destination - warm and hot spare servers - RAID, backups - documentation versus "read the source" - unit tests, regression tests, etc - checksums and error correcting codes - eyword parameters when positional would do - making allowance for "the bus factor" in projects ("why do we need two people who understands this?") Python uses redundant colons after statements that introduce a block; other languages use redundant variable declarations and semicolons. Outside of IT, we even have a proverb about it: don't put all your eggs in one basket. Redundancy is used to make systems more resilient against failure: - Spare tyres, spare keys, etc. - Subject-Verb Agreement Rules in language. - Double-entry book keeping. - Using two or more locks on doors. - Belts and braces, seatbelts and airbags, etc. I'm not saying that redundancy is always good, but your argument that "all" (your word) it does is to "introduce the possibility of error" (as if there are no other sources of error in programming!) doesn't stand up to even the most cursory consideration. In this specific example, `spam, eggs, cheese = islice(values, 3)`, I think that the cost of the redundancy is minimal and the benefit non-zero. Which one "wins" is a matter of taste and the programmer's own personal value judgement of risk versus opportunity. If you don't like it, fine, but I do, and if I were BDFL of Python, I wouldn't add syntax to the language just to remove this redundancy. -- Steve
My current thinking in response to that is that using islice is a decent solution except that it's not obvious. You have to jump outside of the thinking about the destructuring capability and consider what else could be used to help. Probably, first thing that _would_ come to mind from outside would be slicing with square brackets, but that would restrict the solution to only work with sequences and not other iterables and iterators as islice does. That brings up a tangential idea. Why not allow square-bracket indexing of generators instead of having to import and utilize islice for that?
On Thu, 9 Jun 2022 at 01:12, Steve Jorgensen <stevej@stevej.name> wrote:
My current thinking in response to that is that using islice is a decent solution except that it's not obvious. You have to jump outside of the thinking about the destructuring capability and consider what else could be used to help. Probably, first thing that _would_ come to mind from outside would be slicing with square brackets, but that would restrict the solution to only work with sequences and not other iterables and iterators as islice does.
That brings up a tangential idea. Why not allow square-bracket indexing of generators instead of having to import and utilize islice for that?
Because generators don't have a common (sub-)type, so there's no class to put the relevant __getitem__ method on. Paul
would it not be possible to have slicing fallback to islice if __iter__ is implemented and __geitem__ is not?
Mathew Elman writes:
would it not be possible to have slicing fallback to islice if __iter__ is implemented and __geitem__ is not?
The syntax is well-defined, but the semantics are not. Consider "g[101]; g[100]" for g a generator object. This either requires all generators to keep a cache that can become arbitrarily large, or violate the intuition of indexing by raising an error there, or violate the intuition of indexing by returning the 102nd element, then the 202nd element (what "islice(g,101,102); islice(g,100,101)" would do), or some API to tell the generator to disable the cache if you don't need it, or maybe there's some other semantics you could give it that will be undesirable for some people in some other way. Any of those makes 100% sense to me in the abstract, but I'm pretty sure if it's made into syntax I'll be 99% dissatisfied with it. :-) Explicit is better than implicit in this case. You could argue that islice should be made a builtin, but I don't know that it's used enough to justify that.
To me, the natural implementation of slicing on a non-reusable iterator (such as a generator) would be that you are not allowed to go backwards or even stand still: mygen[42] mygen[42] ValueError: Element 42 of iterator has already been used (Apologies if I don't know the difference between an iterator and an iterable; y'all know what I mean.) You still get a useful feature that you didn't have before. Expecting a generator (or whatever) to cache some its values in case you wanted a slice of them opens up a huge can of worms and is surely best forgotten. (100Gb generator anyone?) Well, maybe caching ONE value (the last one accessed) is reasonable, so you could stand still but not go backwards. But it's still adding overhead. Best wishes Rob Cliffe On 09/06/2022 10:28, Stephen J. Turnbull wrote:
Mathew Elman writes:
would it not be possible to have slicing fallback to islice if __iter__ is implemented and __geitem__ is not?
The syntax is well-defined, but the semantics are not.
Consider "g[101]; g[100]" for g a generator object. This either requires all generators to keep a cache that can become arbitrarily large, or violate the intuition of indexing by raising an error there, or violate the intuition of indexing by returning the 102nd element, then the 202nd element (what "islice(g,101,102); islice(g,100,101)" would do), or some API to tell the generator to disable the cache if you don't need it, or maybe there's some other semantics you could give it that will be undesirable for some people in some other way.
Any of those makes 100% sense to me in the abstract, but I'm pretty sure if it's made into syntax I'll be 99% dissatisfied with it. :-) Explicit is better than implicit in this case.
You could argue that islice should be made a builtin, but I don't know that it's used enough to justify that.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Y773Y7... Code of Conduct: http://python.org/psf/codeofconduct/
This I like - it seems very intuitive, almost like an irreversible io stream. I don't know if there would be cases where this would lead to unexpected bugs, but without looking into it it seems nice. Question: What would be the natural behaviour for negative indices? Raising an error?
On Mon, 20 Jun 2022 at 21:11, Mathew Elman <mathew.elman@ocado.com> wrote:
This I like - it seems very intuitive, almost like an irreversible io stream.
I don't know if there would be cases where this would lead to unexpected bugs, but without looking into it it seems nice.
Question: What would be the natural behaviour for negative indices? Raising an error?
Please quote the person and text that you're responding to (and then add your response underneath). Otherwise we have to guess which (sub)proposal it is that you like. ChrisA
Chris Angelico wrote:
On Mon, 20 Jun 2022 at 21:11, Mathew Elman mathew.elman@ocado.com wrote:
This I like - it seems very intuitive, almost like an irreversible io stream. I don't know if there would be cases where this would lead to unexpected bugs, but without looking into it it seems nice. Question: What would be the natural behaviour for negative indices? Raising an error? Please quote the person and text that you're responding to (and then add your response underneath). Otherwise we have to guess which (sub)proposal it is that you like. ChrisA
Oops, I thought I had, it was this:
To me, the natural implementation of slicing on a non-reusable iterator (such as a generator) would be that you are not allowed to go backwards or even stand still: mygen[42] mygen[42] ValueError: Element 42 of iterator has already been used (Apologies if I don't know the difference between an iterator and an iterable; y'all know what I mean.) You still get a useful feature that you didn't have before. Expecting a generator (or whatever) to cache some its values in case you wanted a slice of them opens up a huge can of worms and is surely best forgotten. (100Gb generator anyone?) Well, maybe caching ONE value (the last one accessed) is reasonable, so you could stand still but not go backwards. But it's still adding overhead. Best wishes Rob Cliffe
On Sat, Jun 18, 2022 at 5:42 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
To me, the natural implementation of slicing on a non-reusable iterator (such as a generator) would be that you are not allowed to go backwards or even stand still: mygen[42] mygen[42] ValueError: Element 42 of iterator has already been used
I agree that indexing an iterator such that it could only go forward feels like a reasonable and useful feature in python, but I disagree about the ValueError. To me the above produces two values: the 43rd and 85th elements produced by mygen. Anything else is a bizarre error waiting to arise at obscure times. What if this iterator is passed to another function? Used in a loop? Now this information about what index has been used has to be carried around and checked on every access. Regards, Jeremiah
On Sat, Jun 18, 2022 at 5:42 PM Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
To me, the natural implementation of slicing on a non-reusable iterator (such as a generator) would be that you are not allowed to go backwards or even stand still: mygen[42] mygen[42] ValueError: Element 42 of iterator has already been used
I agree that indexing an iterator such that it could only go forward feels like a reasonable and useful feature in python, but I disagree about the ValueError. To me the above produces two values: the 43rd and 85th elements produced by mygen. Anything else is a bizarre error waiting to arise at obscure times. What if this iterator is passed to another function? Used in a loop? Now this information about what index has been used has to be carried around and checked on every access. Oh, OK, I have no problem with that (except shouldn't it be the 43rd and 86th elements?). I guess which interpretation is more useful depends on
On 20/06/2022 17:39, Jeremiah Paige wrote: the use case. Best wishes Rob Cliffe
Regards, Jeremiah
On Tue, 21 Jun 2022 at 11:07, Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
On 20/06/2022 17:39, Jeremiah Paige wrote:
On Sat, Jun 18, 2022 at 5:42 PM Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
To me, the natural implementation of slicing on a non-reusable iterator (such as a generator) would be that you are not allowed to go backwards or even stand still: mygen[42] mygen[42] ValueError: Element 42 of iterator has already been used
I agree that indexing an iterator such that it could only go forward feels like a reasonable and useful feature in python, but I disagree about the ValueError. To me the above produces two values: the 43rd and 85th elements produced by mygen. Anything else is a bizarre error waiting to arise at obscure times. What if this iterator is passed to another function? Used in a loop? Now this information about what index has been used has to be carried around and checked on every access.
Oh, OK, I have no problem with that (except shouldn't it be the 43rd and 86th elements?). I guess which interpretation is more useful depends on the use case.
I think this confusion is exactly why arbitrary iterators shouldn't be indexable like this. Slicing them is a maybe, but even there, it's hard to explain that mygen[3..] is a destructive operation on mygen (rather than, as it is with sequences, a copy). It wouldn't be hard to create a "lazy caching sequence-like view" to an iterable, which would never reset its base index, but within the iterator itself, it's inevitably going to cause a lot of confusion. ChrisA
On Sun, Jun 19, 2022 at 01:34:35AM +0100, Rob Cliffe via Python-ideas wrote:
To me, the natural implementation of slicing on a non-reusable iterator (such as a generator) would be that you are not allowed to go backwards or even stand still: mygen[42] mygen[42] ValueError: Element 42 of iterator has already been used
How does a generic iterator, including generators, know whether or not item 42 has already been seen? islice for generators is really just a thin wrapper around an iterator that operates something vaguely like this: for i in range(start): next(iterator) # throw the result away for i in range(start, end): yield next(iterator) It doesn't need to keep track of the last index seen, it just blindly advances through the iterator, with some short-cuts for the sake of efficiency. -- Steve
It seems like this is all an occasion to use itertools.tee() ... But with a consciousness that implicit caching uses memory. On Mon, Jun 20, 2022, 11:36 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Jun 19, 2022 at 01:34:35AM +0100, Rob Cliffe via Python-ideas wrote:
To me, the natural implementation of slicing on a non-reusable iterator (such as a generator) would be that you are not allowed to go backwards or even stand still: mygen[42] mygen[42] ValueError: Element 42 of iterator has already been used
How does a generic iterator, including generators, know whether or not item 42 has already been seen?
islice for generators is really just a thin wrapper around an iterator that operates something vaguely like this:
for i in range(start): next(iterator) # throw the result away for i in range(start, end): yield next(iterator)
It doesn't need to keep track of the last index seen, it just blindly advances through the iterator, with some short-cuts for the sake of efficiency.
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/A74YRQ... Code of Conduct: http://python.org/psf/codeofconduct/
On 09/06/2022 09:50, Paul Moore wrote:
On Thu, 9 Jun 2022 at 01:12, Steve Jorgensen <stevej@stevej.name> wrote:
My current thinking in response to that is that using islice is a decent solution except that it's not obvious. You have to jump outside of the thinking about the destructuring capability and consider what else could be used to help. Probably, first thing that _would_ come to mind from outside would be slicing with square brackets, but that would restrict the solution to only work with sequences and not other iterables and iterators as islice does.
That brings up a tangential idea. Why not allow square-bracket indexing of generators instead of having to import and utilize islice for that? Because generators don't have a common (sub-)type, so there's no class to put the relevant __getitem__ method on.
How so?
def mygen(): yield 42 ... type(mygen()) <class 'generator'>
Best wishes Rob Cliffe
On 09/06/2022 09:50, Paul Moore wrote:
On Thu, 9 Jun 2022 at 01:12, Steve Jorgensen <stevej@stevej.name> wrote:
My current thinking in response to that is that using islice is a decent solution except that it's not obvious. You have to jump outside of the thinking about the destructuring capability and consider what else could be used to help. Probably, first thing that _would_ come to mind from outside would be slicing with square brackets, but that would restrict the solution to only work with sequences and not other iterables and iterators as islice does.
That brings up a tangential idea. Why not allow square-bracket indexing of generators instead of having to import and utilize islice for that? Because generators don't have a common (sub-)type, so there's no class to put the relevant __getitem__ method on.
How so?
def mygen(): yield 42 ... type(mygen()) <class 'generator'>
Sorry, I was assuming the request was for slicing to work for iterables, not generators. But do we really want to make slicing work for generators, but still fail for other iterators? That seems like it'll just cause confusion. Take the OP's original example: with open("some.file") as f: for line in f[:10]: # This fails because f isn't a generator with open("some.file") as f: for line in (l for l in f)[:10]: # This does work because we're slicing a generator You're bound to get someone (possibly even the OP!!!) asking for the first version to "just work"... Also, "obvious" cases like # How we would do this currently def get_first_3_current(i): return list(itertools.islice(i, 3)) # How someone might assume we could do this with the new indexing def get_first_3(i): return list(i[:3]) get_first_3(range(10)) get_first_3({1,2,3,4}) get_first_3({"a": "one", "b": "two", "c": "three"}) won't work, and no amount of adding iter() will make them work. Paul
On 09/06/2022 10:28, Paul Moore wrote:
On 09/06/2022 09:50, Paul Moore wrote:
On Thu, 9 Jun 2022 at 01:12, Steve Jorgensen <stevej@stevej.name> wrote:
My current thinking in response to that is that using islice is a decent solution except that it's not obvious. You have to jump outside of the thinking about the destructuring capability and consider what else could be used to help. Probably, first thing that _would_ come to mind from outside would be slicing with square brackets, but that would restrict the solution to only work with sequences and not other iterables and iterators as islice does.
That brings up a tangential idea. Why not allow square-bracket indexing of generators instead of having to import and utilize islice for that? Because generators don't have a common (sub-)type, so there's no class to put the relevant __getitem__ method on.
How so?
def mygen(): yield 42 ... type(mygen()) <class 'generator'> Sorry, I was assuming the request was for slicing to work for iterables, not generators. But do we really want to make slicing work for generators, but still fail for other iterators? That seems like it'll just cause confusion. Take the OP's original example:
with open("some.file") as f: for line in f[:10]: # This fails because f isn't a generator
with open("some.file") as f: for line in (l for l in f)[:10]: # This does work because we're slicing a generator
You're bound to get someone (possibly even the OP!!!) asking for the first version to "just work"...
Also, "obvious" cases like
# How we would do this currently def get_first_3_current(i): return list(itertools.islice(i, 3))
# How someone might assume we could do this with the new indexing def get_first_3(i): return list(i[:3])
get_first_3(range(10)) get_first_3({1,2,3,4}) get_first_3({"a": "one", "b": "two", "c": "three"})
won't work, and no amount of adding iter() will make them work.
Paul Well, you can write for line in f.readlines()[:10]: But taking the general point: Yes, it would no doubt cause confusion if some iterators supported slicing and other's didn't. But it would be a useful feature. Slicing could be added piecemeal to iterators such as open() according to demand. Of course, for non-reusable iterators it would be forbidden to go backwards (or even remain in the same place): agen[42] agen[41] ValueError: Generator has been used up past the slice point. (Better wordings are doubtless available.)
I realise this is asking for a lot of work do be done.🙁 Rob Cliffe
I had actually not thought about the question of what should happen when performing multiple index operations on the same iterator, and maybe that's a reason that the idea of adding index lookup using brackets is not as good as it first seems. The whole point of adding that would be to reduce the number of situations in which it matters whether you have a sequence, or and iterator. As soon as we consider what should happen for multiple index lookups on a single iterator, that concept breaks down. The next thing that makes me think of that's even farther afield from the initial topic of this thread would be to have some new function in the standard library that is similar to 'islice' but returns an array instead of a new iterator and performs optimally when given a list or tuple as an argument. Maybe it could be named something like 'gslice', short for "greedy slice". Hypothetical simplistic implementation: def gslice(source, start_or_stop=None, stop=None, step=None): if isinstance(source, collections.abc.Sequence): return source[slice(start_or_stop, stop, step)] elif isinstance(source, collections.abc.Iterable): return list(islice(start_or_stop, stop, step)) else: raise TypeError("'source' must be a sequence or iterable")
participants (9)
-
Chris Angelico
-
David Mertz, Ph.D.
-
Jeremiah Paige
-
Mathew Elman
-
Paul Moore
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steve Jorgensen
-
Steven D'Aprano