Function to return first(or last) true value from list
It isn't uncommon to try and get either the first or the last True value from a list. In Python 2, you'd do this: next((x for x in mylist if x)) And, in Python 3, thanks to filter returning an iterator, you'd do this: next(filter(bool,mylist)) It still is pretty common. Common enough to make it aggravating to write a function to do that nearly every time. My idea is to add first and lastfunctions that do just that. Stuff that's open to lots of debate: - Names. They're not very creative; I know. - Builtin or itertools. I'm personally leaning towards the latter at the moment. Implementing it in itertools would be simple: def first(it): return next(filter(bool,it)) def last(it): return next(reversed(filter(bool,it))) -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated."
On Feb 18, 2014, at 14:25, Ryan Gonzalez
It isn't uncommon to try and get either the first or the last True value from a list. In Python 2, you'd do this: next((x for x in mylist if x)) And, in Python 3, thanks to filter returning an iterator, you’d do this:
next(filter(bool,mylist)) It still is pretty common. Common enough to make it aggravating to write a function to do that nearly every time. My idea is to add first and last functions that do just that.
Stuff that’s open to lots of debate:
Names. They’re not very creative; I know. Builtin or itertools. I’m personally leaning towards the latter at the moment. If suggest putting it on PyPI, or submitting it to an existing project like more-itertools, and seeing what kind of uptake it has, before trying to decide whether it's necessary to add to the stdlib.
Except I think first may already be in more-itertools.
Implementing it in itertools would be simple:
def first(it): return next(filter(bool,it))
def last(it): return next(reversed(filter(bool,it)))
But reversed takes a sequence, not an iterator. So you'd have to call it on list(filter(...)), and surely you don't want to build the whole list just to get the last value. You could build it by using a single-element peek wrapper, or just an explicit for loop that keeps track of "last" as it goes along.
-- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated."
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Ryan Gonzalez
It isn't uncommon to try and get either the first or the last True value from a list. In Python 2, you'd do this:
next((x for x in mylist if x))
And, in Python 3, thanks to filter returning an iterator, you'd do this:
next(filter(bool,mylist)) ...
See here: http://bugs.python.org/issue18652 Never heard anyone ask for "last" before "first" has been available in a PyPI package for over a year. Detail in the bug report.
On Tue, Feb 18, 2014 at 04:25:28PM -0600, Ryan Gonzalez wrote:
It isn't uncommon to try and get either the first or the last True value from a list.
I'm not sure that I've ever wanted to do either. If I've ever wanted the first true value, it was so uncommon I've forgotten. But I'm pretty confident I've never wanted the *last* true value. That would be a strange thing to do.
In Python 2, you'd do this:
next((x for x in mylist if x))
That works fine in Python 3 too.
And, in Python 3, thanks to filter returning an iterator, you'd do this:
next(filter(bool,mylist))
It still is pretty common. Common enough to make it aggravating to write a function to do that nearly every time.
But you aren't writing a function. It's a simple, trivial, one-line operation, a single expression. Not every trivial one-line operation needs to be a function. Just write "next(filter(bool, mylist))" in-place, where you want it to appear. It's only a couple of characters longer than "itertools.first(mylist)", and is one less thing to memorize. [...]
Stuff that's open to lots of debate:
- Names. They're not very creative; I know. - Builtin or itertools. I'm personally leaning towards the latter at the moment.
You missed the most important question: whether or not this is worth doing at all. -- Steven
Sorry! I was Googling it, but I must have missed that bug report.
On Tue, Feb 18, 2014 at 4:42 PM, Tim Peters
[Ryan Gonzalez
] It isn't uncommon to try and get either the first or the last True value from a list. In Python 2, you'd do this:
next((x for x in mylist if x))
And, in Python 3, thanks to filter returning an iterator, you'd do this:
next(filter(bool,mylist)) ...
See here:
http://bugs.python.org/issue18652
Never heard anyone ask for "last" before "first" has been available in a PyPI package for over a year. Detail in the bug report.
-- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated."
2014-02-19 1:01 GMT+02:00 Steven D'Aprano
On Tue, Feb 18, 2014 at 04:25:28PM -0600, Ryan Gonzalez wrote:
In Python 2, you'd do this:
next((x for x in mylist if x))
That works fine in Python 3 too.
The problem with this approach, which I personally ran into a couple of days ago, is that raising StopIteration in the case of empty `mylist` is *not* what you want, in general. "first" assumes non-exhausted iterator; raising StopIteration is easily caught in the closest `for` loop, and you end up failing silently. But Errors should never pass silently. This is a case of an "almost working" solution, similar to the and-or "trenary" conditional operator. I think it's horrible. Non-advanced Python programmer may not be able to find such a bug. An implementation of first() should raise some other exception than StopIteration. Elazar
On 20 February 2014 13:11, אלעזר
2014-02-19 1:01 GMT+02:00 Steven D'Aprano
: On Tue, Feb 18, 2014 at 04:25:28PM -0600, Ryan Gonzalez wrote:
In Python 2, you'd do this:
next((x for x in mylist if x))
That works fine in Python 3 too.
The problem with this approach, which I personally ran into a couple of days ago, is that raising StopIteration in the case of empty `mylist` is *not* what you want, in general.
I ran into this problem once some time ago and it took a long time to track down the bug. Since then a bare next with no default and no try/except StopIteration sticks out like a sore thumb every time I see it. Bare next() calls should be discouraged. In the situations where they are justified I think that it deserves a code comment at the least: x = next(iterator) # Propagate StopIteration Oscar
On 2/20/2014 8:11 AM, אלעזר wrote:
2014-02-19 1:01 GMT+02:00 Steven D'Aprano
mailto:steve@pearwood.info>: On Tue, Feb 18, 2014 at 04:25:28PM -0600, Ryan Gonzalez wrote:
In Python 2, you'd do this:
next((x for x in mylist if x))
That works fine in Python 3 too.
The problem with this approach, which I personally ran into a couple of days ago, is that raising StopIteration in the case of empty `mylist` is *not* what you want, in general. "first" assumes non-exhausted iterator; raising StopIteration is easily caught in the closest `for` loop, and you end up failing silently. But
Errors should never pass silently.
This is a case of an "almost working" solution, similar to the and-or "trenary" conditional operator. I think it's horrible. Non-advanced Python programmer may not be able to find such a bug.
An implementation of first() should raise some other exception than StopIteration.
#untested __missing = object() def first(iterable, default=__missing): for o in interable: if o: return o else: if default is not __missing: return default else: raise ValueError("iterable has no true value and there is no default") -- Terry Jan Reedy
On 20 February 2014 16:05, Terry Reedy
An implementation of first() should raise some other exception than StopIteration.
#untested __missing = object() def first(iterable, default=__missing): for o in interable: if o: return o else: if default is not __missing: return default else: raise ValueError("iterable has no true value and there is no default")
It's easy enough to do if you know that bare next is a bad thing. More-itertools does it the way I would but has a long comment wondering whether it should actually raise StopIteration: https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.p... The thing is just that bare next is not something that's widely recognised as being dangerous. I've seen examples of this kind of bug in samples from many Python aficionados (including at least one from you Terry). Oscar
On Fri, Feb 21, 2014 at 3:14 AM, Oscar Benjamin
More-itertools does it the way I would but has a long comment wondering whether it should actually raise StopIteration: https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.p...
Has that been subsumed by next(iter(x),default) ? ChrisA
On 20 February 2014 16:34, Chris Angelico
On Fri, Feb 21, 2014 at 3:14 AM, Oscar Benjamin
wrote: More-itertools does it the way I would but has a long comment wondering whether it should actually raise StopIteration: https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.p...
Has that been subsumed by next(iter(x),default) ?
If the default argument is provided then yes it's not giving much over next/iter. The effect that I more often want is that an empty iterable raises an exception. next() with no default already does that but it's the wrong kind of exception and can't safely be allowed to propagate. In that case the alternative is try: obj = next(iter(iterable)) except StopIteration: raise ValueError (which is exactly what more-itertools first does). Oscar
On Thu, Feb 20, 2014 at 04:14:17PM +0000, Oscar Benjamin wrote:
On 20 February 2014 16:05, Terry Reedy
wrote: An implementation of first() should raise some other exception than StopIteration.
[...]
It's easy enough to do if you know that bare next is a bad thing.
Say what? Why do you say that next(it) is a bad thing?
More-itertools does it the way I would but has a long comment wondering whether it should actually raise StopIteration: https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.p...
The thing is just that bare next is not something that's widely recognised as being dangerous. I've seen examples of this kind of bug in samples from many Python aficionados (including at least one from you Terry).
Why is it dangerous, and what kind of bug? If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError. Failure to plan for the empty case, or test it, is not a problem unique to next and iterators. I have read text books on Pascal written in the 70s and 80s that emphasise the need to test the linked list and tree walking code with empty data structures. -- Steven
2014-02-21 0:00 GMT+02:00 Steven D'Aprano
On Thu, Feb 20, 2014 at 04:14:17PM +0000, Oscar Benjamin wrote:
On 20 February 2014 16:05, Terry Reedy
wrote: An implementation of first() should raise some other exception than StopIteration.
[...]
It's easy enough to do if you know that bare next is a bad thing.
Say what? Why do you say that next(it) is a bad thing?
More-itertools does it the way I would but has a long comment wondering whether it should actually raise StopIteration:
https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.p...
The thing is just that bare next is not something that's widely recognised as being dangerous. I've seen examples of this kind of bug in samples from many Python aficionados (including at least one from you Terry).
Why is it dangerous, and what kind of bug?
If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError.
I had this bug just the other day. I did not plan for the empty case, since it was obvious that the empty case is a bug, so I relied on the exception being raised in this case. But I did not get the exception since it was caught in a completely unrelated for loop. It took me a while to figure out what's going on, and it would've taken even more for someone else, not familiar with my assumption or with the whole StopIteration thing (which I believe is the common case). An IndexError or a KeyError would have been great in such a case. It is *very* similar to the "and or" story. --- Elazar
On Fri, Feb 21, 2014 at 12:38:01AM +0200, אלעזר wrote:
2014-02-21 0:00 GMT+02:00 Steven D'Aprano
: On Thu, Feb 20, 2014 at 04:14:17PM +0000, Oscar Benjamin wrote:
On 20 February 2014 16:05, Terry Reedy
wrote: An implementation of first() should raise some other exception than StopIteration.
[...]
It's easy enough to do if you know that bare next is a bad thing.
Say what? Why do you say that next(it) is a bad thing?
More-itertools does it the way I would but has a long comment wondering whether it should actually raise StopIteration:
https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.p...
The thing is just that bare next is not something that's widely recognised as being dangerous. I've seen examples of this kind of bug in samples from many Python aficionados (including at least one from you Terry).
Why is it dangerous, and what kind of bug?
If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError.
I had this bug just the other day. I did not plan for the empty case, since it was obvious that the empty case is a bug, so I relied on the exception being raised in this case. But I did not get the exception since it was caught in a completely unrelated for loop. It took me a while to figure out what's going on, and it would've taken even more for someone else, not familiar with my assumption or with the whole StopIteration thing (which I believe is the common case). An IndexError or a KeyError would have been great in such a case.
IndexError can also be caught by for-loops, under some circumstances. (See the legacy sequence protocol for iteration.) Without knowing what your code does, and how it is written, it's hard to discuss in anything but generalities. But in general, any code that is expected to raise an exception should have a test that it actually does raise that exception. If you're raising an exception that has particular meaning to Python, then you have to be prepared that Python might intercept it before you can see it. Would you be surprised by this? py> class X: ... def method(self): ... # Perform some calculation, which fails. ... raise AttributeError ... def __getattr__(self, name): ... if name == 'spam': return self.method() ... py> x = X() py> getattr(x, 'spam', 23) 23 If the only place you are using x.spam is inside a getattr call, then you will never see the AttributeError. Likewise, if the only place you are using next() is inside a for-loop iterable, then the StopIteration exception will always be captured. (This may even be deliberate on the part of the programmer.) I don't see this as particularly noteworthy. It's one of the things that people probably won't think of until they've actually seen it happen, but programming is full of things like that. Not every tricky bug to find is a sign that the function is "dangerous". -- Steven
On 20 February 2014 22:38, אלעזר
2014-02-21 0:00 GMT+02:00 Steven D'Aprano
: On Thu, Feb 20, 2014 at 04:14:17PM +0000, Oscar Benjamin wrote:
The thing is just that bare next is not something that's widely recognised as being dangerous. I've seen examples of this kind of bug in samples from many Python aficionados (including at least one from you Terry).
Why is it dangerous, and what kind of bug?
If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError.
I had this bug just the other day. I did not plan for the empty case, since it was obvious that the empty case is a bug, so I relied on the exception being raised in this case. But I did not get the exception since it was caught in a completely unrelated for loop. It took me a while to figure out what's going on, and it would've taken even more for someone else, not familiar with my assumption or with the whole StopIteration thing (which I believe is the common case). An IndexError or a KeyError would have been great in such a case.
Exactly. The bug I had manifested in a StopIteration that was raised in a semi-deterministic (dependent on slowly varying data) fashion after ~1 hour of processing. Had it resulted in an IndexError I would have seen a traceback and could have fixed the bug within about 5 minutes. But StopIteration doesn't necessarily bubble up in the same way as other exceptions because it can be caught and silently supressed by a for loop (or any consumer of iterators). So instead of a traceback I had truncated output data. It took some time to discover and verify that the output data was truncated. Then it took some time rerunning the script under pdb which was no help since it couldn't latch into the suppressed exception. I assumed that it must be an exception but there were no try/except clauses anywhere in the code. Eventually I found it by peppering the code with: try: ... except Exception as e: import pdb; pdb.set_trace() It took most of a day for me to track that down instead of 5 minutes precisely because StopIteration is not like other exceptions. Admittedly I'd spot a similar bug much quicker now simply because I'm aware of the possibility. A simplified version of the bug is shown below: def itermerge(datasources): for source in datasources: iterator = iter(source) first = next(iterator) for item in iterator: yield first * item data = [ [1, 1, 2, 3], [1, 4, 5, 6], [], # Who put that there? [1, 7, 8, 9], ] for item in itermerge(data): print(item) If you run the above then you get: $ python tmp.py 1 2 3 4 5 6 So the data is silently truncated at the empty iterable.
It is *very* similar to the "and or" story.
Exactly. Some of the worst programming idioms are the ones that mostly work but fall apart in special cases. Leaking StopIteration is fine... as long as you don't do it in a generator, or a user-defined iterator, or *any* code called by a generator/iterator. Oscar
On 20 February 2014 23:51, Steven D'Aprano
IndexError can also be caught by for-loops, under some circumstances. (See the legacy sequence protocol for iteration.)
Only if your code is actually using the legacy sequence protocol:
def f(): yield 2; raise IndexError ... for x in f(): ... print(x) ... 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in f IndexError
Oscar
On 02/20/2014 02:38 PM, אלעזר wrote:
2014-02-21 0:00 GMT+02:00 Steven D'Aprano wrote:
Why is it dangerous, and what kind of bug?
If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError.
I had this bug just the other day. I did not plan for the empty case, since it was obvious that the empty case is a bug, so I relied on the exception being raised in this case. But I did not get the exception since it was caught in a completely unrelated for loop. It took me a while to figure out what's going on, and it would've taken even more for someone else, not familiar with my assumption or with the whole StopIteration thing (which I believe is the common case). An IndexError or a KeyError would have been great in such a case.
I think the basic problem is that a few exceptions are intended for the Python interpreter itself, but that's easy to forget, particularly since that list is so small: StopIteration ... um, any others? Consequently we need to remind ourselves that we are not likely to see that exception under normal circumstances; and vice-a-versa, we need to remind ourselves to catch it if we are playing with next(). -- ~Ethan~
On Fri, Feb 21, 2014 at 11:59 AM, Ethan Furman
I think the basic problem is that a few exceptions are intended for the Python interpreter itself, but that's easy to forget, particularly since that list is so small:
StopIteration ... um, any others?
There are other exceptions that have particular semantics associated with them. SystemExit is not printed to stderr if it propagates all the way up ChrisA
On Feb 20, 2014, at 16:59, Ethan Furman
On 02/20/2014 02:38 PM, אלעזר wrote:
2014-02-21 0:00 GMT+02:00 Steven D'Aprano wrote:
Why is it dangerous, and what kind of bug?
If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError.
I had this bug just the other day. I did not plan for the empty case, since it was obvious that the empty case is a bug, so I relied on the exception being raised in this case. But I did not get the exception since it was caught in a completely unrelated for loop. It took me a while to figure out what's going on, and it would've taken even more for someone else, not familiar with my assumption or with the whole StopIteration thing (which I believe is the common case). An IndexError or a KeyError would have been great in such a case.
I don't think most people unfamiliar with StopIteration are calling next manually. Maybe there's a brief period where you've learned enough about how the iteration protocol works to be calling iter and next but not enough to know about StopIteration--but that period ends the first time you have to debug your code, and from then on, you know.
I think the basic problem is that a few exceptions are intended for the Python interpreter itself, but that's easy to forget, particularly since that list is so small:
StopIteration ... um, any others?
GeneratorExit, and in a different way KeyboardInterrupt and SystemExit. Presumably that's why none of these four inherit from StandardError and everything else does. And of course IndexError in the special case where you're using the legacy iteration protocol. Meanwhile, attribute access (and/or the default __getattribute__ implementation) handles both KeyError and AttributeError at various levels. Operators and augmented assignments use attribute access, and also directly handle AttributeError and NotImplementedError. If you count builtins that get direct data-model support as part of the interpreter, they do similar things to operators. The import system handles various IOError subclasses, and I think it may also internally raise and handle ImportError in some cases. Maybe the interpreter handles EOFError as well somewhere, I'm not sure. Probably more I didn't think of off the top of my head. So, the class of "dangerous" exceptions where you have to know that the interpreter might treat them specially includes most of the most common exceptions.
Consequently we need to remind ourselves that we are not likely to see that exception under normal circumstances; and vice-a-versa, we need to remind ourselves to catch it if we are playing with next().
This is the point. You have to know which exceptions to deal with in different kinds of code. If you're playing with next, you need to know about StopIteration. If you're playing with generator.send, you need to know about GeneratorExit. If you're creating custom number-like classes or __getattr__-based proxies, you need to know exactly when AttributeError is handled and what it means. Maybe some of this info isn't easy to find until the first time you try to do it (without a good tutorial) and run into bugs, but once you know what to look for the docs are pretty clear.
Oscar Benjamin wrote:
On 20 February 2014 22:38, אלעזר
wrote: 2014-02-21 0:00 GMT+02:00 Steven D'Aprano
: On Thu, Feb 20, 2014 at 04:14:17PM +0000, Oscar Benjamin wrote:
The thing is just that bare next is not something that's widely recognised as being dangerous. I've seen examples of this kind of bug in samples from many Python aficionados (including at least one from you Terry).
Why is it dangerous, and what kind of bug?
If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError.
I had this bug just the other day. I did not plan for the empty case, since it was obvious that the empty case is a bug, so I relied on the exception being raised in this case. But I did not get the exception since it was caught in a completely unrelated for loop. It took me a while to figure out what's going on, and it would've taken even more for someone else, not familiar with my assumption or with the whole StopIteration thing (which I believe is the common case). An IndexError or a KeyError would have been great in such a case.
Exactly. The bug I had manifested in a StopIteration that was raised in a semi-deterministic (dependent on slowly varying data) fashion after ~1 hour of processing. Had it resulted in an IndexError I would have seen a traceback and could have fixed the bug within about 5 minutes.
But StopIteration doesn't necessarily bubble up in the same way as other exceptions because it can be caught and silently supressed by a for loop (or any consumer of iterators). So instead of a traceback I had truncated output data. It took some time to discover and verify that the output data was truncated. Then it took some time rerunning the script under pdb which was no help since it couldn't latch into the suppressed exception. I assumed that it must be an exception but there were no try/except clauses anywhere in the code. Eventually I found it by peppering the code with:
try: ... except Exception as e: import pdb; pdb.set_trace()
It took most of a day for me to track that down instead of 5 minutes precisely because StopIteration is not like other exceptions. Admittedly I'd spot a similar bug much quicker now simply because I'm aware of the possibility.
A simplified version of the bug is shown below:
def itermerge(datasources): for source in datasources: iterator = iter(source) first = next(iterator) for item in iterator: yield first * item
data = [ [1, 1, 2, 3], [1, 4, 5, 6], [], # Who put that there? [1, 7, 8, 9], ]
for item in itermerge(data): print(item)
If you run the above then you get:
$ python tmp.py 1 2 3 4 5 6
So the data is silently truncated at the empty iterable.
It is *very* similar to the "and or" story.
I think the difference is that once you've learned the lesson you stop using `and...or` while you change your usage pattern for next() to minimal scopes def process_source(source): it = iter(source) first = next(it) for item in it: yield first * item def itermerge(sources): for source in sources: yield from process_source(source)
What is the "classic" use case for next() raising StopIteration, to be silently caught ? We need __next__ to do so in for loops, but when do we need it in the functional form? Elazar 2014-02-21 11:25 GMT+02:00 Peter Otten <__peter__@web.de>:
Oscar Benjamin wrote:
On 20 February 2014 22:38, אלעזר
wrote: 2014-02-21 0:00 GMT+02:00 Steven D'Aprano
: On Thu, Feb 20, 2014 at 04:14:17PM +0000, Oscar Benjamin wrote:
The thing is just that bare next is not something that's widely recognised as being dangerous. I've seen examples of this kind of bug in samples from many Python aficionados (including at least one from you Terry).
Why is it dangerous, and what kind of bug?
If you're talking about the fact that next(it) can raise StopIteration, I think you are exaggerating the danger. Firstly, quite often you don't mind if it raises StopIteration, since that's what you would have done anyway. Secondly, I don't see why raising StopIteration is so much more dangerous than (say) IndexError or KeyError.
I had this bug just the other day. I did not plan for the empty case, since it was obvious that the empty case is a bug, so I relied on the exception being raised in this case. But I did not get the exception since it was caught in a completely unrelated for loop. It took me a while to figure out what's going on, and it would've taken even more for someone else, not familiar with my assumption or with the whole StopIteration thing (which I believe is the common case). An IndexError or a KeyError would have been great in such a case.
Exactly. The bug I had manifested in a StopIteration that was raised in a semi-deterministic (dependent on slowly varying data) fashion after ~1 hour of processing. Had it resulted in an IndexError I would have seen a traceback and could have fixed the bug within about 5 minutes.
But StopIteration doesn't necessarily bubble up in the same way as other exceptions because it can be caught and silently supressed by a for loop (or any consumer of iterators). So instead of a traceback I had truncated output data. It took some time to discover and verify that the output data was truncated. Then it took some time rerunning the script under pdb which was no help since it couldn't latch into the suppressed exception. I assumed that it must be an exception but there were no try/except clauses anywhere in the code. Eventually I found it by peppering the code with:
try: ... except Exception as e: import pdb; pdb.set_trace()
It took most of a day for me to track that down instead of 5 minutes precisely because StopIteration is not like other exceptions. Admittedly I'd spot a similar bug much quicker now simply because I'm aware of the possibility.
A simplified version of the bug is shown below:
def itermerge(datasources): for source in datasources: iterator = iter(source) first = next(iterator) for item in iterator: yield first * item
data = [ [1, 1, 2, 3], [1, 4, 5, 6], [], # Who put that there? [1, 7, 8, 9], ]
for item in itermerge(data): print(item)
If you run the above then you get:
$ python tmp.py 1 2 3 4 5 6
So the data is silently truncated at the empty iterable.
It is *very* similar to the "and or" story.
I think the difference is that once you've learned the lesson you stop using `and...or` while you change your usage pattern for next() to minimal scopes
def process_source(source): it = iter(source) first = next(it) for item in it: yield first * item
def itermerge(sources): for source in sources: yield from process_source(source)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
אלעזר wrote:
What is the "classic" use case for next() raising StopIteration, to be silently caught ? We need __next__ to do so in for loops, but when do we need it in the functional form?
Pretty much every generator that treats the first item(s) specially, like the one I gave above:
def process_source(source): it = iter(source) first = next(it) for item in it: yield first * item
Or these: http://docs.python.org/dev/library/itertools.html#itertools.accumulate http://docs.python.org/dev/library/itertools.html#itertools.groupby http://docs.python.org/dev/library/itertools.html#itertools.islice ... The behaviour of next() is really a feature rather than a bug.
On 21 February 2014 09:25, Peter Otten <__peter__@web.de> wrote:
It is *very* similar to the "and or" story.
I think the difference is that once you've learned the lesson you stop using `and...or` while you change your usage pattern for next() to minimal scopes
def process_source(source): it = iter(source) first = next(it) for item in it: yield first * item
def itermerge(sources): for source in sources: yield from process_source(source)
Maybe but is it really correct to just ignore that empty iterable? When I use sequences and write first = seq[0] I'm deliberately asserting that seq is non-empty. I want to see a traceback if for whatever reason it should turn out to be an empty sequence. Using next() and allowing the StopIteration to terminate what you're doing assumes that it's okay to just ignore an empty iterable and go do something else. Leaking StopIteration may be a conscious decision but it's not clear when looking at first = next(it) whether it is. In my own code I would put a comment there to indicate that I have considered the implication and decided that it's okay (and then when I see a bare next with no comment it instantly arouses suspicion). Oscar
On 21 February 2014 09:56, Peter Otten <__peter__@web.de> wrote:
אלעזר wrote:
What is the "classic" use case for next() raising StopIteration, to be silently caught ? We need __next__ to do so in for loops, but when do we need it in the functional form?
Pretty much every generator that treats the first item(s) specially, like the one I gave above:
But there are also cases where that implicit behaviour is not desired. I would rather have to explicitly return when that's what I want so that the control flow is very clear. For example when you use csv.DictReader and don't supply the fieldnames argument you are saying that you want to read a csv file with one header line containing column labels and zero or more data lines. To me a fully empty file (with no header line) is invalid but csv.DictReader will accept it as a csv file with zero rows. I would prefer an error in this case since it would only happen in my usage if an error had occurred somewhere else. One of the examples you linked to shows exactly my own practice of marking a next call with a comment: def __next__(self): while self.currkey == self.tgtkey: self.currvalue = next(self.it) # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) self.tgtkey = self.currkey return (self.currkey, self._grouper(self.tgtkey)) def _grouper(self, tgtkey): while self.currkey == tgtkey: yield self.currvalue self.currvalue = next(self.it) # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) IMO if you want that behaviour then you should mark it to show that you thought about it and otherwise I'll treat any bare next with suspicion. Oscar
Oscar Benjamin wrote:
But there are also cases where that implicit behaviour is not desired. I would rather have to explicitly return when that's what I want so that the control flow is very clear.
For example when you use csv.DictReader and don't supply the fieldnames argument you are saying that you want to read a csv file with one header line containing column labels and zero or more data lines. To me a fully empty file (with no header line) is invalid but csv.DictReader will accept it as a csv file with zero rows. I would prefer an error in this case since it would only happen in my usage if an error had occurred somewhere else.
I think we constantly have to deal with libraries that do almost but not exactly what we want them to do. If you look at the code it is clear that the author made a conscious design decision @property def fieldnames(self): if self._fieldnames is None: try: self._fieldnames = next(self.reader) except StopIteration: pass self.line_num = self.reader.line_num return self._fieldnames totally unrelated to for loops catching StopIterations. You can of course easily revert that decision: reader = csv.DictReader(...) if reader.fieldnames is None: raise EmptyCsvError
When I use sequences and write first = seq[0] I'm deliberately asserting that seq is non-empty.
One of the examples you linked to shows exactly my own practice of marking a next call with a comment:
def __next__(self): while self.currkey == self.tgtkey: self.currvalue = next(self.it) # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) self.tgtkey = self.currkey return (self.currkey, self._grouper(self.tgtkey))
IMO if you want that behaviour then you should mark it to show that you thought about it and otherwise I'll treat any bare next with suspicion.
Similar comments are possible for obj[...]: def heapreplace(heap, item): ... returnitem = heap[0] # raises appropriate IndexError if heap is empty ... I fail to see the fundamental difference between next(...) and sequence[...]. There are edge cases you have to consider, and you add comments where you expect them to help your readers to understand your intentions.
On Fri, Feb 21, 2014 at 2:56 AM, Peter Otten <__peter__@web.de> wrote:
אלעזר wrote:
What is the "classic" use case for next() raising StopIteration, to be silently caught ? We need __next__ to do so in for loops, but when do we need it in the functional form?
Pretty much every generator that treats the first item(s) specially, like the one I gave above:
def process_source(source): it = iter(source) first = next(it) for item in it: yield first * item
Or these:
http://docs.python.org/dev/library/itertools.html#itertools.accumulate http://docs.python.org/dev/library/itertools.html#itertools.groupby http://docs.python.org/dev/library/itertools.html#itertools.islice ...
The behaviour of next() is really a feature rather than a bug.
It's also nice that you can pass it a default to avoid StopIteration in one-off iteration cases (like you cited above): first = next(it, 0) or generically: next((x for x in []), None) -eric
On 21 February 2014 18:00, Peter Otten <__peter__@web.de> wrote:
Oscar Benjamin wrote:
I think we constantly have to deal with libraries that do almost but not exactly what we want them to do. If you look at the code it is clear that the author made a conscious design decision
@property def fieldnames(self): if self._fieldnames is None: try: self._fieldnames = next(self.reader) except StopIteration: pass self.line_num = self.reader.line_num return self._fieldnames
totally unrelated to for loops catching StopIterations.
I was aware of the code. If you look at the commit that made it that way then you can see that the previous implementation was a bare next. It's not clear to me that the behaviour was a design decision or an implementation accident that was propagated for backwards compatibility: $ hg blame Lib/csv.py | grep StopIteration 44735: except StopIteration: $ hg log -p -r 44735 Lib/csv.py <snip> --- a/Lib/csv.py Sat Aug 09 12:47:13 2008 +0000 +++ b/Lib/csv.py Sat Aug 09 19:44:22 2008 +0000 @@ -68,7 +68,7 @@ class DictReader: def __init__(self, f, fieldnames=None, restkey=None, restval=None, dialect="excel", *args, **kwds): - self.fieldnames = fieldnames # list of keys for the dict + self._fieldnames = fieldnames # list of keys for the dict self.restkey = restkey # key to catch long rows self.restval = restval # default value for short rows self.reader = reader(f, dialect, *args, **kwds) @@ -78,11 +78,25 @@ def __iter__(self): return self + @property + def fieldnames(self): + if self._fieldnames is None: + try: + self._fieldnames = next(self.reader) + except StopIteration: + pass + self.line_num = self.reader.line_num + return self._fieldnames + + @fieldnames.setter + def fieldnames(self, value): + self._fieldnames = value + def __next__(self): + if self.line_num == 0: + # Used only for its side effect. + self.fieldnames row = next(self.reader) - if self.fieldnames is None: - self.fieldnames = row - row = next(self.reader) self.line_num = self.reader.line_num Peter wrote:
I fail to see the fundamental difference between next(...) and sequence[...]. There are edge cases you have to consider, and you add comments where you expect them to help your readers to understand your intentions.
The difference is that it's not common practice to catch and ignore IndexError around large blocks of code e.g.: try: do_loads_of_stuff() except IndexError: pass However that is in effect what happens for StopIteration since a typical program will have loads of places where it gets silently caught. The difference is that StopIteration is a particularly innocuous error to leak. Cheers, Oscar
If an iterators were indexable, with non-decreasing order only, then
first(it) would become a simple it[0]. with exactly the same semantics,
except the possibilty of raising a (visible) IndexError on an exhausted
iterator, instead of .
By "non decreasing" I mean the ability to do something like this:
it = iter('abcd')
for i in range(4):
print(it[i])
Of course, it will force the iterator to have additional memory.
I am sure this was suggested before, probably many times (and obviously
rejected), but not exactly in this specific context.
So, maybe allowing only it[0], or maybe generator[0], would be nice (even
if it's just an ugly special case").
----
Elazar
2014-02-22 16:49 GMT+02:00 Oscar Benjamin
On 21 February 2014 18:00, Peter Otten <__peter__@web.de> wrote:
Oscar Benjamin wrote:
I think we constantly have to deal with libraries that do almost but not exactly what we want them to do. If you look at the code it is clear
that
the author made a conscious design decision
@property def fieldnames(self): if self._fieldnames is None: try: self._fieldnames = next(self.reader) except StopIteration: pass self.line_num = self.reader.line_num return self._fieldnames
totally unrelated to for loops catching StopIterations.
I was aware of the code. If you look at the commit that made it that way then you can see that the previous implementation was a bare next. It's not clear to me that the behaviour was a design decision or an implementation accident that was propagated for backwards compatibility:
$ hg blame Lib/csv.py | grep StopIteration 44735: except StopIteration: $ hg log -p -r 44735 Lib/csv.py <snip> --- a/Lib/csv.py Sat Aug 09 12:47:13 2008 +0000 +++ b/Lib/csv.py Sat Aug 09 19:44:22 2008 +0000 @@ -68,7 +68,7 @@ class DictReader: def __init__(self, f, fieldnames=None, restkey=None, restval=None, dialect="excel", *args, **kwds): - self.fieldnames = fieldnames # list of keys for the dict + self._fieldnames = fieldnames # list of keys for the dict self.restkey = restkey # key to catch long rows self.restval = restval # default value for short rows self.reader = reader(f, dialect, *args, **kwds) @@ -78,11 +78,25 @@ def __iter__(self): return self
+ @property + def fieldnames(self): + if self._fieldnames is None: + try: + self._fieldnames = next(self.reader) + except StopIteration: + pass + self.line_num = self.reader.line_num + return self._fieldnames + + @fieldnames.setter + def fieldnames(self, value): + self._fieldnames = value + def __next__(self): + if self.line_num == 0: + # Used only for its side effect. + self.fieldnames row = next(self.reader) - if self.fieldnames is None: - self.fieldnames = row - row = next(self.reader) self.line_num = self.reader.line_num
Peter wrote:
I fail to see the fundamental difference between next(...) and sequence[...]. There are edge cases you have to consider, and you add comments where you expect them to help your readers to understand your intentions.
The difference is that it's not common practice to catch and ignore IndexError around large blocks of code e.g.:
try: do_loads_of_stuff() except IndexError: pass
However that is in effect what happens for StopIteration since a typical program will have loads of places where it gets silently caught. The difference is that StopIteration is a particularly innocuous error to leak.
Cheers, Oscar _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Feb 23, 2014 at 3:52 AM, אלעזר
If an iterators were indexable, with non-decreasing order only, then first(it) would become a simple it[0]. with exactly the same semantics, except the possibilty of raising a (visible) IndexError on an exhausted iterator, instead of .
By "non decreasing" I mean the ability to do something like this:
it = iter('abcd') for i in range(4): print(it[i])
Of course, it will force the iterator to have additional memory. I am sure this was suggested before, probably many times (and obviously rejected), but not exactly in this specific context. So, maybe allowing only it[0], or maybe generator[0], would be nice (even if it's just an ugly special case").
You can get that with itertools.tee(), or list(). ChrisA
participants (11)
-
Andrew Barnert
-
Chris Angelico
-
Eric Snow
-
Ethan Furman
-
Oscar Benjamin
-
Peter Otten
-
Ryan Gonzalez
-
Steven D'Aprano
-
Terry Reedy
-
Tim Peters
-
אלעזר