Rationale behind lazy map/filter
Hey guys, Could someone clarify for me why it is a good idea to have map return an iterator that is iterable multiple times and acts as an empty iterator in subsequent iterations?
r = range(10) list(r) == list(r) True a=map(lambda x:x+1, [1,2,3]) list(a) == list(a) False
Wouldn't it be safer for everyone if attempting to traverse some map iterator a second time would just throw a runtime error rather than act as an empty iterator? Or am I missing something obvious? I understand that chaining functional operators like map/reduce/filter is fairly common and not materializing intermediate computation can improve performance in some cases (or even turn infinite into finite computation), but I also find myself running into non-obvious bugs because of this. Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)". What would you say is a good practice to avoid accidentally passing the result of a map to a function traversing its inputs multiple times? I assume there needs to be an agreement on these things for larger codebases. Should the caller always materialize the result of a map before passing it elsewhere? Or should the callee always materialize its inputs before using them? Or should we just document whether the function traverses its input only once, such as through some type annotation ("def f(x: TraversableOnce[T]")? If we refactor `f' such that it used to traverse `x' only once, but now traverses it twice, should we go and update all callers? Would type hints solve this? More obvious assumption errors, such as "has a method called __len__" throw a runtime error thanks to duck typing, but more subtle ones, such as this one, are harder to express. Thanks, Stefan
On Tue, 13 Oct 2015 14:59:56 +0300, Stefan Mihaila <stefanmihaila91@gmail.com> wrote:
Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)".
This is the way iterators have always worked. The only new thing is that in python3 some things that used to be iter*ables* (lists, usually) are now iter*ators*. Yes it is a change in mindset *with regards to those functions* (and yes I sometimes find it annoying), but it is actually more consistent than it was in python2, and thus easier to generalize your knowledge about how python works instead of having to remember which functions work which way. That is, if you need to iterate it twice, turn it into a list first. --David
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 14:59:56 +0300, Stefan Mihaila <stefanmihaila91@gmail.com> wrote:
Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)".
This is the way iterators have always worked.
It does raise the question though of what working code it would actually break to have "exhausted" iterators raise an error if you try to iterate them again rather than silently yield no items.
On Tue, Oct 13, 2015 at 10:26 AM, Random832 <random832@fastmail.com> wrote:
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 14:59:56 +0300, Stefan Mihaila <stefanmihaila91@gmail.com> wrote:
Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)".
This is the way iterators have always worked.
It does raise the question though of what working code it would actually break to have "exhausted" iterators raise an error if you try to iterate them again rather than silently yield no items.
You mean like this?
m = map(int, '1234') list(m) [1, 2, 3, 4] next(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
It just happens that 'list()' and 'for ...' handle StopIteration for you. -- Zach
On Tue, 13 Oct 2015 11:26:09 -0400, Random832 <random832@fastmail.com> wrote:
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 14:59:56 +0300, Stefan Mihaila <stefanmihaila91@gmail.com> wrote:
Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)".
This is the way iterators have always worked.
It does raise the question though of what working code it would actually break to have "exhausted" iterators raise an error if you try to iterate them again rather than silently yield no items.
They do raise an error: StopIteration. It's just that the iteration machinery uses that to stop iteration :). And the answer to the question is: lots of code. I've written some: code that iterates an iterator, breaks that loop on a condition, then resumes iterating, breaking that loop on a different condition, and so on, until the iterator is exhausted. If the iterator restarted at the top once it was exhausted, that code would break. --David
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 11:26:09 -0400, Random832 <random832@fastmail.com> wrote:
It does raise the question though of what working code it would actually break to have "exhausted" iterators raise an error if you try to iterate them again rather than silently yield no items.
They do raise an error: StopIteration. It's just that the iteration machinery uses that to stop iteration :).
I meant a real error and you know it, both of you. StopIteration is an exception in the technical sense that it can be raised and caught, but it's not an error because it is used for normal control flow. In the plain english meaning of the word, it isn't even an exception.
And the answer to the question is: lots of code. I've written some: code that iterates an iterator, breaks that loop on a condition, then resumes iterating, breaking that loop on a different condition, and so on, until the iterator is exhausted. If the iterator restarted at the top once it was exhausted, that code would break
I'm not suggesting restarting at the top (I've elsewhere suggested that many such methods would be better as an *iterable* that can be restarted at the top by calling iter() multiple times, but that's not the same thing). I'm suggesting raising an exception other than StopIteration, so that this situation can be detected. If you are writing code that tries to resume iterating after the iterator has been exhausted, I have to ask: why? I suppose the answer is the same reason people would deliberately raise StopIteration in the ways that PEP479 breaks - because it works and is easy. But that wasn't a reason not to deprecate that.
On Wed, Oct 14, 2015 at 3:08 AM, Random832 <random832@fastmail.com> wrote:
If you are writing code that tries to resume iterating after the iterator has been exhausted, I have to ask: why?
A well-behaved iterator is supposed to continue raising StopIteration forever once it's been exhausted. I don't know how much code actually depends on this, but it wouldn't be hard to make a wrapper that raises a different exception instead: class iter: _orig_iter = iter def __init__(self, thing): self.iter = self._orig_iter(thing) self.exhausted = False def __iter__(self): return self def __next__(self): if self.exhausted: raise RuntimeError("Already exhausted") try: return next(self.iter) except StopIteration: self.exhausted = True raise Play with that, and see where RuntimeErrors start coming up. I suspect they'll be rare, but they will happen. ChrisA
Chris Angelico <rosuav@gmail.com> writes:
A well-behaved iterator is supposed to continue raising StopIteration forever once it's been exhausted.
Yes, and that is *precisely* the behavior that causes the problem under discussion. My question was what code depends on this.
Play with that, and see where RuntimeErrors start coming up. I suspect they'll be rare, but they will happen.
My theory is that most circumstances under which this would cause a RuntimeError are indicative of a bug in the algorithm consuming the iterator (for example, an algorithm that hasn't considered iterators and expects to be passed an iterable it can iterate from the top more than once), rather than the current behavior being relied on to produce the intended end result. This is essentially the same argument as PEP 479 - except there it was at least *easy* to come up with code which would rely on the old behavior to produce the intended end result. About the only example I can think of is that the implementation of itertools.zip_longest would have to change.
On Wed, Oct 14, 2015 at 3:49 AM, Random832 <random832@fastmail.com> wrote:
My theory is that most circumstances under which this would cause a RuntimeError are indicative of a bug in the algorithm consuming the iterator (for example, an algorithm that hasn't considered iterators and expects to be passed an iterable it can iterate from the top more than once), rather than the current behavior being relied on to produce the intended end result.
This is essentially the same argument as PEP 479 - except there it was at least *easy* to come up with code which would rely on the old behavior to produce the intended end result.
Yeah. Hence my suggestion of a quick little replacement for the iter() function (though, on second reading of the code, I realise that I forgot about the two-arg form; changing 'thing' to '*args' should fix that though) as a means of locating the actual cases where that happens. Hmm. Actually, this kinda breaks if you call it multiple times. Calling iter() on an iterator should return itself, not a wrapper around self. So, new version: class iter: _orig_iter = iter def __new__(cls, *args): if len(args)==1 and isinstance(args[0], cls): # It's already a wrapped iterator. Return it as-is. return args[0] return super().__new__(cls) def __init__(self, *args): if hasattr(self, "iter"): return # Don't rewrap self.iter = self._orig_iter(*args) self.exhausted = False def __iter__(self): return self def __next__(self): if self.exhausted: raise RuntimeError("Already exhausted") try: return next(self.iter) except StopIteration: self.exhausted = True raise I don't have any code of mine that would be broken by this implementation of iter(). Doesn't mean it isn't buggy in ways I haven't spotted, though. :) ChrisA
On Tue, 13 Oct 2015 12:08:12 -0400, Random832 <random832@fastmail.com> wrote:
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 11:26:09 -0400, Random832 <random832@fastmail.com> wrote:
And the answer to the question is: lots of code. I've written some: code that iterates an iterator, breaks that loop on a condition, then resumes iterating, breaking that loop on a different condition, and so on, until the iterator is exhausted. If the iterator restarted at the top once it was exhausted, that code would break
I'm not suggesting restarting at the top (I've elsewhere suggested that many such methods would be better as an *iterable* that can be restarted at the top by calling iter() multiple times, but that's not the same thing). I'm suggesting raising an exception other than StopIteration, so that this situation can be detected. If you are writing code that tries to resume iterating after the iterator has been exhausted, I have to ask: why?
Because the those second &c loops don't run if the iterator is already exhausted, the else clause is executed instead (or nothing happens, depending on the code). Now, likely such code isn't common (so I shouldn't have said "lots"), but the fact that I've done it at least once, maybe twice (but I can't remember what context, it was a while ago), argues it isn't vanishingly uncommon. --David
On Tue, Oct 13, 2015 at 8:26 AM, Random832 <random832@fastmail.com> wrote:
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 14:59:56 +0300, Stefan Mihaila <stefanmihaila91@gmail.com> wrote:
Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)".
This is the way iterators have always worked.
It does raise the question though of what working code it would actually break to have "exhausted" iterators raise an error if you try to iterate them again rather than silently yield no items.
What about cases where not all of the elements of the iterator are known at the outset? For example, you might have a collection of pending tasks that you periodically loop through and process. Changing the behavior would result in an error when checking for more tasks instead of no tasks. --Chris
On Tue, Oct 13, 2015 at 11:26:09AM -0400, Random832 wrote:
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 14:59:56 +0300, Stefan Mihaila <stefanmihaila91@gmail.com> wrote:
Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)".
This is the way iterators have always worked.
It does raise the question though of what working code it would actually break to have "exhausted" iterators raise an error if you try to iterate them again rather than silently yield no items.
Anything which looks like this: for item in iterator: if condition: break do_this() ... for item in iterator: do_that() If the condition is never true, the iterator is completely processed by the first loop, and the second loop is a no-op by design. I don't know how common it is, but I've written code like that. Had we been designing the iterator protocol from scratch, perhaps we might have had two exceptions: class EmptyIterator(Exception): ... class StopIteration(EmptyIterator): ... and have StopIteration only raised the first time you call next() on an empty iterator. But would it have been better? I don't know. I suspect not. I think that although it might avoid a certain class of errors, it would add complexity to other situations which are currently simple. -- Steve
On 14 October 2015 at 09:59, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Oct 13, 2015 at 11:26:09AM -0400, Random832 wrote:
"R. David Murray" <rdmurray@bitdance.com> writes:
On Tue, 13 Oct 2015 14:59:56 +0300, Stefan Mihaila <stefanmihaila91@gmail.com> wrote:
Maybe it's just python2 habits, but I assume I'm not the only one carelessly thinking that "iterating over an input a second time will result in the same thing as the first time (or raise an error)".
This is the way iterators have always worked.
It does raise the question though of what working code it would actually break to have "exhausted" iterators raise an error if you try to iterate them again rather than silently yield no items.
Anything which looks like this:
for item in iterator: if condition: break do_this() ... for item in iterator: do_that()
If the condition is never true, the iterator is completely processed by the first loop, and the second loop is a no-op by design.
I don't know how common it is, but I've written code like that.
I wrote code like this yesterday, to parse a file where there were multiple lines of one type of data, followed by multiple lines of another type of data. I can think of more complex examples including two (or more) iterators where one might reasonably do similar things. E.g. file 1 contains data, some of which is a subset of data in file 2, both of which are sorted. And during parsing, one wishes to match up the common elements. -Graham
On 10/13/2015 7:59 AM, Stefan Mihaila wrote:
Could someone clarify for me ...
This list, pydev, short for 'python development', is for discussing development of future releases of CPython. Your question should have been directed to python-list, where it would be entirely on topic. -- Terry Jan Reedy
participants (9)
-
Chris Angelico
-
Chris Jerdonek
-
Graham Gower
-
R. David Murray
-
Random832
-
Stefan Mihaila
-
Steven D'Aprano
-
Terry Reedy
-
Zachary Ware