[Python-Dev] Rationale behind lazy map/filter

Tue Oct 13 07:59:56 EDT 2015

Hey guys,

Could someone clarify for me why it is a good idea to have map return an 
iterator
that is iterable multiple times and acts as an empty iterator in 
subsequent iterations?

 > r = range(10)
 > list(r) == list(r)
True
 > a=map(lambda x:x+1, [1,2,3])
 > list(a) == list(a)
False

Wouldn't it be safer for everyone if attempting to traverse some map 
iterator
a second time would just throw a runtime error rather than act as an 
empty iterator? Or am I missing
something obvious?

I understand that chaining functional operators like map/reduce/filter 
is fairly common and
not materializing intermediate computation can improve performance in 
some cases (or even turn infinite
into finite computation), but I also find myself running into 
non-obvious bugs because of this.
Maybe it's just python2 habits, but I assume I'm not the only one
carelessly thinking that "iterating over an input a second time will 
result in the same thing as the first time (or raise an error)".

What would you say is a good practice to avoid accidentally passing the 
result of a map to a function traversing its inputs multiple times?
I assume there needs to be an agreement on these things for larger 
codebases.

Should the caller always materialize the result of a map before passing 
it elsewhere? Or should the callee always materialize its inputs before 
using them?
Or should we just document whether the function traverses its input only 
once, such as through some type annotation
("def f(x: TraversableOnce[T]")? If we refactor `f' such that it used to 
traverse `x' only once, but now traverses it twice, should we
go and update all callers? Would type hints solve this?

More obvious assumption errors, such as "has a method called __len__" 
throw a runtime error thanks to duck typing, but
more subtle ones, such as this one, are harder to express.

Thanks,
Stefan