[Python-Dev] Rationale behind lazy map/filter
Stefan Mihaila
stefanmihaila91 at gmail.com
Tue Oct 13 07:59:56 EDT 2015
Hey guys,
Could someone clarify for me why it is a good idea to have map return an
iterator
that is iterable multiple times and acts as an empty iterator in
subsequent iterations?
> r = range(10)
> list(r) == list(r)
True
> a=map(lambda x:x+1, [1,2,3])
> list(a) == list(a)
False
Wouldn't it be safer for everyone if attempting to traverse some map
iterator
a second time would just throw a runtime error rather than act as an
empty iterator? Or am I missing
something obvious?
I understand that chaining functional operators like map/reduce/filter
is fairly common and
not materializing intermediate computation can improve performance in
some cases (or even turn infinite
into finite computation), but I also find myself running into
non-obvious bugs because of this.
Maybe it's just python2 habits, but I assume I'm not the only one
carelessly thinking that "iterating over an input a second time will
result in the same thing as the first time (or raise an error)".
What would you say is a good practice to avoid accidentally passing the
result of a map to a function traversing its inputs multiple times?
I assume there needs to be an agreement on these things for larger
codebases.
Should the caller always materialize the result of a map before passing
it elsewhere? Or should the callee always materialize its inputs before
using them?
Or should we just document whether the function traverses its input only
once, such as through some type annotation
("def f(x: TraversableOnce[T]")? If we refactor `f' such that it used to
traverse `x' only once, but now traverses it twice, should we
go and update all callers? Would type hints solve this?
More obvious assumption errors, such as "has a method called __len__"
throw a runtime error thanks to duck typing, but
more subtle ones, such as this one, are harder to express.
Thanks,
Stefan
More information about the Python-Dev
mailing list