[Python-Dev] itertools module

Raymond Hettinger python@rcn.com
Mon, 27 Jan 2003 15:20:29 -0500


>     Raymond> The other differences are that it stops with the shortest
>     Raymond> iterable and doesn't accept None for a func argument.
> 
> I was questioning why with a name like "imap" you chose
> to make it differ from map() in ways other than its iterator-ness.  The
> other semantic differences make it more difficult to replace map() with
> itertools.imap() than it might be.

Okay, it's no problem to put back in the function=None behavior.
I had thought it an outdated hack that could be left behind, but
there is no loss from including it.  And, you're right, it may help
someone transition their code a little more easily.

>     Raymond> Because one or more useful inputs are potentially infinite,
>     Raymond> filling in Nones is less useful than stopping with the shortest
>     Raymond> iterator.
> 
> Yes, but it still seems a gratuitous change from map() to me.

I understand; however, for me, replicating quirks of map ranks
less in importance than creating a cohesive set of tools that work
well together.  The SML/Haskell tools have a number of infinite
iterators as basic building blocks; using them requires that other
functions know when to shut off.  I would like the package to
be unified by the idea that the iterators all terminate with shortest 
input (assuming they have one and some don't).

Also, I'm a little biased because that map feature has never been
helpful to me and more than once has gotten in the way.  The
implementations in Haskell and SML also did not include a 
None fillin feature.

>     >> * loopzip() - It's not clear why its next() method should return a
>     >>   list instead of a tuple (again, a seemingly needless distiction
>     >>   with its builtin counterpart, zip()).
> 
>     Raymond> I've wrestled with the one.  The short answer is that zip()
>     Raymond> already does a pretty good job and that the only use for
>     Raymond> loopzip() is super high speed looping.  To that end, reusing a
>     Raymond> single list instead of allocating and building tuples is *much*
>     Raymond> faster.
> 
> How do you know the caller doesn't squirrel away the list you returned on
> the n-th iteration?  I don't see how you can safely reuse the same list.

If needed, I can add in an izip() function that returns tuples just like zip()
does. I would like to keep loopzip().  It is very effective and efficient for the
use case that zip was meant to solve, namely lockstep iteration:

      for i, j in loopzip(ivector, jvector):   # results are immediately unpacked
            process(i,j)

This use case is even more prevalent with this package where loopzip
can combine algebraicly with other itertools or functionals:

     takewhile(binarypredicate, loopzip(ivec, jvec)

It's a terrible waste to constantly allocate tuples, build them, pass them,
unpack them, and throw them away on every pass.  Reuse is an
optimization that is already built into the existing implementations
of filter() and map().

> 
> Skip

Thanks again for the useful comments.
I'll add the map(None, s1, s2, ...) behavior
and write an izip() function which can be used
with full safety for non-looping use cases.


Raymond Hettinger