[Python-Dev] iterzip()

Mon, 29 Apr 2002 13:10:41 -0400

From: "Guido van Rossum" <guido@python.org>
> > > Did you time any of these?
> >
> > I just timed it and am shocked.  iterzip() has exactly the same code
> > as zip() except for the final append result to list.  So, I expected
only
> > a microscopic speed-up.  I don't see where the huge performance
> > improvement came from.
>
> Probably the list-append still has a slight quadratic component --
> you're doing a million elements here.

The order of magnitude speed boost is consistent through all ranges
measurable on my machine:

n      time   func
-      ----    ----
10000 0.11 zip
10000 0.00 iterzip
10000 0.38 genzip
100000 1.05 zip
100000 0.16 iterzip
100000 2.31 genzip
1000000 19.77 zip
1000000 1.37 iterzip
1000000 28.62 genzip

And at each range, timing('list(iterzip)') is approx equal to timing('zip').

> But I asked something else.  How much does the speed difference affect
> the apps you have?  In my experience it's usually used for small lists
> where the quadratic effect doesn't occur and the timing doesn't matter

I use zip() a lot, but iterzip() only made a speed difference in a few
places:

     building dictionaries:  reachable = dict(zip(map(alpha,wordlist,),
wordlist))
     forming character pairs on a file read into a string:  zip(text[:-1],
text[1:])
     forming matrices from vectors: Mat( zip(xvec, yvec) )

If we have to leave zip untouched, consider adding iterzip anyway.  It will
almost always be the better choice.

The real issue is which ONE of the following poisons are you most willing to
swallow:
1.  Forgo iterzip's speed/space improvements
2.  Grow the number of builtins by one
3.  Deprecate and/or relocate one of:  zip, reduce, input, apply, oct
4.  Change the behavior of zip to return an iterator (breaking some code)

I vote for either #2 or #3.

Raymond Hettinger

P.S.  Hmm, no name controversy?  Alex hasn't even suggested a beautiful
sounding Italian variant like 'terziparare' <winks and grins>