Iterator vs Lists Was: Speed of string += string

Raymond Hettinger vze4rx4y at verizon.net
Mon Apr 14 19:31:43 EDT 2003


[Martelli bot]
> d=dict.fromkeys(range(200))''' 'for x in d.keys(): pass'
> d=dict.fromkeys(range(200))''' 'for x in d.iterkeys(): pass'
> 10000 loops, best of 3: 28.2 usec per loop

> That's most definitely NOT a significant performance difference,
> of course, but calling iteration on iterkeys "a lot faster" than
> iteration on keys would appear to be substantially misleading, at
> least without a lot of qualifications and hedging!-)

Here are a few thoughts about the performance differences between
looping over an iterator versus building a list and looping over it.

* Constructing a list is especially fast when the underlying object
   can report its length.  That enables the list constructors to
pre-allocate
   the entire list at once.  If not, the construction requires a series of
   appends and an occasional resize/copy operation.

* Iterators have a tiny constant size while lists take space proportional
   to the length of the list.  The part that is not obvious is that looping
   over the iterator re-uses the same memory location again an again.
   So the relevant data is almost always in the hardware memory cache.
   The same is also true for small lists.  However, large lists cannot all
   fit in the cache and accessing them will be slower than their iterator
   based counterparts.  As processor speeds race ahead of memory
   speeds, the performance difference will become more pronounced
   over time.

* It takes time to allocate Python objects, but the smart allocator saves
   some of that time when it can reuse recently discarded objects.  This
   technique works very well with iterators which tend to continuously
   create and abandon many same sized objects.  However, list based
   approaches keep a reference to each element preventing their reuse.

   Sometimes the difference is tiny; sometimes it is huge.  There were some
   examples where itertools.izip() outperformed zip() by about ten to one.
   Timbot's subsequent optimizations to zip() made this much less likely.
Still,
   izip() generally performs at least a little better than its list based
counterpart.

The executive summary:
For small datasets, iterator and list based approaches have similar
performance.
For larger datasets, iterators save both time and space.


Raymond Hettinger









More information about the Python-list mailing list