Iterator vs Lists Was: Speed of string += string
Raymond Hettinger
vze4rx4y at verizon.net
Mon Apr 14 19:31:43 EDT 2003
[Martelli bot]
> d=dict.fromkeys(range(200))''' 'for x in d.keys(): pass'
> d=dict.fromkeys(range(200))''' 'for x in d.iterkeys(): pass'
> 10000 loops, best of 3: 28.2 usec per loop
> That's most definitely NOT a significant performance difference,
> of course, but calling iteration on iterkeys "a lot faster" than
> iteration on keys would appear to be substantially misleading, at
> least without a lot of qualifications and hedging!-)
Here are a few thoughts about the performance differences between
looping over an iterator versus building a list and looping over it.
* Constructing a list is especially fast when the underlying object
can report its length. That enables the list constructors to
pre-allocate
the entire list at once. If not, the construction requires a series of
appends and an occasional resize/copy operation.
* Iterators have a tiny constant size while lists take space proportional
to the length of the list. The part that is not obvious is that looping
over the iterator re-uses the same memory location again an again.
So the relevant data is almost always in the hardware memory cache.
The same is also true for small lists. However, large lists cannot all
fit in the cache and accessing them will be slower than their iterator
based counterparts. As processor speeds race ahead of memory
speeds, the performance difference will become more pronounced
over time.
* It takes time to allocate Python objects, but the smart allocator saves
some of that time when it can reuse recently discarded objects. This
technique works very well with iterators which tend to continuously
create and abandon many same sized objects. However, list based
approaches keep a reference to each element preventing their reuse.
Sometimes the difference is tiny; sometimes it is huge. There were some
examples where itertools.izip() outperformed zip() by about ten to one.
Timbot's subsequent optimizations to zip() made this much less likely.
Still,
izip() generally performs at least a little better than its list based
counterpart.
The executive summary:
For small datasets, iterator and list based approaches have similar
performance.
For larger datasets, iterators save both time and space.
Raymond Hettinger
More information about the Python-list
mailing list