Iteration and large lists (was: Re: [Edu-sig] How to Improve code performance)

Seth David Schoen schoen@loyalty.org
Tue, 30 Oct 2001 12:17:41 -0800


Titu Kim writes:

> I am also looking for the result on this question.
> --- Seth David Schoen <schoen@loyalty.org> wrote:
> > 
> > Anyway, this also has me wondering about the
> > relative speed of
> > different Python iteration idioms (for, while, map,
> > recursion, and now
> > list comprehensions).  They're not all applicable to
> > the same
> > problems, but where there is overlap, which are
> > faster than others?

In a trivial test, I noticed that the standard way to do an iteration
over an index,

for i in range(n):
	pass

raises a MemoryError for sufficiently large n (on Unix with a ulimit
in effect, limiting how much memory any process can allocate).  On the
other hand,

i = 0
while i < n:
	pass

obviously has no such memory limitation.  The memory problems arise
from trying to construct the enormous list -- I can reproduce them on
this particular system merely by writing

a = range(10**7)

Now, one thought is that you could simulate the behavior of list data
types by creating a class to represent "sparse lists" or the more
general "patterned lists", lists in which the nth item is given by a
function of n (instead of being calculated and stored beforehand).

The advantage here would be that you could say something like

r = PatternedList(lambda x: x, 0, 10**7)

for i in r:
	pass

The PatternedList class is easy to implement _except_ that I don't
think a user-defined object -- other than a subclass of UserList --
can be used in a "for" loop.  UserList doesn't seem to help here,
because it doesn't actually allow you to override the element lookup
behavior of the list, to change the way the list is represented in
memory.

Anyway, the general point was about resource requirements for
different iteration idioms.  Recursion requires memory, iteration over
a sequence requires memory if you have to build the sequence just for
that purpose, and iteration with while _doesn't_ require more than a
constant amount of memory (well, maybe if you get up into long
integers).  If we rewrote code to use a while instead of a "for i in
range", our code's memory profile could be much different.

-- 
Seth David Schoen <schoen@loyalty.org> | Its really terrible when FBI arrested
Temp.  http://www.loyalty.org/~schoen/ | hacker, who visited USA with peacefull
down:  http://www.loyalty.org/   (CAF) | mission -- to share his knowledge with
     http://www.freesklyarov.org/      | american nation.  (Ilya V. Vasilyev)