Iteration and large lists (was: Re: [Edu-sig] How to
Improve code performance)
Kirby Urner
urnerk@qwest.net
Tue, 30 Oct 2001 13:07:13 -0800
>
>Anyway, the general point was about resource requirements for
>different iteration idioms. Recursion requires memory, iteration over
>a sequence requires memory if you have to build the sequence just for
>that purpose, and iteration with while _doesn't_ require more than a
>constant amount of memory (well, maybe if you get up into long
>integers). If we rewrote code to use a while instead of a "for i in
>range", our code's memory profile could be much different.
Seth makes good points about range() pre-allocating potentially
large chunks of memory. Historically, xrange() was provided to
allow a "lazy evaluation" approach -- instead of pre-allocating,
it would do more like Seth suggests, and compute the next "list"
element with each iteration.
Since Python 2.2, we have the yield keyword, which, when used
in a function definition, makes that function a "generator".
To 'yield' a value is akin to using a print statement, except
the argument is returned in a usable form (ala 'return') --
which ties this thread to an earlier one re 'return' vs.
'print' semantics.
When a function encounters a 'yield', execution stops and the
calling function gets the yielded falue. When the function
is next triggered using .next(), or is iterated over as an
object, as in 'for i in myfunction()', then execution picks
up right after the 'yield', as if "nothing had happened"
(i.e. there's no more disruption to local variables than if
a 'print' had been used -- the function behaves as if that's
what had happened).
So xrange() is being phased out, given 'yield' provides a
much more generic approach to lazy evaluation (means a
'just in time' approach to value generation, where you
get a next value just when you need it, without a lot of
pre-allocation of values -- which you might never get to,
if the loop has a break in it, for example).
To make this more concrete, here's a lazyrange() function
which behaves somewhat like range (lots of improvements
and variations possible):
>>> from __future__ import generators # make feature available
>>> def lazyrange(initial,final,inc):
while 1: # infinite loop
if inc < 0: # break condition
if initial <= final: return
else: # alternate break condition
if initial >= final: return
yield initial
initial += inc # increment using inc
>>> for i in lazyrange(10,1,-1): print i,
10 9 8 7 6 5 4 3 2
You see above that by containing a yield, lazyrange becomes
a generator, and therefore a kind of iterable object (something
you can iterate over using 'for').
Note this alternative syntax:
>>> f = lazyrange(1,10,1) # instantiate function object
>>> type(f) # note f's type
<type 'generator'>
>>> dir(f) # note builtin generator method next()
['__class__', '__delattr__', '__getattribute__', '__hash__',
'__init__', '__iter__', '__new__', '__reduce__', '__repr__',
'__setattr__', '__str__', 'gi_frame', 'gi_running', 'next']
>>> for i in f: print i,
1 2 3 4 5 6 7 8 9
Kirby