[Python-Dev] iterators
M.-A. Lemburg
mal@lemburg.com
Mon, 21 Aug 2000 12:04:04 +0200
This is a multi-part message in MIME format.
--------------D181A42EF954A5F1E101C9DB
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Guido van Rossum wrote:
>
> Paul Prescod wrote:
>
> > I don't think of iterators as indexing in terms of numbers. Otherwise I
> > could do this:
> >
> > >>> a={0:"zero",1:"one",2:"two",3:"three"}
> > >>> for i in a:
> > ... print i
> > ...
> >
> > So from a Python user's point of view, for-looping has nothing to do
> > with integers. From a Python class/module creator's point of view it
> > does have to do with integers. I wouldn't be either surprised nor
> > disappointed if that changed one day.
>
> Bingo!
>
> I've long had an idea for generalizing 'for' loops using iterators. This is
> more a Python 3000 thing, but I'll explain it here anyway because I think
> it's relevant. Perhaps this should become a PEP? (Maybe we should have a
> series of PEPs with numbers in the 3000 range for Py3k ideas?)
>
> The statement
>
> for <variable> in <object>: <block>
>
> should translate into this kind of pseudo-code:
>
> # variant 1
> __temp = <object>.newiterator()
> while 1:
> try: <variable> = __temp.next()
> except ExhaustedIterator: break
> <block>
>
> or perhaps (to avoid the relatively expensive exception handling):
>
> # variant 2
> __temp = <object>.newiterator()
> while 1:
> __flag, <variable = __temp.next()
> if not __flag: break
> <block>
>
> In variant 1, the next() method returns the next object or raises
> ExhaustedIterator. In variant 2, the next() method returns a tuple (<flag>,
> <variable>) where <flag> is 1 to indicate that <value> is valid or 0 to
> indicate that there are no more items available. I'm not crazy about the
> exception, but I'm even less crazy about the more complex next() return
> value (careful observers may have noticed that I'm rarely crazy about flag
> variables :-).
>
> Another argument for variant 1 is that variant 2 changes what <variable> is
> after the loop is exhausted, compared to current practice: currently, it
> keeps the last valid value assigned to it. Most likely, the next() method
> returns None when the sequence is exhausted. It doesn't make a lot of sense
> to require it to return the last item of the sequence -- there may not *be*
> a last item, if the sequence is empty, and not all sequences make it
> convenient to keep hanging on to the last item in the iterator, so it's best
> to specify that next() returns (0, None) when the sequence is exhausted.
>
> (It would be tempting to suggeste a variant 1a where instead of raising an
> exception, next() returns None when the sequence is exhausted, but this
> won't fly: you couldn't iterate over a list containing some items that are
> None.)
How about a third variant:
#3:
__iter = <object>.iterator()
while __iter:
<variable> = __iter.next()
<block>
This adds a slot call, but removes the malloc overhead introduced
by returning a tuple for every iteration (which is likely to be
a performance problem).
Another possibility would be using an iterator attribute
to get at the variable:
#4:
__iter = <object>.iterator()
while 1:
if not __iter.next():
break
<variable> = __iter.value
<block>
> Side note: I believe that the iterator approach could actually *speed up*
> iteration over lists compared to today's code. This is because currently the
> interation index is a Python integer object that is stored on the stack.
> This means an integer add with overflow check, allocation, and deallocation
> on each iteration! But the iterator for lists (and other basic sequences)
> could easily store the index as a C int! (As long as the sequence's length
> is stored in an int, the index can't overflow.)
You might want to check out the counterobject.c approach I used
to speed up the current for-loop in Python 1.5's ceval.c:
it's basically a mutable C integer which is used instead of
the current Python integer index.
The details can be found in my old patch:
http://starship.python.net/crew/lemburg/mxPython-1.5.patch.gz
> [Warning: thinking aloud ahead!]
>
> Once we have the concept of iterators, we could support explicit use of them
> as well. E.g. we could use a variant of the for statement to iterate over an
> existing iterator:
>
> for <variable> over <iterator>: <block>
>
> which would (assuming variant 1 above) translate to:
>
> while 1:
> try: <variable> = <iterator>.next()
> except ExhaustedIterator: break
> <block>
>
> This could be used in situations where you have a loop iterating over the
> first half of a sequence and a second loop that iterates over the remaining
> items:
>
> it = something.newiterator()
> for x over it:
> if time_to_start_second_loop(): break
> do_something()
> for x over it:
> do_something_else()
>
> Note that the second for loop doesn't reset the iterator -- it just picks up
> where the first one left off! (Detail: the x that caused the break in the
> first loop doesn't get dealt with in the second loop.)
>
> I like the iterator concept because it allows us to do things lazily. There
> are lots of other possibilities for iterators. E.g. mappings could have
> several iterator variants to loop over keys, values, or both, in sorted or
> hash-table order. Sequences could have an iterator for traversing them
> backwards, and a few other ones for iterating over their index set (cf.
> indices()) and over (index, value) tuples (cf. irange()). Files could be
> their own iterator where the iterator is almost the same as readline()
> except it raises ExhaustedIterator at EOF instead of returning "". A tree
> datastructure class could have an associated iterator class that maintains a
> "finger" into the tree.
>
> Hm, perhaps iterators could be their own iterator? Then if 'it' were an
> iterator, it.newiterator() would return a reference to itself (not a copy).
> Then we wouldn't even need the 'over' alternative syntax. Maybe the method
> should be called iterator() then, not newiterator(), to avoid suggesting
> anything about the newness of the returned iterator.
>
> Other ideas:
>
> - Iterators could have a backup() method that moves the index back (or
> raises an exception if this feature is not supported, e.g. when reading data
> from a pipe).
>
> - Iterators over certain sequences might support operations on the
> underlying sequence at the current position of the iterator, so that you
> could iterate over a sequence and occasionally insert or delete an item (or
> a slice).
FYI, I've attached a module which I've been using a while for
iteration. The code is very simple and implements the #4 variant
described above.
> Of course iterators also connect to generators. The basic list iterator
> doesn't need coroutines or anything, it can be done as follows:
>
> class Iterator:
> def __init__(self, seq):
> self.seq = seq
> self.ind = 0
> def next(self):
> if self.ind >= len(self.seq): raise ExhaustedIterator
> val = self.seq[self.ind]
> self.ind += 1
> return val
>
> so that <list>.iterator() could just return Iterator(<list>) -- at least
> conceptually.
>
> But for other data structures the amount of state needed might be
> cumbersome. E.g. a tree iterator needs to maintain a stack, and it's much
> easier to code that using a recursive Icon-style generator than by using an
> explicit stack. On the other hand, I remember reading an article a while ago
> (in Dr. Dobbs?) by someone who argued (in a C++ context) that such recursive
> solutions are very inefficient, and that an explicit stack (1) is really not
> that hard to code, and (2) gives much more control over the memory and time
> consumption of the code. On the third hand, some forms of iteration really
> *are* expressed much more clearly using recursion. On the fourth hand, I
> disagree with Matthias ("Dr. Scheme") Felleisen about recursion as the root
> of all iteration. Anyway, I believe that iterators (as explained above) can
> be useful even if we don't have generators (in the Icon sense, which I
> believe means coroutine-style).
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
--------------D181A42EF954A5F1E101C9DB
Content-Type: text/python; charset=us-ascii;
name="Iterator.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="Iterator.py"
""" Generic object iterators.
(c) Copyright Marc-Andre Lemburg; All Rights Reserved.
See the documentation for further information on copyrights,
or contact the author (mal@lemburg.com).
"""
import exceptions
class Error(exceptions.StandardError):
pass
class Iterator:
value = None # Current value
def __init__(self,obj,startpos=0):
self.obj = obj
self.index = startpos
self.startpos = startpos
self.step = 1
def set(self,pos):
self.index = self.startpos = pos
self.value = self.obj[pos]
def reset(self):
self.index = pos = self.startpos
self.value = self.obj[pos]
def next(self):
self.index = i = self.index + self.step
if i < 0:
return 0
try:
self.value = self.obj[i]
return 1
except IndexError:
return 0
def prev(self):
self.index = i = self.index - self.step
if i < 0:
return 0
try:
self.value = self.obj[i]
return 1
except IndexError:
return 0
def __getitem__(self,i):
if i != 0:
self.index = i = self.index + self.step
else:
i = self.index
if i < 0:
raise IndexError
self.value = v = self.obj[i]
return v
ForwardIterator = Iterator
class BackwardIterator(Iterator):
def __init__(self,obj,startpos=None):
self.obj = obj
if startpos is not None:
self.index = startpos
else:
self.index = startpos = len(obj) - 1
self.startpos = startpos
self.value = self.obj[startpos]
self.step = -1
class CallIterator(Iterator):
def __init__(self,obj):
self.obj = obj
self.index = 0
def set(self,pos):
raise Error,'.set() not supported'
def reset(self):
raise Error,'.reset() not supported'
def next(self):
self.index = self.index + 1
try:
v = self.obj()
if not v:
return 0
self.value = v
return 1
except IndexError:
return 0
def prev(self):
raise Error,'.prev() not supported'
def __getitem__(self,i):
self.index = self.index + 1
v = self.obj()
if not v:
raise IndexError
self.value = v
return v
def _test():
i = BackwardIterator(range(1,10))
for x in i:
print x
print
i.reset()
while 1:
print i.value
if not i.next():
break
print
filereader = CallIterator(open('/usr/dict/words').readline)
for line in filereader:
pass
print 'Read %i lines' % filereader.index
if __name__ == '__main__':
_test()
--------------D181A42EF954A5F1E101C9DB--