Iterators (was: Re: [Tutor] text module)

Ibraheem Umaru-Mohammed iumarumo@eidosnet.co.uk
Sun, 1 Sep 2002 01:56:56 +0100


["Erik Price"="Erik"]
Erik >> On Friday, August 30, 2002, at 07:41  PM, Scot W. Stevenson wrote:
Erik >> 
Erik >> >This is okay if you know that the file is going to be small and you 
Erik >> >machine
Erik >> >is big, but for large files on small machines, it could be a problem:
Erik >> >readlines loads the whole file into memory in one big gulp. xreadlines 
Erik >> >was
Erik >> >created in Python 2.1 (I think) to avoid this problem, but if you have
Erik >> >Python 2.2 or later, the really cool thing to do is to use iterators 
Erik >> >and
Erik >> >simply create a loop such as:
Erik >> >
Erik >> >for line in subj:
Erik >> >    (etc)
Erik >> 
Erik >> Not having any idea what iterators are, I went to the docs 
Erik >> (http://python.org/doc/current/whatsnew/node4.html) and read about 
Erik >> them.  But as always I am not sure of everything.
Erik >> 

A good article on iterators and generators is:

	,---- [ url ]
	| http://www-106.ibm.com/developerworks/library/l-pycon.html?t=gr,lnxw10=PyIntro
	`----

Erik >> I understand the construction that Scot uses above, because the docs 
Erik >> explain that file objects incorporate the __iter__() method, and "for" 
Erik >> loops now act upon this method for all sequence objects (and file 
Erik >> objects too).
Erik >> 
Erik >> It's the first part of the explanation of iterators that I'm not 
Erik >> totally clear on, the part that explains how you can incorporate your 
Erik >> own iterators into your own class definitions.
Erik >> 
Erik >> 
Erik >> Is it that you specify an "__iter__()" method, and then you implement 
Erik >> the code that would go about returning a "next" item each time the 
Erik >> __iter__() method is called?  How do you store the state of the object 
Erik >> such that it knows which item to return next time the __iter__() method 
Erik >> is called?  Am I correct in assuming that you must implement the 
Erik >> StopIterator exception yourself in the code for situations in which 
Erik >> there are no more items to be iterated over?  And finally, do you write 
Erik >> a separate class definition for an iterator, and then have your 
Erik >> original class definition use the iterator object, or is this something 
Erik >> that can be entirely kept within the class definition for the object in 
Erik >> question?
Erik >> 
Erik >> It's a confusing document IMHO.  I would appreciate any discussion and 
Erik >> thoughts that anyone cares to contribute.
Erik >> 

I agree, that the document probably could have done with some examples or something
added to the tutorial.

I haven't played much with iterators myself, nor simple generators, but the code
below might help, as it is an example that uses both the __iter__() and next() methods 
required of user-defined classes that adhere to the iterator protocol.

,---- [ myrange.py => Iterators ]
| #!/usr/bin/env python
| 
| class MyRange:
|   def __init__(self,step=1):
|     self.zeroToTen = ["zero", "one", "two","three","four","five","six","seven","eight","nine","ten" ]
|     self.index = 0 
|     self.step = step
| 
|   def __iter__(self):
|     return self 
| 
|   def next(self):
|     if self.index >= len(self.zeroToTen):
|       raise StopIteration
|     index = self.index
|     self.index += self.step
|     return self.zeroToTen[index]
| 
| if __name__ == "__main__":
|		myrange = MyRange(1)
|		for item in myrange:
|			print item
|
|		myrange = MyRange(3)
|		for item in myrange:
|			print item
`----

It basically has functionality similar to range(), but instead returns the 
name of the number as a string as opposed to returning numbers. It is 
limited to the range, zero to ten. The key here is the use of the "step", to 
specify the stride through the 11 elements. 

In the function next(), we have to save the value of the current index before we 
increment the index counter (in preparation for the "next" value) - we need to
do this before we return the value for the next() method. If the step was always
"1", the "++" that some other languages provied would have been useful here 
(e.g return self.zeroToTen[self.index++]).

Maybe someone else can shed some more light on using iterators and generators.

Anyway, I hope this helps a little.

													--ibz.
-- 
                          Ibraheem Umaru-Mohammed 
                                   "ibz"
                   umarumohammed (at) btinternet (dot) com