Enhanced Generators - reiterating over a file

Oren Tirosh oren-py-l at hishome.net
Sun Feb 3 14:02:17 CET 2002

On Sat, Feb 02, 2002 at 10:05:11PM -0500, Kragen Sitaker wrote:
> > > That won't work when passed iterators as arguments, right?
> > 
> > Sure, but as long as iterators are kept as invisible temporary objects
> Um, sure.  iter(open("hello")) doesn't work that way already.

This is a real life example of how useful reiterable x functions can be
and how to solve the problem of reiterating over a file.

The following function takes a vector of samples and yields a stream of samples 
with their values normalized to the range +-1:

def normalized(vector):
    max_value = 0.0
    for sample in vector:
        max_value = max(max_value, abs(sample))
    for sample in vector:
        yield sample/max_value

Note that this must be a two-pass operation: you cannot yield the first
normalized sample before you find the maximum value.

My data is in a text file with one decimal sample per line.  This code reads
it and writes the normalized samples to another file in the same format.

vector = map(float, file('samples.dat'))
outfile = file('normalized.dat','w')
for sample in normalized(vector):
    print >>outfile, sample

This works in Python 2.2.  The only problem is that it reads the entire file
into memory.  What if the file is too big to read into memory or I just don't 
want to stress virtual memory unnecessarily?

Just add the magic x! change map->xmap, file->xfile and everything works 
exactly the same but without any temporary lists.

xfile is a lazy file object: the object just stores the filename.  Only when 
iter(xfile('filename')) is called the returned iterator object opens a 
temporary file descriptor to walk through the file.

xmap must be the truly lazy version that returns an iterable object, not the 
half-eager half-lazy version that returns an iterator.  This is because the 
function normalized() has to scan the source twice.

An xfile object really simulates a container - you can use iter() to get 
multiple independent iterators of the same container.  It should also appeal 
to the fans of a certain TV show :-)

A Python file object is not really a container: it can be argued that a file 
object already *is* a kind of iterator.  It is a temporary object used to walk 
through a container.  The real container in this case is the actual file on 
the disk.  An xfile object represents a file on the disk.

   x-ly yours,


More information about the Python-list mailing list