[Tutor] I am having difficulty grasping 'generators'

Cameron Simpson cs at zip.com.au
Wed May 28 01:54:56 CEST 2014


On 27May2014 15:27, Degreat Yartey <yarteydegreat2 at gmail.com> wrote:
>I am studying python on my own (i.e. i am between the beginner and
>intermediate level) and i haven't met any difficulty until i reached the
>topic 'Generators and Iterators'.
>I need an explanation so simple as using the expression 'print ()', in this
>case 'yield'.
>Python 2.6 here!
>Thank you.

Generators are functions that do a bit of work and then yield a value, then a 
bit more and so on. This means that you "call" them once. What you get back is 
an iterator, not the normal function return value.

Whenever you use the iterator, the generator function runs until it hits a 
"yield" statement, and the value in theyield statement is what you get for that 
iteration. Next time you iterate, the function runs a bit more, until it yields 
again, or returns (end of function, and that causes end of iteration).

So the function doesn't even run until you ask for a value, and then it only 
runs long enough to find the next value.

Example (all code illstrative only, untested):

Suppose you need to process every second line of a file.

You might write it directly like this:

   def munge_lines(fp):
     ''' Do stuff with every second line of the already-open file `fp`.
     '''
     lineno = 0
     for line in fp:
       lineno += 1
       if lineno % 2 == 0:
         print lineno, line,

That should read lines from the file and print every second one with the line 
number.

Now suppose you want something more complex than "every second line", 
especially something that requires keeping track of some state. In the example 
above you only need the line number, and using it still consumes 2 of the 3 
lines in the loop body.

A more common example might be "lines between two markers".

The more of that you embed in the "munge_lines" function, the more it will get 
in the way of seeing what the function actually does.

So a reasonable thing might be to write a function that gets the requested 
lines:

   def wanted_lines(fp):
     wanted = []
     between = False
     for line in fp:
       if between:
         if 'end_marker' in line:
           between = False
         else:
           wanted.append(line)
       elif 'start_maker' in line:
         between = True
     return wanted

This reads the whole file and returns a line of the wanted lines, and 
"munge_lines: might then look like this:

   for line in wanted_lines(fp):
     print line

However:

   - that reads the whole file before returning anything

   - has to keep all the lines in the list "wanted"

Slow in response, heavy in memory cost, and unworkable if "fp" actually doesn't 
end (eg reading from a terminal, or a pipeline, or...)

What you'd really like is to get each line as needed.

We can rewrite "wanted_lines" as a generator:

   def wanted_lines(fp):
     between = False
     for line in fp:
       if between:
         if 'end_marker' in line:
           between = False
         else:
           yield line
       elif 'start_maker' in line:
         between = True

All we've done is used "yield" instead of the "append" and removed the "wanted" 
list and the return statement. The calling code is the same.

To see the difference, put a "print" in "wanted_lines" as the first line of the 
for loop. With the "list" version you will see all the prints run before you 
get the array back. With the generator you will see the print run just before 
each value you get back.

Cheers,
Cameron Simpson <cs at zip.com.au>


More information about the Tutor mailing list