[New-bugs-announce] [issue5973] re-usable generators / generator expressions should return iterables

Sven Rahmann report at bugs.python.org
Fri May 8 23:26:19 CEST 2009


New submission from Sven Rahmann <svenrahmann at googlemail.com>:

The syntax of generator expressions suggests that they can be used
similarly to lists (at least when iterated over).
However, as was pointed out to me, the resulting generators are
iterators and can be used only once.
This is inconvenient in situations where some function expects an
iterable argument but needs to iterate over it more than once.

Consider the following function (see also attached file
reusable_generators.py for a complete example)

def secondmax(iterable):
    """return the second largest value in iterable"""
    m = max(iterable)
    return max(i for i in iterable if i<m)

It works fine when passed a list or other iterable container, but
consider the following situation. We have a huge matrix A (list of
lists) and want to pass a column to the function.

Using a list works fine, but requires copying the column's values and
needs additional memory:

col2_list = [a[2] for a in A]  # new list created from column 2

There is no reason why we shouldn't be able to create an iterable object
that returns, one by one, the values from the colums:

col2_gen  = (a[2] for a in A) 

The problem is that secondmax(col2_gen) does not work; try the attached
file: col2_gen can be iterated over only once.

I can imagine many situations where I need or want to iterate over such
a "view" object several times; I don't see a reason why it shouldn't be
possible or why it would be unwanted.

We can do the following, but it is not elegant: Wrap the generator
expression into a closure and a class.

class ReusableGenerator():
    def __init__(self,g): self.g = g
    def __iter__(self):   return self.g()

col2_re = ReusableGenerator(lambda: (a[2] for a in A)) # I want this!

This works, but it is not a generator object (e.g., it doesn't have a
next method). We also need the lambda detour for this to work.

Note that in some situations, the "problem" I describe does not occur or
can be easily circumvented. For example instead of writing

col2 = (a[2] for a in A) 
for x in col2: foo(x)
for x in col2: foo(x) # doesn't work

we could just repeat the generator expression (and create a new iterator
whenever we need it):

for x in (a[2] for a in A): foo(x)
for x in (a[2] for a in A): foo(x) # works fine

But exactly this is impossible if I want to pass the generator
expression or generator function to another function (such as secondmax()). 

I believe this contradicts Python philosophy that functions can be
passed around just like other objects.


My proposal is probably unrealistic, but I would like to see generator
functions and generator expressions change in a way that they return not
iterators, but iterables, so the problem described here does not occur,
and wrapper classes are unnecessary.

In Java that distinction is very clear, in Python less so I think (which
is good because iterators are a pain to use in Java).


Admittedly, I have no idea why generator functions and expressions are
implemented as they are; there are probably lots of good reasons, and it
may not be possible to change this any time soon or at all.
However, I think the change would make Python a more consistent language.

----------
files: reusable_generators.py
messages: 87473
nosy: svenrahmann
severity: normal
status: open
title: re-usable generators / generator expressions should return iterables
type: feature request
versions: Python 3.1
Added file: http://bugs.python.org/file13936/reusable_generators.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5973>
_______________________________________


More information about the New-bugs-announce mailing list