need help on need help on generator...

Sat Jan 22 06:00:36 EST 2005

Francis Girard <francis.girard at free.fr> wrote:
   ...
> > A 'def' of a function whose body uses 'yield', and in 2.4 the new genexp
> > construct.
> 
> Ok. I guess I'll have to update to version 2.4 (from 2.3) to follow the
> discussion.

It's worth upgrading even just for the extra speed;-).

> > Since you appear to conflate generators and iterators, I guess the iter
> > built-in function is the main one you missed.  iter(x), for any x,
> > either raises an exception (if x's type is not iterable) or else returns
> > an iterator.
> 
> You're absolutly right, I take the one for the other and vice-versa. If I
> understand correctly, a "generator" produce something over which you can
> iterate with the help of an "iterator". Can you iterate (in the strict sense
> of an "iterator") over something not generated by a "generator" ?

A generator function (commonly known as a generator), each time you call
it, produces a generator object AKA a generator-iterator.  To wit:

>>> def f(): yield 23
... 
>>> f
<function f at 0x75fe70>
>>> x = f()
>>> x
<generator object at 0x75aa58>
>>> type(x)
<type 'generator'>

A generator expression (genexp) also has a result which is a generator
object:

>>> x = (23 for __ in [0])
>>> type(x)
<type 'generator'>

Iterators need not be generator-iterators, by any means.  Generally, the
way to make sure you have an iterator is to call iter(...) on something;
if the something was already an iterator, NP, then iter's idempotent:

>>> iter(x) is x
True

That's what "an iterator" means: some object x such that x.next is
callable without arguments and iter(x) is x.

Since iter(x) tries calling type(x).__iter__(x) [[slight simplification
here by ignoring custom metaclasses, see recent discussion on python-dev
as to why this is only 99% accurate, not 100% accurate]], one way to
code an iterator is as a class.  For example:

class Repeater(object):
    def __iter__(self): return self
    def next(self): return 23

Any instance of Repeater is an iterator which, as it happens, has just
the same behavior as itertools.repeat(23), which is also the same
behavior you get from iterators obtained by calling:

def generepeat():
    while True: yield 23

In other words, after:

a = Repeater()
b = itertools.repeat(23)
c = generepeat()

the behavior of a, b and c is indistinguishable, though you can easily
tell them apart by introspection -- type(a) != type(b) != type(c).

Python's penchant for duck typing -- behavior matters more, WAY more
than implementation details such as type() -- means we tend to consider
a, b and c fully equivalent.  Focusing on ``generator'' is at this level
an implementation detail.

Most often, iterators (including generator-iterators) are used in a for
statement (or equivalently a for clause of a listcomp or genexp), which
is why one normally doesn't think about built-in ``iter'': it's called
automatically by these ``for'' syntax-forms.  In other words,

for x in <<<whatever>>>:
   ...body...

is just like:

__tempiter = iter(<<<whatever>>>)
while True:
    try: x = __tempiter.next()
    except StopIteration: break
    ...body...

((Simplification alert: the ``for'' statement has an optional ``else''
which this allegedly "just like" form doesn't mimic exactly...))

> You're right. I was much more talking (mistakenly) about lazy evaluation of
> the arguments to a function (i.e. the function begins execution before its
> arguments get evaluated) -- in such a case I think it should be specified
> which arguments are "strict" and which are "lazy" -- but I don't think
> there's such a thing in Python (... well not yet as Python get more and more
> akin to FP).

Python's strict that way.  To explicitly make some one argument "lazy",
sorta, you can put a "lambda:" in front of it at call time, but then you
have to "call the argument" to get it evaluated; a bit of a kludge.
There's a PEP out to allow a ``prefix :'' to mean just the same as this
"lambda:", but even though I co-authored it I don't think it lowers the
kludge quotient by all that much.

Guido, our beloved BDFL, is currently musing about optional typing of
arguments, which might perhaps open a tiny little crack towards letting
some arguments be lazy.  I don't think Guido wants to go there, though.

My prediction is that even Python 3000 will be strict.  At least this
makes some things obvious at each call-site without having to study the
way a function is defined, e.g., upon seeing
    f(a+b, c*d)
you don't have to wonder, or study the ``def f'', to find out when the
addition and the multiplication happen -- they happen before f's body
gets a chance to run, and thus, in particular, if either operation
raises an exception, there's nothing f can do about it.

And that's a misunderstanding I _have_ seen repeatedly even in people
with a pretty good overall grasp of Python, evidenced in code such as:
    self.assertRaises(SomeError, f(23))
with astonishment that -- if f(23) does indeed raise SomeError -- this
exception propagates, NOT caught by assertRaises; and if mistakenly
f(23) does NOT raise, you typically get a TypeError about None not being
callable.  The way to do the above call, _since Python is strict_, is:
    self.assertRaises(SomeError, f, 23)
i.e. give assertRaises the function (or other callable) and arguments to
pass to it -- THIS way, assertRaises performs the call under its own
control (inside the try clause of a try/except statement) and can and
does catch and report things appropriately.

The frequency of this misunderstanding is high enough to prove to me
that strictness is not FULLY understood by some "intermediate level"
Pythonistas.  However, I doubt the right solution is to complicate
Python with the ability to have some arguments be strict, and other
lazy, much as sometimes one might yearn for it.  _Maybe_ one could have
some form that makes ALL arguments lazy, presenting them as an iterator
to the function itself.  But even then the form _should_, I suspect, be
obvious at the call site, rather than visible only in the "def"...

Alex