[Python-Dev] accumulator display syntax

Tim Peters tim_one at email.msn.com
Wed Oct 22 21:18:42 EDT 2003


I had a large file today, and needed to find lines matching several patterns
simultaneously.  It seemed a natural application for generator expressions,
so let's see how that looks.

Generalized a bit:

Given:
    "source", an iterable producing elements (like a file producing lines)
    "predicates", a sequence of one-argument functions, mapping element to
truth
        (like a regexp search returning a match object or None)

Create:
    a generator producing the elements of source for which each predicate is
true

This is-- or should be --an easy application for pipelining generator
expressions.  Like so:

    pipe = source
    for p in predicates:
        # add a filter over the current pipe, and call that the new pipe
        pipe = e for e in pipe if p(e)

Now I hope that

    for e in pipe:
        print e

prints the desired elements.  If will if the "p" and "pipe" in the generator
expression use the bindings in effect at the time the generator expression
is assigned to pipe.  If the generator expression is instead a closure, it's
a subtle disaster.  You can play with this today like so:

    pipe = source
    for p in predicates:
        # pipe = e for e in pipe if p(e)
        def g(pipe=pipe, p=p):
            for e in pipe:
                if p(e):
                    yield e
        pipe = g()

    for e in pipe:
        print e

Those are the semantics for which "it works".

If "p=p" is removed (so that the implementation of the generator expression
acts like a closure wrt p), the effect is to ignore all but the last
predicate.  Instead predicates[-1] is applied to soucre, and then applied
redundantly to the survivors len(predicates)-1 times each.  It's not obvious
then that the result is wrong, and for some inputs may even be correct.

If "pipe=pipe" is removed instead, it should produce a "generator already
executing" exception, since the "pipe" in the final for-loop is bound to the
same object as the "pipe" inside g then (all of the g's, but only the last g
matters).




More information about the Python-Dev mailing list