
I had a large file today, and needed to find lines matching several patterns simultaneously. It seemed a natural application for generator expressions, so let's see how that looks. Generalized a bit: Given: "source", an iterable producing elements (like a file producing lines) "predicates", a sequence of one-argument functions, mapping element to truth (like a regexp search returning a match object or None) Create: a generator producing the elements of source for which each predicate is true This is-- or should be --an easy application for pipelining generator expressions. Like so: pipe = source for p in predicates: # add a filter over the current pipe, and call that the new pipe pipe = e for e in pipe if p(e) Now I hope that for e in pipe: print e prints the desired elements. If will if the "p" and "pipe" in the generator expression use the bindings in effect at the time the generator expression is assigned to pipe. If the generator expression is instead a closure, it's a subtle disaster. You can play with this today like so: pipe = source for p in predicates: # pipe = e for e in pipe if p(e) def g(pipe=pipe, p=p): for e in pipe: if p(e): yield e pipe = g() for e in pipe: print e Those are the semantics for which "it works". If "p=p" is removed (so that the implementation of the generator expression acts like a closure wrt p), the effect is to ignore all but the last predicate. Instead predicates[-1] is applied to soucre, and then applied redundantly to the survivors len(predicates)-1 times each. It's not obvious then that the result is wrong, and for some inputs may even be correct. If "pipe=pipe" is removed instead, it should produce a "generator already executing" exception, since the "pipe" in the final for-loop is bound to the same object as the "pipe" inside g then (all of the g's, but only the last g matters).