[Baypiggies] Script using generators produces different results when invoked as a CGI

K. Barclay kebarcla at comcast.net
Mon May 5 06:00:24 CEST 2008

I attended David Beazley's awe-inspiring tutorial on the use of generators in systems programming:
I used his approach to write a web tool that can display search results from different log files. But the resulting script produced fewer results when invoked as CGI than it did when run from the command line, and I can't figure out why.
He showed how to 'pipeline' generators together in a form of declarative programming. For example, put the following generators together to grep lines out of a set of files:
# Generate a list of files
lognames = gen_find("*.bz2","/tmp/bz2_alldata/pa_new")
# Yield a sequence of file objects that have been suitably opened
logfiles = gen_open(lognames)
# Concatenate multiple generators into a single sequence
loglines = gen_cat(logfiles)
# Grep a sequence of lines that match a re pattern
searchlines = gen_grep(r'fried',loglines)
The functions are only a few lines each:

def gen_open(filenames):
    for name in filenames:
        if name.endswith(".bz2"):
            yield bz2.BZ2File(name)

def gen_cat(sources):
    for s in sources:
        for item in s:
            yield item
import re
def gen_grep(pat,lines):
    patc = re.compile(pat)
    for line in lines:
        if patc.search(line): yield line
Since they're generators, processing the data doesn't start until you kick off the iteration on the final generator.
Problem: For small sets of files this works great. But when I had 19Meg worth of log files in a test directory, the script would return the correct number of matching lines (288) only when it was invoked directly from the command line. When invoked from a CGI script, it returns 220 lines instead (written to the "tempfile", below.) I don't know where that limit is coming from. If more logs are added to the test directory, the result is always the same 220 lines.
I'm using Python 2.5.1 on Red Hat Linux 3.2.3-47. Below is the whole script I was testing with. It's using hard-coded values in place of ones I'll be getting from an HTML form (generated with HTMLgen) presented to the user.
There are no exceptions or errors of any kind. Any pointers on what might be happening here would be welcome!
import tempfile
from genfind import  gen_find
from genopen import  gen_open
from gencat  import  gen_cat
from gengrep import  gen_grep
tf = tempfile.mkstemp()
tmpfile = open(tf[1],'w')
lognames = gen_find("*.bz2","/tmp/bz2_alldata/pa_new")
logfiles = gen_open(lognames)
loglines = gen_cat(logfiles)
searchlines = gen_grep(r'fried',loglines)
for line in searchlines:
    print >> tmpfile, line,

More information about the Baypiggies mailing list