[Baypiggies] Script using generators produces different results when invoked as a CGI

Shannon -jj Behrens jjinux at gmail.com
Wed May 7 02:40:17 CEST 2008


On Sun, May 4, 2008 at 9:00 PM, K. Barclay <kebarcla at comcast.net> wrote:
> Hello,
>
>  I attended David Beazley's awe-inspiring tutorial on the use of generators in systems programming:
>
>  http://www.dabeaz.com/generators/
>
>  I used his approach to write a web tool that can display search results from different log files. But the resulting script produced fewer results when invoked as CGI than it did when run from the command line, and I can't figure out why.
>
>  He showed how to 'pipeline' generators together in a form of declarative programming. For example, put the following generators together to grep lines out of a set of files:
>
>  # Generate a list of files
>  lognames = gen_find("*.bz2","/tmp/bz2_alldata/pa_new")
>  # Yield a sequence of file objects that have been suitably opened
>  logfiles = gen_open(lognames)
>  # Concatenate multiple generators into a single sequence
>  loglines = gen_cat(logfiles)
>  # Grep a sequence of lines that match a re pattern
>  searchlines = gen_grep(r'fried',loglines)
>
>  The functions are only a few lines each:
>
>
>  def gen_open(filenames):
>     for name in filenames:
>         if name.endswith(".bz2"):
>             yield bz2.BZ2File(name)
>
>  def gen_cat(sources):
>     for s in sources:
>         for item in s:
>             yield item
>
>  import re
>  def gen_grep(pat,lines):
>     patc = re.compile(pat)
>     for line in lines:
>         if patc.search(line): yield line
>
>  Since they're generators, processing the data doesn't start until you kick off the iteration on the final generator.
>
>  Problem: For small sets of files this works great. But when I had 19Meg worth of log files in a test directory, the script would return the correct number of matching lines (288) only when it was invoked directly from the command line. When invoked from a CGI script, it returns 220 lines instead (written to the "tempfile", below.) I don't know where that limit is coming from. If more logs are added to the test directory, the result is always the same 220 lines.
>
>  I'm using Python 2.5.1 on Red Hat Linux 3.2.3-47. Below is the whole script I was testing with. It's using hard-coded values in place of ones I'll be getting from an HTML form (generated with HTMLgen) presented to the user.
>
>  There are no exceptions or errors of any kind. Any pointers on what might be happening here would be welcome!
>
>  Thanks
>  Ken
>
>  #!/usr/local/bin/python
>
>  import tempfile
>  from genfind import  gen_find
>  from genopen import  gen_open
>  from gencat  import  gen_cat
>  from gengrep import  gen_grep
>
>  tf = tempfile.mkstemp()
>  tmpfile = open(tf[1],'w')
>
>  lognames = gen_find("*.bz2","/tmp/bz2_alldata/pa_new")
>  logfiles = gen_open(lognames)
>  loglines = gen_cat(logfiles)
>
>  searchlines = gen_grep(r'fried',loglines)
>  for line in searchlines:
>     print >> tmpfile, line,
>  tmpfile.close()

I've heard that was a great talk.  Here are some ideas:

* Try running it as the same user as the Web server runs.

* Try dealing with one generator at a time and counting the number of
times it yields.  Try writing some debugging output to a file so you
can figure out where things are getting stuck.

* Are you getting any error messages at all?  For instance, is output
to stderr ending up in the Web server log file?

Best Regards,
-jj

-- 
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/


More information about the Baypiggies mailing list