[Chicago] Thursday!

Edward Summers ehs at pobox.com
Tue Feb 7 13:59:07 CET 2006

On Feb 6, 2006, at 2:58 PM, Peter Fein wrote:
> I use pyparsing.  It works pretty well, but would be curious as to  
> what
> else folks use.

pyparsing rocks for building and using grammars. For screen scraping  
the web I've really come to rely on Lundh's elementtidy [1].  
elementtidy wraps tidylib [2], and allows you to feed in possibly  
garbled HTML and get out a shiny new elementtree object.

For example, if your boss came to you and asked you to write a  
program to pretend to be Moe the Bartender:

   #!/usr/bin/env python

   from elementtidy import TidyHTMLTreeBuilder
   from urllib import urlopen
   from random import randint

   url = 'http://www.snpp.com/guides/moe_calls.html'
   root = TidyHTMLTreeBuilder.parse(urlopen(url))

   quotes = root.findall('.//{http://www.w3.org/1999/xhtml}b')
   print quotes[ randint(0,len(quotes)-1) ].text


biblio:~ ed$ moe
Phone call for Al...Al Coholic...is there an Al Coholic here?
biblio:~ ed$ moe
Hey, is there a Butz here? Seymour Butz? Hey, everybody, I wanna  
Seymour Butz!


[1] http://effbot.org/zone/element-tidylib.htm
[2] http://tidy.sourceforge.net/

More information about the Chicago mailing list