[Chicago] Thursday!
Edward Summers
ehs at pobox.com
Tue Feb 7 13:59:07 CET 2006
On Feb 6, 2006, at 2:58 PM, Peter Fein wrote:
> I use pyparsing. It works pretty well, but would be curious as to
> what
> else folks use.
pyparsing rocks for building and using grammars. For screen scraping
the web I've really come to rely on Lundh's elementtidy [1].
elementtidy wraps tidylib [2], and allows you to feed in possibly
garbled HTML and get out a shiny new elementtree object.
For example, if your boss came to you and asked you to write a
program to pretend to be Moe the Bartender:
#!/usr/bin/env python
from elementtidy import TidyHTMLTreeBuilder
from urllib import urlopen
from random import randint
url = 'http://www.snpp.com/guides/moe_calls.html'
root = TidyHTMLTreeBuilder.parse(urlopen(url))
quotes = root.findall('.//{http://www.w3.org/1999/xhtml}b')
print quotes[ randint(0,len(quotes)-1) ].text
--
biblio:~ ed$ moe
Phone call for Al...Al Coholic...is there an Al Coholic here?
biblio:~ ed$ moe
Hey, is there a Butz here? Seymour Butz? Hey, everybody, I wanna
Seymour Butz!
//Ed
[1] http://effbot.org/zone/element-tidylib.htm
[2] http://tidy.sourceforge.net/
More information about the Chicago
mailing list