[Chicago] web page content scraper

Pete pfein at pobox.com
Wed Aug 13 18:38:18 CEST 2008


On Apr 9, 2008, at 11:27 AM, Adrian Holovaty wrote:

> On Tue, Apr 8, 2008 at 9:25 AM, Tom Printy  
> <tprinty at mail.edisonave.net> wrote:
>> Wow this library is super cool. Anyone got slides or notes from the
>> talk?
>
> Hey, that's my library and was my talk. Note that the current version
> of templatemaker (on Google Code) is pretty "dumb" when dealing with
> HTML.
>
> Since that talk, I've developed a new one, based on lxml, that
> analyzes differences in the HTML trees. It's a *lot* better (I'd even
> call it *awesome*), but I haven't released it open-source yet. Stay
> tuned.

Ian bicking wrote something similar IIRC, also based on lxml.  If  
you're both gonna be there, would you like to talk about them  
briefly?  Anyone want to speak for BeautifulSoup?  I'm thinking just  
5-10 minutes on each.


More information about the Chicago mailing list