[Chicago] web page content scraper

Ted Pollari tcp at mac.com
Wed Aug 13 19:42:18 CEST 2008


On Aug 13, 2008, at 11:38 AM, Pete wrote:

>
> On Apr 9, 2008, at 11:27 AM, Adrian Holovaty wrote:
>
>> On Tue, Apr 8, 2008 at 9:25 AM, Tom Printy <tprinty at mail.edisonave.net 
>> > wrote:
>>> Wow this library is super cool. Anyone got slides or notes from the
>>> talk?
>>
>> Hey, that's my library and was my talk. Note that the current version
>> of templatemaker (on Google Code) is pretty "dumb" when dealing with
>> HTML.
>>
>> Since that talk, I've developed a new one, based on lxml, that
>> analyzes differences in the HTML trees. It's a *lot* better (I'd even
>> call it *awesome*), but I haven't released it open-source yet. Stay
>> tuned.
>
> Ian bicking wrote something similar IIRC, also based on lxml.  If  
> you're both gonna be there, would you like to talk about them  
> briefly?  Anyone want to speak for BeautifulSoup? I'm thinking just  
> 5-10 minutes on each.


This would be really useful -- I've just been assigned a task that  
will require a bunch of screen scraping... I'd gladly buy beer  
(before, during or after) as a bribe for these talks being given this  
week.

-tcp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20080813/7ebc5ff5/attachment.htm>


More information about the Chicago mailing list