[Chicago] web page content scraper

Christopher Allan Webber cwebber at imagescape.com
Wed Apr 9 21:23:11 CEST 2008


It sounds interesting.  I'm interested in seeing the technical reasons
for the change to lxml, and possibly how that benefitted you.  Maybe
do another talk (or at least a lightning talk) at another ChiPy
meeting once you're ready to open it?

"Adrian Holovaty" <web at holovaty.com> writes:

> On Tue, Apr 8, 2008 at 9:25 AM, Tom Printy <tprinty at mail.edisonave.net> wrote:
>> Wow this library is super cool. Anyone got slides or notes from the
>>  talk?
>
> Hey, that's my library and was my talk. Note that the current version
> of templatemaker (on Google Code) is pretty "dumb" when dealing with
> HTML.
>
> Since that talk, I've developed a new one, based on lxml, that
> analyzes differences in the HTML trees. It's a *lot* better (I'd even
> call it *awesome*), but I haven't released it open-source yet. Stay
> tuned.
>
> Adrian
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago


More information about the Chicago mailing list