[Tutor] HTML Parsing

Andreas Kostyrka andreas at kostyrka.org
Mon Apr 21 16:19:15 CEST 2008


Just from memory, you need to subclass the HTMLParser class, and provide
start_dt and end_dt methods, plus one to capture the text inbetween.

Read the docs on htmllib (www.python.org | Documentation | module docs),
and see if you can manage if not, come back with questions ;)

Andreas

Am Montag, den 21.04.2008, 14:40 +0100 schrieb Stephen Nelson-Smith:
> On 4/21/08, Andreas Kostyrka <andreas at kostyrka.org> wrote:
> > As usual there are a number of ways.
> >
> >  But I basically see two steps here:
> >
> >  1.) capture all dt elements. If you want to stick with the standard
> >  library, htmllib would be the module. Else you can use e.g.
> >  BeautifulSoup or something comparable.
> 
> I want to stick with standard library.
> 
> How do you capture <dt> elements?
> 
> S.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
Url : http://mail.python.org/pipermail/tutor/attachments/20080421/97f94caa/attachment.pgp 


More information about the Tutor mailing list