[Tutor] HTML Parsing
Andreas Kostyrka
andreas at kostyrka.org
Mon Apr 21 16:19:15 CEST 2008
Just from memory, you need to subclass the HTMLParser class, and provide
start_dt and end_dt methods, plus one to capture the text inbetween.
Read the docs on htmllib (www.python.org | Documentation | module docs),
and see if you can manage if not, come back with questions ;)
Andreas
Am Montag, den 21.04.2008, 14:40 +0100 schrieb Stephen Nelson-Smith:
> On 4/21/08, Andreas Kostyrka <andreas at kostyrka.org> wrote:
> > As usual there are a number of ways.
> >
> > But I basically see two steps here:
> >
> > 1.) capture all dt elements. If you want to stick with the standard
> > library, htmllib would be the module. Else you can use e.g.
> > BeautifulSoup or something comparable.
>
> I want to stick with standard library.
>
> How do you capture <dt> elements?
>
> S.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
Url : http://mail.python.org/pipermail/tutor/attachments/20080421/97f94caa/attachment.pgp
More information about the Tutor
mailing list