Extracting data from HTML
Ian Bicking
ianb at colorstudy.com
Fri May 31 18:25:52 EDT 2002
On Fri, 2002-05-31 at 14:52, Hazel wrote:
> how do i write a program that
> will extract info from an HTML and print
> of a list of TV programmes, its Time, and Duration
> using urllib?
You can get the page with urllib. You can use htmllib to parse it, but
I often find that regular expressions (the re module) are an easier way
-- since you aren't looking for specific markup, but specific
expressions. You'll get lots of false negatives (and positives), but
when you are parsing a page that isn't meant to be parsed (like most web
pages) no technique is perfect.
Ian
More information about the Python-list
mailing list