HTMLParser cannot parse some web pages?
nospam at bigfoot.com
Wed Oct 17 14:25:16 CEST 2001
Have a look at this...
This includes an example that samples the links of a page.
Of course, an as most HTML sniffer, it cannot handle properly links which
"Paul Lim" <paullim at starhub.net.sg> a écrit dans le message news:
3BCD77D2.EBFB2D98 at starhub.net.sg...
> I am a newbie in Python. I hope the guru could advise me on the
> I am trying to extract the links in html file.
> My code is shown below:
> The code works fine. But I just want to understand more about this
> HTMLParser module. Apparently, there are some webpages where I cannot
> extract the links.
> But I really don't understand why? An example is
> Is there certain limitation in this HTMLParser? For example, is it that
> it cannot extract from certain kind of web pages. If so, which kind?
> Thank you very much for your help.
> "To extract the links in a page."
> # To open a url and return url handler
> linkHandler = urllib.urlopen(link)
> except IOError:
> print "Unable to open url!"
> # Extract link from the HTML file and stored in anchorlist
> parser = HTMLParser(NullFormatter())
> print "Unable to extract!"
More information about the Python-list