HTMLParser cannot parse some web pages?

Gillou nospam at
Wed Oct 17 14:25:16 CEST 2001


Have a look at this...

This includes an example that samples the links of a page.

Of course, an as most HTML sniffer, it cannot handle properly links which
target are results of javascript expressions.


"Paul Lim" <paullim at> a écrit dans le message news:
3BCD77D2.EBFB2D98 at
> Hi,
> I am a newbie in Python. I hope the guru could advise me on the
> following
> I am trying to extract the links in html file.
> My code is shown below:
> The code works fine. But I just want to understand more about this
> HTMLParser module. Apparently, there are some webpages where I cannot
> extract the links.
> But I really don't understand why? An example is
> Is there certain limitation in this HTMLParser? For example, is it that
> it cannot extract from certain kind of web pages. If so, which kind?
> Thank you very much for your help.
> Sincerely
> Paul
> "To extract the links in a page."
>  # To open a url and return url handler
>  try:
>   linkHandler = urllib.urlopen(link)
>  except IOError:
>   print "Unable to open url!"
>  # Extract link from the HTML file and stored in anchorlist
>  try:
>   parser = HTMLParser(NullFormatter())
>   parser.feed(
>  except:
>   print "Unable to extract!"
>   pass

More information about the Python-list mailing list