Help: HTMLParser cannot parse some web pages?
paullim at starhub.net.sg
Wed Oct 17 14:21:39 CEST 2001
I am a newbie in Python. I hope the guru could advise me on the
I am trying to extract the links in html file.
My code is shown below:
The code works fine. But I just want to understand more about this
HTMLParser module. Apparently, there are some webpages where I cannot
extract the links.
But I really don't understand why? An example is
Is there certain limitation in this HTMLParser? For example, is it that
it cannot extract from certain kind of web pages. If so, which kind?
Thank you very much for your help.
"To extract the links in a page."
# To open a url and return url handler
linkHandler = urllib.urlopen(link)
print "Unable to open url!"
# Extract link from the HTML file and stored in anchorlist
parser = HTMLParser(NullFormatter())
print "Unable to extract!"
More information about the Python-list