SimplePrograms challenge
Rob Wolfe
rw at smsnet.pl
Tue Jun 12 08:06:08 EDT 2007
Steve Howell wrote:
> Hi, I'm offering a challenge to extend the following
> page by one good example:
>
> http://wiki.python.org/moin/SimplePrograms
What about simple HTML parsing? As a matter of fact this is not
language concept, but shows the power of Python standard library.
Besides, that's very popular problem among newbies. This program
for example shows all the linked URLs in the HTML document:
<code>
from HTMLParser import HTMLParser
page = '''
<html><head><title>URLs</title></head>
<body>
<ul>
<li><a href="http://domain1/page1">some page1</a></li>
<li><a href="http://domain2/page2">some page2</a></li>
</ul>
</body></html>
'''
class URLLister(HTMLParser):
def reset(self):
HTMLParser.reset(self)
self.urls = []
def handle_starttag(self, tag, attrs):
try:
# get handler for tag and call it e.g. self.start_a
getattr(self, "start_%s" % tag)(attrs)
except AttributeError:
pass
def start_a(self, attrs):
href = [v for k, v in attrs if k == "href"]
if href:
self.urls.extend(href)
parser = URLLister()
parser.feed(page)
parser.close()
for url in parser.urls: print url
</code>
--
Regards,
Rob
More information about the Python-list
mailing list