[Tutor] (no subject)
dana at momerath.us
dana at momerath.us
Sat Jan 31 07:35:57 EST 2004
From: Dana Baguley <dana at momerath.us>
To: tutor at python.org
Subject: HTMLParser
Date: Sat, 31 Jan 2004 04:37:18 -0800
User-Agent: KMail/1.5.4
MIME-Version: 1.0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200401310437.18353.dana at momerath.us>
Status: RO
X-Status: Q
X-KMail-EncryptionState:
X-KMail-SignatureState:
Hello,
I'm new to python and pretty new to programming in general. As a first
project, I'm trying to write a program to pull data from tables on webpages.
I'm having trouble with the HTMLParser module. Below is my code and the error
message. I'm thinking I should be using the debugger for this, but I don't
have a clue where to start so I'm emailing the list instead. Thanks in
advance. Also, this is my first email to a list like this one, so I'm
wondering if I'm observing proper social conventions. This message feels kind
of long to me. What do you think?
Dana
#r.py
import HTMLParser
import urllib
import string
def getPage(address):
page = urllib.urlopen(address)
page = page.readlines()
return page
def flattenPage(page):
returned = ''
for line in page:
returned +=line
returned += ' '
for character in string.whitespace: # probably not necessary to zap
newlines.
returned = string.replace(returned, character, ' ')
return returned
def getFlatPage(URL):
page = flattenPage(getPage(URL))
return page
class InteractiveURLParser(HTMLParser.HTMLParser):
def open(self, URL):
page = getPage(URL)
page = flattenPage(page)
self.feed(page)
def newURL(self, URL):
__init__(self, URL)
def handle_starttag(self, tag, attrs):
print "tag is %s with attributes %s." %(self, tag, attrs)
def handle_data(self, data):
print data
--
>>> carrie = "http://www.plu.edu/~swarthcj"
>>> r.test(carrie)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "r.py", line 37, in test
testInstance.open(URL)
File "r.py", line 27, in open
self.feed(page)
File "/usr/lib/python2.3/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.3/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.3/HTMLParser.py", line 281, in parse_starttag
self.handle_starttag(tag, attrs)
File "r.py", line 31, in handle_starttag
print "tag is %s with attributes %s." %(self, tag, attrs)
TypeError: not all arguments converted during string formatting
More information about the Tutor
mailing list