[Tutor] (no subject)

Sat Jan 31 07:35:57 EST 2004

From: Dana Baguley <dana at momerath.us>
To: tutor at python.org
Subject: HTMLParser
Date: Sat, 31 Jan 2004 04:37:18 -0800
User-Agent: KMail/1.5.4
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200401310437.18353.dana at momerath.us>
Status: RO
X-Status: Q
X-KMail-EncryptionState:
X-KMail-SignatureState:

Hello,

I'm new to python and pretty new to programming in general. As a first
project, I'm trying to write a program to pull data from tables on webpages.
I'm having trouble with the HTMLParser module. Below is my code and the error
message. I'm thinking I should be using the debugger for this, but I don't
have a clue where to start so I'm emailing the list instead. Thanks in
advance. Also, this is my first email to a list like this one, so I'm
wondering if I'm observing proper social conventions. This message feels kind
of long to me. What do you think?

Dana

#r.py
import HTMLParser
import urllib
import string

def getPage(address):
  page = urllib.urlopen(address)
  page = page.readlines()
  return page

def flattenPage(page):
    returned = ''
    for line in page:
        returned +=line
        returned += ' '
    for character in string.whitespace: # probably not necessary to zap
newlines.
      returned = string.replace(returned, character, ' ')
    return returned

def getFlatPage(URL):
    page = flattenPage(getPage(URL))
    return page

class InteractiveURLParser(HTMLParser.HTMLParser):
    def open(self, URL):
        page = getPage(URL)
        page = flattenPage(page)
        self.feed(page)
    def newURL(self, URL):
        __init__(self, URL)
    def handle_starttag(self, tag, attrs):
        print "tag is %s with attributes %s." %(self, tag, attrs)
    def handle_data(self, data):
        print data
--
>>> carrie = "http://www.plu.edu/~swarthcj"
>>> r.test(carrie)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "r.py", line 37, in test
    testInstance.open(URL)
  File "r.py", line 27, in open
    self.feed(page)
  File "/usr/lib/python2.3/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.3/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.3/HTMLParser.py", line 281, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "r.py", line 31, in handle_starttag
    print "tag is %s with attributes %s." %(self, tag, attrs)
TypeError: not all arguments converted during string formatting