[Python-bugs-list] [ python-Bugs-467059 ] htmllib broken
noreply@sourceforge.net
noreply@sourceforge.net
Mon, 01 Oct 2001 20:30:17 -0700
Bugs item #467059, was opened at 2001-10-01 20:25
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=467059&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: htmllib broken
Initial Comment:
Responding to a question in python-help about extracting links from web pages, I wrote a simple href printer (see attached file). When run using 2.2a4, it never prints anything. It outputs a list of hrefs when run with 2.1 or 1.6. Either there's a bug somewhere (in my code possibly, though it's pretty simple) or some semantics changed that I
missed.
I thought maybe the method resolution order change affected things, but htmllib.HTMLParser only uses single inheritance. When displaying help about
htmllib.HTMLParser, pydoc.help does emit the method resolution order, which it doesn't generally seem to do:
class HTMLParser(sgmllib.SGMLParser)
| Method resolution order:
| HTMLParser
| sgmllib.SGMLParser
| markupbase.ParserBase
...
----------------------------------------------------------------------
>Comment By: Skip Montanaro (montanaro)
Date: 2001-10-01 20:30
Message:
Logged In: YES
user_id=44345
SF apparently doesn't like file uploads from Opera, so it's pasted here...
import htmllib, formatter
class MyParser(htmllib.HTMLParser):
def anchor_bgn(self, href, name, type):
print href
fmt = formatter.NullFormatter()
parser = MyParser(fmt, verbose=1)
parser.feed(open("tour01.html").read())
parser.close()
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=467059&group_id=5470