[Python-bugs-list] [ python-Bugs-467059 ] htmllib broken
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 10 Oct 2001 08:47:42 -0700
Bugs item #467059, was opened at 2001-10-01 20:25
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=467059&group_id=5470
Category: Python Library
Group: Irreproducible
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: htmllib broken
Initial Comment:
Responding to a question in python-help about extracting links from web pages, I wrote a simple href printer (see attached file). When run using 2.2a4, it never prints anything. It outputs a list of hrefs when run with 2.1 or 1.6. Either there's a bug somewhere (in my code possibly, though it's pretty simple) or some semantics changed that I
missed.
I thought maybe the method resolution order change affected things, but htmllib.HTMLParser only uses single inheritance. When displaying help about
htmllib.HTMLParser, pydoc.help does emit the method resolution order, which it doesn't generally seem to do:
class HTMLParser(sgmllib.SGMLParser)
| Method resolution order:
| HTMLParser
| sgmllib.SGMLParser
| markupbase.ParserBase
...
----------------------------------------------------------------------
>Comment By: Skip Montanaro (montanaro)
Date: 2001-10-10 08:47
Message:
Logged In: YES
user_id=44345
I tried it again just now with the same input that was failing when I first submitted this bug. It worked this time (though the output was slightly different than we running against 2.1 - the href parameter is a tuple instead of a string), so I went ahead and closed the bug instead of just leaving it pending. Something apparently changed in the past 9 days.
(Sorry for the delay responding. My procmail filters classed the message as spam...)
Skip
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-10-04 13:05
Message:
Logged In: YES
user_id=3066
Please attach the input for which this fails. A trivial
test case does not fail (see Lib/test/test_htmllib.py).
Set status to "pending".
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2001-10-01 20:30
Message:
Logged In: YES
user_id=44345
SF apparently doesn't like file uploads from Opera, so it's pasted here...
import htmllib, formatter
class MyParser(htmllib.HTMLParser):
def anchor_bgn(self, href, name, type):
print href
fmt = formatter.NullFormatter()
parser = MyParser(fmt, verbose=1)
parser.feed(open("tour01.html").read())
parser.close()
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=467059&group_id=5470