[Python-bugs-list] [ python-Bugs-467059 ] htmllib broken

noreply@sourceforge.net noreply@sourceforge.net
Wed, 10 Oct 2001 08:48:10 -0700


Bugs item #467059, was opened at 2001-10-01 20:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=467059&group_id=5470

Category: Python Library
Group: Irreproducible
Status: Closed
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: htmllib broken 

Initial Comment:
Responding to a question in python-help about extracting links from web pages, I wrote a simple href printer (see attached file). When run using 2.2a4, it never prints anything.  It outputs a list of hrefs when run with 2.1 or 1.6.  Either there's a bug somewhere (in my code possibly, though it's pretty simple) or some semantics changed that I
missed.

I thought maybe the method resolution order change affected things, but htmllib.HTMLParser only uses single inheritance.  When displaying help about
htmllib.HTMLParser, pydoc.help does emit the method resolution order, which it doesn't generally seem to do:

    class HTMLParser(sgmllib.SGMLParser)
     |  Method resolution order:
     |      HTMLParser
     |      sgmllib.SGMLParser
     |      markupbase.ParserBase
     ...


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2001-10-10 08:47

Message:
Logged In: YES 
user_id=44345

I tried it again just now with the same input that was failing when I first submitted this bug.  It worked this time (though the output was slightly different than we running against 2.1 - the href parameter is a tuple instead of a string), so I went ahead and closed the bug instead of just leaving it pending.  Something apparently changed in the past 9 days.

(Sorry for the delay responding.  My procmail filters classed the message as spam...)

Skip


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-10-04 13:05

Message:
Logged In: YES 
user_id=3066

Please attach the input for which this fails.  A trivial
test case does not fail (see Lib/test/test_htmllib.py).

Set status to "pending".

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2001-10-01 20:30

Message:
Logged In: YES 
user_id=44345

SF apparently doesn't like file uploads from Opera, so it's pasted here...

import htmllib, formatter

class MyParser(htmllib.HTMLParser):
    def anchor_bgn(self, href, name, type):
        print href

fmt = formatter.NullFormatter()
parser = MyParser(fmt, verbose=1)
parser.feed(open("tour01.html").read())
parser.close()


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=467059&group_id=5470