[Python-bugs-list] [ python-Bugs-705983 ] simple HTMLParser doesn't ignore < within pre-formatted text

SourceForge.net noreply@sourceforge.net
Tue, 18 Mar 2003 22:09:37 -0800


Bugs item #705983, was opened at 2003-03-19 00:32
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=705983&group_id=5470

Category: Python Library
Group: Python 2.2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: David C. Fox (dcfox)
Assigned to: Nobody/Anonymous (nobody)
Summary: simple HTMLParser doesn't ignore < within pre-formatted text

Initial Comment:
The simple HTMLParser in the HTMLParser module fails to
ignore angle brackets or less-than signs within
preformatted text delimited by <PRE> ... </PRE> or
examples <XMP> ... </XMP>.

For example, if I use HTMLParser.HTMLParser to parse
the contents of

http://www.ataword.com/programming/dragons.html,

I get the following (incorrect) error message:

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in ?
    p.close()
  File "E:\PYTHON22\lib\HTMLParser.py", line 112, in close
    self.goahead(1)
  File "E:\PYTHON22\lib\HTMLParser.py", line 166, in
goahead
    self.error("EOF in middle of construct")
  File "E:\PYTHON22\lib\HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParseError: EOF in middle of construct, at line 50,
column 31


The more advanced parser in htmllib deals with these
cases properly.

Even if this isn't worth fixing, it would be nice if
this limitation were noted in the library documentation.



----------------------------------------------------------------------

>Comment By: David C. Fox (dcfox)
Date: 2003-03-19 06:09

Message:
Logged In: YES 
user_id=23703

Sorry, this isn't actually a bug.  I was misinterpreting the
meaning of the <PRE> tag.  All the browsers I've seen don't
mind unescaped < signs within PRE, but NCSA Beginner's Guide
to HTML says you still have to use &lt.  I should have
double checked the HTML standards first.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=705983&group_id=5470