[Python-bugs-list] [ python-Bugs-453059 ] Nasty bug in HTMLParser.py

noreply@sourceforge.net noreply@sourceforge.net
Sun, 19 Aug 2001 13:41:27 -0700


Bugs item #453059, was opened at 2001-08-19 13:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=453059&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Chris Withers (fresh)
Assigned to: Nobody/Anonymous (nobody)
Summary: Nasty bug in HTMLParser.py

Initial Comment:
If you feed the following string to an HTMLParser
parser, you get _very_ weird results:

'one & two & three &three; &blagh ;'

What I would expect would be:

 - call to handle_data(data='one & two & three ')

 - call to handle_entityref(name='three')

 - call to handle_data(data=' &blagh ;')

What you actually get is:

 - call to handle_data(data='one ')

 - call to handle_data(data='one ')

...which is very wrong :-S

Now, I'm not sure of the validity of the associated
HTML*, but if it's invalid, I would have thought
exceptions would be thrown rather than the above result.

In any case, I have a module that demonstrates this
problem which is available from:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/squishdot/stripogram/

It has a testsuite that runs with Zope's testrunner.py
and I just added a test to demonstrate this problem.

Any help would be very much appreciated...

Chris

* The string 'one & two & three &three; &blagh ;'
displays exactly as is in Mozilla, IE and Netscape, of
course that doesn't mean the W3C will like it ;-) I'd
prefer to go with the majority rather than being
'right' on this one.




----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=453059&group_id=5470