[Python-bugs-list] [ python-Bugs-453706 ] sgmllib exception behaviour: policy?

noreply@sourceforge.net noreply@sourceforge.net
Tue, 28 Aug 2001 08:51:42 -0700


Bugs item #453706, was opened at 2001-08-21 05:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=453706&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 6
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: sgmllib exception behaviour: policy?

Initial Comment:
the standard python htmlparser produces an exception
from sgmllib while parsing this contruction:

<!spacer type="block" height="25">

in the body of an html file. Iīm not sure if this is a
regular HTML Tag, but when netscape renders the page,
it seems to ignore it.

The problem is now, that sgmllib produces an uncaught
exception, and i donīt know how to handle this
exception really intelligent by my program. I like to
have a HTLMParser that never crashes, even if he
sometimes produces not exactly output.

What does the developer team think for that? is this
unimportant for the rest of the world or have i found a
bug that should be fixed? if not, can you give me some
suggestions how to work around this bug in my special
case. 

thanks

Michael

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-08-28 08:51

Message:
Logged In: YES 
user_id=3066

This is not legal HTML or SGML (or XHTML, or any other
acronym-of-the-week).  The <! syntax introduces a
"declaration", of which only two forms should appear in an
HTML document:  <!DOCTYPE, which may appear at the start of
the document, and <!--, which introduces a comment (yeah, I
know, that doesn't sound like a declaration to me either).

(At one point there was a proposed <spacer> element that
would have looked like that, but without the "!".  I suspect
someone had played with that and tried to comment it out,
which they botched.)

The "right thing" is to raise an exception due to illegal
syntax.  The problem is that you weren't seeing the
exception before (what was the latest released version you
were using where you did not get the exception?).

I'll look into this a bit and think about it.  The
sgmllib/htmllib parsers have traditionally been way too
lenient, so we may need to restore the old behavior.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=453706&group_id=5470