[ python-Bugs-1122916 ] incorrect handle of declaration in markupbase

SourceForge.net noreply at sourceforge.net
Tue Feb 15 18:09:05 CET 2005


Bugs item #1122916, was opened at 2005-02-14 23:04
Message generated for change (Comment added) made by tungwaiyip
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1122916&group_id=5470

Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Wai Yip Tung (tungwaiyip)
Assigned to: Nobody/Anonymous (nobody)
Summary: incorrect handle of declaration in markupbase 

Initial Comment:
When parsing the document below using sgmllib:

<html>
<!-BAD COMMENT->hello
</html>

The incorrect declaration is returned with hello as one 
single character data:

  "<!-BAD COMMENT->hello"

markupbase should have treated it as an error (to be 
consistent with it strict treatment in _scan_name).

I believe the line 73 of markupbase.py should be

        if rawdata[j:j+2] in ("-", ""):

intead of 

        if rawdata[j:j+1] in ("-", ""):

Also note that the condition in line 79 will not be true

    if rawdata[j:j+1] == '--'


----------------------------------------------------------------------

>Comment By: Wai Yip Tung (tungwaiyip)
Date: 2005-02-15 09:09

Message:
Logged In: YES 
user_id=561546

To clarify the syndrome, actually everything after the <!- is 
returned as a single character data:

"<!-BAD COMMENT->hello\r\n</html>"

This means all the tags like </html> are not parsed as tags but 
as character data as soon as there is a <!-. That's why I think 
it is significant bug to report.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1122916&group_id=5470


More information about the Python-bugs-list mailing list