BeautifulSoup vs. Microsoft
nagle at animats.com
Thu Mar 29 18:54:55 CEST 2007
Duncan Booth wrote:
> John Nagle <nagle at animats.com> wrote:
>>Strictly speaking, it's Microsoft's fault.
>>is supposed to be an HTML comment. But it's improperly terminated.
>>It should end with "-->". So all that following stuff is from what
>>follows the next "-->" which terminates a comment.
> It is an attribute value, and unescaped angle brackets are valid in
> attributes. It looks to me like a bug in BeautifulSoup.
I think you're right. The HTML 4 spec,
says "Note that comments are markup". So recognizing comment syntax
inside an attribute is, in fact, an error in BeautifulSoup.
The source HTML on the Microsoft page is thus syntactically correct,
although meaningless. That's the only place on that page with a
comment-type form in an attribute.
More information about the Python-list