BeautifulSoup vs. Microsoft
John Nagle
nagle at animats.com
Thu Mar 29 12:54:55 EDT 2007
Duncan Booth wrote:
> John Nagle <nagle at animats.com> wrote:
>
>
>>Strictly speaking, it's Microsoft's fault.
>>
>> title="<!--http://www.microsoft.com/usability/information.mspx->"
>>
>>is supposed to be an HTML comment. But it's improperly terminated.
>>It should end with "-->". So all that following stuff is from what
>>follows the next "-->" which terminates a comment.
>
>
> It is an attribute value, and unescaped angle brackets are valid in
> attributes. It looks to me like a bug in BeautifulSoup.
I think you're right. The HTML 4 spec,
http://www.w3.org/TR/html4/intro/sgmltut.html
says "Note that comments are markup". So recognizing comment syntax
inside an attribute is, in fact, an error in BeautifulSoup.
The source HTML on the Microsoft page is thus syntactically correct,
although meaningless. That's the only place on that page with a
comment-type form in an attribute.
John Nagle
More information about the Python-list
mailing list