"<![CDATA[]]" vs. BeautifulSoup
Ian Kelly
ian.g.kelly at gmail.com
Thu May 3 19:02:02 EDT 2012
On Thu, May 3, 2012 at 1:59 PM, John Nagle <nagle at animats.com> wrote:
> An HTML page for a major site (http://www.chase.com) has
> some incorrect HTML. It contains
>
> <![CDATA[]]
>
> which is not valid HTML, XML, or SMGL. However, most browsers
> ignore it. BeautifulSoup treats it as the start of a CDATA section,
> and consumes the rest of the document in CDATA format.
>
> Bug?
Seems like a bug to me. BeautifulSoup is supposed to parse like a
browser would, so if most browsers just ignore an unterminated CDATA
section, then BeautifulSoup probably should too.
More information about the Python-list
mailing list