"<![CDATA[]]" vs. BeautifulSoup

Ian Kelly ian.g.kelly at gmail.com
Fri May 4 10:41:04 EDT 2012


On Fri, May 4, 2012 at 12:57 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Ian Kelly, 04.05.2012 01:02:
>> BeautifulSoup is supposed to parse like a browser would
>
> Not at all, that would be html5lib.

Well, I guess that depends on whether we're talking about
BeautifulSoup 3 (a regex-based screen scraper with methods for
navigating and searching the resulting tree) or 4 (purely a parse tree
navigation library that relies on external libraries to do the actual
parsing).

According to the BS3 documentation, "The BeautifulSoup class is full
of web-browser-like heuristics for divining the intent of HTML
authors."

If we're talking about BS4, though, then the problem in this instance
would have nothing to do with BS4 and instead would be an issue of
whatever underlying parser the OP is using.



More information about the Python-list mailing list