[Python-Dev] cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

Brian Curtin brian at python.org
Tue Apr 24 21:41:54 CEST 2012


On Tue, Apr 24, 2012 at 14:34, Éric Araujo <merwok at netwok.org> wrote:
> Le 24/04/2012 15:02, Georg Brandl a écrit :
>>
>> On 24.04.2012 20:34, Benjamin Peterson wrote:
>>>
>>> 2012/4/24 Georg Brandl<g.brandl at gmx.net>:
>>>>
>>>> I think that's misleading: there's no way to "correctly" parse malformed
>>>> HTML.
>>>
>>> There is in the since that you can follow the HTML5 algorithm, which
>>> can "parse" any junk you throw at it.
>>
>> Ah, good. Then I hope we are following the algorithm here (and are slowly
>> coming to use it for htmllib in general).
>
>
> Yes, Ezio’s commits on html.parser/HTMLParser in the last months have been
> following the HTML5 spec.  Ezio, RDM and I have had some discussion about
> that on some bug reports, IRC and private mail and reached the agreement to
> do the useful thing, that is follow HTML5 and not pretend that the stdlib
> parser is strict or validating.
>
> Ezio was thinking about a blog.python.org post to advertise this.

Please do this, and I welcome anyone else who wants to write about
their work on the blog to do so. Contact me for info.


More information about the Python-Dev mailing list