Re: [Python-Dev] cpython (2.7): #14538: HTMLParser can now parse correctly start tags that contain a bare /.

On 19.04.2012 03:36, ezio.melotti wrote:
I think that's misleading: there's no way to "correctly" parse malformed HTML. Georg

2012/4/24 Georg Brandl <g.brandl@gmx.net>:
There is in the since that you can follow the HTML5 algorithm, which can "parse" any junk you throw at it. -- Regards, Benjamin

On Tue, Apr 24, 2012 at 2:34 PM, Benjamin Peterson <benjamin@python.org> wrote:
There is in the since that you can follow the HTML5 algorithm, which can "parse" any junk you throw at it.
This whole can of worms is why I gave up on HTML years ago (well, one reason among many). There are markup languages, and there's soup. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> "A person who won't read has no advantage over one who can't read." --Samuel Langhorne Clemens

Le 24/04/2012 15:02, Georg Brandl a écrit :
Yes, Ezio’s commits on html.parser/HTMLParser in the last months have been following the HTML5 spec. Ezio, RDM and I have had some discussion about that on some bug reports, IRC and private mail and reached the agreement to do the useful thing, that is follow HTML5 and not pretend that the stdlib parser is strict or validating. Ezio was thinking about a blog.python.org post to advertise this. Regards

On Tue, Apr 24, 2012 at 14:34, Éric Araujo <merwok@netwok.org> wrote:
Please do this, and I welcome anyone else who wants to write about their work on the blog to do so. Contact me for info.

2012/4/24 Georg Brandl <g.brandl@gmx.net>:
There is in the since that you can follow the HTML5 algorithm, which can "parse" any junk you throw at it. -- Regards, Benjamin

On Tue, Apr 24, 2012 at 2:34 PM, Benjamin Peterson <benjamin@python.org> wrote:
There is in the since that you can follow the HTML5 algorithm, which can "parse" any junk you throw at it.
This whole can of worms is why I gave up on HTML years ago (well, one reason among many). There are markup languages, and there's soup. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> "A person who won't read has no advantage over one who can't read." --Samuel Langhorne Clemens

Le 24/04/2012 15:02, Georg Brandl a écrit :
Yes, Ezio’s commits on html.parser/HTMLParser in the last months have been following the HTML5 spec. Ezio, RDM and I have had some discussion about that on some bug reports, IRC and private mail and reached the agreement to do the useful thing, that is follow HTML5 and not pretend that the stdlib parser is strict or validating. Ezio was thinking about a blog.python.org post to advertise this. Regards

On Tue, Apr 24, 2012 at 14:34, Éric Araujo <merwok@netwok.org> wrote:
Please do this, and I welcome anyone else who wants to write about their work on the blog to do so. Contact me for info.
participants (5)
-
Benjamin Peterson
-
Brian Curtin
-
Fred Drake
-
Georg Brandl
-
Éric Araujo