On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote:

A little data: the HTML5lib project lives at
https://code.google.com/p/html5lib/
It has 4 owners and 22 other committers.

The most recent release, html5lib 0.90 for Python, is nearly 2 years old. Since there is a separate Python3 repository, and there is no mention on Python3 compatibility elsewhere that I saw, including the pypi listing, I assume that is for Python2 only.

I believe that you are correct.

A comment on a recent (July 11) Python3 issue
https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port
suggest that the Python3 version still has problems. "Merged in now, though still lots of errors and failures in the testsuite."

First, you could believe that porting a codebase from Python 2 to Python 3 is much easier than solving a difficult domain-specific problem. In that case, html5lib has done the hard part and someone interested in html-in-the-stdlib should do the rest.

Second, you could believe that porting a codebase from Python 2 to Python 3 is harder than solving a difficult domain-specific problem, in which case something is seriously wrong with Python 3 or its attendant migration tools and that needs to be fixed, so someone should fix that rather than worrying about parsing HTML right now. (I doubt that many subscribers to this list would share this opinion, though.)

Third, you could believe that parsing HTML is not a difficult domain-specific problem. But only a crazy person would believe that, so you're left with one of the previous options :).