[Python-Dev] Fixing the XML batteries

Stefan Behnel stefan_ml at behnel.de
Sat Dec 10 08:38:35 CET 2011


Bill Janssen, 09.12.2011 19:15:
> I think another thing that might go into "refreshing the batteries" is a
> feature comparison of BeautifulSoup and HTML5lib against the stdlib
> competition, to see what needs to be added/revised.  Having to switch to
> an outside package for parsing possibly invalid HTML is a pain.

Such a feature request should be worth a separate thread.

Note, however, that html5lib is likely way too big to add it to the stdlib, 
and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, 
which would be the target release series for better HTML support. So, 
whatever library or API you would want to use for HTML processing is 
currently only the second question as long as Py3 lacks a real-world HTML 
parser in the stdlib, as well as a robust character detection mechanism. I 
don't think that can be fixed all that easily.

Stefan



More information about the Python-Dev mailing list