[ANN] html5lib 0.2

James Graham jg307 at cam.ac.uk
Tue Jan 9 11:13:16 CET 2007


HTML parsing library based on the WHATWG Web Applications 1.0 "HTML5"
specification[1]. The parser is designed to work with all existing 
flavors of HTML and implements well-defined error recovery that has been 
specified though analysis of the behavior of modern desktop web browsers.

html5lib currently allows parsing to both a custom "simpletree" format 
and to an ElementTree, if available. Future releases will include 
support for at least one DOM implementation, and it is possible to 
implement custom treebuilders although the API should not yet be 
considered stable.




This is the first release of html5lib and it is considered alpha quality 
software. However, it ships with over 230 passing unit tests covering 
most of the specified behavior. Bugs should be reported on the issue 
tracker [2]


Error handling does not yet conform to the specification; not all errors 
are reported and the error messages are not informative.


More information about the project including documentation and 
information on getting involved is available on the project page:

[1] http://whatwg.org/specs/web-apps/current-work/
[2] http://code.google.com/p/html5lib/issues/list

More information about the Python-list mailing list