lxml 2.0alpha1 released

Stefan Behnel stefan.behnel-n05pAM at web.de
Mon Oct 1 11:42:12 CEST 2007

[looks like it finally didn't make it to the NG]

Hi all,

I'm proudly announcing the first alpha release of lxml 2.0.


** What is lxml?

In short: lxml is the most feature-rich and easy-to-use library for working
with XML and HTML in the Python language.

lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique
in that it combines the speed and feature completeness of these libraries with
the simplicity of a native Python API.

This release features a major cleanup both behind the scenes and at the
surface, that improves the XML tool integration and makes the API clearer and
more consistent in many places. The major new addition, however, is the
lxml.html package, a new toolkit for HTML handling.

The web site for the pre-2.0 series is online at


The "what's new" page has a description of the major changes:


and the ChangeLog has a more detailed list, see below.

This being an alpha release means that not everything is stable, both in terms
of crashes and the API. There will be a small number of alpha releases to make
the advancements publicly available, before the beta releases focus on
improving the stability.

I warmly invite everyone to contribute to the final release by discussing the
API changes and the new features on the mailing list. There is always space
for improvements!

There is currently a known problem with Microsoft's compilers, so Windows
builds may not become available for 2.0alpha1. The next alpha will hopefully
come with prebuilt binaries for that platform. Building with the more
standards compliant MinGW compilers should work.

Note that working on the code now requires Cython (version, an
enhanced fork of Pyrex.  lxml therefore no longer ships with a copy of Pyrex
or Cython, but as usual, building from the distribution sources does not
require Cython.  It can be installed with "easy_install Cython" or from here:


I hope that lxml 2.0 will become a straight continuation of the success story
that lxml 1.x was already.

Have fun,

2.0alpha1 (2007-09-02)
Features added

    * Reimplemented objectify.E for better performance and improved
      integration with objectify. Provides extended type support based on
      registered PyTypes.
    * XSLT objects now support deep copying
    * New makeSubElement() C-API function that allows creating a new
      subelement straight with text, tail and attributes.
    * XPath extension functions can now access the current context node
      (context.context_node) and use a context dictionary
      (context.eval_context) from the context provided in their first
    * HTML tag soup parser based on BeautifulSoup in lxml.html.ElementSoup
    * New module lxml.doctestcompare by Ian Bicking for writing simplified
      doctests based on XML/HTML output. Use by importing lxml.usedoctest or
      lxml.html.usedoctest from within a doctest.
    * New module lxml.cssselect by Ian Bicking for selecting Elements with
      CSS selectors.
    * New package lxml.html written by Ian Bicking for advanced HTML
    * Namespace class setup is now local to the ElementNamespaceClassLookup
      instance and no longer global.
    * Schematron validation (incomplete in libxml2)
    * Additional stringify argument to objectify.PyType() takes a conversion
      function to strings to support setting text values from arbitrary types.
    * Entity support through an Entity factory and element classes. XML
      parsers now have a resolve_entities keyword argument that can be set to
      False to keep entities in the document.
    * column field on error log entries to accompany the line field
    * Error specific messages in XPath parsing and evaluation
      NOTE: for evaluation errors, you will now get an XPathEvalError instead
      of an XPathSyntaxError. To catch both, you can except on XPathError.
    * The regular expression functions in XPath now support passing a node-set
      instead of a string
    * Extended type annotation in objectify: new xsiannotate() function
    * EXSLT RegExp support in standard XPath (not only XSLT)

Bugs fixed

    * lxml.etree did not check tag/attribute names
    * The XML parser did not report undefined entities as error
    * The text in exceptions raised by XML parsers, validators and XPath
      evaluators now reports the first error that occurred instead of the last
    * Passing '' as XPath namespace prefix did not raise an error
    * Thread safety in XPath evaluators

Other changes

    * objectify.PyType for None is now called "NoneType"
    * el.getiterator() renamed to el.iter(), following ElementTree 1.3 -
      original name is still available as alias
    * In the public C-API, findOrBuildNodeNs() was replaced by the more
      generic findOrBuildNodeNsPrefix
    * Major refactoring in XPath/XSLT extension function code
    * Network access in parsers disabled by default

More information about the Python-announce-list mailing list