lxml 2.0 released

Stefan Behnel stefan_ml at behnel.de
Fri Feb 1 19:43:49 CET 2008

Hi everyone,

I'm very happy to announce the official release of lxml 2.0!


** Install it with

	$ easy_install lxml==2.0

** What is lxml?

In short: lxml is the most feature-rich and easy-to-use library for working
with XML and HTML in the Python language.

lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique
in that it combines the speed and feature completeness of these libraries with
the simplicity of a native Python API.

This release marks the end of a development effort of more than 6 months,
starting with the release of the last stable series lxml 1.3. The major
differences are explained on this page:


lxml 2.0 is not a revolution, it is a gradual move towards a cleaner API with
more things working together as expected. But it nevertheless comes with a lot
of new tools and features, that makes your XML life easier - and even more
your HTML life. There are also a couple of minor things that were deprecated,
which will be removed for lxml 2.1. See the above link for details.

The new release has already adopted a lot of changes from the upcoming
ElementTree 1.3 library, and implements a much broader set of compatible
features, such as the TreeBuilder interface for parser targets.

The complete changelog follows.

Have fun,

** ChangeLog:

2.0 (2008-02-01)

Features added

* Passing the ``unicode`` type as ``encoding`` to ``tostring()`` will
  serialise to unicode.  The ``tounicode()`` function is now officially

* ``XMLSchema()`` and ``RelaxNG()`` can parse from StringIO.

* ``makeparser()`` function in ``lxml.objectify`` to create a new
  parser with the usual objectify setup.

Bugs fixed

Other changes

2.0beta2 (2008-01-26)

Features added

* Plain ASCII XPath string results are no longer forced into unicode
  objects as in 2.0beta1, but are returned as plain strings as before.

* All XPath string results are 'smart' objects that have a
  ``getparent()`` method to retrieve their parent Element.

* ``with_tail`` option in serialiser functions.

* More accurate exception messages in validator creation.

Bugs fixed

* Missing import in ``lxml.html.clean``.

* Some Python 2.4-isms prevented lxml from building/running under
  Python 2.3.

Other changes

* Exceptions carry only the part of the error log that is related to
  the operation that caused the error.

* ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source
  file/filename through the ``file`` keyword argument.

* The test suite now skips most doctests under Python 2.3.

* ``make clean`` no longer removes the .c files (use ``make
  realclean`` instead)

2.0beta1 (2008-01-11)

Features added

* Parse-time XML schema validation (``schema`` parser keyword).

* XPath string results of the ``text()`` function and attribute
  selection make their Element container accessible through a
  ``getparent()`` method.  As a side-effect, they are now always
  unicode objects (even ASCII strings).

* ``XSLT`` objects are usable in any thread - at the cost of a deep
  copy if they were not created in that thread.

* Invalid entity names and character references will be rejected by
  the ``Entity()`` factory.

* ``entity.text`` returns the textual representation of the entity,
  e.g. ``&``.

Bugs fixed

* XPath on ElementTrees could crash when selecting the virtual root
  node of the ElementTree.

* Compilation ``--without-threading`` was buggy in alpha5/6.

Other changes

* Minor performance tweaks for Element instantiation and subelement

2.0alpha6 (2007-12-19)

Features added

* New properties ``position`` and ``code`` on ParseError exception (as
  in ET 1.3)

Bugs fixed

* Memory leak in the ``parse()`` function.

* Minor bugs in XSLT error message formatting.

* Result document memory leak in target parser.

Other changes

* Various places in the XPath, XSLT and iteration APIs now require
  keyword-only arguments.

* The argument order in ``element.itersiblings()`` was changed to
  match the order used in all other iteration methods.  The second
  argument ('preceding') is now a keyword-only argument.

* The ``getiterator()`` method on Elements and ElementTrees was
  reverted to return an iterator as it did in lxml 1.x.  The ET API
  specification allows it to return either a sequence or an iterator,
  and it traditionally returned a sequence in ET and an iterator in
  lxml.  However, it is now deprecated in favour of the ``iter()``
  method, which should be used in new code wherever possible.

* The 'pretty printed' serialisation of ElementTree objects now
  inserts newlines at the root level between processing instructions,
  comments and the root tag.

* A 'pretty printed' serialisation is now terminated with a newline.

* Second argument to ``lxml.etree.Extension()`` helper is no longer
  required, third argument is now a keyword-only argument ``ns``.

* ``lxml.html.tostring`` takes an ``encoding`` argument.

2.0alpha5 (2007-11-24)

Features added

* Rich comparison of ``element.attrib`` proxies.

* ElementTree compatible TreeBuilder class.

* Use default prefixes for some common XML namespaces.

* ``lxml.html.clean.Cleaner`` now allows for a ``host_whitelist``, and
  two overridable methods: ``allow_embedded_url(el, url)`` and the
  more general ``allow_element(el)``.

* Extended slicing of Elements as in ``element[1:-1:2]``, both in
  etree and in objectify

* Resolvers can now provide a ``base_url`` keyword argument when
  resolving a document as string data.

* When using ``lxml.doctestcompare`` you can give the doctest option
  ``NOPARSE_MARKUP`` (like ``# doctest: +NOPARSE_MARKUP``) to suppress
  the special checking for one test.

Bugs fixed

* Target parser failed to report comments.

* In the ``lxml.html`` ``iter_links`` method, links in ``<object>``
  tags weren't recognized.  (Note: plugin-specific link parameters
  still aren't recognized.)  Also, the ``<embed>`` tag, though not
  standard, is now included in ``lxml.html.defs.special_inline_tags``.

* Using custom resolvers on XSLT stylesheets parsed from a string
  could request ill-formed URLs.

* With ``lxml.doctestcompare`` if you do ``<tag xmlns="...">`` in your
  output, it will then be namespace-neutral (before the ellipsis was
  treated as a real namespace).

Other changes

* The module source files were renamed to "lxml.*.pyx", such as
  "lxml.etree.pyx".  This was changed for consistency with the way
  Pyrex commonly handles package imports.  The main effect is that
  classes now know about their fully qualified class name, including
  the package name of their module.

* Keyword-only arguments in some API functions, especially in the
  parsers and serialisers.

2.0alpha4 (2007-10-07)

Features added

Bugs fixed

* AttributeError in feed parser on parse errors

Other changes

* Tag name validation in lxml.etree (and lxml.html) now distinguishes
  between HTML tags and XML tags based on the parser that was used to
  parse or create them.  HTML tags no longer reject any non-ASCII
  characters in tag names but only spaces and the special characters

2.0alpha3 (2007-09-26)

Features added

* Separate ``feed_error_log`` property for the feed parser interface.
  The normal parser interface and ``iterparse`` continue to use

* The normal parsers and the feed parser interface are now separated
  and can be used concurrently on the same parser instance.

* ``fromstringlist()`` and ``tostringlist()`` functions as in
  ElementTree 1.3

* ``iterparse()`` accepts an ``html`` boolean keyword argument for
  parsing with the HTML parser (note that this interface may be
  subject to change)

* Parsers accept an ``encoding`` keyword argument that overrides the encoding
  of the parsed documents.

* New C-API function ``hasChild()`` to test for children

* ``annotate()`` function in objectify can annotate with Python types and XSI
  types in one step.  Accompanied by ``xsiannotate()`` and ``pyannotate()``.

Bugs fixed

* XML feed parser setup problem

* Type annotation for unicode strings in ``DataElement()``

Other changes

* lxml.etree now emits a warning if you use XPath with libxml2 2.6.27
  (which can crash on certain XPath errors)

* Type annotation in objectify now preserves the already annotated type by
  default to prevent loosing type information that is already there.

2.0alpha2 (2007-09-15)

Features added

* ``ET.write()``, ``tostring()`` and ``tounicode()`` now accept a keyword
  argument ``method`` that can be one of 'xml' (or None), 'html' or 'text' to
  serialise as XML, HTML or plain text content.

* ``iterfind()`` method on Elements returns an iterator equivalent to

* ``itertext()`` method on Elements

* Setting a QName object as value of the .text property or as an attribute
  will resolve its prefix in the respective context

* ElementTree-like parser target interface as described in

* ElementTree-like feed parser interface on XMLParser and HTMLParser
  (``feed()`` and ``close()`` methods)

Bugs fixed

* lxml failed to serialise namespace declarations of elements other than the
  root node of a tree

* Race condition in XSLT where the resolver context leaked between concurrent
  XSLT calls

Other changes

* ``element.getiterator()`` returns a list, use ``element.iter()`` to retrieve
  an iterator (ElementTree 1.3 compatible behaviour)

2.0alpha1 (2007-09-02)

Features added

* Reimplemented ``objectify.E`` for better performance and improved
  integration with objectify.  Provides extended type support based on
  registered PyTypes.

* XSLT objects now support deep copying

* New ``makeSubElement()`` C-API function that allows creating a new
  subelement straight with text, tail and attributes.

* XPath extension functions can now access the current context node
  (``context.context_node``) and use a context dictionary
  (``context.eval_context``) from the context provided in their first

* HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup``

* New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified
  doctests based on XML/HTML output.  Use by importing ``lxml.usedoctest`` or
  ``lxml.html.usedoctest`` from within a doctest.

* New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS

* New package ``lxml.html`` written by Ian Bicking for advanced HTML

* Namespace class setup is now local to the ``ElementNamespaceClassLookup``
  instance and no longer global.

* Schematron validation (incomplete in libxml2)

* Additional ``stringify`` argument to ``objectify.PyType()`` takes a
  conversion function to strings to support setting text values from arbitrary

* Entity support through an ``Entity`` factory and element classes.  XML
  parsers now have a ``resolve_entities`` keyword argument that can be set to
  False to keep entities in the document.

* ``column`` field on error log entries to accompany the ``line`` field

* Error specific messages in XPath parsing and evaluation
  NOTE: for evaluation errors, you will now get an XPathEvalError instead of
  an XPathSyntaxError.  To catch both, you can except on ``XPathError``

* The regular expression functions in XPath now support passing a node-set
  instead of a string

* Extended type annotation in objectify: new ``xsiannotate()`` function

* EXSLT RegExp support in standard XPath (not only XSLT)

Bugs fixed

* lxml.etree did not check tag/attribute names

* The XML parser did not report undefined entities as error

* The text in exceptions raised by XML parsers, validators and XPath
  evaluators now reports the first error that occurred instead of the last

* Passing '' as XPath namespace prefix did not raise an error

* Thread safety in XPath evaluators

Other changes

* objectify.PyType for None is now called "NoneType"

* ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 -
  original name is still available as alias

* In the public C-API, ``findOrBuildNodeNs()`` was replaced by the more
  generic ``findOrBuildNodeNsPrefix``

* Major refactoring in XPath/XSLT extension function code

* Network access in parsers disabled by default

More information about the Python-announce-list mailing list