lxml 4.0.0 released
stefan_ml at behnel.de
Mon Sep 18 02:24:53 EDT 2017
lxml 4.0.0 was released yesterday with several new features and
enhancements. Thanks to everyone who contributed.
lxml is the fastest, most versatile and most widely used tool for
processing XML and HTML in Python, supporting XPath, XSLT and many pythonic
ways to deal with markup documents.
The documentation is here: http://lxml.de/
Binary wheels are available for Linux, Mac-OS and Windows.
This release was built using Cython 0.26.1.
If you are interested in commercial support or customisations for the lxml
package, please contact me directly.
* The ElementPath implementation is now compiled using Cython,
which speeds up the ``.find*()`` methods quite significantly.
* The modules ``lxml.builder``, ``lxml.html.diff`` and ``lxml.html.clean``
are also compiled using Cython in order to speed them up.
* ``xmlfile()`` supports async coroutines using ``async with`` and
``await``. See http://lxml.de/api.html#incremental-xml-generation
* ``iterwalk()`` has a new method ``skip_subtree()`` that prevents walking
into the descendants of the current element.
* ``RelaxNG.from_rnc_string()`` accepts a ``base_url`` argument to
allow relative resource lookups.
* The XSLT result object has a new method ``.write_output(file)`` that
serialises output data into a file according to the ``<xsl:output>``
* GH#251: HTML comments were handled incorrectly by the soupparser.
Patch by mozbugbox.
* LP#1654544: The html5parser no longer passes the ``useChardet`` option
if the input is a Unicode string, unless explicitly requested. When
parsing files, the default is to enable it when a URL or file path is
passed (because the file is then opened in binary mode), and to disable
it when reading from a file(-like) object.
Note: This is a backwards incompatible change of the default
configuration. If your code parses byte strings/streams and depends on
character detection, please pass the option ``guess_charset=True``
explicitly, which already worked in older lxml versions.
* LP#1703810: ``etree.fromstring()`` failed to parse UTF-32 data with BOM.
* LP#1526522: Some RelaxNG errors were not reported in the error log.
* LP#1567526: Empty and plain text input raised a TypeError in soupparser.
* LP#1710429: Uninitialised variable usage in HTML diff.
* LP#1415643: The closing tags context manager in ``xmlfile()`` could
continue to output end tags even after writing failed with an exception.
* LP#1465357: ``xmlfile.write()`` now accepts and ignores None as input
* Compilation under Py3.7-pre failed due to a modified function signature.
* The main module source files were renamed from ``lxml.*.pyx`` to plain
``*.pyx`` (e.g. ``etree.pyx``) to simplify their handling in the build
process. Care was taken to keep the old header files as fallbacks for
code that compiles against the public C-API of lxml, but it might still
be worth validating that third-party code does not notice this change.
More information about the Python-announce-list