Hi all, I just released lxml 4.0.0 with several new features. Thanks to everyone who contributed. The documentation is here: http://lxml.de/ Download: https://pypi.python.org/packages/07/76/9f14811d3fb91ed7973a798ded15eda416070... Signature: https://pypi.python.org/packages/07/76/9f14811d3fb91ed7973a798ded15eda416070... Changelog: http://lxml.de/4.0/changes-4.0.0.html Github: https://github.com/lxml/lxml/releases/tag/lxml-4.0.0 This release was built using Cython 0.26.1. If you are interested in commercial support or customisations for the lxml package, please contact me directly. Have fun, Stefan 4.0.0 (2017-09-17) ================== Features added -------------- * The ElementPath implementation is now compiled using Cython, which speeds up the ``.find*()`` methods quite significantly. * The modules ``lxml.builder``, ``lxml.html.diff`` and ``lxml.html.clean`` are also compiled using Cython in order to speed them up. * ``xmlfile()`` supports async coroutines using ``async with`` and ``await``. * ``iterwalk()`` has a new method ``skip_subtree()`` that prevents walking into the descendants of the current element. * ``RelaxNG.from_rnc_string()`` accepts a ``base_url`` argument to allow relative resource lookups. * The XSLT result object has a new method ``.write_output(file)`` that serialises output data into a file according to the ``<xsl:output>`` configuration. Bugs fixed ---------- * GH#251: HTML comments were handled incorrectly by the soupparser. Patch by mozbugbox. * LP#1654544: The html5parser no longer passes the ``useChardet`` option if the input is a Unicode string, unless explicitly requested. When parsing files, the default is to enable it when a URL or file path is passed (because the file is then opened in binary mode), and to disable it when reading from a file(-like) object. Note: This is a backwards incompatible change of the default configuration. If your code parses byte strings/streams and depends on character detection, please pass the option ``guess_charset=True`` explicitly, which already worked in older lxml versions. * LP#1703810: ``etree.fromstring()`` failed to parse UTF-32 data with BOM. * LP#1526522: Some RelaxNG errors were not reported in the error log. * LP#1567526: Empty and plain text input raised a TypeError in soupparser. * LP#1710429: Uninitialised variable usage in HTML diff. * LP#1415643: The closing tags context manager in ``xmlfile()`` could continue to output end tags even after writing failed with an exception. * LP#1465357: ``xmlfile.write()`` now accepts and ignores None as input argument. * Compilation under Py3.7-pre failed due to a modified function signature. Other changes ------------- * The main module source files were renamed from ``lxml.*.pyx`` to plain ``*.pyx`` (e.g. ``etree.pyx``) to simplify their handling in the build process. Care was taken to keep the old header files as fallbacks for code that compiles against the public C-API of lxml, but it might still be worth validating that third-party code does not notice this change.