Python-announce-list

Download

python-announce-list@python.org

February 2008

48 participants
61 discussions

lxml 2.0 released
by Stefan Behnel 01 Feb '08

01 Feb '08

Hi everyone, I'm very happy to announce the official release of lxml 2.0! http://codespeak.net/lxml/ http://pypi.python.org/pypi/lxml/2.0 ** Install it with $ easy_install lxml==2.0 ** What is lxml? """ In short: lxml is the most feature-rich and easy-to-use library for working with XML and HTML in the Python language. lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique in that it combines the speed and feature completeness of these libraries with the simplicity of a native Python API. """ This release marks the end of a development effort of more than 6 months, starting with the release of the last stable series lxml 1.3. The major differences are explained on this page: http://codespeak.net/lxml/lxml2.html lxml 2.0 is not a revolution, it is a gradual move towards a cleaner API with more things working together as expected. But it nevertheless comes with a lot of new tools and features, that makes your XML life easier - and even more your HTML life. There are also a couple of minor things that were deprecated, which will be removed for lxml 2.1. See the above link for details. The new release has already adopted a lot of changes from the upcoming ElementTree 1.3 library, and implements a much broader set of compatible features, such as the TreeBuilder interface for parser targets. The complete changelog follows. Have fun, Stefan ** ChangeLog: 2.0 (2008-02-01) ================ Features added -------------- * Passing the ``unicode`` type as ``encoding`` to ``tostring()`` will serialise to unicode. The ``tounicode()`` function is now officially deprecated. * ``XMLSchema()`` and ``RelaxNG()`` can parse from StringIO. * ``makeparser()`` function in ``lxml.objectify`` to create a new parser with the usual objectify setup. Bugs fixed ---------- Other changes ------------- 2.0beta2 (2008-01-26) ===================== Features added -------------- * Plain ASCII XPath string results are no longer forced into unicode objects as in 2.0beta1, but are returned as plain strings as before. * All XPath string results are 'smart' objects that have a ``getparent()`` method to retrieve their parent Element. * ``with_tail`` option in serialiser functions. * More accurate exception messages in validator creation. Bugs fixed ---------- * Missing import in ``lxml.html.clean``. * Some Python 2.4-isms prevented lxml from building/running under Python 2.3. Other changes ------------- * Exceptions carry only the part of the error log that is related to the operation that caused the error. * ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source file/filename through the ``file`` keyword argument. * The test suite now skips most doctests under Python 2.3. * ``make clean`` no longer removes the .c files (use ``make realclean`` instead) 2.0beta1 (2008-01-11) ===================== Features added -------------- * Parse-time XML schema validation (``schema`` parser keyword). * XPath string results of the ``text()`` function and attribute selection make their Element container accessible through a ``getparent()`` method. As a side-effect, they are now always unicode objects (even ASCII strings). * ``XSLT`` objects are usable in any thread - at the cost of a deep copy if they were not created in that thread. * Invalid entity names and character references will be rejected by the ``Entity()`` factory. * ``entity.text`` returns the textual representation of the entity, e.g. ``&``. Bugs fixed ---------- * XPath on ElementTrees could crash when selecting the virtual root node of the ElementTree. * Compilation ``--without-threading`` was buggy in alpha5/6. Other changes ------------- * Minor performance tweaks for Element instantiation and subelement creation 2.0alpha6 (2007-12-19) ====================== Features added -------------- * New properties ``position`` and ``code`` on ParseError exception (as in ET 1.3) Bugs fixed ---------- * Memory leak in the ``parse()`` function. * Minor bugs in XSLT error message formatting. * Result document memory leak in target parser. Other changes ------------- * Various places in the XPath, XSLT and iteration APIs now require keyword-only arguments. * The argument order in ``element.itersiblings()`` was changed to match the order used in all other iteration methods. The second argument ('preceding') is now a keyword-only argument. * The ``getiterator()`` method on Elements and ElementTrees was reverted to return an iterator as it did in lxml 1.x. The ET API specification allows it to return either a sequence or an iterator, and it traditionally returned a sequence in ET and an iterator in lxml. However, it is now deprecated in favour of the ``iter()`` method, which should be used in new code wherever possible. * The 'pretty printed' serialisation of ElementTree objects now inserts newlines at the root level between processing instructions, comments and the root tag. * A 'pretty printed' serialisation is now terminated with a newline. * Second argument to ``lxml.etree.Extension()`` helper is no longer required, third argument is now a keyword-only argument ``ns``. * ``lxml.html.tostring`` takes an ``encoding`` argument. 2.0alpha5 (2007-11-24) ====================== Features added -------------- * Rich comparison of ``element.attrib`` proxies. * ElementTree compatible TreeBuilder class. * Use default prefixes for some common XML namespaces. * ``lxml.html.clean.Cleaner`` now allows for a ``host_whitelist``, and two overridable methods: ``allow_embedded_url(el, url)`` and the more general ``allow_element(el)``. * Extended slicing of Elements as in ``element[1:-1:2]``, both in etree and in objectify * Resolvers can now provide a ``base_url`` keyword argument when resolving a document as string data. * When using ``lxml.doctestcompare`` you can give the doctest option ``NOPARSE_MARKUP`` (like ``# doctest: +NOPARSE_MARKUP``) to suppress the special checking for one test. Bugs fixed ---------- * Target parser failed to report comments. * In the ``lxml.html`` ``iter_links`` method, links in ``<object>`` tags weren't recognized. (Note: plugin-specific link parameters still aren't recognized.) Also, the ``<embed>`` tag, though not standard, is now included in ``lxml.html.defs.special_inline_tags``. * Using custom resolvers on XSLT stylesheets parsed from a string could request ill-formed URLs. * With ``lxml.doctestcompare`` if you do ``<tag xmlns="...">`` in your output, it will then be namespace-neutral (before the ellipsis was treated as a real namespace). Other changes ------------- * The module source files were renamed to "lxml.*.pyx", such as "lxml.etree.pyx". This was changed for consistency with the way Pyrex commonly handles package imports. The main effect is that classes now know about their fully qualified class name, including the package name of their module. * Keyword-only arguments in some API functions, especially in the parsers and serialisers. 2.0alpha4 (2007-10-07) ====================== Features added -------------- Bugs fixed ---------- * AttributeError in feed parser on parse errors Other changes ------------- * Tag name validation in lxml.etree (and lxml.html) now distinguishes between HTML tags and XML tags based on the parser that was used to parse or create them. HTML tags no longer reject any non-ASCII characters in tag names but only spaces and the special characters ``<>&/"'``. 2.0alpha3 (2007-09-26) ====================== Features added -------------- * Separate ``feed_error_log`` property for the feed parser interface. The normal parser interface and ``iterparse`` continue to use ``error_log``. * The normal parsers and the feed parser interface are now separated and can be used concurrently on the same parser instance. * ``fromstringlist()`` and ``tostringlist()`` functions as in ElementTree 1.3 * ``iterparse()`` accepts an ``html`` boolean keyword argument for parsing with the HTML parser (note that this interface may be subject to change) * Parsers accept an ``encoding`` keyword argument that overrides the encoding of the parsed documents. * New C-API function ``hasChild()`` to test for children * ``annotate()`` function in objectify can annotate with Python types and XSI types in one step. Accompanied by ``xsiannotate()`` and ``pyannotate()``. Bugs fixed ---------- * XML feed parser setup problem * Type annotation for unicode strings in ``DataElement()`` Other changes ------------- * lxml.etree now emits a warning if you use XPath with libxml2 2.6.27 (which can crash on certain XPath errors) * Type annotation in objectify now preserves the already annotated type by default to prevent loosing type information that is already there. 2.0alpha2 (2007-09-15) ====================== Features added -------------- * ``ET.write()``, ``tostring()`` and ``tounicode()`` now accept a keyword argument ``method`` that can be one of 'xml' (or None), 'html' or 'text' to serialise as XML, HTML or plain text content. * ``iterfind()`` method on Elements returns an iterator equivalent to ``findall()`` * ``itertext()`` method on Elements * Setting a QName object as value of the .text property or as an attribute will resolve its prefix in the respective context * ElementTree-like parser target interface as described in http://effbot.org/elementtree/elementtree-xmlparser.htm * ElementTree-like feed parser interface on XMLParser and HTMLParser (``feed()`` and ``close()`` methods) Bugs fixed ---------- * lxml failed to serialise namespace declarations of elements other than the root node of a tree * Race condition in XSLT where the resolver context leaked between concurrent XSLT calls Other changes ------------- * ``element.getiterator()`` returns a list, use ``element.iter()`` to retrieve an iterator (ElementTree 1.3 compatible behaviour) 2.0alpha1 (2007-09-02) ====================== Features added -------------- * Reimplemented ``objectify.E`` for better performance and improved integration with objectify. Provides extended type support based on registered PyTypes. * XSLT objects now support deep copying * New ``makeSubElement()`` C-API function that allows creating a new subelement straight with text, tail and attributes. * XPath extension functions can now access the current context node (``context.context_node``) and use a context dictionary (``context.eval_context``) from the context provided in their first parameter * HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup`` * New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified doctests based on XML/HTML output. Use by importing ``lxml.usedoctest`` or ``lxml.html.usedoctest`` from within a doctest. * New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS selectors. * New package ``lxml.html`` written by Ian Bicking for advanced HTML treatment. * Namespace class setup is now local to the ``ElementNamespaceClassLookup`` instance and no longer global. * Schematron validation (incomplete in libxml2) * Additional ``stringify`` argument to ``objectify.PyType()`` takes a conversion function to strings to support setting text values from arbitrary types. * Entity support through an ``Entity`` factory and element classes. XML parsers now have a ``resolve_entities`` keyword argument that can be set to False to keep entities in the document. * ``column`` field on error log entries to accompany the ``line`` field * Error specific messages in XPath parsing and evaluation NOTE: for evaluation errors, you will now get an XPathEvalError instead of an XPathSyntaxError. To catch both, you can except on ``XPathError`` * The regular expression functions in XPath now support passing a node-set instead of a string * Extended type annotation in objectify: new ``xsiannotate()`` function * EXSLT RegExp support in standard XPath (not only XSLT) Bugs fixed ---------- * lxml.etree did not check tag/attribute names * The XML parser did not report undefined entities as error * The text in exceptions raised by XML parsers, validators and XPath evaluators now reports the first error that occurred instead of the last * Passing '' as XPath namespace prefix did not raise an error * Thread safety in XPath evaluators Other changes ------------- * objectify.PyType for None is now called "NoneType" * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 - original name is still available as alias * In the public C-API, ``findOrBuildNodeNs()`` was replaced by the more generic ``findOrBuildNodeNsPrefix`` * Major refactoring in XPath/XSLT extension function code * Network access in parsers disabled by default

1 0