[lxml-dev] lxml 2.0 beta1 released

11 Jan 2008

      Hi all,

I finally managed to push lxml 2.0beta1 over to PyPI. This release marks the
end of the four month alpha cycle of lxml 2.0. The last stable release series,
lxml 1.3, saw the light of day more than six months ago.

http://codespeak.net/lxml/dev/
http://pypi.python.org/pypi/lxml/2.0beta1

The complete changelog for beta1 and the 2.0 alpha series follows below. Apart
from a number of important fixes and enhancements, this beta release also
finalises the major API changes that make the difference between 1.x and 2.x.
Incompatible changes after this release will require a very good motivation.
As usual, compatible enhancements will always be embraced - as will be
updates, clarifications and fixes for the documentation! Asking back helps.

I expect beta1 to also be the last beta release before lxml 2.0 final
(hopefully not in the sense that alpha4/5/6 were), so please test as much as
you can to spot any remaining bugs and problems.

Note that this release depends on a bug fix in Cython that will hopefully be
released as Cython 0.9.6.11 in a couple of days. I attached the necessary
patch for those who want work on the sources.

Another thing: there was a security advisory on the libxml2 mailing list. To
prevent DoS attacks, systems that parse XML from untrusted sources should be
updated to libxml2 2.6.31 (or should apply the patch that is referenced in
Daniel's post below).

http://mail.gnome.org/archives/xml/2008-January/msg00036.html

Sidnei, when you build the Windows binaries, could you please wait for libxml2
2.6.31 to become available as binaries as well? Hopefully, that won't take too
long...

Have fun,
Stefan

2.0beta1 (2008-01-11)
=====================

Features added
--------------

* Parse-time XML schema validation (``schema`` parser keyword).

* XPath string results of the ``text()`` function and attribute
  selection make their Element container accessible through a
  ``getparent()`` method.  As a side-effect, they are now always
  unicode objects (even ASCII strings).

* ``XSLT`` objects are usable in any thread - at the cost of a deep
  copy if they were not created in that thread.

* Invalid entity names and character references will be rejected by
  the ``Entity()`` factory.

* ``entity.text`` returns the textual representation of the entity,
  e.g. ``&``.

Bugs fixed
----------

* XPath on ElementTrees could crash when selecting the virtual root
  node of the ElementTree.

* Compilation ``--without-threading`` was buggy in alpha5/6.

Other changes
-------------

* Minor performance tweaks for Element instantiation and subelement
  creation

2.0alpha6 (2007-12-19)
======================

Features added
--------------

* New properties ``position`` and ``code`` on ParseError exception (as
  in ET 1.3)

Bugs fixed
----------

* Memory leak in the ``parse()`` function.

* Minor bugs in XSLT error message formatting.

* Result document memory leak in target parser.

Other changes
-------------

* Various places in the XPath, XSLT and iteration APIs now require
  keyword-only arguments.

* The argument order in ``element.itersiblings()`` was changed to
  match the order used in all other iteration methods.  The second
  argument ('preceding') is now a keyword-only argument.

* The ``getiterator()`` method on Elements and ElementTrees was
  reverted to return an iterator as it did in lxml 1.x.  The ET API
  specification allows it to return either a sequence or an iterator,
  and it traditionally returned a sequence in ET and an iterator in
  lxml.  However, it is now deprecated in favour of the ``iter()``
  method, which should be used in new code wherever possible.

* The 'pretty printed' serialisation of ElementTree objects now
  inserts newlines at the root level between processing instructions,
  comments and the root tag.

* A 'pretty printed' serialisation is now terminated with a newline.

* Second argument to ``lxml.etree.Extension()`` helper is no longer
  required, third argument is now a keyword-only argument ``ns``.

* ``lxml.html.tostring`` takes an ``encoding`` argument.

2.0alpha5 (2007-11-24)
======================

Features added
--------------

* Rich comparison of ``element.attrib`` proxies.

* ElementTree compatible TreeBuilder class.

* Use default prefixes for some common XML namespaces.

* ``lxml.html.clean.Cleaner`` now allows for a ``host_whitelist``, and
  two overridable methods: ``allow_embedded_url(el, url)`` and the
  more general ``allow_element(el)``.

* Extended slicing of Elements as in ``element[1:-1:2]``, both in
  etree and in objectify

* Resolvers can now provide a ``base_url`` keyword argument when
  resolving a document as string data.

* When using ``lxml.doctestcompare`` you can give the doctest option
  ``NOPARSE_MARKUP`` (like ``# doctest: +NOPARSE_MARKUP``) to suppress
  the special checking for one test.

Bugs fixed
----------

* Target parser failed to report comments.

* In the ``lxml.html`` ``iter_links`` method, links in ``<object>``
  tags weren't recognized.  (Note: plugin-specific link parameters
  still aren't recognized.)  Also, the ``<embed>`` tag, though not
  standard, is now included in ``lxml.html.defs.special_inline_tags``.

* Using custom resolvers on XSLT stylesheets parsed from a string
  could request ill-formed URLs.

* With ``lxml.doctestcompare`` if you do ``<tag xmlns="...">`` in your
  output, it will then be namespace-neutral (before the ellipsis was
  treated as a real namespace).

Other changes
-------------

* The module source files were renamed to "lxml.*.pyx", such as
  "lxml.etree.pyx".  This was changed for consistency with the way
  Pyrex commonly handles package imports.  The main effect is that
  classes now know about their fully qualified class name, including
  the package name of their module.

* Keyword-only arguments in some API functions, especially in the
  parsers and serialisers.

2.0alpha4 (2007-10-07)
======================

Features added
--------------

Bugs fixed
----------

* AttributeError in feed parser on parse errors

Other changes
-------------

* Tag name validation in lxml.etree (and lxml.html) now distinguishes
  between HTML tags and XML tags based on the parser that was used to
  parse or create them.  HTML tags no longer reject any non-ASCII
  characters in tag names but only spaces and the special characters
  ``<>&/"'``.

2.0alpha3 (2007-09-26)
======================

Features added
--------------

* Separate ``feed_error_log`` property for the feed parser interface.
  The normal parser interface and ``iterparse`` continue to use
  ``error_log``.

* The normal parsers and the feed parser interface are now separated
  and can be used concurrently on the same parser instance.

* ``fromstringlist()`` and ``tostringlist()`` functions as in
  ElementTree 1.3

* ``iterparse()`` accepts an ``html`` boolean keyword argument for
  parsing with the HTML parser (note that this interface may be
  subject to change)

* Parsers accept an ``encoding`` keyword argument that overrides the encoding
  of the parsed documents.

* New C-API function ``hasChild()`` to test for children

* ``annotate()`` function in objectify can annotate with Python types and XSI
  types in one step.  Accompanied by ``xsiannotate()`` and ``pyannotate()``.

Bugs fixed
----------

* XML feed parser setup problem

* Type annotation for unicode strings in ``DataElement()``

Other changes
-------------

* lxml.etree now emits a warning if you use XPath with libxml2 2.6.27
  (which can crash on certain XPath errors)

* Type annotation in objectify now preserves the already annotated type by
  default to prevent loosing type information that is already there.

2.0alpha2 (2007-09-15)
======================

Features added
--------------

* ``ET.write()``, ``tostring()`` and ``tounicode()`` now accept a keyword
  argument ``method`` that can be one of 'xml' (or None), 'html' or 'text' to
  serialise as XML, HTML or plain text content.

* ``iterfind()`` method on Elements returns an iterator equivalent to
  ``findall()``

* ``itertext()`` method on Elements

* Setting a QName object as value of the .text property or as an attribute
  will resolve its prefix in the respective context

* ElementTree-like parser target interface as described in
  http://effbot.org/elementtree/elementtree-xmlparser.htm

* ElementTree-like feed parser interface on XMLParser and HTMLParser
  (``feed()`` and ``close()`` methods)

Bugs fixed
----------

* lxml failed to serialise namespace declarations of elements other than the
  root node of a tree

* Race condition in XSLT where the resolver context leaked between concurrent
  XSLT calls

Other changes
-------------

* ``element.getiterator()`` returns a list, use ``element.iter()`` to retrieve
  an iterator (ElementTree 1.3 compatible behaviour)

2.0alpha1 (2007-09-02)
======================

Features added
--------------

* Reimplemented ``objectify.E`` for better performance and improved
  integration with objectify.  Provides extended type support based on
  registered PyTypes.

* XSLT objects now support deep copying

* New ``makeSubElement()`` C-API function that allows creating a new
  subelement straight with text, tail and attributes.

* XPath extension functions can now access the current context node
  (``context.context_node``) and use a context dictionary
  (``context.eval_context``) from the context provided in their first
  parameter

* HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup``

* New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified
  doctests based on XML/HTML output.  Use by importing ``lxml.usedoctest`` or
  ``lxml.html.usedoctest`` from within a doctest.

* New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS
  selectors.

* New package ``lxml.html`` written by Ian Bicking for advanced HTML
  treatment.

* Namespace class setup is now local to the ``ElementNamespaceClassLookup``
  instance and no longer global.

* Schematron validation (incomplete in libxml2)

* Additional ``stringify`` argument to ``objectify.PyType()`` takes a
  conversion function to strings to support setting text values from arbitrary
  types.

* Entity support through an ``Entity`` factory and element classes.  XML
  parsers now have a ``resolve_entities`` keyword argument that can be set to
  False to keep entities in the document.

* ``column`` field on error log entries to accompany the ``line`` field

* Error specific messages in XPath parsing and evaluation
  NOTE: for evaluation errors, you will now get an XPathEvalError instead of
  an XPathSyntaxError.  To catch both, you can except on ``XPathError``

* The regular expression functions in XPath now support passing a node-set
  instead of a string

* Extended type annotation in objectify: new ``xsiannotate()`` function

* EXSLT RegExp support in standard XPath (not only XSLT)

Bugs fixed
----------

* lxml.etree did not check tag/attribute names

* The XML parser did not report undefined entities as error

* The text in exceptions raised by XML parsers, validators and XPath
  evaluators now reports the first error that occurred instead of the last

* Passing '' as XPath namespace prefix did not raise an error

* Thread safety in XPath evaluators

Other changes
-------------

* objectify.PyType for None is now called "NoneType"

* ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 -
  original name is still available as alias

* In the public C-API, ``findOrBuildNodeNs()`` was replaced by the more
  generic ``findOrBuildNodeNsPrefix``

* Major refactoring in XPath/XSLT extension function code

* Network access in parsers disabled by default

[lxml-dev] lxml 2.0 beta1 released

Stefan Behnel