[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

Tim Delaney timothy.c.delaney at gmail.com
Tue Mar 19 09:10:32 EDT 2019


On Tue, 19 Mar 2019 at 23:13, David Mertz <mertz at gnosis.cx> wrote:

> In a way, this case makes bugs worse because they are not only a Python
> internal matter. XML is used to communicate among many tools and
> programming languages, and relying on assumptions those other tools will
> not follow us a bad habit.
>

I have a recent example I encountered where the 3.7 behaviour (sorting
attributes) results in a third-party tool behaving incorrectly, whereas
maintaining attribute order works correctly. The particular case was using
HTML <meta> tags for importing into Calibre for converting to an ebook. The
most common symptom was that series indexes were sometimes being correctly
imported, and sometimes not. Occasionally other <meta> tags would also fail
to be correctly imported.

Turns out that <meta name="series_index" content="3"/> gave consistently
correct results, whilst <meta content="3" name="series_index"/> was
erratic. And whilst I'd specified the <meta> tags with the name attribute
first, I was then passing the HTML through BeautifulSoup, which sorted the
attributes.

Now Calibre is definitely in the wrong here - it should be able to import
regardless of the order of attributes. But the fact is that there are a lot
of tools out there that are semi-broken in a similar manner.

This to me is an argument to default to maintaining order, but provide a
way for the caller to control the order of attributes when formatting e.g.
pass an ordering function. If you want sorted attributes, pass the built-in
sorted function as your ordering function. But I think that's getting
beyond the scope of this discussion.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190320/1a496c7e/attachment.html>


More information about the Python-Dev mailing list