[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

Mon Mar 18 18:41:15 EDT 2019

We're having a super interesting discussion on https://bugs.python.org/issue34160 .  It is now marked as a release blocker and warrants a broader discussion.

Our problem is that at least two distinct and important users have written tests that depend on exact byte-by-byte comparisons of the final serialization.  So any changes to the XML modules will break those tests (not the applications themselves, just the test cases that assume the output will be forever, byte-by-byte identical).  

In theory, the tests are incorrectly designed and should not treat the module output as a canonical normal form.  In practice, doing an equality test on the output is the simplest, most obvious approach, and likely is being done in other packages we don't know about yet.

With pickle, json, and __repr__, the usual way to write a test is to verify a roundtrip:  assert pickle.loads(pickle.dumps(data)) == data.  With XML, the problem is that the DOM doesn't have an equality operator.  The user is left with either testing specific fragments with element.find(xpath) or with using a standards compliant canonicalization package (not available from us). Neither option is pleasant.

The code in the current 3.8 alpha differs from 3.7 in that it removes attribute sorting and instead preserves the order the user specified when creating an element.  As far as I can tell, there is no objection to this as a feature.  The problem is what to do about the existing tests in third-party code, what guarantees we want to make going forward, and what do we recommend as a best practice for testing XML generation.

Things we can do:

1) Revert back to the 3.7 behavior. This of course, makes all the test pass :-)  The downside is that it perpetuates the practice of bytewise equality tests and locks in all implementation quirks forever.  I don't know of anyone advocating this option, but it is the simplest thing to do.

2). Go into every XML module and add attribute sorting options to each function that generate xml.  This gives users a way to make their tests pass for now. There are several downsides. a) It grows the API in a way that is inconsistent with all the other XML packages I've seen. b) We'll have to test, maintain, and document the API forever -- the API is already large and time consuming to teach. c) It perpetuates the notion that bytewise equality tests are the right thing to do, so we'll have this problem again if substitute in another code generator or alter any of the other implementation quirks (i.e. how CDATA sections are serialized).

3) Add a standards compliant canonicalization tool (see https://en.wikipedia.org/wiki/Canonical_XML ).  This is likely to be the right-way-to-do-it but takes time and energy.

4) Fix the tests in the third-party modules to be more focused on their actual test objectives, the semantics of the generated XML rather than the exact serialization.  This option would seem like the right-thing-to-do but it isn't trivial because the entire premise of the existing test is invalid.  For every case, we'll actually have to think through what the test objective really is.

Of these, option 2 is my least preferred.  Ideally, we don't guarantee bytewise identical output across releases, and ideally we don't grow a new API that perpetuates the issue. That said, I'm not wedded to any of these options and just want us to do what is best for the users in the long run.

Regardless of option chosen, we should make explicit whether on not the Python standard library modules guarantee cross-release bytewise identical output for XML. That is really the core issue here.  Had we had an explicit notice one way or the other, there wouldn't be an issue now.

Any thoughts?

Raymond Hettinger

P.S.   Stefan Behnel is planning to remove attribute sorting from lxml.  On the bug tracker, he has clearly articulated his reasons.