[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

Tue Mar 19 02:30:43 EDT 2019

On Mon, Mar 18, 2019 at 9:44 PM Terry Reedy <tjreedy at udel.edu> wrote:

> On 3/18/2019 6:41 PM, Raymond Hettinger wrote:
> > We're having a super interesting discussion on
> https://bugs.python.org/issue34160 .  It is now marked as a release
> blocker and warrants a broader discussion.
> >
> > Our problem is that at least two distinct and important users have
> written tests that depend on exact byte-by-byte comparisons of the final
> serialization.  So any changes to the XML modules will break those tests
> (not the applications themselves, just the test cases that assume the
> output will be forever, byte-by-byte identical).
> >
> > In theory, the tests are incorrectly designed and should not treat the
> module output as a canonical normal form.  In practice, doing an equality
> test on the output is the simplest, most obvious approach, and likely is
> being done in other packages we don't know about yet.
> >
> > With pickle, json, and __repr__, the usual way to write a test is to
> verify a roundtrip:  assert pickle.loads(pickle.dumps(data)) == data.  With
> XML, the problem is that the DOM doesn't have an equality operator.  The
> user is left with either testing specific fragments with
> element.find(xpath) or with using a standards compliant canonicalization
> package (not available from us). Neither option is pleasant.
> >
> > The code in the current 3.8 alpha differs from 3.7 in that it removes
> attribute sorting and instead preserves the order the user specified when
> creating an element.  As far as I can tell, there is no objection to this
> as a feature.  The problem is what to do about the existing tests in
> third-party code, what guarantees we want to make going forward, and what
> do we recommend as a best practice for testing XML generation.
> >
> > Things we can do:
> >
> > 1) Revert back to the 3.7 behavior. This of course, makes all the test
> pass :-)  The downside is that it perpetuates the practice of bytewise
> equality tests and locks in all implementation quirks forever.  I don't
> know of anyone advocating this option, but it is the simplest thing to do.
>
> If it comes down to doing *something* to unblock the release ...
> 1b) Revert to 3.7 *and* document that byte equality with current ouput
> is *not* guaranteed.
>
> > 2). Go into every XML module and add attribute sorting options to each
> function that generate xml.  This gives users a way to make their tests
> pass for now. There are several downsides. a) It grows the API in a way
> that is inconsistent with all the other XML packages I've seen. b) We'll
> have to test, maintain, and document the API forever -- the API is already
> large and time consuming to teach. c) It perpetuates the notion that
> bytewise equality tests are the right thing to do, so we'll have this
> problem again if substitute in another code generator or alter any of the
> other implementation quirks (i.e. how CDATA sections are serialized).
> >
> > 3) Add a standards compliant canonicalization tool (see
> https://en.wikipedia.org/wiki/Canonical_XML ).  This is likely to be the
> right-way-to-do-it but takes time and energy.

>
> > 4) Fix the tests in the third-party modules to be more focused on their
> actual test objectives, the semantics of the generated XML rather than the
> exact serialization.  This option would seem like the right-thing-to-do but
> it isn't trivial because the entire premise of the existing test is
> invalid.  For every case, we'll actually have to think through what the
> test objective really is.

>
> > Of these, option 2 is my least preferred.  Ideally, we don't guarantee
> bytewise identical output across releases, and ideally we don't grow a new
> API that perpetuates the issue. That said, I'm not wedded to any of these
> options and just want us to do what is best for the users in the long run.
>

For (1) - don't revert in 3.8 - Do not worry about order or formatting of
serialized data changing between major Python releases.  change in 3.8?
that's 100% okay.  This already happens all the time between Python
releases.  We've changed dict iteration order between releases twice this
decade.

Within point releases of stable versions, ie 3.7.x? Up to the release
manager; it is semi-rude to change something like this within a stable
release unless there is a good reason, but we *believe* have done it
before. A general rule of thumb is to try not to without good reason though
unless the code to avoid doing so would be over complicated.

It is always the user code depending on the non-declared ordering within
output that is wrong, when we preserve it we're only doing them a temporary
favor that ultimately allows more problems to grow in the future.  Nobody
should use a text comparison on serialized data not explicitly stated as
canonical and call that test good by any standard unless you are writing a
test that for canonical output by a library that explicitly guarantees its
output will be canonical.

Agreed that your option (2) is not good for the world. The best thing to do
API wise is intentionally force some randomness of emitted order so that
people do not accidentally stumble into writing tests like this in the
presence of multiple attributes being emitted. If people need some
canonical form of output they need to explicitly ask for it.

The hash randomization work we did years ago for dicts exposed real bugs in
code all around the world when it was enabled (we turned it on internally
at work soon after it was implemented and had thousands of tests to fix
including exposing several hidden actual bugs in code).  A lot of the fixes
were typical "parse the structured data and compare structures" cleanups of
code that was previously cheating and getting away with a golden value
string comparison.  (people are always going to write code like that, it is
trivial to write, and if it doesn't appear to be flaky they'll consider it
good enough and leave future breakage for the next maintainers)

Option (3) seems better served via PyPI modules.

Option (4): fix code testing golden value str/bytes equality of serialized
formatted data is always a good thing to do.  sometimes easy, occasionally
quite frustrating as you find fundamentally flawed tests or, worse, flawed
API designs that relied on the non-guaranteed order.  can we fix everyones
code?  nope.  it isn't our responsibility, but if we've identified widely
used projects with the issue at least give them a heads up about their
bug(s) as soon as possible with repro instructions in the relevant issue
trackers.

my 2cents,
-gps

>
> The point of 1b would be to give us time to do that if more is needed.
>
> > Regardless of option chosen, we should make explicit whether on not the
> Python standard library modules guarantee cross-release bytewise identical
> output for XML. That is really the core issue here.  Had we had an explicit
> notice one way or the other, there wouldn't be an issue now.
>
> I have not read the XML docs but based on this and the issue discussion
> and what I think our general guarantee policy has been, I would consider
> that there is not one.  (I am thinking about things like garbage
> collection, stable sorting, and set/dict iteration order.)
>
> --
> Terry Jan Reedy
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190318/1208eee0/attachment.html>