[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

Victor Stinner vstinner at redhat.com
Wed Mar 20 20:46:14 EDT 2019


Le jeu. 21 mars 2019 à 01:30, Raymond Hettinger
<raymond.hettinger at gmail.com> a écrit :
> There's no preaching and no judgment.  We can't have a conversation though if we can't state the crux of the problem: some existing tests in third-party modules depend on the XML serialization being byte-for-byte identical forever. The various respondents to this thread have indicated that the standard library should only make that guarantee within a single feature release and that it may to vary across feature releases.
>
> For docutils, it may end-up being an easy fix (either with a semantic comparison or with regenerating the target files when point releases differ).  For Coverage, I don't make any presumption that reengineering the tests will be easy or fun.  Several mitigation strategies have been proposed:
>
> * alter to element creation code to create the attributes in the desired order
> * use a canonicalization tool to create output that is guarantee not to change
> * generate new baseline files when a feature release changes
> * apply Stefan's recipe for reordering attributes
> * make a semantic level comparison
>
> Will any other these work for you?

Python 3.8 is still in a very early stage of testing. We only started
to discover which projects are broken by the XML change.

IMHO the problem is wider than just unit tests written in Python.
Python can be used to produce the XML, but other languages can be used
to parse or compare the generated XML. For example, if the generated
file is stored in Git, it will be seen as modified and "git diff" will
show a lot of "irrelevant" changes.

Comparison of XML using string comparison can also be used to avoid
expensive disk/database write or reduce network bandwidth. That's an
issue if the program isn't written in Python, whereas the XML is
generated by Python.

Getting the same output on Python 3.7 and Python 3.8 is also matter
for https://reproducible-builds.org/

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.


More information about the Python-Dev mailing list