[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

Raymond Hettinger raymond.hettinger at gmail.com
Wed Mar 20 20:56:48 EDT 2019


> On Mar 20, 2019, at 5:22 PM, Victor Stinner <vstinner at redhat.com> wrote:
> 
> I don't understand why such simple solution has been rejected.

It hasn't been rejected. That is above my pay grade.  Stefan and I recommended against going down this path. However, since you're in disagreement and have marked this as a release blocker, it is now time for the steering committee to earn their pay (which is at least double what I'm making) or defer to the principal module maintainer, Stefan.

To recap reasons for not going down this path:

1) The only known use case for a "sort=True" parameter is to perpetuate the practice of byte-by-byte output comparisons guaranteed to work across feature releases.  The various XML experts in this thread have opined that isn't something we should guarantee (and sorting isn't the only aspect detail subject to change, Stefan listed others).

2) The intent of the XML modules is to implement the specification and be interoperable with other languages and other XML tools. It is not intended to be used to generate an exact binary output.  Per section 3.1 of the XML spec, "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant."

3) Mitigating a test failure is a one-time problem. API expansions are forever.

4) The existing API is not small and presents a challenge for teaching. Making the API bigger will make it worse.

5) As far as I can tell, XML tools in other languages (such as Java) don't sort (and likely for good reason).  LXML is dropping its attribute sorting as well, so the standard library would become more of an outlier.


Raymond



More information about the Python-Dev mailing list