[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

Victor Stinner vstinner at redhat.com
Wed Mar 20 20:22:56 EDT 2019


Hi,

Le lun. 18 mars 2019 à 23:41, Raymond Hettinger
<raymond.hettinger at gmail.com> a écrit :
> We're having a super interesting discussion on https://bugs.python.org/issue34160 .  It is now marked as a release blocker and warrants a broader discussion.

Thanks for starting a thread on python-dev. I'm the one who raised the
priority to release blocker to trigger such discussion on python-dev.


> Our problem is that at least two distinct and important users have written tests that depend on exact byte-by-byte comparisons of the final serialization.

Sorry but I don't think that it's a good summary of the issue. IMHO
the issue is more general about how we introduce backward incompatible
in Python.

The migration from Python 2 to Python 3 took around ten years. That's
way too long and it caused a lot of troubles in the Python community.
IMHO one explanation is our patronizing behavior regarding to users
that I would like to summarize as "your code is wrong, you have to fix
it" (whereas the code was working well for 10 years with Python 2!).

I'm not opposed to backward incompatible changes, but I think that we
must very carefully prepare the migration and do our best to help
users to migrate their code.


> 2). Go into every XML module and add attribute sorting options to each function that generate xml. (...)

Written like that, it sounds painful and a huge project... But in
practice, the implementation looks simple and straightforward:
https://github.com/python/cpython/pull/12354/files

I don't understand why such simple solution has been rejected.

IMHO adding an optional sort parameter is just the *bare minimum* that
we can do for our users.

Alternatives have been proposed like a recipe to sort node attributes
before serialization, but honestly, it's way too complex. I don't want
to have to copy such recipe to every project. Add a new function,
import it, use it where XML is written into a file, etc. Taken alone,
maybe it's acceptable. But please remember that some companies are
still porting their large Python 2 code base to Python 3. This new
backward incompatible gets on top of the pile of other backward
incompatible changes between 2.7 and 3.8.

I would prefer to be able to "just add" sort=True. Don't forget that
tests like "if sys.version >= (3, 8):"  will be needed which makes the
overall fix more complicated.

Said differently, the stdlib should help the user to update Python.
The pain should not only be on the user side.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.


More information about the Python-Dev mailing list