[Python-Dev] Smoothing the transition from Python 2 to 3

Stephen J. Turnbull stephen at xemacs.org
Fri Jun 10 02:23:43 EDT 2016


Neil Schemenauer writes:

 > I have to wonder if you guys actually ported at lot of Python 2
 > code.

Python 3 (including stdlib) itself is quite a bit of code.

 > According to you guys, there is no problem

No, according to us, there are problems, but in the code, not in the
language or its implementation.  This is a Brooksian "no silver
bullet" problem: it's very hard to write reliable code that handles
multiple text representations (as pretty much everything does
nowadays), except by converting to internal text on input and back to
encoded text on output.  The warnings you quote (and presumably the
code that generates them) make assumptions (cf Barry's post) that are
frequently invalid.  I don't know about cross-type comparisons, but as
Barry and Brett both pointed out, mixtures of bytes and text are
*rarely* easy to fix, because it's often extremely difficult to know
which is the appropriate representation for a given variable unless
you do a complete refactoring as described above.  When I've tried to
fix such warnings one at a time, it's always been whack-a-mole.

The experience in GNU Emacs and Mailman 2 has been that it took about
ten years to get to the point where they went a whole year without an
encoding bug once non-Latin-1 encodings were being handled.  XEmacs
OTOH took only about 3 years from the proof-of-concept introduction of
multibyte characters to essentially no bugs (except in C code, of
course!) because we had the same policy as Python 3: bytes and text
don't mix, and in development we also would abort on mixing integers
and characters (in GNU Emacs, the character type was the same as the
integer type until very recently).  We affectionately referred to
those bugs as "Ebola" (not very polite, but it gets the point across
about how seriously we took the idea of making the internal text
representation completely opaque).  In Mailman 2, we still can't say
confidently that there are no Unicode bugs left even today.  We still
need an outer "except UnicodeError: quarantine_and_call_for_help(msg)"
handler, although AFAIK it hasn't been reported for a couple years.

It's not that you can't continue to run the potentially buggy code in
Python 2.  Mailman 2 does; you can, too.  What we don't support (and I
personally hope we never support) is running that code in Python 3
(warnings or no).  If you want to support that yourself, more power to
you, but I advise you that my experience suggests that it's not going
to be a panacea, and I do believe it's going to be more trouble than
biting the bullet and just thoroughly porting your code.  Even if that
takes as much time as it took Amber to port Twisted.

 > and we already have good enough tooling. ;-(

Nobody said that, just that the existing tooling is pretty good for
the problems that tools can help with, while no tool is likely to be
much help with some of the code your tool allows to run.  You're
welcome to try to prove that claim wrong -- if you do, it would indeed
be very valuable!  But I personally, based on my own experience, think
that the chance of success is too low to justify the cost.  (Granted,
I don't have to port Twisted, so in that sense I'm biased. :-/ )

BTW tools continue to be added, as well as language changes (PEP 461!)
There is no resistence to that.

What you're running into here is that several of us have substantial
experience with various of the issues raised, and that experience
convinces us that there's no silver bullet, just hard work, if you
face them.

Steve



More information about the Python-Dev mailing list