PEP0238 lament

Tue Jul 24 19:43:57 EDT 2001

Guido van Rossum <guido at python.org> writes:

> If you agree that, if it weren't for breaking old code, PEP-238 would
> make Python a better language, what can we do about that old code?

Could we also at least keep in the fray a slightly alternate question
of "what can we do about the PEP"?  I mean, as has been suggested
elsewhere at times, one solution to the old code is not to break it -
using "//" (or some other sequence) as the float division is backwards
compatible (no current code using it) and can be introduced at any
time for new code to use.  It wouldn't even need a lengthy period, but
could show up right in 2.2.

It seems to me that the only real argument against that is aesthetics,
or am I missing something?  Of course I'll play a little devil's
advocate and say that I probably agree with the aesthetic argument, but
only to the point where it breaks down against my concern for the
compatibility argument.

I will however, also concede that realistically the compatibility
argument should hold for any backwards-incompatible change (e.g.,
adding the yield keyword for generators) which weakens my argument
somewhat since old modules using "yield" as a variable will already be
running into the same issues we're discussing with respect to
division.

> I propose the following measures.  Let's discuss whether these are
> conservative enough, from the perspective of someone who has a large
> body of old working Python code.  (From the perspective of someone who
> wants int division for other reasons, these can never be conservative
> enough. :-)

"large body" in this case should be the product of actual amount of
code, and size of deployed base.  Even if I've only got a few modules
impacted, if they're deployed all over the place, it's just as hard to
fix one thing as many.

But, going with the thought process...

> (1) A very long wait period before the new division becomes the
>     default.  The PPE mentions at least two years; we can go over
>     that.  Python 2.2 will introduce the first release where it's
>     *possible* to write "from __future__ import division"; we'll have
>     to wait until 1.5.2 is only a faint memory (like 1.4 is now) and
>     2.2 pretty old.  (The current release pace is two revisions per
>     year, so two years would make 2.6 the first release where new
>     division is the default.)

Will 2.2 also introduce the use of // if you wanted to write it now?

Thinking in general, I'm concerned that this change itself might feed
back and itself cause the necessary time to increase, and it's not
clear that it ever really comes to closure.  That is, perhaps part of
the reason that 1.4 became a faint memory is that the only thing
holding up use of 1.5.2 was pure deployment concerns.  I didn't take
part in that transition, so will have to defer to others if there were
any core backwards incompatible changes in 1.5 over 1.4 that had to be
overcome for people to move away from 1.4.

Thus, the very fact that the behavior is moving towards breakage may
hold up people from moving for fear of the resource involved to ensure
they don't have a problem, which in turn may prolong the period of
time when such breakage is a risk.  And while we may be timing from
1.5.2, the real crux is the gap from oldest in general use release
that works the old way to first that doesn't, since deployments
(particularly large systems) don't tend to move linearly through the
releases.

This just makes me think that no timeline is really large enough, but
maybe I'm just falling into the trap of theoretical concerns that
won't materialize in reality.

But in concrete terms (as a data point for myself, if only to give
some visibility into some practical terms of what I'm likely to be
dealing with), on a distributed basis, I have to deal with relatively
small amount of code, but dependent on third party packages,
distributed to around 1000 machines in several hundred locations in a
few countries.  That's at the moment, and if my sales force isn't
lying (no snickers from the back of the room) it'll be at least double
that within a year.  Locally, I have larger code bases, but only on
tens of machines all in a central data center.

I'm still at 1.5.2 and unlikely to move beyond that until at least
this year's end.  At that point I would have been using Python 1.5.2
for about 21 months with it fully deployed in all production sites
around 18.  I would anticipate about a 6 month preparation cycle
before moving to any new release (code test, ensure third party
support, etc...) and would then expect to get at least 12-18 months
run time again.  Right now I was planning on moving to 2.1.1 unless
2.2 looked to arrive soon enough (unlikely) to meet the testing/third
party criteria.

So even when you released 2.2, I'd probably just be deploying 2.1.1
with the next deployment release being investigated around 2.4/2.5.

So, assuming the new division operator was supported as of 2.2, I'd
likely have my first chance of using it in production use (e.g.,
writing code to use it to avoid breaking in 2.6) with 2.4, at least a
year from now.  But if that still left me a year and two releases to
iron things out I expect the code I controlled would be fine.

My biggest concern at that point would be that third party code
continued to track as well, and even more so, how to sanity check
older code I may obtain (from the net or elsewhere) to ensure that it
didn't depend on old behavior.

Actually, that might turn out to be the biggest compatibility problem
of all.  Once the from __future__ goes away you'll be left with
straight Python code with no indication that it assumes that / is the
new behavior (other than being written recently) and as things lead a
_long_ life on the web, it'll be all too easy to pull down code
examples or modules that have no visual indication that they might
depend on the old behavior.

So bottom line is that I think I'd have to admit that two years is
probably ok, but I'd feel more comfortable with it longer, and even
then I'd be concerned with where that left us steady state given the
legacy of code modules floating around for who knows how long.  But
since some stake has to be placed in the ground, and I don't know that
I'd be more comfortable with 'n' years rather than 2 years, it's
probably ok.

Wow, that got longer than I thought...

> (2) A command line option to change division semantics.  I propose
>     -Dnew to make the new division semantics the default, -Dold to
>     make the old semantics the default.  -Dold will be the default
>     until the wait period is over, then -Dnew will become the
>     default.  The -D option is retired when nobody needs -Dold any
>     more.  Modules that contain a future statement always use the new
>     default; there's no way to force the old default on a *per-module*
>     basis (since that would be a language feature that could never be
>     removed).

I see this as being of limited usefulness, primarily in cases where
all application code (sans standard library) is under my control.
That's because from a compatibility point of view, if I'm having
problems, it's during the transition, so it's less likely that I'll be
dealing with a homogenous set of modules all of which want one
behavior or the other, versus a mixed bag.

This also begs the question of why offer a command line option for
division but not other backwards-incompatible changes (aside from lack
of more public uproar in the other cases :-)).  

> (3) As of 2.2, all the standard library code should work regardless of
>     the -D option.  That is, all library modules (and modules in Demo
>     and Tools) should use // whenever int division is required, and
>     should use the future statement whenever float division is
>     required.  This is necessary so that use of the -D command line
>     option doesn't break the standard library.

What happens when you reach the release where the __future__ is no
longer needed?  Will that stay in the standard library for some period
of time after that anyway (or else the -D command line option might
cause breakage in the standard library)?

> Anything else?

Just one thought on the warnings defined in the PEP (which in any list
like the above, I'd probably suggest highlighting as a point).
They're not perfect since there are plenty of environments where they
might not show up, but if I read correctly, as of 2.3 any integer
divisions that lose data will generate a warning, but only until the
new behavior becomes default.

Unlike many (all?) of the prior backwards-incompatible features, which
first warn about breaking behavior and then fully break behavior, but
continue to do so visibly (e.g., "yield" remains a keyword), this one
ends up in a state where the break goes from being warned about to
being silent and no longer visible.  That might be part of my gut
concern about compatibility, as it makes me feel that there is a
window of opportunity to catch the problem, but if you miss it,
discovering dependencies on old behavior becomes very problematic.

Could there be a way to continue to request the activation of the
warning noted in the PEP even after it has been turned off by default
and the behavior has officially changed?  Doing a static analysis of
modules to find uses of division can help inspect older modules but
having runtime warnings may prove helpful when trying to make use of
an older module in a few years with a current Python.  This sort of
thing might even be helpful as a general recommendation for other
backwards-incompatible changes that by their nature become silent once
fully enabled.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/