Language change and code breaks

Tue Jul 10 20:55:56 EDT 2001

LANGUAGE CHANGE AND CODE BREAKS
--with particular reference to Python and recent discussions in
comp.lang.python

Terry J. Reedy, tjreedy at udel.edu
July 10, 2001

Summary: After discussing two generals issues (type and audience), I
particularly look at the effect on new programmers and the effect of
replacement changes.

Types of Change: As in other realms, such as text editing, we can
categorize programming language changes as addition, deletion, and
replacement.  While replacement can be modeled by or reduced  to deletion
and addition (see Keywords below), replacement in the specific sense of
changing the meaning of a construct legal both before and after is
qualitatively different in its effect on programmers and code.  It should
therefore be kept as a separate category.

Audience: Each type of change will have different effects on old and new
code (defined as that written for the the old and new versions of the
language) when run on the 'wrong' version on the language interpreter.
Programmers will also see the change differently depending on whether they
learn the language before or after the change .  Existing users who pay
attention to the change will have to upgrade their knowledge and maybe
their code.  New users who initially learn only the new version of the
language and then read old code or run code on old interpreters without
backdating their knowledge may be surprised.

Deletions: If a feature is directly deleted, it presumably is rare used and
not too useful.  When present in code run on a new system, there should be
a syntax error and the programmer will have to remove it and maybe do
something else.  New users reading old code will seldom see deleted
features.  (I have never, for instance, seen an access statement.)  If and
when they do, they may ignore it, guess its meaning, or find its meaning in
old documentation.

Additions: Old code runs fine in new interpreters.  New users will never
see the new feature they learned in old code.  If they try to use it with
an old interpreter, they will get an error message and either work around
its absence or stick with new interpreters.

Keywords:  Addition of a word as a keyword implicitly deletes its previous
usage as a name.  As long as the legal usages before and after are
disjoint, any wrong usages will be flagged as errors.  This combination of
deletion and addition is different from replacement in the narrow sense I
use here.  Example:  The statement 'yield <expression>' is disjoint from
any current legal use of 'yield' as, for instance, a name for bond yield.

Replacements: These are the most troublesome changes since they amount to a
silent code break.  There will be no error message (at least not at the
point of misinterpretation) when code is run under the wrong interpreter.
Similarly, there is no reason for a flag to raised in the mind of a new
user who only knows the new meaning of the construct.

A further complication is that incompatibility is bidirectional.  Old code
that does not use  a deleted feature and new code that does not use an
added feature can run on both old and new interpreters.  After a
replacement, code must avoid the changed-meaning syntax entirely, in both
old and new meanings, to achieve the same flexibility.

For various reasons, transitions may take several years before everyone
upgrades their interpreters to any particular release level.  Last I knew,
for instance, some Linux distributions still install Python 1.5.2.
Code-breaking replacements probably inhibit ungrades more that
non-code-breaking additions

Example 1 - nested scooping:  As I understand it, the value assigned to y
in

x = 1
def f():
  x = 'one'
  def g():
    y = x

is changing from 1 to 'one'.  This transition is eased by the fact that
duplicate intermediate names are currently unnecessary (just change the
spelling) and rare.  As a transition measure, the compiler can detect this
usage and emit a warning, strongly suggestion a name change before the
scope-hiding meaning is deleted by being replaced.  As for newcomers, those
few who really delve into into the advanced topic of nested functions and
name scoping can also read an included warning about the older two-level
rule.

Example 2 - integer (int+long) division:  This transition is more involved
and will be harder even with extra planning.

a. While it is fairly easy to write the future meaning of int/int in the
current language, using '.0' and float() as people do now, the currently
possible rewrite of the current meaning so it will remain valid --
divmod(e1,e2)[0]) -- is more complex and slower to run.  Therefore, a
simultaneous (or preceeding) addition of something like infix 'div' is
desirable (and already planned).

b.  Nested scoping applies to all nested functions; its uses are detectable
at compile time, which makes warnings about deprecated usages easy.  The
change in meaning of e1 / e2 is partial in that it only affects
integer-integer division.  While the case of two literal constants could be
detected at compile time, detection generally awaits the run-time
dispatching in the function that implements '/'.  A warning mechanism will
have to include run-time warnings.  A mechanism to turn warnings on and off
on a per-statement basis would be helpful.

c. Nested scoping was intended to be an addition until someone noticed the
existence of a foolish-but-legal conflict.  The change in meaning of
integer / integer is a true bidirectional replacement that will invalidate
actual and not just theoretically possible code.  To make code
cross-compitible, one will generally (without specific program analysis)
have to avoid all occurrence of integer-integer division in any form, with
either / or div or with constants or variables.

d. If the change in meaning brought about by a replacement is large, one
may hope that the uncaught error introduced by a legal but incorrect
cross-usage will bring about an exception sometime later in the
computation.  However, the division change may only change the type of the
result, which may well change but not crash further computation.  And, of
course, the value error, if not zero, may be small and similarly hard to
detect.

e.  Almost everyone who does much of any programming will use division
sometime, so everyone should be taught how to program it.  So this change
affect beginners and not just advanced programmers.  For years after the
change, anyone who might use an older system that they do not control, such
as at a school campus or workplace, should also be taught that 'integer /
integer' used to mean 'interger div integer', and that subtle and
hard-to-detect errors can arise from erroneous cross-usage.