[Python-Dev] Function Hash: Check it in?

Mon, 29 Jan 2001 16:51:56 -0500

[Moshe Zadka]
> ...
> I'm starting to wonder what the tests really test: the language
> definition, or accidents of the implementation?

You'd be amazed (appalled?) at how hard it is to separate them.

In two previous lives as a Big Iron compiler hacker, we routinely had to get
our compilers validated by a govt agency before any US govt account would be
allowed to buy our stuff; e.g.,

    http://www.itl.nist.gov/div897/ctg/vpl/language.htm

This usually *started* as a two-day process, flying the inspector to our
headquarters, taking perhaps 2 minutes of machine time to run the test
suite, then sitting around that day and into the next arguing about whether
the "failures" were due to non-standard assumptions in the tests, or
compiler bugs.  It was almost always the former, but sometimes that didn't
get fully resolved for months (if the inspector was being particularly
troublesome, it could require getting an Official Interpretation from the
relevant stds body -- not swift!).  (BTW, this is one reason huge customers
are often very reluctant to move to a new release:  the validation process
can be very expensive and drag on for months)

>>> def f():
...     global g
...     g += 1
...     return g
...
>>> g = 0
>>> d = {f(): f()}
>>> d
{2: 1}
>>>

The Python Lang Ref doesn't really say whether {2: 1} or {1: 2} "should be"
the result, nor does it say it's implementation-defined.  If you *asked*
Guido what he thought it should do, he'd probably say {1: 2} (not much of a
guess:  I asked him in the past, and that's what he did say <wink>).

Something "like that" can show up in the test suite, but buried under layers
of obfuscating accidents.  Nobody is likely to realize it in the absence of
a failure motivating people to search for it.

Which is a trap:  sometimes ours was the only compiler (of dozens and
dozens) that had *ever* "failed" a particular test.  This was most often the
case at Cray Research, which had bizarre (but exceedingly fast -- which is
what Cray's customers valued most) floating-point arithmetic.  I recall one
test in particular that failed because Cray's was the only box on earth that
set I to 1 in

    INTEGER I
    I = 6.0/3.0

Fortran doesn't define that the result must be 2.  But-- you guessed
it --neither does Python.

Cute:  at KSR, INT(6.0/3.0) did return 2 -- but INT(98./49.) did not <wink>.

then-again-the-python-test-suite-is-still-shallow-ly y'rs  - tim