[Python-Dev] Expert floats

Tue Apr 6 23:10:16 EDT 2004

[Tim]
>> I want marshaling of fp numbers to give exact (not approximate)
>> round-trip equality on a single box, and across all boxes supporting
>> the 754 standard where C maps "double" to a 754 double.

[Ping]
> That is a valuable property.  I support it and support Python
> continuing to have that property.

That's good, since nobody opposes it <wink>.

> I hope it has been made quite clear by now that this property does not
> constrain how numbers are displayed by the interpreter in
> human-readable form.  The issue of choosing an appropriate string
> representation of a number is unaffected by the desire for the above
> property.

...

> I think we *have* made progress.  Now we can set aside the red-herring
> issue of platform-independent serialization and focus on the real
> issue: human-readable string representation.

I don't think that's "the" real issue, but it is one of several.

...

>> I'm the one who> has fielded most newbie questions about fp since
>> Python's beginning, and I'm very happy with the results of changing
>> repr() to produce 17 digits.

> Now you are pulling rank.

I'm relating my experience, which informs my beliefs about these issues more
than any "head argument".

> I cannot dispute your longer history and greater experience with Python;
> it is something i greatly admire and respect.  I also don't know your
> personal experiences teaching Python.

In person, mostly to hardware geeks and other hardcore software geeks.  On
mailing lists and newsgroups, to all comers, although I've had decreasing
time for that as the years drag on.

> But i can tell you my experiences.  And i can tell you that i have
> tried to teach Python to many people, individually and in groups.  I
> taught a class in Python at UC Berkeley last spring to 22 people who
> had never used Python before.  I maintained good communication with
> the students and their feedback was very positive about the class.
>
> How did the class react to floating-point?  Seeing behaviour like
> this:
>
>     >>> 3.3
>     3.2999999999999998
>     >>>
>
> confused and frightened them, and continues to confuse and frighten
> almost everyone i teach.

Sorry, but so long as they stick to binary fp, stuff like that can't be
avoided, even using "your rule" (other examples of that were posted today,
and I won't repeat them again here).  I liked Python's *former* rule
myself(repr rounds to 12 significant digits), and would like it much better
than "shortest possible" (which still shows tons of crap I usually don't
care about) most days for my own uses.

That's a real problem Python hasn't addressed:  its format decisions are
often inappropriate and/or undesirable (both can differ by app and by
audience and by object type), and there are insufficient hooks for
overriding these decisions.  sys.displayhook goes a bit in that direction,
but not far enough.

BTW, if your students *remain* confused & frightened, it could be you're not
really trying to explain binary fp reality to them.

> (The rare exceptions are the people who have done lots of computational
> work before and know how binary floating-point representations work.)
> Every time this happens, the teaching is derailed and i am forced to go
> into an explanation of binary floating-point to assuage their fears.

Then they *don't* remain confused & frightened?  Great.  Then they've been
educated.  How long can it take to read the Tutorial Appendix?  It's well
worth however many years it takes <wink>.

> Remember, i am trying to teach basic programming skills.  How to solve
> problems; how to break down problems into steps; what's a subroutine;
> and so on.  Aside from this floating-point thing throwing them off,
> Python is a great first language for new programmers.  This is not the
> time to talk about internal number representation.

Use Decimal instead.  That's always been the best idea for newbies (and for
most casual users of floating-point, newbie or not).

> I am tired of making excuses for Python.  I love to tell people about
> Python and show them what it can do for them.  But this floating-point
> problem is embarrassing.  People are confused because no other system
> they've seen behaves like this.

If you're teaching "basic programming skills", what other systems have they
seen?  Hand calculators for sure -- which is why they should use Decimal
instead.  Virually nothing about it will surprise them, except the
liberating ability to crank up the precision.

> Other languages don't print their numbers like this.  Accounting
> programs and spreadsheets don't print their numbers like this.

I don't care -- really.  I'm thoroughly in agreement with Kahan on this;
see, e.g., section "QPRO 4.0 and QPRO for Windows" in

    http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf

    ... the reader can too easily misinterpret a few references to 15
    or 16 sig. dec of precision as indications that no more need be said
    about QPRO's arithmetic.  Actually much more needs to be said because
    some of it is bizarre.

    Decimal displays of Binary nonintegers cannot always be WYSIWYG.

    Trying to pretend otherwise afflicts both customers and implementors
    with bugs that go mostly misdiagnosed, so “fixing” one bug merely
    spawns others.

    ...

    The correct cure for the @ROUND and @INT (and some other) bugs is not
    to fudge their argument but to increase from 15 to 17 the maximum
    number of sig. dec. that users of QPRO may see displayed.

    But no such cure can be liberated from little annoyances:
    [snip things that make Ping's skin crawl about Python today]

    ...

    For Quattro’s intended market, mostly small businesses with little
    numerical expertise, a mathematically competent marketing follow-
    through would have chosen either to educate customers about binary
    floating-point or, more likely, to adopt decimal floating-point
    arithmetic even if it runs benchmarks slower.

The same cures are appropriate for Python.

> Matlab and Maple and Mathematica don't print their numbers like this.

Those are designed for experts (although Mathematica pretends not to be).

> Only Python insists on being this ugly.  And it screws up the most common
> way that people first get to know Python -- as a handy interactive
> calculator.
>
> And for what?  For no gain at all -- because when you limit your focus
> to the display issue, the only argument you're making is "People
> should be frightened."  That's a pointless reason.

Sorry, that's an absurd recharacterization, and I won't bother responding to
it.  If you really can't see any more to "my side" of the argument than that
yet, then repeating it another time isn't going to help.

So enough of this.  In what time I can make for "stuff like this", I'm going
to try to help the Decimal module along instead.  Do what you want with
interactive display of non-decimal floats, but do try to make it flexible
instead of fighting tooth and nail just to replace one often-hated fixed
behavior with another to-be-often-hated fixed behavior.

...

> Not everyone runs into floating-point corner cases.  In fact, very few
> people do.

Heh.  I like to think that part of that has to do with the change to repr()!
As I've said many times before, we *used* to get reports of a great variety
of relatively *subtle* problems due to binary fp behavior from newbies; we
generally get only one now, and the same one every time.  They're not
stupid, Ping, they just need the bit of education it takes to learn
something about that expensive fp hardware they bought.

> I have never encountered such a problem in my entire history of using
> Python.

Pride goeth before the fall ...

> And if you surveyed the user community, i'm sure you would find that
> only a small minority cares enough about the 17th decimal place for the
> discrepancy to be an issue.

The result of int() can change by 1 when the last bit changes, and the
difference between 2 and 3 can be a disaster -- see Kahan (op. cit.) for a
tale of compounded woe following from that one.  Aahz's recent example of a
loop going around one time more or less "than expected" used to be very
common, and is the same thing in a different guise.  It's like security that
way:  nobody gives a shit before they get burned, and then they get livid
about it.  If a user believes 0.1 is one tenth, they're going to get burned
by it.

> ...
> You say it's better for people to get "bitten early".  What's better:
> everyone suffering for a problem that will never affect most of them,
> or just those who care about the issue having to deal with it?

The force of this is lost because you don't have a way to spare users from
"unexpected extra digits" either.  It comes with the territory!  It's
inherit in using binary fp in a decimal world.  All you're really going on
about is showing "funny extra digits" less often -- which will make them all
the more mysterious when they show up.  I liked the former
round-to-12-digits behavior much better on that count.  I expect to like
Decimal mounds better on all counts except speed.