[Python-Dev] Change in Python 3's "round" behavior
steve at pearwood.info
Thu Sep 27 09:53:33 EDT 2018
On Thu, Sep 27, 2018 at 05:55:07PM +1200, Greg Ewing wrote:
> jab at math.brown.edu wrote:
> >I understand from
> >that "to always round up... can theoretically skew the data"
> *Very* theoretically. If the number is even a whisker bigger than
> 2.5 it's going to get rounded up regardless:
> >>> round(2.500000000000001)
> That difference is on the order of the error you expect from
> representing decimal fractions in binary, so I would be surprised
> if anyone can actually measure this bias in a real application.
I think you may have misunderstood the nature of the bias. It's not
about individual roundings and it definitely has nothing to do with
Any one round operation will introduce a bias. You had a number, say
2.3, and it gets rounded down to 2.0, introducing an error of -0.3. But
if you have lots of rounds, some will round up, and some will round
down, and we want the rounding errors to cancel.
The errors *almost* cancel using the naive rounding algorithm as most of
the digits pair up:
.1 rounds down, error = -0.1
.9 rounds up, error = +0.1
.2 rounds down, error = -0.2
.8 rounds up, error = +0.2
etc. If each digit is equally likely, then on average they'll cancel and
we're left with *almost* no overall error.
The problem is that while there are four digits rounding down (.1
through .4) there are FIVE which round up (.5 through .9). Two digits
don't pair up:
.0 stays unchanged, error = 0
.5 always rounds up, error = +0.5
Given that for many purposes, our data is recorded only to a fixed
number of decimal places, we're dealing with numbers like 0.5 rather
than 0.5000000001, so this can become a real issue. Every ten rounding
operations will introduce an average error of +0.05 instead of
cancelling out. Rounding introduces a small but real bias.
The most common (and, in many experts' opinion, the best default
behaviour) is Banker's Rounding, or round-to-even. All the other digits
round as per the usual rule, but .5 rounds UP half the time and DOWN the
rest of the time:
0.5, 2.5, 3.5 etc round down, error = -0.5
1.5, 3.5, 5.5 etc round up, error = +0.5
thus on average the .5 digit introduces no error and the bias goes away.
More information about the Python-Dev