[Python-Dev] Change in Python 3's "round" behavior
Steven D'Aprano
steve at pearwood.info
Thu Sep 27 09:53:33 EDT 2018
On Thu, Sep 27, 2018 at 05:55:07PM +1200, Greg Ewing wrote:
> jab at math.brown.edu wrote:
> >I understand from
> >https://github.com/cosmologicon/pywat/pull/40#discussion_r219962259
> >that "to always round up... can theoretically skew the data"
>
> *Very* theoretically. If the number is even a whisker bigger than
> 2.5 it's going to get rounded up regardless:
>
> >>> round(2.500000000000001)
> 3
>
> That difference is on the order of the error you expect from
> representing decimal fractions in binary, so I would be surprised
> if anyone can actually measure this bias in a real application.
I think you may have misunderstood the nature of the bias. It's not
about individual roundings and it definitely has nothing to do with
binary representation.
Any one round operation will introduce a bias. You had a number, say
2.3, and it gets rounded down to 2.0, introducing an error of -0.3. But
if you have lots of rounds, some will round up, and some will round
down, and we want the rounding errors to cancel.
The errors *almost* cancel using the naive rounding algorithm as most of
the digits pair up:
.1 rounds down, error = -0.1
.9 rounds up, error = +0.1
.2 rounds down, error = -0.2
.8 rounds up, error = +0.2
etc. If each digit is equally likely, then on average they'll cancel and
we're left with *almost* no overall error.
The problem is that while there are four digits rounding down (.1
through .4) there are FIVE which round up (.5 through .9). Two digits
don't pair up:
.0 stays unchanged, error = 0
.5 always rounds up, error = +0.5
Given that for many purposes, our data is recorded only to a fixed
number of decimal places, we're dealing with numbers like 0.5 rather
than 0.5000000001, so this can become a real issue. Every ten rounding
operations will introduce an average error of +0.05 instead of
cancelling out. Rounding introduces a small but real bias.
The most common (and, in many experts' opinion, the best default
behaviour) is Banker's Rounding, or round-to-even. All the other digits
round as per the usual rule, but .5 rounds UP half the time and DOWN the
rest of the time:
0.5, 2.5, 3.5 etc round down, error = -0.5
1.5, 3.5, 5.5 etc round up, error = +0.5
thus on average the .5 digit introduces no error and the bias goes away.
--
Steve
More information about the Python-Dev
mailing list