Sorting NaNs
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Thu Jun 7 00:14:56 EDT 2018
On Sat, 02 Jun 2018 21:02:14 +1000, Chris Angelico wrote:
> Point of curiosity: Why "> 0.5"?
No particular reason, I just happened to hit that key and then copied and
pasted the line into the next one.
> Normally when I want a fractional
> chance, I write the comparison the other way: "random.random() < 0.5"
> has exactly a 50% chance of occurring (presuming that random.random()
> follows its correct documented distribution). I've no idea what the
> probability of random.random() returning exactly 0.5 is
Neither do I. But I expect that "exactly 50% chance" is only
approximately true :-)
My understanding is that given the assumption of uniformity, the 50%
chance is mathematically true, but in that case, it makes no difference
whether you go from 0 to 0.5 or 0.5 to 1.0. Mathematically it makes no
difference whether you include or exclude the end points. In the Real
numbers, there's an uncountable infinity of points either way.
*But* once you move to actual floats, that's no longer true. There are a
great many more floats between 0 and 0.5 than between 0.5 and 1.0.
Including the end points, if we enumerate the floats we get:
0.0 --> 0
0.5 --> 4602678819172646912
1.0 --> 4607182418800017408
so clearly the results of random.random() cannot be uniformly distributed
over the individual floats. If they were, the probably of getting
something less than or equal to 0.5 would be 4602678819172646912 /
4607182418800017407 or a little more than 99.9%.
So given that our mathematically pure(ish) probability of 0.5 for the
reals has to be mapped in some way to a finite number of floats, I
wouldn't want to categorically say that that the probability remains
*precisely* one half. But if it were (let's say) 1 ULP greater or less
than one half, would we even know?
0.5 - 1 ULP = 0.49999999999999994
0.5 + 1 ULP = 0.5000000000000001
I would love to see the statistical experiment that could distinguish
those two probabilities from exactly 1/2 to even a 90% confidence
level :-)
> but since it
> can return 0.0 and cannot return 1.0, I've just always used less-than.
> (Also because it works nicely with other values - there's a 30% chance
> that random.random() is less than 0.3, etc.) Is there a reason for going
> greater-than, or is it simply that it doesn't matter?
No, no particular reason. If I had thought about it I would have used <
too, but I didn't.
--
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list
mailing list