[Python-ideas] random.choice on non-sequence
Steven D'Aprano
steve at pearwood.info
Wed Apr 13 21:30:38 EDT 2016
On Thu, Apr 14, 2016 at 09:47:37AM +1200, Greg Ewing wrote:
> Chris Angelico wrote:
> >On Wed, Apr 13, 2016 at 8:36 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> >
> >>On 4/13/2016 12:52 AM, Chris Barker - NOAA Federal wrote:
> >>
> >>>BTW, isn't it impossible to randomly select from an infinite iterable
> >>>anyway?
> >>
> >>With equal probability, yes, impossible.
[...]
> I think Terry meant that you can't pick just one item that's
> equally likely to be any of the infinitely many items returned
> by the iterator.
Correct. That's equivalent to chosing a positive integer with uniform
probability distribution and no upper bound.
> You can prove that by considering that the probability of
> a given item being returned would have to be 1/infinity,
> which is zero -- so you can't return anything!
That's not how probability works :-)
Consider a dart which is thrown at a dartboard. The probability of it
landing on any specific point is zero, since the area of a single point
is zero. Nevertheless, the dart does hit somewhere!
A formal and precise treatment would have to involve calculus and limits
as the probability approaches zero, rather than a flat out "the
probability is zero, therefore it's impossible".
Slightly less formally, we can say (only horrifying mathematicians a
little bit) that the probability of any specific number is an
infinitesimal number.
https://en.wikipedia.org/wiki/Infinitesimal
While it is *mathematically* meaningful to talk about selecting a random
positive integer uniformly, its hard to do much more than that. The mean
(average) is undefined[1]. A typical value chosen would have a vast
number of digits, far larger than anything that could be stored in
computer memory. Indeed Almost All[2] of the values we generate would be
so large that we have no notation for writing it down (and not enough
space in the universe to write it even if we did). So it is impossible
in practice to select a random integer with uniform distribution and no
upper bound.
Non-uniform distributions, though, are easy :-)
[1] While weird, this is not all that weird. For example, selecting
numbers from a Cauchy distribution also has an undefined mean. What this
means in practice is that the *sample mean* will not converge as you
take more and more samples: the more samples you take, the more wildly
the average will jump all over the place.
https://en.wikipedia.org/wiki/Cauchy_distribution#Estimation_of_parameters
[2] https://en.wikipedia.org/wiki/Almost_all
--
Steve
More information about the Python-ideas
mailing list