[Python-ideas] random.choice on non-sequence

Steven D'Aprano steve at pearwood.info
Wed Apr 13 21:30:38 EDT 2016


On Thu, Apr 14, 2016 at 09:47:37AM +1200, Greg Ewing wrote:
> Chris Angelico wrote:
> >On Wed, Apr 13, 2016 at 8:36 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> >
> >>On 4/13/2016 12:52 AM, Chris Barker - NOAA Federal wrote:
> >>
> >>>BTW, isn't it impossible to randomly select from an infinite iterable
> >>>anyway?
> >>
> >>With equal probability, yes, impossible.
[...]

> I think Terry meant that you can't pick just one item that's
> equally likely to be any of the infinitely many items returned
> by the iterator.

Correct. That's equivalent to chosing a positive integer with uniform 
probability distribution and no upper bound.

 
> You can prove that by considering that the probability of
> a given item being returned would have to be 1/infinity,
> which is zero -- so you can't return anything!

That's not how probability works :-)

Consider a dart which is thrown at a dartboard. The probability of it 
landing on any specific point is zero, since the area of a single point 
is zero. Nevertheless, the dart does hit somewhere!

A formal and precise treatment would have to involve calculus and limits 
as the probability approaches zero, rather than a flat out "the 
probability is zero, therefore it's impossible".

Slightly less formally, we can say (only horrifying mathematicians a 
little bit) that the probability of any specific number is an 
infinitesimal number.

https://en.wikipedia.org/wiki/Infinitesimal


While it is *mathematically* meaningful to talk about selecting a random 
positive integer uniformly, its hard to do much more than that. The mean 
(average) is undefined[1]. A typical value chosen would have a vast 
number of digits, far larger than anything that could be stored in 
computer memory. Indeed Almost All[2] of the values we generate would be 
so large that we have no notation for writing it down (and not enough 
space in the universe to write it even if we did). So it is impossible 
in practice to select a random integer with uniform distribution and no 
upper bound.

Non-uniform distributions, though, are easy :-)





[1] While weird, this is not all that weird. For example, selecting 
numbers from a Cauchy distribution also has an undefined mean. What this 
means in practice is that the *sample mean* will not converge as you 
take more and more samples: the more samples you take, the more wildly 
the average will jump all over the place.

https://en.wikipedia.org/wiki/Cauchy_distribution#Estimation_of_parameters

[2] https://en.wikipedia.org/wiki/Almost_all



-- 
Steve


More information about the Python-ideas mailing list