[Python-ideas] Shuffled

Thu Sep 8 04:02:25 EDT 2016

On 8 September 2016 at 15:05, Danilo J. S. Bellini
<danilo.bellini at gmail.com> wrote:
>> 1. The cognitive leap between shuffling and sampling isn't small
>
> I don't think so, actually the Fisher-Yates shuffle algorithm algorithm is
> an iterative sampling algorithm:
> https://gist.github.com/danilobellini/6384872

I'm not talking about mathematical equivalence, I'm talking about
making an unaided leap from "shuffle this deck of cards"
(random.shuffle) and "pick a card, any card" (random.choice) to
"choose a random sample from this population" (random.sample). I can
see people following that logic given suitable instruction (since they
really are closely related operations), but it's a tough connection to
see on your own.

>> 4. With a default, random.sample becomes more easily confused with
>> random.choice
>
> I don't think k=1 would be a good default sample size from a statistics
> point of view, but I get the point (I'm from a DSP background, where "a
> sample" means one single "value").

Likewise - it isn't that I think "1" would be a reasonable default,
it's that without the second argument being there, I lapse back into
DSP terminology rather than statistical terminology.

Requiring the second argument as random.sample() does today keeps
everything nicely unambiguous.

> Controling the random function is required for the function to be really
> pure, else its output won't depend only on the inputs (and there would be
> some "state" in that "implicit input"). That would also be a great feature
> when non-uniform (or external) random number generators are to be used. This
> seem to be something that only shuffle gives some control (among the
> functions we're talking about), or am I missing something?

The module level "functions" in random are just bound methods for a
default global random.Random() instance, so they're not truly pure -
there's interdependence there via the shared PRNG state.

However, by creating your *own* Random instance, or another object
that provides the same API, you can get a lot more control over
things, including reproducible behaviour for a given seed.

I'm not familiar with how shuffle works internally, but presumably
passing a non-uniform distribution is a way let you bias the shuffle
(the docs don't actually explain *why* you'd want to use a randomiser
other than the default).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia