[Python-ideas] Shuffled

Thu Sep 8 01:05:17 EDT 2016

>
> 1. The cognitive leap between shuffling and sampling isn't small
>
I don't think so, actually the Fisher-Yates shuffle algorithm algorithm is
an iterative sampling algorithm: https://gist.github.com/danilo
bellini/6384872

> 2. "shuffled" would be a more logical name for an out-of-place shuffle
> than "sample"
>
Agreed.

> 3. "Copy the whole container" would be a surprising default for a
> function called "sample"
>
Perhaps "sample the whole population" sounds strange. Indeed, statisticians
probably wouldn't be happy with that. That reminds me the variance
estimator with the extra unbiasing "-1" in the denominator versus the
population variance.

> 4. With a default, random.sample becomes more easily confused with
> random.choice
>
I don't think k=1 would be a good default sample size from a statistics
point of view, but I get the point (I'm from a DSP background, where "a
sample" means one single "value").

Controling the random function is required for the function to be really
pure, else its output won't depend only on the inputs (and there would be
some "state" in that "implicit input"). That would also be a great feature
when non-uniform (or external) random number generators are to be used.
This seem to be something that only shuffle gives some control (among the
functions we're talking about), or am I missing something?

2016-09-08 1:26 GMT-03:00 Nick Coghlan <ncoghlan at gmail.com>:

> On 8 September 2016 at 13:33, Danilo J. S. Bellini
> <danilo.bellini at gmail.com> wrote:
> > Nice to know about random.sample! =)
> >
> > I think what OP said can then be reduced to having the default k in
> > random.sample to be the iterable size. The existance of random.sample is
> a
> > very strong argument against "shuffled", and the only "feature" shuffled
> > would have that random.sample doesn't have is that default size.
>
> There are a few reasons I don't think defining a default for
> random.sample() would be a good answer to Arek's question:
>
> 1. The cognitive leap between shuffling and sampling isn't small
> 2. "shuffled" would be a more logical name for an out-of-place shuffle
> than "sample"
> 3. "Copy the whole container" would be a surprising default for a
> function called "sample"
> 4. With a default, random.sample becomes more easily confused with
> random.choice
>
> For the last two concerns, if I saw "result =
> random.sample(container)" and didn't already know about
> random.choice(), I'd expect it to behave like random.choice(). Even
> knowing they're different, I'd still need to do a double-take to make
> sure I was remembering which was which correctly. By contrast,
> "random.sample(container, 1)", "random.sample(container, k)",
> "random.sample(container, len(container))" are all clearly different
> from the single result primitive "random.choice(container)"
>
> One interesting (to me anyway) aspect of an out-of-place shuffle is
> that you can readily implement it based on *either* of the more
> primitive operations (in-place shuffle or random sample):
>
>     def shuffled(container):
>         result = list(container)
>         random.shuffle(result)
>         return result
>
>     def shuffled(container):
>         return random.sample(container, len(container))
>
> Writing down those two examples does highlight a potential refactoring
> benefit to having "out-of-place shuffle" as an explicitly represented
> concept in the core Random API: it can be defined in terms of shuffle
> or sample *on the same Random instance*, and hence automatically
> benefit when code switches from using the global random number
> generator to using a purpose-specific Random instance that allows
> greater control over the reproducibility of results (which can be very
> important for testing, games, modeling & simulation).
>
> The above helper functions are both flawed on that front: they
> hardcode the use of the default global random number generator. To
> permit use of a specific random instance, they need to be changed to:
>
>     def shuffled(container, random=random):
>         result = list(container)
>         random.shuffle(result)
>         return result
>
>     def shuffled(container, random=random):
>         return random.sample(container, len(container))
>
> and then used as "result = shuffled(original, my_random_instance)" if
> you decide to switch away from the global API.
>
> By contrast, a method based implementation could be refactored the
> exact same way as any other random.Random method:
>
>     result = my_random_instance.shuffled(original)
>
> I'll reiterate that I don't have a use case for this myself, but I'll
> cite the key arguments I see in favour:
>
> - an out-of-place shuffle may be useful as a bridging concept between
> in place shuffling and out of place sampling (in either direction)
> - the presence of "shuffled" becomes a reminder that "shuffle" itself
> is an in-place operation
> - "write it yourself" isn't as simple as it first sounds due to the
> common migration away from the random module functions to a custom
> Random instance
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>

-- 
Danilo J. S. Bellini
---------------
"*It is not our business to set up prohibitions, but to arrive at
conventions.*" (R. Carnap)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160908/bebcebb1/attachment.html>