1. The cognitive leap between shuffling and sampling isn't small
I don't think so, actually the Fisher-Yates shuffle algorithm algorithm is an iterative sampling algorithm: https://gist.github.com/danilobellini/6384872
 
2. "shuffled" would be a more logical name for an out-of-place shuffle
than "sample"
Agreed.
 
3. "Copy the whole container" would be a surprising default for a
function called "sample"
Perhaps "sample the whole population" sounds strange. Indeed, statisticians probably wouldn't be happy with that. That reminds me the variance estimator with the extra unbiasing "-1" in the denominator versus the population variance.
 
4. With a default, random.sample becomes more easily confused with random.choice
I don't think k=1 would be a good default sample size from a statistics point of view, but I get the point (I'm from a DSP background, where "a sample" means one single "value").

Controling the random function is required for the function to be really pure, else its output won't depend only on the inputs (and there would be some "state" in that "implicit input"). That would also be a great feature when non-uniform (or external) random number generators are to be used. This seem to be something that only shuffle gives some control (among the functions we're talking about), or am I missing something?


2016-09-08 1:26 GMT-03:00 Nick Coghlan <ncoghlan@gmail.com>:
On 8 September 2016 at 13:33, Danilo J. S. Bellini
<danilo.bellini@gmail.com> wrote:
> Nice to know about random.sample! =)
>
> I think what OP said can then be reduced to having the default k in
> random.sample to be the iterable size. The existance of random.sample is a
> very strong argument against "shuffled", and the only "feature" shuffled
> would have that random.sample doesn't have is that default size.

There are a few reasons I don't think defining a default for
random.sample() would be a good answer to Arek's question:

1. The cognitive leap between shuffling and sampling isn't small
2. "shuffled" would be a more logical name for an out-of-place shuffle
than "sample"
3. "Copy the whole container" would be a surprising default for a
function called "sample"
4. With a default, random.sample becomes more easily confused with random.choice

For the last two concerns, if I saw "result =
random.sample(container)" and didn't already know about
random.choice(), I'd expect it to behave like random.choice(). Even
knowing they're different, I'd still need to do a double-take to make
sure I was remembering which was which correctly. By contrast,
"random.sample(container, 1)", "random.sample(container, k)",
"random.sample(container, len(container))" are all clearly different
from the single result primitive "random.choice(container)"

One interesting (to me anyway) aspect of an out-of-place shuffle is
that you can readily implement it based on *either* of the more
primitive operations (in-place shuffle or random sample):

    def shuffled(container):
        result = list(container)
        random.shuffle(result)
        return result

    def shuffled(container):
        return random.sample(container, len(container))

Writing down those two examples does highlight a potential refactoring
benefit to having "out-of-place shuffle" as an explicitly represented
concept in the core Random API: it can be defined in terms of shuffle
or sample *on the same Random instance*, and hence automatically
benefit when code switches from using the global random number
generator to using a purpose-specific Random instance that allows
greater control over the reproducibility of results (which can be very
important for testing, games, modeling & simulation).

The above helper functions are both flawed on that front: they
hardcode the use of the default global random number generator. To
permit use of a specific random instance, they need to be changed to:

    def shuffled(container, random=random):
        result = list(container)
        random.shuffle(result)
        return result

    def shuffled(container, random=random):
        return random.sample(container, len(container))

and then used as "result = shuffled(original, my_random_instance)" if
you decide to switch away from the global API.

By contrast, a method based implementation could be refactored the
exact same way as any other random.Random method:

    result = my_random_instance.shuffled(original)

I'll reiterate that I don't have a use case for this myself, but I'll
cite the key arguments I see in favour:

- an out-of-place shuffle may be useful as a bridging concept between
in place shuffling and out of place sampling (in either direction)
- the presence of "shuffled" becomes a reminder that "shuffle" itself
is an in-place operation
- "write it yourself" isn't as simple as it first sounds due to the
common migration away from the random module functions to a custom
Random instance

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia



--
Danilo J. S. Bellini
---------------
"It is not our business to set up prohibitions, but to arrive at conventions." (R. Carnap)