[Numpy-discussion] Request for enhancement to numpy.random.shuffle

Nathaniel Smith njs at pobox.com
Thu Oct 16 22:50:48 EDT 2014

On Fri, Oct 17, 2014 at 2:35 AM,  <josef.pktd at gmail.com> wrote:
> On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
>> <warren.weckesser at gmail.com> wrote:
>>> On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
>>>> <warren.weckesser at gmail.com> wrote:
>>>> >
>>>> > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> >>
>>>> >> Regarding names: shuffle/permutation is a terrible naming convention
>>>> >> IMHO and shouldn't be propagated further. We already have a good
>>>> >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
>>>> >> reversed, etc.
>>>> >>
>>>> >> So, how about:
>>>> >>
>>>> >> scramble + scrambled shuffle individual entries within each
>>>> >> row/column/..., as in Warren's suggestion.
>>>> >>
>>>> >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
>>>> >> these break a 2d array into a bunch of 1d "cards", and then shuffle
>>>> >> those cards).
>>>> >>
>>>> >> permuted remains indefinitely, with the docstring: "Deprecated alias
>>>> >> for 'shuffled'."
>>>> >
>>>> > That sounds good to me.  (I might go with 'randomize' instead of
>>>> > 'scramble',
>>>> > but that's a second-order decision for the API.)
>>>> I hesitate to use names like "randomize" because they're less
>>>> informative than they feel seem -- if asked what this operation does
>>>> to an array, then it would be natural to say "it randomizes the
>>>> array". But if told that the random module has a function called
>>>> randomize, then that's not very informative -- everything in random
>>>> randomizes something somehow.
>>> I had some similar concerns (hence my original "disarrange"), but
>>> "randomize" seemed more likely to be found when searching or browsing the
>>> docs, and while it might be a bit too generic-sounding, it does feel like a
>>> natural verb for the process.   On the other hand, "permute" and "permuted"
>>> are even more natural and unambiguous.  Any objections to those?  (The
>>> existing function is "permutation".)
>> [...]
>>> By the way, "permutation" has a feature not yet mentioned here: if the
>>> argument is an integer 'n', it generates a permutation of arange(n).  In
>>> this case, it acts like matlab's "randperm" function.  Unless we replicate
>>> that in the new function, we shouldn't deprecate "permutation".
>> I guess we could do something like:
>> permutation(n):
>> Return a random permutation on n items. Equivalent to permuted(arange(n)).
>> Note: for backwards compatibility, a call like permutation(an_array)
>> currently returns the same as shuffled(an_array). (This is *not*
>> equivalent to permuted(an_array).) This functionality is deprecated.
>> OTOH "np.random.permute" as a name does have a downside: someday we'll
>> probably add a function called "np.permute" (for applying a given
>> permutation in place -- the O(n) algorithm for this is useful and
>> tricky), and having two functions with the same name and very
>> different semantics would be pretty confusing.
> I like `permute`. That's the one term I'm looking for first.
> If np.permute does some kind of deterministic permutation or pivoting,
> then I wouldn't find it confusing if np.random.permute does "random"
> permutation.

Yeah, but:

from ... import permute
# 500 lines later
def foo(...):
    permute(...)  # what the heck is this

It definitely *can* be confusing; basically everything else in
np.random has a name that suggests randomness even without seeing the
full path.

It's not a huge deal, though.

> (I definitely don't like scrambled, sounds like eggs or cable TV that
> needs to be unscrambled.)

I vote that in this kind of bikeshed we try to restrict ourselves to
arguments that we can at least pretend are motivated by some
technical/UX concern ;-). (I guess unscrambling eggs would be
technically impressive tho ;-))

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh

More information about the NumPy-Discussion mailing list