[Numpy-discussion] Request for enhancement to numpy.random.shuffle

Nathaniel Smith njs at pobox.com
Sun Oct 12 21:13:56 EDT 2014


On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <sebix at sebix.at> wrote:
>
> On 2014-10-12 16:54, Warren Weckesser wrote:
>>
>>
>> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern at gmail.com
>> <mailto:robert.kern at gmail.com>> wrote:
>>
>>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>>     <warren.weckesser at gmail.com <mailto:warren.weckesser at gmail.com>>
>>     wrote:
>>
>>     > A small wart in this API is the meaning of
>>     >
>>     >   shuffle(a, independent=False, axis=None)
>>     >
>>     > It could be argued that the correct behavior is to leave the
>>     > array unchanged. (The current behavior can be interpreted as
>>     > shuffling a 1-d sequence of monolithic blobs; the axis argument
>>     > specifies which axis of the array corresponds to the
>>     > sequence index.  Then `axis=None` means the argument is
>>     > a single monolithic blob, so there is nothing to shuffle.)
>>     > Or an error could be raised.
>>     >
>>     > What do you think?
>>
>>     It seems to me a perfectly good reason to have two methods instead of
>>     one. I can't imagine when I wouldn't be using a literal True or False
>>     for this, so it really should be two different methods.
>>
>>
>>
>> I agree, and my first inclination was to propose a different method
>> (and I had the bikeshedding conversation with myself about the name:
>> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some
>> other variation of the word "shuffle", ...), but I figured the first
>> thing folks would say is "Why not just add options to shuffle?"  So,
>> choose your battles and all that.
>>
>> What do other folks think of making a separate method
> I'm not a fan of more methods with similar functionality in Numpy. It's
> already hard to overlook the existing functions and all their possible
> applications and variants. The axis=None proposal for shuffling all
> items is very intuitive.
>
> I think we don't want to take the path of matlab: a huge amount of
> powerful functions, but few people know of their powerful possibilities.

I totally agree with this principle, but I think this is an exception
to the rule, b/c unfortunately in this case the function that we *do*
have is weird and inconsistent with how most other functions in numpy
work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
(k,)->(k,) would work. Also, it's easy to implement the current
'shuffle' in terms of any 1d shuffle function, with no explicit loops,
Warren's disarrange requires an explicit loop. So, we really
implemented the wrong one, oops. What this means going forward,
though, is that our only options are either to implement both
behaviours with two functions, or else to give up on have the more
natural behaviour altogether. I think the former is the lesser of two
evils.

Regarding names: shuffle/permutation is a terrible naming convention
IMHO and shouldn't be propagated further. We already have a good
naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
reversed, etc.

So, how about:

scramble + scrambled shuffle individual entries within each
row/column/..., as in Warren's suggestion.

shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
these break a 2d array into a bunch of 1d "cards", and then shuffle
those cards).

permuted remains indefinitely, with the docstring: "Deprecated alias
for 'shuffled'."

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org



More information about the NumPy-Discussion mailing list