[Numpy-discussion] Request for enhancement to numpy.random.shuffle

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Oct 12 12:29:13 EDT 2014


On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser
<warren.weckesser at gmail.com> wrote:
>
>
> On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser
> <warren.weckesser at gmail.com> wrote:
>>
>> I created an issue on github for an enhancement
>> to numpy.random.shuffle:
>>     https://github.com/numpy/numpy/issues/5173
>> I'd like to get some feedback on the idea.
>>
>> Currently, `shuffle` shuffles the first dimension of an array
>> in-place.  For example, shuffling a 2D array shuffles the rows:
>>
>> In [227]: a
>> Out[227]:
>> array([[ 0,  1,  2],
>>        [ 3,  4,  5],
>>        [ 6,  7,  8],
>>        [ 9, 10, 11]])
>>
>> In [228]: np.random.shuffle(a)
>>
>> In [229]: a
>> Out[229]:
>> array([[ 0,  1,  2],
>>        [ 9, 10, 11],
>>        [ 3,  4,  5],
>>        [ 6,  7,  8]])
>>
>>
>> To add an axis keyword, we could (in effect) apply `shuffle` to
>> `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
>> the columns:
>>
>> In [232]: a = np.arange(15).reshape(3,5)
>>
>> In [233]: a
>> Out[233]:
>> array([[ 0,  1,  2,  3,  4],
>>        [ 5,  6,  7,  8,  9],
>>        [10, 11, 12, 13, 14]])
>>
>> In [234]: axis = 1
>>
>> In [235]: np.random.shuffle(a.swapaxes(axis, 0))
>>
>> In [236]: a
>> Out[236]:
>> array([[ 3,  2,  4,  0,  1],
>>        [ 8,  7,  9,  5,  6],
>>        [13, 12, 14, 10, 11]])
>>
>> So that's the first part--adding an `axis` keyword.
>>
>> The other part of the enhancement request is to add a shuffle
>> behavior that shuffles the 1-d slices *independently*.  That is,
>> for a 2-d array, shuffling with `axis=0` would apply a different
>> shuffle to each column.  In the github issue, I defined a
>> function called `disarrange` that implements this behavior:
>>
>> In [240]: a
>> Out[240]:
>> array([[ 0,  1,  2],
>>        [ 3,  4,  5],
>>        [ 6,  7,  8],
>>        [ 9, 10, 11],
>>        [12, 13, 14]])
>>
>> In [241]: disarrange(a, axis=0)
>>
>> In [242]: a
>> Out[242]:
>> array([[ 6,  1,  2],
>>        [ 3, 13, 14],
>>        [ 9, 10,  5],
>>        [12,  7,  8],
>>        [ 0,  4, 11]])
>>
>> Note that each column has been shuffled independently.
>>
>> This behavior is analogous to how `sort` handles the `axis`
>> keyword.  `sort` sorts the 1-d slices along the given axis
>> independently.
>>
>> In the github issue, I suggested the following signature
>> for `shuffle` (but I'm not too fond of the name `independent`):
>>
>>   def shuffle(a, independent=False, axis=0)
>>
>> If `independent` is False, the current behavior of `shuffle`
>> is used.  If `independent` is True, each 1-d slice is shuffled
>> independently (in the same way that `sort` sorts each 1-d
>> slice).
>>
>> Like most functions that take an `axis` argument, `axis=None`
>> means to shuffle the flattened array.  With `independent=True`,
>> it would act like `np.random.shuffle(a.flat)`, e.g.
>>
>> In [247]: a
>> Out[247]:
>> array([[ 0,  1,  2,  3,  4],
>>        [ 5,  6,  7,  8,  9],
>>        [10, 11, 12, 13, 14]])
>>
>> In [248]: np.random.shuffle(a.flat)
>>
>> In [249]: a
>> Out[249]:
>> array([[ 0, 14,  9,  1, 13],
>>        [ 2,  8,  5,  3,  4],
>>        [ 6, 10,  7, 12, 11]])
>>
>>
>> A small wart in this API is the meaning of
>>
>>   shuffle(a, independent=False, axis=None)
>>
>> It could be argued that the correct behavior is to leave the
>> array unchanged. (The current behavior can be interpreted as
>> shuffling a 1-d sequence of monolithic blobs; the axis argument
>> specifies which axis of the array corresponds to the
>> sequence index.  Then `axis=None` means the argument is
>> a single monolithic blob, so there is nothing to shuffle.)
>> Or an error could be raised.
>>
>> What do you think?
>>
>> Warren
>>
>
>
>
> It is clear from the comments so far that, when `axis` is None, the result
> should be a shuffle of all the elements in the array, for both methods of
> shuffling (whether implemented as a new method or with a boolean argument to
> `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)
>
> Josef's comment reminded me that `numpy.random.permutation`

which kind of proofs my point

I sometimes have problems finding `shuffle` because I want a function
that does permutation.

Josef

returns a
> shuffled copy of the array (when its argument is an array).  This function
> should also get an `axis` argument.  `permutation` shuffles the same way
> `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.
> If a new method is added for the new shuffling style, then it would be
> consistent to also add a new method that uses the new shuffling style and
> returns a copy of the shuffled array.   Then we would then have four
> methods:
>
>                        In-place    Copy
> Current shuffle style  shuffle     permutation
> New shuffle style      (name TBD)  (name TBD)
>
> (All of them will have an `axis` argument.)
>
> I suspect this will make some folks prefer the approach of adding a boolean
> argument to `shuffle` and `permutation`.
>
> Warren
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list