# [Numpy-discussion] Request for enhancement to numpy.random.shuffle

Warren Weckesser warren.weckesser at gmail.com
Sat Oct 11 18:51:56 EDT 2014

```I created an issue on github for an enhancement
to numpy.random.shuffle:
https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.

Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In : a
Out:
array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])

In : np.random.shuffle(a)

In : a
Out:
array([[ 0,  1,  2],
[ 9, 10, 11],
[ 3,  4,  5],
[ 6,  7,  8]])

To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In : a = np.arange(15).reshape(3,5)

In : a
Out:
array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

In : axis = 1

In : np.random.shuffle(a.swapaxes(axis, 0))

In : a
Out:
array([[ 3,  2,  4,  0,  1],
[ 8,  7,  9,  5,  6],
[13, 12, 14, 10, 11]])

So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In : a
Out:
array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])

In : disarrange(a, axis=0)

In : a
Out:
array([[ 6,  1,  2],
[ 3, 13, 14],
[ 9, 10,  5],
[12,  7,  8],
[ 0,  4, 11]])

Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In : a
Out:
array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

In : np.random.shuffle(a.flat)

In : a
Out:
array([[ 0, 14,  9,  1, 13],
[ 2,  8,  5,  3,  4],
[ 6, 10,  7, 12, 11]])

A small wart in this API is the meaning of

shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141011/874bed59/attachment.html>
```