[Numpy-discussion] Request for enhancement to numpy.random.shuffle
Warren Weckesser
warren.weckesser at gmail.com
Sat Oct 11 18:51:56 EDT 2014
I created an issue on github for an enhancement
to numpy.random.shuffle:
https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.
Currently, `shuffle` shuffles the first dimension of an array
in-place. For example, shuffling a 2D array shuffles the rows:
In [227]: a
Out[227]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [228]: np.random.shuffle(a)
In [229]: a
Out[229]:
array([[ 0, 1, 2],
[ 9, 10, 11],
[ 3, 4, 5],
[ 6, 7, 8]])
To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles
the columns:
In [232]: a = np.arange(15).reshape(3,5)
In [233]: a
Out[233]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [234]: axis = 1
In [235]: np.random.shuffle(a.swapaxes(axis, 0))
In [236]: a
Out[236]:
array([[ 3, 2, 4, 0, 1],
[ 8, 7, 9, 5, 6],
[13, 12, 14, 10, 11]])
So that's the first part--adding an `axis` keyword.
The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*. That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column. In the github issue, I defined a
function called `disarrange` that implements this behavior:
In [240]: a
Out[240]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [241]: disarrange(a, axis=0)
In [242]: a
Out[242]:
array([[ 6, 1, 2],
[ 3, 13, 14],
[ 9, 10, 5],
[12, 7, 8],
[ 0, 4, 11]])
Note that each column has been shuffled independently.
This behavior is analogous to how `sort` handles the `axis`
keyword. `sort` sorts the 1-d slices along the given axis
independently.
In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):
def shuffle(a, independent=False, axis=0)
If `independent` is False, the current behavior of `shuffle`
is used. If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).
Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array. With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.
In [247]: a
Out[247]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [248]: np.random.shuffle(a.flat)
In [249]: a
Out[249]:
array([[ 0, 14, 9, 1, 13],
[ 2, 8, 5, 3, 4],
[ 6, 10, 7, 12, 11]])
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index. Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.
What do you think?
Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141011/874bed59/attachment.html>
More information about the NumPy-Discussion
mailing list