[Numpy-discussion] Request for enhancement to numpy.random.shuffle
Warren Weckesser
warren.weckesser at gmail.com
Sun Oct 12 12:14:15 EDT 2014
On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:
> I created an issue on github for an enhancement
> to numpy.random.shuffle:
> https://github.com/numpy/numpy/issues/5173
> I'd like to get some feedback on the idea.
>
> Currently, `shuffle` shuffles the first dimension of an array
> in-place. For example, shuffling a 2D array shuffles the rows:
>
> In [227]: a
> Out[227]:
> array([[ 0, 1, 2],
> [ 3, 4, 5],
> [ 6, 7, 8],
> [ 9, 10, 11]])
>
> In [228]: np.random.shuffle(a)
>
> In [229]: a
> Out[229]:
> array([[ 0, 1, 2],
> [ 9, 10, 11],
> [ 3, 4, 5],
> [ 6, 7, 8]])
>
>
> To add an axis keyword, we could (in effect) apply `shuffle` to
> `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles
> the columns:
>
> In [232]: a = np.arange(15).reshape(3,5)
>
> In [233]: a
> Out[233]:
> array([[ 0, 1, 2, 3, 4],
> [ 5, 6, 7, 8, 9],
> [10, 11, 12, 13, 14]])
>
> In [234]: axis = 1
>
> In [235]: np.random.shuffle(a.swapaxes(axis, 0))
>
> In [236]: a
> Out[236]:
> array([[ 3, 2, 4, 0, 1],
> [ 8, 7, 9, 5, 6],
> [13, 12, 14, 10, 11]])
>
> So that's the first part--adding an `axis` keyword.
>
> The other part of the enhancement request is to add a shuffle
> behavior that shuffles the 1-d slices *independently*. That is,
> for a 2-d array, shuffling with `axis=0` would apply a different
> shuffle to each column. In the github issue, I defined a
> function called `disarrange` that implements this behavior:
>
> In [240]: a
> Out[240]:
> array([[ 0, 1, 2],
> [ 3, 4, 5],
> [ 6, 7, 8],
> [ 9, 10, 11],
> [12, 13, 14]])
>
> In [241]: disarrange(a, axis=0)
>
> In [242]: a
> Out[242]:
> array([[ 6, 1, 2],
> [ 3, 13, 14],
> [ 9, 10, 5],
> [12, 7, 8],
> [ 0, 4, 11]])
>
> Note that each column has been shuffled independently.
>
> This behavior is analogous to how `sort` handles the `axis`
> keyword. `sort` sorts the 1-d slices along the given axis
> independently.
>
> In the github issue, I suggested the following signature
> for `shuffle` (but I'm not too fond of the name `independent`):
>
> def shuffle(a, independent=False, axis=0)
>
> If `independent` is False, the current behavior of `shuffle`
> is used. If `independent` is True, each 1-d slice is shuffled
> independently (in the same way that `sort` sorts each 1-d
> slice).
>
> Like most functions that take an `axis` argument, `axis=None`
> means to shuffle the flattened array. With `independent=True`,
> it would act like `np.random.shuffle(a.flat)`, e.g.
>
> In [247]: a
> Out[247]:
> array([[ 0, 1, 2, 3, 4],
> [ 5, 6, 7, 8, 9],
> [10, 11, 12, 13, 14]])
>
> In [248]: np.random.shuffle(a.flat)
>
> In [249]: a
> Out[249]:
> array([[ 0, 14, 9, 1, 13],
> [ 2, 8, 5, 3, 4],
> [ 6, 10, 7, 12, 11]])
>
>
> A small wart in this API is the meaning of
>
> shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged. (The current behavior can be interpreted as
> shuffling a 1-d sequence of monolithic blobs; the axis argument
> specifies which axis of the array corresponds to the
> sequence index. Then `axis=None` means the argument is
> a single monolithic blob, so there is nothing to shuffle.)
> Or an error could be raised.
>
> What do you think?
>
> Warren
>
>
It is clear from the comments so far that, when `axis` is None, the result
should be a shuffle of all the elements in the array, for both methods of
shuffling (whether implemented as a new method or with a boolean argument
to `shuffle`). Forget I ever suggested doing nothing or raising an error.
:)
Josef's comment reminded me that `numpy.random.permutation` returns a
shuffled copy of the array (when its argument is an array). This function
should also get an `axis` argument. `permutation` shuffles the same way
`shuffle` does--it simply makes a copy and then calls `shuffle` on the
copy. If a new method is added for the new shuffling style, then it would
be consistent to also add a new method that uses the new shuffling style
and returns a copy of the shuffled array. Then we would then have four
methods:
In-place Copy
Current shuffle style shuffle permutation
New shuffle style (name TBD) (name TBD)
(All of them will have an `axis` argument.)
I suspect this will make some folks prefer the approach of adding a boolean
argument to `shuffle` and `permutation`.
Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141012/40fc08e6/attachment.html>
More information about the NumPy-Discussion
mailing list