Request for enhancement to numpy.random.shuffle

I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173 I'd like to get some feedback on the idea. Currently, `shuffle` shuffles the first dimension of an array in-place. For example, shuffling a 2D array shuffles the rows: In [227]: a Out[227]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [228]: np.random.shuffle(a) In [229]: a Out[229]: array([[ 0, 1, 2], [ 9, 10, 11], [ 3, 4, 5], [ 6, 7, 8]]) To add an axis keyword, we could (in effect) apply `shuffle` to `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles the columns: In [232]: a = np.arange(15).reshape(3,5) In [233]: a Out[233]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [234]: axis = 1 In [235]: np.random.shuffle(a.swapaxes(axis, 0)) In [236]: a Out[236]: array([[ 3, 2, 4, 0, 1], [ 8, 7, 9, 5, 6], [13, 12, 14, 10, 11]]) So that's the first part--adding an `axis` keyword. The other part of the enhancement request is to add a shuffle behavior that shuffles the 1-d slices *independently*. That is, for a 2-d array, shuffling with `axis=0` would apply a different shuffle to each column. In the github issue, I defined a function called `disarrange` that implements this behavior: In [240]: a Out[240]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]]) In [241]: disarrange(a, axis=0) In [242]: a Out[242]: array([[ 6, 1, 2], [ 3, 13, 14], [ 9, 10, 5], [12, 7, 8], [ 0, 4, 11]]) Note that each column has been shuffled independently. This behavior is analogous to how `sort` handles the `axis` keyword. `sort` sorts the 1-d slices along the given axis independently. In the github issue, I suggested the following signature for `shuffle` (but I'm not too fond of the name `independent`): def shuffle(a, independent=False, axis=0) If `independent` is False, the current behavior of `shuffle` is used. If `independent` is True, each 1-d slice is shuffled independently (in the same way that `sort` sorts each 1-d slice). Like most functions that take an `axis` argument, `axis=None` means to shuffle the flattened array. With `independent=True`, it would act like `np.random.shuffle(a.flat)`, e.g. In [247]: a Out[247]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [248]: np.random.shuffle(a.flat) In [249]: a Out[249]: array([[ 0, 14, 9, 1, 13], [ 2, 8, 5, 3, 4], [ 6, 10, 7, 12, 11]]) A small wart in this API is the meaning of shuffle(a, independent=False, axis=None) It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised. What do you think? Warren

On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173
I like this idea. I was a bit surprised there wasn't something like this already.
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
Let's think about it from the other direction: if a user wants to shuffle all the elements as if it were 1-d, as you point out they could do this: shuffle(a, axis=None, independent=True) But that's a lot of typing. Maybe we should just let this do the same thing: shuffle(a, axis=None) That seems to be in keeping with the other APIs taking axis as you mentioned. To me, "independent" has no relevance when the array is 1-d, it can simply be ignored. John Zwinck

Thanks Warren, I think these are sensible additions. I would argue to treat the None-False condition as an error. Indeed I agree one might argue the correcr behavior is to 'shuffle' the singleton block of data, which does nothing; but its more likely to come up as an unintended error than as a natural outcome of parametrized behavior. On Sun, Oct 12, 2014 at 3:31 AM, John Zwinck <jzwinck@gmail.com> wrote:
On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173
I like this idea. I was a bit surprised there wasn't something like this already.
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
Let's think about it from the other direction: if a user wants to shuffle all the elements as if it were 1-d, as you point out they could do this:
shuffle(a, axis=None, independent=True)
But that's a lot of typing. Maybe we should just let this do the same thing:
shuffle(a, axis=None)
That seems to be in keeping with the other APIs taking axis as you mentioned. To me, "independent" has no relevance when the array is 1-d, it can simply be ignored.
John Zwinck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Oct 12, 2014 at 3:51 PM, Eelco Hoogendoorn <hoogendoorn.eelco@gmail.com> wrote:
I would argue to treat the None-False condition as an error. Indeed I agree one might argue the correcr behavior is to 'shuffle' the singleton block of data, which does nothing; but its more likely to come up as an unintended error than as a natural outcome of parametrized behavior.
I'm interested to know why you think axis=None should raise an error if independent=False when independent=False is the default. What I mean is, if someone uses this function and wants axis=None (which seems not totally unusual), why force them to always type in the boilerplate independent=True to make it work? John Zwinck

Hi Warren On 2014-10-12 00:51:56, Warren Weckesser <warren.weckesser@gmail.com> wrote:
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged.
I like the suggested changes. Since "independent" loses its meaning when axis is None, I would expect this to have the same effect as `shuffle(a, independent=True, axis=None)`. I think a shuffle function that doesn't shuffle will confuse a lot of people! Stéfan

yeah, a shuffle function that does not shuffle indeed seems like a major source of bugs to me. Indeed one could argue that setting axis=None should suffice to give a clear enough declaration of intent; though I wouldn't mind typing the extra bit to ensure consistent semantics. On Sun, Oct 12, 2014 at 10:56 AM, Stefan van der Walt <stefan@sun.ac.za> wrote:
Hi Warren
On 2014-10-12 00:51:56, Warren Weckesser <warren.weckesser@gmail.com> wrote:
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged.
I like the suggested changes. Since "independent" loses its meaning when axis is None, I would expect this to have the same effect as `shuffle(a, independent=True, axis=None)`. I think a shuffle function that doesn't shuffle will confuse a lot of people!
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods. That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods. -- Robert Kern

On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that. What do other folks think of making a separate method?
That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods.
Sounds good to me. Warren
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that.
What do other folks think of making a separate method?
I'm not a fan of many similar functions. What's the difference between permute, shuffle and scramble? And how do I find or remember which is which?
That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods.
Sounds good to me.
+1 (since all the arguments have been already given Josef - Why does sort treat columns independently instead of sorting rows? - because there is lexsort - Oh, lexsort, I haven thought about it in 5 years. It's not even next to sort in the pop up code completion
Warren
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Oct 12, 2014 at 11:20 AM, <josef.pktd@gmail.com> wrote:
On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com>
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of
wrote: the
word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that.
What do other folks think of making a separate method?
I'm not a fan of many similar functions.
What's the difference between permute, shuffle and scramble?
The difference between `shuffle` and the new method being proposed is explained in the first email in this thread. `np.random.permutation` with an array argument returns a shuffled copy of the array; it does not modify its argument. (It should also get an `axis` argument when `shuffle` gets an `axis` argument.) And how do I find or remember which is which?
You could start with `doc(np.random)` (or `np.random?` in ipython). Warren
That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods.
Sounds good to me.
+1 (since all the arguments have been already given
Josef - Why does sort treat columns independently instead of sorting rows? - because there is lexsort - Oh, lexsort, I haven thought about it in 5 years. It's not even next to sort in the pop up code completion
Warren
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Oct 12, 2014 at 11:33 AM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 11:20 AM, <josef.pktd@gmail.com> wrote:
On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that.
What do other folks think of making a separate method?
I'm not a fan of many similar functions.
What's the difference between permute, shuffle and scramble?
The difference between `shuffle` and the new method being proposed is explained in the first email in this thread. `np.random.permutation` with an array argument returns a shuffled copy of the array; it does not modify its argument. (It should also get an `axis` argument when `shuffle` gets an `axis` argument.)
And how do I find or remember which is which?
You could start with `doc(np.random)` (or `np.random?` in ipython).
If you have to check the docstring each time, then there is something wrong. In my opinion all docstrings should be read only once. It's like a Windows program where the GUI menus are not **self-explanatory**. What did Save-As do ? Josef
Warren
That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods.
Sounds good to me.
+1 (since all the arguments have been already given
Josef - Why does sort treat columns independently instead of sorting rows? - because there is lexsort - Oh, lexsort, I haven thought about it in 5 years. It's not even next to sort in the pop up code completion
Warren
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 2014-10-12 16:54, Warren Weckesser wrote:
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com <mailto:warren.weckesser@gmail.com>> wrote:
> A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think?
It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that.
What do other folks think of making a separate method
I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive. I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities. regards, Sebastian

On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <sebix@sebix.at> wrote:
On 2014-10-12 16:54, Warren Weckesser wrote:
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com <mailto:warren.weckesser@gmail.com>> wrote:
> A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think?
It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that.
What do other folks think of making a separate method
I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive.
I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities.
I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)->(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils. Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards). permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'." -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <sebix@sebix.at> wrote:
On 2014-10-12 16:54, Warren Weckesser wrote:
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com <mailto:warren.weckesser@gmail.com>> wrote:
> A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think?
It seems to me a perfectly good reason to have two methods instead
of
one. I can't imagine when I wouldn't be using a literal True or
False
for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that.
What do other folks think of making a separate method
I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive.
I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities.
I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)->(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils.
Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc.
So, how about:
scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion.
shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards).
permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) Warren -n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Oct 16, 2014 at 8:39 AM, Warren Weckesser < warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <sebix@sebix.at> wrote:
On 2014-10-12 16:54, Warren Weckesser wrote:
On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <warren.weckesser@gmail.com <mailto:warren.weckesser@gmail.com>> wrote:
> A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think?
It seems to me a perfectly good reason to have two methods instead
of
one. I can't imagine when I wouldn't be using a literal True or
False
for this, so it really should be two different methods.
I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that.
What do other folks think of making a separate method
I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive.
I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities.
I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)->(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils.
Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc.
So, how about:
scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion.
shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards).
permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)
So the only little detail left is someone actually rolling up his/her sleeves and creating a PR... ;-) The current shuffle and permutation are implemented here: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L4... It's in Cython, so it is a good candidate for anyone wanting to contribute to numpy, but wary of C code. Jaime
Warren
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc.
So, how about:
scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion.
shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards).
permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)
I hesitate to use names like "randomize" because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say "it randomizes the array". But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc.
So, how about:
scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion.
shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards).
permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of
'scramble',
but that's a second-order decision for the API.)
I hesitate to use names like "randomize" because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say "it randomizes the array". But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow.
I had some similar concerns (hence my original "disarrange"), but "randomize" seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, "permute" and "permuted" are even more natural and unambiguous. Any objections to those? (The existing function is "permutation".) Whatever the names, the docstrings for the four functions should be cross-referenced in their "See Also" sections to help users find the appropriate function. By the way, "permutation" has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's "randperm" function. Unless we replicate that in the new function, we shouldn't deprecate "permutation". Warren
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc.
So, how about:
scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion.
shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards).
permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)
I hesitate to use names like "randomize" because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say "it randomizes the array". But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow.
I had some similar concerns (hence my original "disarrange"), but "randomize" seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, "permute" and "permuted" are even more natural and unambiguous. Any objections to those? (The existing function is "permutation".)
[...]
By the way, "permutation" has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's "randperm" function. Unless we replicate that in the new function, we shouldn't deprecate "permutation".
I guess we could do something like: permutation(n): Return a random permutation on n items. Equivalent to permuted(arange(n)). Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated. OTOH "np.random.permute" as a name does have a downside: someday we'll probably add a function called "np.permute" (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc.
So, how about:
scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion.
shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards).
permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)
I hesitate to use names like "randomize" because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say "it randomizes the array". But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow.
I had some similar concerns (hence my original "disarrange"), but "randomize" seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, "permute" and "permuted" are even more natural and unambiguous. Any objections to those? (The existing function is "permutation".)
[...]
By the way, "permutation" has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's "randperm" function. Unless we replicate that in the new function, we shouldn't deprecate "permutation".
I guess we could do something like:
permutation(n):
Return a random permutation on n items. Equivalent to permuted(arange(n)).
Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated.
OTOH "np.random.permute" as a name does have a downside: someday we'll probably add a function called "np.permute" (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing.
I like `permute`. That's the one term I'm looking for first. If np.permute does some kind of deterministic permutation or pivoting, then I wouldn't find it confusing if np.random.permute does "random" permutation. (I definitely don't like scrambled, sounds like eggs or cable TV that needs to be unscrambled.) Josef
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Oct 17, 2014 at 2:35 AM, <josef.pktd@gmail.com> wrote:
On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc.
So, how about:
scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion.
shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards).
permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)
I hesitate to use names like "randomize" because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say "it randomizes the array". But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow.
I had some similar concerns (hence my original "disarrange"), but "randomize" seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, "permute" and "permuted" are even more natural and unambiguous. Any objections to those? (The existing function is "permutation".)
[...]
By the way, "permutation" has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's "randperm" function. Unless we replicate that in the new function, we shouldn't deprecate "permutation".
I guess we could do something like:
permutation(n):
Return a random permutation on n items. Equivalent to permuted(arange(n)).
Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated.
OTOH "np.random.permute" as a name does have a downside: someday we'll probably add a function called "np.permute" (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing.
I like `permute`. That's the one term I'm looking for first.
If np.permute does some kind of deterministic permutation or pivoting, then I wouldn't find it confusing if np.random.permute does "random" permutation.
Yeah, but: from ... import permute # 500 lines later def foo(...): permute(...) # what the heck is this It definitely *can* be confusing; basically everything else in np.random has a name that suggests randomness even without seeing the full path. It's not a huge deal, though.
(I definitely don't like scrambled, sounds like eggs or cable TV that needs to be unscrambled.)
I vote that in this kind of bikeshed we try to restrict ourselves to arguments that we can at least pretend are motivated by some technical/UX concern ;-). (I guess unscrambling eggs would be technically impressive tho ;-)) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Thu, Oct 16, 2014 at 10:50 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Oct 17, 2014 at 2:35 AM, <josef.pktd@gmail.com> wrote:
On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote: > > Regarding names: shuffle/permutation is a terrible naming convention > IMHO and shouldn't be propagated further. We already have a good > naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. > reversed, etc. > > So, how about: > > scramble + scrambled shuffle individual entries within each > row/column/..., as in Warren's suggestion. > > shuffle + shuffled to do what shuffle, permutation do now (mnemonic: > these break a 2d array into a bunch of 1d "cards", and then shuffle > those cards). > > permuted remains indefinitely, with the docstring: "Deprecated alias > for 'shuffled'."
That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)
I hesitate to use names like "randomize" because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say "it randomizes the array". But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow.
I had some similar concerns (hence my original "disarrange"), but "randomize" seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, "permute" and "permuted" are even more natural and unambiguous. Any objections to those? (The existing function is "permutation".)
[...]
By the way, "permutation" has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's "randperm" function. Unless we replicate that in the new function, we shouldn't deprecate "permutation".
I guess we could do something like:
permutation(n):
Return a random permutation on n items. Equivalent to permuted(arange(n)).
Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated.
OTOH "np.random.permute" as a name does have a downside: someday we'll probably add a function called "np.permute" (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing.
I like `permute`. That's the one term I'm looking for first.
If np.permute does some kind of deterministic permutation or pivoting, then I wouldn't find it confusing if np.random.permute does "random" permutation.
Yeah, but:
from ... import permute # 500 lines later def foo(...): permute(...) # what the heck is this
It definitely *can* be confusing; basically everything else in np.random has a name that suggests randomness even without seeing the full path.
I usually/always avoid importing names from random into the module namespace np.random.xxx from numpy.random import power power(...)
power(5, 3) array([ 0.93771162, 0.96180884, 0.80191961])
??? and f and beta and gamma, ...
bytes(10) '\xa3\xf0%\x88\x11\xda\x0e\x81\x0c\x8e' bytes(5) '\xb0B\x8e\xa1\x80'
It's not a huge deal, though.
(I definitely don't like scrambled, sounds like eggs or cable TV that needs to be unscrambled.)
I vote that in this kind of bikeshed we try to restrict ourselves to arguments that we can at least pretend are motivated by some technical/UX concern ;-). (I guess unscrambling eggs would be technically impressive tho ;-))
Ignoring the eggs, it still sounds like a cheap encryption and is a word I would never look for when looking for something to implement a permutation test. Josef
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < warren.weckesser@gmail.com> wrote:
I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173 I'd like to get some feedback on the idea.
Currently, `shuffle` shuffles the first dimension of an array in-place. For example, shuffling a 2D array shuffles the rows:
In [227]: a Out[227]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]])
In [228]: np.random.shuffle(a)
In [229]: a Out[229]: array([[ 0, 1, 2], [ 9, 10, 11], [ 3, 4, 5], [ 6, 7, 8]])
To add an axis keyword, we could (in effect) apply `shuffle` to `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles the columns:
In [232]: a = np.arange(15).reshape(3,5)
In [233]: a Out[233]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [234]: axis = 1
In [235]: np.random.shuffle(a.swapaxes(axis, 0))
In [236]: a Out[236]: array([[ 3, 2, 4, 0, 1], [ 8, 7, 9, 5, 6], [13, 12, 14, 10, 11]])
So that's the first part--adding an `axis` keyword.
The other part of the enhancement request is to add a shuffle behavior that shuffles the 1-d slices *independently*. That is, for a 2-d array, shuffling with `axis=0` would apply a different shuffle to each column. In the github issue, I defined a function called `disarrange` that implements this behavior:
In [240]: a Out[240]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]])
In [241]: disarrange(a, axis=0)
In [242]: a Out[242]: array([[ 6, 1, 2], [ 3, 13, 14], [ 9, 10, 5], [12, 7, 8], [ 0, 4, 11]])
Note that each column has been shuffled independently.
This behavior is analogous to how `sort` handles the `axis` keyword. `sort` sorts the 1-d slices along the given axis independently.
In the github issue, I suggested the following signature for `shuffle` (but I'm not too fond of the name `independent`):
def shuffle(a, independent=False, axis=0)
If `independent` is False, the current behavior of `shuffle` is used. If `independent` is True, each 1-d slice is shuffled independently (in the same way that `sort` sorts each 1-d slice).
Like most functions that take an `axis` argument, `axis=None` means to shuffle the flattened array. With `independent=True`, it would act like `np.random.shuffle(a.flat)`, e.g.
In [247]: a Out[247]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [248]: np.random.shuffle(a.flat)
In [249]: a Out[249]: array([[ 0, 14, 9, 1, 13], [ 2, 8, 5, 3, 4], [ 6, 10, 7, 12, 11]])
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
Warren
It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`). Forget I ever suggested doing nothing or raising an error. :) Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array). This function should also get an `axis` argument. `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array. Then we would then have four methods: In-place Copy Current shuffle style shuffle permutation New shuffle style (name TBD) (name TBD) (All of them will have an `axis` argument.) I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`. Warren

On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser <warren.weckesser@gmail.com> wrote:
I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173 I'd like to get some feedback on the idea.
Currently, `shuffle` shuffles the first dimension of an array in-place. For example, shuffling a 2D array shuffles the rows:
In [227]: a Out[227]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]])
In [228]: np.random.shuffle(a)
In [229]: a Out[229]: array([[ 0, 1, 2], [ 9, 10, 11], [ 3, 4, 5], [ 6, 7, 8]])
To add an axis keyword, we could (in effect) apply `shuffle` to `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles the columns:
In [232]: a = np.arange(15).reshape(3,5)
In [233]: a Out[233]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [234]: axis = 1
In [235]: np.random.shuffle(a.swapaxes(axis, 0))
In [236]: a Out[236]: array([[ 3, 2, 4, 0, 1], [ 8, 7, 9, 5, 6], [13, 12, 14, 10, 11]])
So that's the first part--adding an `axis` keyword.
The other part of the enhancement request is to add a shuffle behavior that shuffles the 1-d slices *independently*. That is, for a 2-d array, shuffling with `axis=0` would apply a different shuffle to each column. In the github issue, I defined a function called `disarrange` that implements this behavior:
In [240]: a Out[240]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]])
In [241]: disarrange(a, axis=0)
In [242]: a Out[242]: array([[ 6, 1, 2], [ 3, 13, 14], [ 9, 10, 5], [12, 7, 8], [ 0, 4, 11]])
Note that each column has been shuffled independently.
This behavior is analogous to how `sort` handles the `axis` keyword. `sort` sorts the 1-d slices along the given axis independently.
In the github issue, I suggested the following signature for `shuffle` (but I'm not too fond of the name `independent`):
def shuffle(a, independent=False, axis=0)
If `independent` is False, the current behavior of `shuffle` is used. If `independent` is True, each 1-d slice is shuffled independently (in the same way that `sort` sorts each 1-d slice).
Like most functions that take an `axis` argument, `axis=None` means to shuffle the flattened array. With `independent=True`, it would act like `np.random.shuffle(a.flat)`, e.g.
In [247]: a Out[247]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [248]: np.random.shuffle(a.flat)
In [249]: a Out[249]: array([[ 0, 14, 9, 1, 13], [ 2, 8, 5, 3, 4], [ 6, 10, 7, 12, 11]])
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
Warren
It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`). Forget I ever suggested doing nothing or raising an error. :)
Josef's comment reminded me that `numpy.random.permutation`
which kind of proofs my point I sometimes have problems finding `shuffle` because I want a function that does permutation. Josef returns a
shuffled copy of the array (when its argument is an array). This function should also get an `axis` argument. `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array. Then we would then have four methods:
In-place Copy Current shuffle style shuffle permutation New shuffle style (name TBD) (name TBD)
(All of them will have an `axis` argument.)
I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.
Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser < warren.weckesser@gmail.com> wrote:
On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < warren.weckesser@gmail.com> wrote:
I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173 I'd like to get some feedback on the idea.
Currently, `shuffle` shuffles the first dimension of an array in-place. For example, shuffling a 2D array shuffles the rows:
In [227]: a Out[227]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]])
In [228]: np.random.shuffle(a)
In [229]: a Out[229]: array([[ 0, 1, 2], [ 9, 10, 11], [ 3, 4, 5], [ 6, 7, 8]])
To add an axis keyword, we could (in effect) apply `shuffle` to `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles the columns:
In [232]: a = np.arange(15).reshape(3,5)
In [233]: a Out[233]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [234]: axis = 1
In [235]: np.random.shuffle(a.swapaxes(axis, 0))
In [236]: a Out[236]: array([[ 3, 2, 4, 0, 1], [ 8, 7, 9, 5, 6], [13, 12, 14, 10, 11]])
So that's the first part--adding an `axis` keyword.
The other part of the enhancement request is to add a shuffle behavior that shuffles the 1-d slices *independently*. That is, for a 2-d array, shuffling with `axis=0` would apply a different shuffle to each column. In the github issue, I defined a function called `disarrange` that implements this behavior:
In [240]: a Out[240]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]])
In [241]: disarrange(a, axis=0)
In [242]: a Out[242]: array([[ 6, 1, 2], [ 3, 13, 14], [ 9, 10, 5], [12, 7, 8], [ 0, 4, 11]])
Note that each column has been shuffled independently.
This behavior is analogous to how `sort` handles the `axis` keyword. `sort` sorts the 1-d slices along the given axis independently.
In the github issue, I suggested the following signature for `shuffle` (but I'm not too fond of the name `independent`):
def shuffle(a, independent=False, axis=0)
If `independent` is False, the current behavior of `shuffle` is used. If `independent` is True, each 1-d slice is shuffled independently (in the same way that `sort` sorts each 1-d slice).
Like most functions that take an `axis` argument, `axis=None` means to shuffle the flattened array. With `independent=True`, it would act like `np.random.shuffle(a.flat)`, e.g.
In [247]: a Out[247]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [248]: np.random.shuffle(a.flat)
In [249]: a Out[249]: array([[ 0, 14, 9, 1, 13], [ 2, 8, 5, 3, 4], [ 6, 10, 7, 12, 11]])
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
Warren
It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`). Forget I ever suggested doing nothing or raising an error. :)
Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array). This function should also get an `axis` argument. `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array. Then we would then have four methods:
In-place Copy Current shuffle style shuffle permutation New shuffle style (name TBD) (name TBD)
(All of them will have an `axis` argument.)
That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined. That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion. Warren
I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.
Warren

On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser < warren.weckesser@gmail.com> wrote:
On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser < warren.weckesser@gmail.com> wrote:
On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < warren.weckesser@gmail.com> wrote:
I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173 I'd like to get some feedback on the idea.
Currently, `shuffle` shuffles the first dimension of an array in-place. For example, shuffling a 2D array shuffles the rows:
In [227]: a Out[227]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]])
In [228]: np.random.shuffle(a)
In [229]: a Out[229]: array([[ 0, 1, 2], [ 9, 10, 11], [ 3, 4, 5], [ 6, 7, 8]])
To add an axis keyword, we could (in effect) apply `shuffle` to `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles the columns:
In [232]: a = np.arange(15).reshape(3,5)
In [233]: a Out[233]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [234]: axis = 1
In [235]: np.random.shuffle(a.swapaxes(axis, 0))
In [236]: a Out[236]: array([[ 3, 2, 4, 0, 1], [ 8, 7, 9, 5, 6], [13, 12, 14, 10, 11]])
So that's the first part--adding an `axis` keyword.
The other part of the enhancement request is to add a shuffle behavior that shuffles the 1-d slices *independently*. That is, for a 2-d array, shuffling with `axis=0` would apply a different shuffle to each column. In the github issue, I defined a function called `disarrange` that implements this behavior:
In [240]: a Out[240]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]])
In [241]: disarrange(a, axis=0)
In [242]: a Out[242]: array([[ 6, 1, 2], [ 3, 13, 14], [ 9, 10, 5], [12, 7, 8], [ 0, 4, 11]])
Note that each column has been shuffled independently.
This behavior is analogous to how `sort` handles the `axis` keyword. `sort` sorts the 1-d slices along the given axis independently.
In the github issue, I suggested the following signature for `shuffle` (but I'm not too fond of the name `independent`):
def shuffle(a, independent=False, axis=0)
If `independent` is False, the current behavior of `shuffle` is used. If `independent` is True, each 1-d slice is shuffled independently (in the same way that `sort` sorts each 1-d slice).
Like most functions that take an `axis` argument, `axis=None` means to shuffle the flattened array. With `independent=True`, it would act like `np.random.shuffle(a.flat)`, e.g.
In [247]: a Out[247]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [248]: np.random.shuffle(a.flat)
In [249]: a Out[249]: array([[ 0, 14, 9, 1, 13], [ 2, 8, 5, 3, 4], [ 6, 10, 7, 12, 11]])
A small wart in this API is the meaning of
shuffle(a, independent=False, axis=None)
It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised.
What do you think?
Warren
It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`). Forget I ever suggested doing nothing or raising an error. :)
Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array). This function should also get an `axis` argument. `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array. Then we would then have four methods:
In-place Copy Current shuffle style shuffle permutation New shuffle style (name TBD) (name TBD)
(All of them will have an `axis` argument.)
That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined. That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion.
Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'. Jaime
Warren
I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.
Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

On Sun, Oct 12, 2014 at 10:56 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'.
+1 Unfortunately, shuffle has the better name, but permutation has the better default behavior. (also, I think "inplace" might be a less ambiguous name for the argument than "copy")
participants (10)
-
Eelco Hoogendoorn
-
Jaime Fernández del Río
-
John Zwinck
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Robert Kern
-
Sebastian
-
Stefan van der Walt
-
Stephan Hoyer
-
Warren Weckesser