<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser <span dir="ltr"><<a href="mailto:warren.weckesser@gmail.com" target="_blank">warren.weckesser@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser <span dir="ltr"><<a href="mailto:warren.weckesser@gmail.com" target="_blank">warren.weckesser@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><div><div><br><div class="gmail_quote">On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser <span dir="ltr"><<a href="mailto:warren.weckesser@gmail.com" target="_blank">warren.weckesser@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div>I created an issue on github for an enhancement<br>to numpy.random.shuffle:<br> <a href="https://github.com/numpy/numpy/issues/5173" target="_blank">https://github.com/numpy/numpy/issues/5173</a><br>I'd like to get some feedback on the idea.<br><br>Currently, `shuffle` shuffles the first dimension of an array<br>in-place. For example, shuffling a 2D array shuffles the rows:<br><br><span style="font-family:courier new,monospace">In [227]: a<br>Out[227]: <br>array([[ 0, 1, 2],<br> [ 3, 4, 5],<br> [ 6, 7, 8],<br> [ 9, 10, 11]])<br><br>In [228]: np.random.shuffle(a)<br><br>In [229]: a<br>Out[229]: <br>array([[ 0, 1, 2],<br> [ 9, 10, 11],<br> [ 3, 4, 5],<br> [ 6, 7, 8]])</span><br><br><br>To add an axis keyword, we could (in effect) apply `shuffle` to<br>`a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles<br>the columns:<br><span style="font-family:courier new,monospace"><br>In [232]: a = np.arange(15).reshape(3,5)<br><br>In [233]: a<br>Out[233]: <br>array([[ 0, 1, 2, 3, 4],<br> [ 5, 6, 7, 8, 9],<br> [10, 11, 12, 13, 14]])<br><br>In [234]: axis = 1<br><br>In [235]: np.random.shuffle(a.swapaxes(axis, 0))<br><br>In [236]: a<br>Out[236]: <br>array([[ 3, 2, 4, 0, 1],<br> [ 8, 7, 9, 5, 6],<br> [13, 12, 14, 10, 11]])</span><br><br>So that's the first part--adding an `axis` keyword.<br><br>The other part of the enhancement request is to add a shuffle<br>behavior that shuffles the 1-d slices *independently*. That is,<br>for a 2-d array, shuffling with `axis=0` would apply a different<br>shuffle to each column. In the github issue, I defined a<br>function called `disarrange` that implements this behavior:<br><br><span style="font-family:courier new,monospace">In [240]: a<br>Out[240]: <br>array([[ 0, 1, 2],<br> [ 3, 4, 5],<br> [ 6, 7, 8],<br> [ 9, 10, 11],<br> [12, 13, 14]])<br><br>In [241]: disarrange(a, axis=0)<br><br>In [242]: a<br>Out[242]: <br>array([[ 6, 1, 2],<br> [ 3, 13, 14],<br> [ 9, 10, 5],<br> [12, 7, 8],<br> [ 0, 4, 11]])</span><br><br>Note that each column has been shuffled independently.<br><br>This behavior is analogous to how `sort` handles the `axis`<br>keyword. `sort` sorts the 1-d slices along the given axis<br>independently.<br><br>In the github issue, I suggested the following signature<br>for `shuffle` (but I'm not too fond of the name `independent`):<br><br><span style="font-family:courier new,monospace"> def shuffle(a, independent=False, axis=0)</span><br><br>If `independent` is False, the current behavior of `shuffle`<br>is used. If `independent` is True, each 1-d slice is shuffled<br>independently (in the same way that `sort` sorts each 1-d<br>slice).<br><br>Like most functions that take an `axis` argument, `axis=None`<br>means to shuffle the flattened array. With `independent=True`,<br>it would act like `np.random.shuffle(a.flat)`, e.g.<br><span style="font-family:courier new,monospace"><br>In [247]: a<br>Out[247]: <br>array([[ 0, 1, 2, 3, 4],<br> [ 5, 6, 7, 8, 9],<br> [10, 11, 12, 13, 14]])<br><br>In [248]: np.random.shuffle(a.flat)<br><br>In [249]: a<br>Out[249]: <br>array([[ 0, 14, 9, 1, 13],<br> [ 2, 8, 5, 3, 4],<br> [ 6, 10, 7, 12, 11]])</span><br><br><br>A small wart in this API is the meaning of<br><br><span style="font-family:courier new,monospace"> shuffle(a, independent=False, axis=None)</span><br><br>It could be argued that the correct behavior is to leave the<br></div></div>array unchanged. (The current behavior can be interpreted as<br>shuffling a 1-d sequence of monolithic blobs; the axis argument<br>specifies which axis of the array corresponds to the<br>sequence index. Then `axis=None` means the argument is<br>a single monolithic blob, so there is nothing to shuffle.)<br>Or an error could be raised.<br><div><div><br>What do you think?<span><font color="#888888"><br><br>Warren<br><br></font></span></div></div></div>
</blockquote></div><br><br><br></div></div>It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`). Forget I ever suggested doing nothing or raising an error. :)<br><br>Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array). This function should also get an `axis` argument. `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array. Then we would then have four methods:<br><br><span style="font-family:courier new,monospace"> In-place Copy<br>Current shuffle style shuffle permutation<br>New shuffle style (name TBD) (name TBD)</span><br><br>(All of them will have an `axis` argument.)<br><br></div></div></blockquote><div><br></div><div><br></div></div></div><div>That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined. That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion.</div></div></div></div></blockquote><div><br></div><div>Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'.</div><div><br></div><div>Jaime</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><span class="HOEnZb"><font color="#888888"><br><br></font></span></div><span class="HOEnZb"><font color="#888888"><div>Warren<br> <br></div></font></span><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra">I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.<span><font color="#888888"><br><br>Warren<br><br></font></span></div></div>
</blockquote></span></div><br></div></div>
<br>_______________________________________________<br>
NumPy-Discussion mailing list<br>
<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>(\__/)<br>( O.o)<br>( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
</div></div>