[Numpy-discussion] Request for enhancement to numpy.random.shuffle

Jaime Fernández del Río jaime.frio at gmail.com
Thu Oct 16 11:58:53 EDT 2014


On Thu, Oct 16, 2014 at 8:39 AM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:

>
>
> On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <sebix at sebix.at> wrote:
>> >
>> > On 2014-10-12 16:54, Warren Weckesser wrote:
>> >>
>> >>
>> >> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <robert.kern at gmail.com
>> >> <mailto:robert.kern at gmail.com>> wrote:
>> >>
>> >>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>> >>     <warren.weckesser at gmail.com <mailto:warren.weckesser at gmail.com>>
>> >>     wrote:
>> >>
>> >>     > A small wart in this API is the meaning of
>> >>     >
>> >>     >   shuffle(a, independent=False, axis=None)
>> >>     >
>> >>     > It could be argued that the correct behavior is to leave the
>> >>     > array unchanged. (The current behavior can be interpreted as
>> >>     > shuffling a 1-d sequence of monolithic blobs; the axis argument
>> >>     > specifies which axis of the array corresponds to the
>> >>     > sequence index.  Then `axis=None` means the argument is
>> >>     > a single monolithic blob, so there is nothing to shuffle.)
>> >>     > Or an error could be raised.
>> >>     >
>> >>     > What do you think?
>> >>
>> >>     It seems to me a perfectly good reason to have two methods instead
>> of
>> >>     one. I can't imagine when I wouldn't be using a literal True or
>> False
>> >>     for this, so it really should be two different methods.
>> >>
>> >>
>> >>
>> >> I agree, and my first inclination was to propose a different method
>> >> (and I had the bikeshedding conversation with myself about the name:
>> >> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some
>> >> other variation of the word "shuffle", ...), but I figured the first
>> >> thing folks would say is "Why not just add options to shuffle?"  So,
>> >> choose your battles and all that.
>> >>
>> >> What do other folks think of making a separate method
>> > I'm not a fan of more methods with similar functionality in Numpy. It's
>> > already hard to overlook the existing functions and all their possible
>> > applications and variants. The axis=None proposal for shuffling all
>> > items is very intuitive.
>> >
>> > I think we don't want to take the path of matlab: a huge amount of
>> > powerful functions, but few people know of their powerful possibilities.
>>
>> I totally agree with this principle, but I think this is an exception
>> to the rule, b/c unfortunately in this case the function that we *do*
>> have is weird and inconsistent with how most other functions in numpy
>> work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
>> (k,)->(k,) would work. Also, it's easy to implement the current
>> 'shuffle' in terms of any 1d shuffle function, with no explicit loops,
>> Warren's disarrange requires an explicit loop. So, we really
>> implemented the wrong one, oops. What this means going forward,
>> though, is that our only options are either to implement both
>> behaviours with two functions, or else to give up on have the more
>> natural behaviour altogether. I think the former is the lesser of two
>> evils.
>>
>> Regarding names: shuffle/permutation is a terrible naming convention
>> IMHO and shouldn't be propagated further. We already have a good
>> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
>> reversed, etc.
>>
>> So, how about:
>>
>> scramble + scrambled shuffle individual entries within each
>> row/column/..., as in Warren's suggestion.
>>
>> shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
>> these break a 2d array into a bunch of 1d "cards", and then shuffle
>> those cards).
>>
>> permuted remains indefinitely, with the docstring: "Deprecated alias
>> for 'shuffled'."
>>
>>
>
> That sounds good to me.  (I might go with 'randomize' instead of
> 'scramble', but that's a second-order decision for the API.)
>

So the only little detail left is someone actually rolling up his/her
sleeves and creating a PR... ;-)

The current shuffle and permutation are implemented here:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L4551

It's in Cython, so it is a good candidate for anyone wanting to contribute
to numpy, but wary of C code.

Jaime



>
>
> Warren
>
>
> -n
>>
>> --
>> Nathaniel J. Smith
>> Postdoctoral researcher - Informatics - University of Edinburgh
>> http://vorpus.org
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141016/9457c530/attachment.html>


More information about the NumPy-Discussion mailing list