[Numpy-discussion] A bug in numpy.random.shuffle?

Thu Sep 5 16:06:44 EDT 2013

On Thu, Sep 5, 2013 at 1:45 PM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
>
>
> On Thu, Sep 5, 2013 at 1:34 PM, Bradley M. Froehle <brad.froehle at gmail.com
> > wrote:
>
>> I put this test case through `git bisect run` and here's what came
>> back.  I haven't confirmed this manually yet, but the blamed commit
>> does seem reasonable:
>>
>> b26c675e2a91e1042f8f8d634763942c87fbbb6e is the first bad commit
>> commit b26c675e2a91e1042f8f8d634763942c87fbbb6e
>> Author: Nathaniel J. Smith <njs at pobox.com>
>> Date:   Thu Jul 12 13:20:20 2012 +0100
>>
>>     [FIX] Make np.random.shuffle less brain-dead
>>
>>     The logic in np.random.shuffle was... not very sensible. Fixes trac
>>     ticket #2074.
>>
>>     This patch also exposes a completely unrelated issue in
>>     numpy.testing. Filed as Github issue #347 and marked as knownfail for
>>     now.
>>
>> :040000 040000 6f3cf0c85a64664db6a71bd59909903f18b51639
>> 0b6c8571dd3c9de8f023389f6bd963e42b12cc26 M numpy
>> bisect run success
>>
>> On Thu, Sep 5, 2013 at 11:58 AM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> >
>> > On Thu, Sep 5, 2013 at 12:50 PM, Fernando Perez <fperez.net at gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Sep 5, 2013 at 11:43 AM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >>
>> >>
>> >> > Oh, nice one ;) Should be fixable if you want to submit a patch.
>> >>
>> >> Strategy? One option is to do, for structured arrays, a shuffle of the
>> >> indices and then an in-place
>> >>
>> >> arr = arr[shuffled_indices]
>> >>
>> >> But there may be a cleaner/faster way to do it.
>> >>
>> >> I'm happy to submit a patch, but I'm not familiar enough with the
>> >> internals to know what the best approach should be.
>> >>
>> >
>> > Better open an issue. It looks like a bug in the indexing code.
>> >
>>
>
> Also fails for string arrays.
>
> In [6]: x = np.zeros(5, dtype=[('n', 'S1'), ('s', 'S1')])
>
> In [7]: x['s'] = [c for c in 'abcde']
>
> In [8]: x
> Out[8]:
> array([('', 'a'), ('', 'b'), ('', 'c'), ('', 'd'), ('', 'e')],
>       dtype=[('n', 'S1'), ('s', 'S1')])
>
> In [9]: x[0], x[1] = x[1], x[0]
>
> In [10]: x
> Out[10]:
> array([('', 'b'), ('', 'b'), ('', 'c'), ('', 'd'), ('', 'e')],
>       dtype=[('n', 'S1'), ('s', 'S1')])
>
>
This behavior is not new, it is also present in 1.6.x

In [1]: x = np.zeros(5, dtype=[('n', 'S1'), ('s', 'S1')])

In [2]: x['s'] = [c for c in 'abcde']

In [3]: x
Out[3]:
array([('', 'a'), ('', 'b'), ('', 'c'), ('', 'd'), ('', 'e')],
      dtype=[('n', '|S1'), ('s', '|S1')])

In [4]: x[0], x[1] = x[1], x[0]

In [5]: x
Out[5]:
array([('', 'b'), ('', 'b'), ('', 'c'), ('', 'd'), ('', 'e')],
      dtype=[('n', '|S1'), ('s', '|S1')])

In [6]: np.__version__
Out[6]: '1.6.3.dev-3f58621'

So it looks like it needs to be decided if this is a bug or not. I think
the returned scalars should be copies of the data.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130905/8bde97a2/attachment.html>