[Numpy-discussion] 1.6.2 no more unique for rows

josef.pktd at gmail.com josef.pktd at gmail.com
Wed May 30 19:08:16 EDT 2012


On Wed, May 30, 2012 at 5:55 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Wed, May 30, 2012 at 5:39 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>>
>> On Wed, May 30, 2012 at 4:59 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>> On Tue, May 29, 2012 at 7:42 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> >
>>> > On Mon, May 28, 2012 at 9:18 PM, <josef.pktd at gmail.com> wrote:
>>> >>
>>> >>
>>> >>
>>> >> https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892
>>> >> broke return_inverse for structured arrays, because of the use of
>>> >> mergesort
>>> >>
>>> >> I'm using structured dtypes to get uniques and return_inverse by rows
>>> >>
>>> >> >>> groups = np.random.randint(0,4,size=(10,2))
>>> >> >>> groups_ =
>>> >> >>> groups.view([('',groups.dtype)]*groups.shape[1]).flatten()
>>> >> >>> groups
>>> >> array([[0, 2],
>>> >>       [1, 2],
>>> >>       [1, 1],
>>> >>       [3, 1],
>>> >>       [3, 1],
>>> >>       [2, 1],
>>> >>       [1, 0],
>>> >>       [3, 3],
>>> >>       [3, 2],
>>> >>       [0, 0]])
>>> >> >>> groups_
>>> >> array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3),
>>> >>       (3, 2), (0, 0)],
>>> >>      dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> >>
>>> >> >>> np.argsort(groups_)
>>> >> array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7])
>>> >>
>>> >> >>> np.argsort(groups_, kind='mergesort')
>>> >> Traceback (most recent call last):
>>> >>  File "<stdin>", line 1, in <module>
>>> >>  File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line
>>> >> 679, in argsort
>>> >>    return argsort(axis, kind, order)
>>> >> TypeError: requested sort not available for type
>>> >>
>>> >> >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True,
>>> >> >>> return_inverse=True)
>>> >> >>> uni_inv
>>> >> array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0])
>>> >>
>>> >> exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels)
>>> >>
>>> >
>>> > I've been putting of, um, planning to implement the different sort
>>> > kinds for
>>> > object/structured arrays for a while, sounds like it needs to get done.
>>>
>>> So I guess this is a 1.6.1 -> 1.6.2 regression, and presumably we
>>> won't be landing any new sort implementations in the 1.6 branch.
>>> Should we be thinking about reverting this and releasing a 1.6.3? (I
>>> don't know if it's worth it, but it seems like something we should
>>> think about either way.)
>>>
>>> Same question applies to 1.7 too -- obviously the change to unique()
>>> is a good one, but maybe it has to wait until mergesort can handle
>>> structured dtypes?
>>>
>>
>> Should definitely be reverted if a 1.6.3 goes out.
>
>
> But is a 1.6.3 required for this issue alone? It's a regression, but it
> looks like a corner case and is already fixed in statsmodels. If there are
> more users who are running into this problem though, I'm OK with doing a
> 1.6.3 release just for this.

For statsmodels it doesn't make much difference anymore when this gets
changed. Once it is released in a numpy version that we support, we
are pretty much stuck with numpy compatibility files.
Fortunately the function was easy to copy.

Unfortunately we didn't have test coverage or didn't test this before
numpy 1.6.2.

Josef

>
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list