On Wed, May 30, 2012 at 5:39 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:


On Wed, May 30, 2012 at 4:59 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, May 29, 2012 at 7:42 PM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Mon, May 28, 2012 at 9:18 PM, <josef.pktd@gmail.com> wrote:
>>
>>
>> https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892
>> broke return_inverse for structured arrays, because of the use of
>> mergesort
>>
>> I'm using structured dtypes to get uniques and return_inverse by rows
>>
>> >>> groups = np.random.randint(0,4,size=(10,2))
>> >>> groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten()
>> >>> groups
>> array([[0, 2],
>>       [1, 2],
>>       [1, 1],
>>       [3, 1],
>>       [3, 1],
>>       [2, 1],
>>       [1, 0],
>>       [3, 3],
>>       [3, 2],
>>       [0, 0]])
>> >>> groups_
>> array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3),
>>       (3, 2), (0, 0)],
>>      dtype=[('f0', '<i4'), ('f1', '<i4')])
>>
>> >>> np.argsort(groups_)
>> array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7])
>>
>> >>> np.argsort(groups_, kind='mergesort')
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line
>> 679, in argsort
>>    return argsort(axis, kind, order)
>> TypeError: requested sort not available for type
>>
>> >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True,
>> >>> return_inverse=True)
>> >>> uni_inv
>> array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0])
>>
>> exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels)
>>
>
> I've been putting of, um, planning to implement the different sort kinds for
> object/structured arrays for a while, sounds like it needs to get done.

So I guess this is a 1.6.1 -> 1.6.2 regression, and presumably we
won't be landing any new sort implementations in the 1.6 branch.
Should we be thinking about reverting this and releasing a 1.6.3? (I
don't know if it's worth it, but it seems like something we should
think about either way.)

Same question applies to 1.7 too -- obviously the change to unique()
is a good one, but maybe it has to wait until mergesort can handle
structured dtypes?


Should definitely be reverted if a 1.6.3 goes out.

But is a 1.6.3 required for this issue alone? It's a regression, but it looks like a corner case and is already fixed in statsmodels. If there are more users who are running into this problem though, I'm OK with doing a 1.6.3 release just for this.

Ralf