[Numpy-discussion] Multiplicity of an entry

Tue Oct 27 16:04:08 EDT 2009

On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:

> Christopher Barker wrote:
>> Nadav Horesh wrote:
>>
>>> np.equal(a,a).sum(0)
>>>
>>> but, for unknown reason, np.equal operates only on "normal" arrays.
>>>
>>
>> true:
>>
>> In [25]: a
>> Out[25]:
>> array(['abc', 'def', 'abc', 'ghij'],
>>       dtype='|S4')
>>
>> In [27]: np.equal(a,a)
>> Out[27]: NotImplemented
>>
>> however:
>>
>> In [28]: a == a
>> Out[28]: array([ True,  True,  True,  True], dtype=bool)
>>
>> don't they use the same code? or is "==" reverting to plain old  
>> generic
>> python sequence comparison, which would partly explain why it is so  
>> slow.
>>
> It looks as if "a == a" (that is array_richcompare) is triggering
> special case code for strings, so it is fast.  However, IMHO np.equal
> should be made to work as well.  Can you file a bug and assign it to  
> me
> (I'm dealing with a number of other string-related things, so I  
> might as
> well take this too).

The array_richcompare special-cased strings not for speed but for  
actual functionality.

Making np.equal work with strings requires changes to the ufunc code  
itself which was never written to work with "variable-length" data- 
types (like strings, unicode, and records).    There are several  
things that would have to be fixed.   Some of the changes we made to  
allow for date-time data-types also made it possible to support  
variable-length strings, but this is non-trivial to implement.  It's  
certainly possible, but I would want to look at any changes you make  
before committing them to make sure all the issues are being understood.

Thanks,

-Travis

--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliphant at enthought.com