[Numpy-discussion] Multiplicity of an entry
Travis Oliphant
oliphant at enthought.com
Tue Oct 27 16:04:08 EDT 2009
On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:
> Christopher Barker wrote:
>> Nadav Horesh wrote:
>>
>>> np.equal(a,a).sum(0)
>>>
>>> but, for unknown reason, np.equal operates only on "normal" arrays.
>>>
>>
>> true:
>>
>> In [25]: a
>> Out[25]:
>> array(['abc', 'def', 'abc', 'ghij'],
>> dtype='|S4')
>>
>> In [27]: np.equal(a,a)
>> Out[27]: NotImplemented
>>
>> however:
>>
>> In [28]: a == a
>> Out[28]: array([ True, True, True, True], dtype=bool)
>>
>> don't they use the same code? or is "==" reverting to plain old
>> generic
>> python sequence comparison, which would partly explain why it is so
>> slow.
>>
> It looks as if "a == a" (that is array_richcompare) is triggering
> special case code for strings, so it is fast. However, IMHO np.equal
> should be made to work as well. Can you file a bug and assign it to
> me
> (I'm dealing with a number of other string-related things, so I
> might as
> well take this too).
The array_richcompare special-cased strings not for speed but for
actual functionality.
Making np.equal work with strings requires changes to the ufunc code
itself which was never written to work with "variable-length" data-
types (like strings, unicode, and records). There are several
things that would have to be fixed. Some of the changes we made to
allow for date-time data-types also made it possible to support
variable-length strings, but this is non-trivial to implement. It's
certainly possible, but I would want to look at any changes you make
before committing them to make sure all the issues are being understood.
Thanks,
-Travis
--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliphant at enthought.com
More information about the NumPy-Discussion
mailing list