On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:
Christopher Barker wrote:
Nadav Horesh wrote:
np.equal(a,a).sum(0)
but, for unknown reason, np.equal operates only on "normal" arrays.
true:
In [25]: a Out[25]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [27]: np.equal(a,a) Out[27]: NotImplemented
however:
In [28]: a == a Out[28]: array([ True, True, True, True], dtype=bool)
don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow.
It looks as if "a == a" (that is array_richcompare) is triggering special case code for strings, so it is fast. However, IMHO np.equal should be made to work as well. Can you file a bug and assign it to me (I'm dealing with a number of other string-related things, so I might as well take this too).
The array_richcompare special-cased strings not for speed but for actual functionality.
Making np.equal work with strings requires changes to the ufunc code itself which was never written to work with "variable-length" data- types (like strings, unicode, and records). There are several things that would have to be fixed. Some of the changes we made to allow for date-time data-types also made it possible to support variable-length strings, but this is non-trivial to implement. It's certainly possible, but I would want to look at any changes you make before committing them to make sure all the issues are being understood.
Thanks,
-Travis
-- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com oliphant@enthought.com