On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:
Christopher Barker wrote:
Nadav Horesh wrote:
but, for unknown reason, np.equal operates only on "normal" arrays.
In : a Out: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In : np.equal(a,a) Out: NotImplemented
In : a == a Out: array([ True, True, True, True], dtype=bool)
don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow.
It looks as if "a == a" (that is array_richcompare) is triggering special case code for strings, so it is fast. However, IMHO np.equal should be made to work as well. Can you file a bug and assign it to me (I'm dealing with a number of other string-related things, so I might as well take this too).
The array_richcompare special-cased strings not for speed but for actual functionality.
Making np.equal work with strings requires changes to the ufunc code itself which was never written to work with "variable-length" data- types (like strings, unicode, and records). There are several things that would have to be fixed. Some of the changes we made to allow for date-time data-types also made it possible to support variable-length strings, but this is non-trivial to implement. It's certainly possible, but I would want to look at any changes you make before committing them to make sure all the issues are being understood.
-- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com firstname.lastname@example.org