Nadav Horesh wrote:
np.equal(a,a).sum(0)
but, for unknown reason, np.equal operates only on "normal" arrays.
true:
In [25]: a Out[25]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [27]: np.equal(a,a) Out[27]: NotImplemented
however:
In [28]: a == a Out[28]: array([ True, True, True, True], dtype=bool)
don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow.
maybe you can transform the array to arrays of numbers, for example by hash.
or even easier:
In [32]: a2 = a.view(dtype=np.int32)
In [33]: a2 Out[33]: array([1633837824, 1684366848, 1633837824, 1734895978])
In [34]: np.equal(a2, a2[0]) Out[34]: array([ True, False, True, False], dtype=bool)
though that only works if your strings are a handy length like 4 bytes...
-Chris