[Numpy-discussion] Boolean arrays
Robert Kern
robert.kern at gmail.com
Fri Aug 27 16:35:07 EDT 2010
On Fri, Aug 27, 2010 at 15:21, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, Aug 27, 2010 at 1:17 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> But in any case, that would be very slow for large arrays since it
>> would invoke a Python function call for every value in ar. Instead,
>> iterate over the valid array, which is much shorter:
>>
>> mask = np.zeros(ar.shape, dtype=bool)
>> for good in valid:
>> mask |= (ar == good)
>>
>> Wrap that up into a function and you're good to go. That's about as
>> efficient as it gets unless if the valid array gets large.
>
> Probably even more efficient if 'ar' is large and 'valid' is small,
> and shorter to boot:
>
> np.in1d(ar, valid)
Not according to my timings:
[~]
|2> def kern_in(x, valid):
..> mask = np.zeros(x.shape, dtype=bool)
..> for good in valid:
..> mask |= (x == good)
..> return mask
..>
[~]
|6> ar = np.random.randint(100, size=1000000)
[~]
|7> valid = np.arange(0, 100, 5)
[~]
|8> %timeit kern_in(ar, valid)
10 loops, best of 3: 115 ms per loop
[~]
|9> %timeit np.in1d(ar, valid)
1 loops, best of 3: 279 ms per loop
As valid gets larger, in1d() will catch up but for smallish sizes of
valid, which I suspect given the "non-numeric" nature of the OP's (Hi,
Brett!) request, kern_in() is usually better.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list