[Numpy-discussion] strange divergence in performance
robert.kern at gmail.com
Wed Jan 20 17:17:26 EST 2010
2010/1/20 Ernest Adrogué <eadrogue at gmx.net>:
> I have a function where an array of integers (1-d) is compared
> element-wise to an integer using the greater-than operator.
> I noticed that when the integer is 0 it takes about 75% more time
> than when it's 1 or 2. Is there an explanation?
> Here is a stripped-down version which does (sort of)show what I say:
> def filter_array(array, f1, f2, flag=False):
> if flag:
> k = 1
> k = 0
> m1 = reduce(np.add, [(array['f1'] == i).astype(int) for i in f1]) > 0
> m2 = reduce(np.add, [(array['f2'] == i).astype(int) for i in f2]) > 0
> mask = reduce(np.add, (i.astype(int) for i in (m1, m2))) > k
> return array[mask]
> Now let's create an array with two fields:
> a = np.array(zip( np.random.random_integers(0,10,size=5000), np.random.random_integers(0,10,size=5000)), dtype=[('f1',int),('f2',int)])
> Now call the function with flag=True and flag=False, and see what happens:
> In : %timeit filter_array(a, (6,), (0,), flag=False)
> 1000 loops, best of 3: 536 us per loop
> In : %timeit filter_array(a, (6,), (0,), flag=True)
> 1000 loops, best of 3: 245 us per loop
> In this example the difference seems to be 1:2. In my program
> is 1:4. I am at a loss about what causes this.
It is not the > operator that exhibits the difference.
In : x = np.random.random_integers(0,10,size=5000)
In : %timeit m = x > 0
100000 loops, best of 3: 19.1 us per loop
In : %timeit m = x > 1
100000 loops, best of 3: 19.3 us per loop
The difference is in the array[mask]. There are necessarily fewer True
elements in the mask for >1 than >0.
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion