Mailman 3 index partition - NumPy-Discussion - python.org

newer
Re: [Numpy-discussion] Wiki page...

index partition

older
Writing successful tests

Alan G Isaac

12 Apr 2014 12 Apr '14

4:47 p.m.

From a 1d array, I want two arrays of indexes: the first for elements that satisfy a criterion, and the second for elements that do not. Naturally there are many ways to do this. Is there a preferred way? As a simple example, suppose for array `a` I want np.flatnonzero(a>0) and np.flatnonzero(a<=0). Can I get them both in one go? Thanks, Alan Isaac

Reply

Sign in to reply online Use email software

Show replies by date

Alexander Belopolsky

12 Apr 12 Apr

5:01 p.m.

On Sat, Apr 12, 2014 at 4:47 PM, Alan G Isaac wrote:

As a simple example, suppose for array `a` I want np.flatnonzero(a>0) and np.flatnonzero(a<=0). Can I get them both in one go?

I don't think you can do better than x = a > 0 p, q = np.flatnonzero(x), np.flatnonzero(~x)

Reply

Sign in to reply online Use email software

Sebastian Berg

5:03 p.m.

On Sa, 2014-04-12 at 16:47 -0400, Alan G Isaac wrote:

From a 1d array, I want two arrays of indexes: the first for elements that satisfy a criterion, and the second for elements that do not. Naturally there are many ways to do this. Is there a preferred way?

As a simple example, suppose for array `a` I want np.flatnonzero(a>0) and np.flatnonzero(a<=0). Can I get them both in one go?

Might be missing something, but I don't think there is a way to do it in one go. The result is irregularly structured and there are few functions like nonzero which give something like that. - Sebastian

Thanks, Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply

Sign in to reply online Use email software

Alexander Belopolsky

5:20 p.m.

On Sat, Apr 12, 2014 at 5:03 PM, Sebastian Berg wrote:

...
As a simple example, suppose for array `a` I want np.flatnonzero(a>0) and np.flatnonzero(a<=0). Can I get them both in one go?

Might be missing something, but I don't think there is a way to do it in one go. The result is irregularly structured and there are few functions like nonzero which give something like that.

The "set routines" [1] are in this category and may help you deal with partitions, but I would recommend using boolean arrays instead. If you commonly deal with both a subset and a complement, set representation does not give you a memory advantage over a boolean mask. [1] http://docs.scipy.org/doc/numpy/reference/routines.set.html

Reply

Sign in to reply online Use email software

Alan G Isaac

14 Apr 14 Apr

12:17 p.m.

On 4/12/2014 5:20 PM, Alexander Belopolsky wrote:

The "set routines" [1] are in this category and may help you deal with partitions, but I would recommend using boolean arrays instead. If you commonly deal with both a subset and a complement, set representation does not give you a memory advantage over a boolean mask.

I take it that by a lack of a memory advantage you mean because boolean arrays are 8 bit representations. That makes sense. I find it rather more convenient to use boolean arrays, but I wonder if arrays of indexes might have other advantages (which would suggest using the set operations instead). In particular, might a[boolean_array] be slower that a[indexes]? (I'm just asking, not suggesting.) Thanks! Alan

Reply

Sign in to reply online Use email software

Daπid

15 Apr 15 Apr

4:34 a.m.

On 14 April 2014 18:17, Alan G Isaac wrote:

I find it rather more convenient to use boolean arrays, but I wonder if arrays of indexes might have other advantages (which would suggest using the set operations instead). In particular, might a[boolean_array] be slower that a[indexes]? (I'm just asking, not suggesting.)

Indexing is generally faster, but convert from boolean to indexes gets more expensive: In [2]: arr =np.random.random(1000) In [3]: mask = arr>0.7 In [4]: mask.sum() Out[4]: 290 In [5]: %timeit arr[mask] 100000 loops, best of 3: 4.01 µs per loop In [6]: %%timeit ...: wh = np.where(mask) ...: arr[wh] ...: 100000 loops, best of 3: 6.47 µs per loop In [8]: wh = np.where(mask) In [9]: %timeit arr[wh] 100000 loops, best of 3: 2.57 µs per loop In [10]: %timeit np.where(mask) 100000 loops, best of 3: 3.89 µs per loop In [14]: np.all(arr[wh] == arr[mask]) Out[14]: True If you want to apply the same mask to several arrays, it is then worth (performance-wise) to do it. /David.

Reply

Sign in to reply online Use email software

3661

Age (days ago)

3664

Last active (days ago)

Download

5 comments

4 participants

tags

participants (4)

Alan G Isaac
Alexander Belopolsky
Daπid
Sebastian Berg