[Numpy-discussion] Fast decrementation of indices

Mon Feb 3 08:36:21 EST 2014

I think you'll find the algorithm below to be a lot faster, especially if the arrays are big.  Checking each array index against the list of included or excluded elements is must slower than simply creating a secondary array and looking up whether the elements are included or not.

b = np.array([
 [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7, 8, 9, 10, 11],
 [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3,  4,  5]
])

a = [1,2,3,7,8]

keepdata = np.ones(12, dtype=np.bool)
keepdata[a] = False
w = np.where(keepdata[b[0]] & keepdata[b[1]])
newindex = keepdata.cumsum()-1
c = newindex[b[:,w[0]]]

Cheers,
Rick

On 2 February 2014 20:58, Mads Ipsen <mads.ipsen at gmail.com> wrote:

> Hi,
> 
> I have run into a potential 'for loop' bottleneck. Let me outline:
> 
> The following array describes bonds (connections) in a benzene molecule
> 
>    b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7,
> 8, 9, 10, 11],
>         [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1,
> 2, 3,  4,  5]]
> 
> ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
> practical examples, the list can be much larger (N > 100.000 connections.
> 
> Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds
> connecting those atoms must be deleted. I achieve this doing
> 
> i_0 = numpy.in1d(b[0], a)
> i_1 = numpy.in1d(b[1], a)
> b_i = numpy.where(i_0 | i_1)[0]
> b = b[:,~(i_0 | i_1)]
> 
> If you find this approach lacking, feel free to comment.
> 
> This results in the following updated bond list
> 
> b = [[0,  0,  4,  4,  5,  5,  5,  6, 10, 11]
>     [5,  6, 10,  5,  4, 11,  0,  0,  4,  5]]
> 
> This list is however not correct: Since atoms [1,2,3,7,8] have been
> deleted, the remaining atoms with indices larger than the deleted atoms
> must be decremented. I do this as follows:
> 
> for i in a:
>    b = numpy.where(b > i, bonds-1, bonds)  (*)
> 
> yielding the correct result
> 
> b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6],
>     [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]
> 
> The Python for loop in (*) may easily contain 50.000 iteration. Is there
> a smart way to utilize numpy functionality to avoid this?
> 
> Thanks and best regards,
> 
> Mads
> 
> -- 
> +---------------------------------------------------------+
> | Mads Ipsen                                              |
> +----------------------+----------------------------------+
> | G?seb?ksvej 7, 4. tv | phone:              +45-29716388 |
> | DK-2500 Valby        | email:      mads.ipsen at gmail.com |
> | Denmark              | map  :   www.tinyurl.com/ns52fpa |
> +----------------------+----------------------------------+