[Numpy-discussion] Fast decrementation of indices
Rick White
rlw at stsci.edu
Mon Feb 3 08:36:21 EST 2014
I think you'll find the algorithm below to be a lot faster, especially if the arrays are big. Checking each array index against the list of included or excluded elements is must slower than simply creating a secondary array and looking up whether the elements are included or not.
b = np.array([
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11],
[5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5]
])
a = [1,2,3,7,8]
keepdata = np.ones(12, dtype=np.bool)
keepdata[a] = False
w = np.where(keepdata[b[0]] & keepdata[b[1]])
newindex = keepdata.cumsum()-1
c = newindex[b[:,w[0]]]
Cheers,
Rick
On 2 February 2014 20:58, Mads Ipsen <mads.ipsen at gmail.com> wrote:
> Hi,
>
> I have run into a potential 'for loop' bottleneck. Let me outline:
>
> The following array describes bonds (connections) in a benzene molecule
>
> b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7,
> 8, 9, 10, 11],
> [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1,
> 2, 3, 4, 5]]
>
> ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
> practical examples, the list can be much larger (N > 100.000 connections.
>
> Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds
> connecting those atoms must be deleted. I achieve this doing
>
> i_0 = numpy.in1d(b[0], a)
> i_1 = numpy.in1d(b[1], a)
> b_i = numpy.where(i_0 | i_1)[0]
> b = b[:,~(i_0 | i_1)]
>
> If you find this approach lacking, feel free to comment.
>
> This results in the following updated bond list
>
> b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11]
> [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]]
>
> This list is however not correct: Since atoms [1,2,3,7,8] have been
> deleted, the remaining atoms with indices larger than the deleted atoms
> must be decremented. I do this as follows:
>
> for i in a:
> b = numpy.where(b > i, bonds-1, bonds) (*)
>
> yielding the correct result
>
> b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6],
> [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]
>
> The Python for loop in (*) may easily contain 50.000 iteration. Is there
> a smart way to utilize numpy functionality to avoid this?
>
> Thanks and best regards,
>
> Mads
>
> --
> +---------------------------------------------------------+
> | Mads Ipsen |
> +----------------------+----------------------------------+
> | G?seb?ksvej 7, 4. tv | phone: +45-29716388 |
> | DK-2500 Valby | email: mads.ipsen at gmail.com |
> | Denmark | map : www.tinyurl.com/ns52fpa |
> +----------------------+----------------------------------+
More information about the NumPy-Discussion
mailing list