Hi, I have run into a potential 'for loop' bottleneck. Let me outline: The following array describes bonds (connections) in a benzene molecule b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5]] ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In practical examples, the list can be much larger (N > 100.000 connections. Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds connecting those atoms must be deleted. I achieve this doing i_0 = numpy.in1d(b[0], a) i_1 = numpy.in1d(b[1], a) b_i = numpy.where(i_0 | i_1)[0] b = b[:,~(i_0 | i_1)] If you find this approach lacking, feel free to comment. This results in the following updated bond list b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]] This list is however not correct: Since atoms [1,2,3,7,8] have been deleted, the remaining atoms with indices larger than the deleted atoms must be decremented. I do this as follows: for i in a: b = numpy.where(b > i, bonds-1, bonds) (*) yielding the correct result b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]] The Python for loop in (*) may easily contain 50.000 iteration. Is there a smart way to utilize numpy functionality to avoid this? Thanks and best regards, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | Gåsebæksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen@gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+
On Sun, Feb 2, 2014 at 2:58 PM, Mads Ipsen <mads.ipsen@gmail.com> wrote:
Since atoms [1,2,3,7,8] have been deleted, the remaining atoms with indices larger than the deleted atoms must be decremented.
Let
x array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
and
i = [1, 0, 2]
Create a shape of x matrix with 1's at (k, i[k]) and zeros elsewhere
b = zeros_like(x) b.put(i + arange(3)*4 + 1, 1) # there must be a simpler way
x - b.cumsum(1) array([[ 0, 1, 1, 2], [ 4, 4, 5, 6], [ 8, 9, 10, 10]])
seems to be the result you want.
Cannot test right now, but np.unique(b, return_inverse=True)[1].reshape(2, -1) should do what you are after, I think. On Feb 2, 2014 11:58 AM, "Mads Ipsen" <mads.ipsen@gmail.com> wrote:
Hi,
I have run into a potential 'for loop' bottleneck. Let me outline:
The following array describes bonds (connections) in a benzene molecule
b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5]]
ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In practical examples, the list can be much larger (N > 100.000 connections.
Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds connecting those atoms must be deleted. I achieve this doing
i_0 = numpy.in1d(b[0], a) i_1 = numpy.in1d(b[1], a) b_i = numpy.where(i_0 | i_1)[0] b = b[:,~(i_0 | i_1)]
If you find this approach lacking, feel free to comment.
This results in the following updated bond list
b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]]
This list is however not correct: Since atoms [1,2,3,7,8] have been deleted, the remaining atoms with indices larger than the deleted atoms must be decremented. I do this as follows:
for i in a: b = numpy.where(b > i, bonds-1, bonds) (*)
yielding the correct result
b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]
The Python for loop in (*) may easily contain 50.000 iteration. Is there a smart way to utilize numpy functionality to avoid this?
Thanks and best regards,
Mads
-- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | Gåsebæksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen@gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Seconding Jaime; I use this trick in mesh manipulations a lot as well. There are a lot of graph-type manipulations you can express effectively in numpy using np.unique and related functionality. On Sun, Feb 2, 2014 at 11:57 PM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
Cannot test right now, but np.unique(b, return_inverse=True)[1].reshape(2, -1) should do what you are after, I think. On Feb 2, 2014 11:58 AM, "Mads Ipsen" <mads.ipsen@gmail.com> wrote:
Hi,
I have run into a potential 'for loop' bottleneck. Let me outline:
The following array describes bonds (connections) in a benzene molecule
b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5]]
ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In practical examples, the list can be much larger (N > 100.000 connections.
Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds connecting those atoms must be deleted. I achieve this doing
i_0 = numpy.in1d(b[0], a) i_1 = numpy.in1d(b[1], a) b_i = numpy.where(i_0 | i_1)[0] b = b[:,~(i_0 | i_1)]
If you find this approach lacking, feel free to comment.
This results in the following updated bond list
b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]]
This list is however not correct: Since atoms [1,2,3,7,8] have been deleted, the remaining atoms with indices larger than the deleted atoms must be decremented. I do this as follows:
for i in a: b = numpy.where(b > i, bonds-1, bonds) (*)
yielding the correct result
b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]
The Python for loop in (*) may easily contain 50.000 iteration. Is there a smart way to utilize numpy functionality to avoid this?
Thanks and best regards,
Mads
-- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | Gåsebæksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen@gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 2 February 2014 20:58, Mads Ipsen <mads.ipsen@gmail.com> wrote:
ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In practical examples, the list can be much larger (N > 100.000 connections.
Perhaps you should consider an alternative approach. You could consider it a graph, and you could use Networkx or Scipy to work with them (provided it actually works well with the rest of your problem) In the case of Scipy, the graph is described by its adjacency matrix, and you just want to delete a row and a column. But, in any case, not knowing at all what is your overall project, renumbering nodes is not something one has to usually do when working with graphs, except for final results. The labels are that, labels, with no further meaning. /David.
participants (5)
-
Alexander Belopolsky
-
Daπid
-
Eelco Hoogendoorn
-
Jaime Fernández del Río
-
Mads Ipsen