I finally got it to work, but the Numpyized version runs slower than the plain Python one. I think that I can transpose the NodeCord matrix once in the program and feed that in, rather than the scratch matrix that is generated here. Evidently transpose is not very fast even for smal matrices. Here is my test program:
from Numeric import *
Qpnt=array(([20,21,22],[23,24,25],[26,27,28])) NodeCord=array(([1,2,3],[4,5,6],[7,8,9])) TrngleNode=array((1,2,0))
#the original routine def ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord):
SrcPointCol=zeros((3))
SrcPointCol[0] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],0]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],0]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],0]
SrcPointCol[1] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],1]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],1]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],1]
SrcPointCol[2] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],2]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],2]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],2]
return SrcPointCol
#the yettobefaster routine
def Quad(QuadPoint, TrngleNode, Qpnt,NodeCord): s = Qpnt[QuadPoint,:] c= take(NodeCord, TrngleNode) SrcPointCol= add.reduce(s * transpose(c),1)
return SrcPointCol
QuadPoint=1
print "The Correct:" print ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord)
print "The New" print Quad(QuadPoint,TrngleNode,Qpnt,NodeCord)
Hi Rob,
It looks like you are trying to use the function below to integrate over a single triangle. I'm almost certain that you will _not_ be able to get this to run fast; there's just too much overhead per triangle from all the function calls and what not. Instead you need to structure things so that you do more work per call. One way is to pass a list of triangles and get back a list of answers. This way you spread your area over many triangles. Another possibility, assuming that you need to evaluate the integral for all seven possible values of QPoint each time, is to compute the answer for all values of QPoint at once so that you reduce the overhead per triangle by seven. For example (minimally tested):
Qpnt=reshape(arange(7*3), (7,3)) # Other code from other messages elided....
def Quads(TrngleNode, Qpnt, NodeCord): c = take(NodeCord, TrngleNode) SrcPointCols= add.reduce(Qpnt[...,NewAxis] * c[NewAxis,],1) return SrcPointCols
# Initialize stuff to minimize effects of timing.... from time import clock q1 = [None]*7 q2 = [None]*7 rng = range(7)
# Time the three methods....
t0 = clock() for i in rng: q1[i] = ComputeGaussQuadPoint(i,TrngleNode,Qpnt,NodeCord) t1 = clock() for i in rng: q2[i] = Quad(i,TrngleNode,Qpnt,NodeCord) t2 = clock() q3 = Quads(TrngleNode, Qpnt, NodeCord) t3 = clock()
# And the results...
print "Times (us):", (t1t0)*1e6, (t2t1)*1e6, (t3t2)*1e6 for i in range(7): print "The Answers:", q1[i], q2[i], q3[i]
If this code is still a bottleneck, you could do both (compute the values for all nodes and all values of QPoint at once, but this may be enough to move the bottleneck elsewhere; by my timing it's over 7 times as fast.
tim
[snip]
Evidently transpose is not very fast even for smal matrices.
Actually, transpose should be fast, and should look progressively faster for larger matrices. Typically all that happens is that the strides are changed.
 Original Message  From: "Rob" europax@home.com To: numpydiscussion@lists.sourceforge.net Sent: Monday, August 27, 2001 7:49 AM Subject: [Numpydiscussion] Gaussian Quadrature routine Numpyization :)
I finally got it to work, but the Numpyized version runs slower than the plain Python one. I think that I can transpose the NodeCord matrix once in the program and feed that in, rather than the scratch matrix that is generated here. Evidently transpose is not very fast even for smal matrices. Here is my test program:
from Numeric import *
Qpnt=array(([20,21,22],[23,24,25],[26,27,28])) NodeCord=array(([1,2,3],[4,5,6],[7,8,9])) TrngleNode=array((1,2,0))
#the original routine def ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord):
SrcPointCol=zeros((3)) SrcPointCol[0] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],0]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],0]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],0] SrcPointCol[1] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],1]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],1]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],1] SrcPointCol[2] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],2]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],2]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],2] return SrcPointCol
#the yettobefaster routine
def Quad(QuadPoint, TrngleNode, Qpnt,NodeCord): s = Qpnt[QuadPoint,:] c= take(NodeCord, TrngleNode) SrcPointCol= add.reduce(s * transpose(c),1)
return SrcPointCol
QuadPoint=1
print "The Correct:" print ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord)
print "The New" print Quad(QuadPoint,TrngleNode,Qpnt,NodeCord)
 The Numeric Python EM Project
www.members.home.net/europax
Numpydiscussion mailing list Numpydiscussion@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/numpydiscussion
Hi Tim, Yes this is part of a FEMMOM hybrid EM simulator. It was originally written in C, so I've had to deal with the C way of doing things. I'd like to consolidate many of these types of operations so that the program becomes streamlined and easier to understand "The Python Way" TM. :)
I will try your method to see how it works. By contrast, I have a FDTD simulator which is a true speed demon with Numpy. But it was organized in a way that I would easily use slicing to time march the 6 scalar Maxwell's equations.
In any case I'm having fun. I do this at home. At work I have Ansoft Serenade and Agilent ADS at my disposal.
Rob.
Tim Hochberg wrote:
Hi Rob,
It looks like you are trying to use the function below to integrate over a single triangle. I'm almost certain that you will _not_ be able to get this to run fast; there's just too much overhead per triangle from all the function calls and what not. Instead you need to structure things so that you do more work per call. One way is to pass a list of triangles and get back a list of answers. This way you spread your area over many triangles. Another possibility, assuming that you need to evaluate the integral for all seven possible values of QPoint each time, is to compute the answer for all values of QPoint at once so that you reduce the overhead per triangle by seven. For example (minimally tested):
Qpnt=reshape(arange(7*3), (7,3)) # Other code from other messages elided....
def Quads(TrngleNode, Qpnt, NodeCord): c = take(NodeCord, TrngleNode) SrcPointCols= add.reduce(Qpnt[...,NewAxis] * c[NewAxis,],1) return SrcPointCols
# Initialize stuff to minimize effects of timing.... from time import clock q1 = [None]*7 q2 = [None]*7 rng = range(7)
# Time the three methods....
t0 = clock() for i in rng: q1[i] = ComputeGaussQuadPoint(i,TrngleNode,Qpnt,NodeCord) t1 = clock() for i in rng: q2[i] = Quad(i,TrngleNode,Qpnt,NodeCord) t2 = clock() q3 = Quads(TrngleNode, Qpnt, NodeCord) t3 = clock()
# And the results...
print "Times (us):", (t1t0)*1e6, (t2t1)*1e6, (t3t2)*1e6 for i in range(7): print "The Answers:", q1[i], q2[i], q3[i]
If this code is still a bottleneck, you could do both (compute the values for all nodes and all values of QPoint at once, but this may be enough to move the bottleneck elsewhere; by my timing it's over 7 times as fast.
tim
[snip]
Evidently transpose is not very fast even for smal matrices.
Actually, transpose should be fast, and should look progressively faster for larger matrices. Typically all that happens is that the strides are changed.
 Original Message  From: "Rob" europax@home.com To: numpydiscussion@lists.sourceforge.net Sent: Monday, August 27, 2001 7:49 AM Subject: [Numpydiscussion] Gaussian Quadrature routine Numpyization :)
I finally got it to work, but the Numpyized version runs slower than the plain Python one. I think that I can transpose the NodeCord matrix once in the program and feed that in, rather than the scratch matrix that is generated here. Evidently transpose is not very fast even for smal matrices. Here is my test program:
from Numeric import *
Qpnt=array(([20,21,22],[23,24,25],[26,27,28])) NodeCord=array(([1,2,3],[4,5,6],[7,8,9])) TrngleNode=array((1,2,0))
#the original routine def ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord):
SrcPointCol=zeros((3)) SrcPointCol[0] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],0]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],0]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],0] SrcPointCol[1] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],1]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],1]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],1] SrcPointCol[2] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],2]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],2]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],2] return SrcPointCol
#the yettobefaster routine
def Quad(QuadPoint, TrngleNode, Qpnt,NodeCord): s = Qpnt[QuadPoint,:] c= take(NodeCord, TrngleNode) SrcPointCol= add.reduce(s * transpose(c),1)
return SrcPointCol
QuadPoint=1
print "The Correct:" print ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord)
print "The New" print Quad(QuadPoint,TrngleNode,Qpnt,NodeCord)
 The Numeric Python EM Project
www.members.home.net/europax
Numpydiscussion mailing list Numpydiscussion@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/numpydiscussion
Numpydiscussion mailing list Numpydiscussion@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/numpydiscussion
Tim, I used your routine, computing all of the triangles at once and then feeding the resulting array back into the needy functions. I shaved 30% off the execution time!! There is another routine which computes correction terms which can use the "take()" method. I'll work on that tomorrow. Thanks alot!
Rob.
ps. I'll have to study the Numpy manual to figure out how your routine works.
Tim Hochberg wrote:
Hi Rob,
It looks like you are trying to use the function below to integrate over a single triangle. I'm almost certain that you will _not_ be able to get this to run fast; there's just too much overhead per triangle from all the function calls and what not. Instead you need to structure things so that you do more work per call. One way is to pass a list of triangles and get back a list of answers. This way you spread your area over many triangles. Another possibility, assuming that you need to evaluate the integral for all seven possible values of QPoint each time, is to compute the answer for all values of QPoint at once so that you reduce the overhead per triangle by seven. For example (minimally tested):
Qpnt=reshape(arange(7*3), (7,3)) # Other code from other messages elided....
def Quads(TrngleNode, Qpnt, NodeCord): c = take(NodeCord, TrngleNode) SrcPointCols= add.reduce(Qpnt[...,NewAxis] * c[NewAxis,],1) return SrcPointCols
# Initialize stuff to minimize effects of timing.... from time import clock q1 = [None]*7 q2 = [None]*7 rng = range(7)
# Time the three methods....
t0 = clock() for i in rng: q1[i] = ComputeGaussQuadPoint(i,TrngleNode,Qpnt,NodeCord) t1 = clock() for i in rng: q2[i] = Quad(i,TrngleNode,Qpnt,NodeCord) t2 = clock() q3 = Quads(TrngleNode, Qpnt, NodeCord) t3 = clock()
# And the results...
print "Times (us):", (t1t0)*1e6, (t2t1)*1e6, (t3t2)*1e6 for i in range(7): print "The Answers:", q1[i], q2[i], q3[i]
If this code is still a bottleneck, you could do both (compute the values for all nodes and all values of QPoint at once, but this may be enough to move the bottleneck elsewhere; by my timing it's over 7 times as fast.
tim
[snip]
Evidently transpose is not very fast even for smal matrices.
Actually, transpose should be fast, and should look progressively faster for larger matrices. Typically all that happens is that the strides are changed.
 Original Message  From: "Rob" europax@home.com To: numpydiscussion@lists.sourceforge.net Sent: Monday, August 27, 2001 7:49 AM Subject: [Numpydiscussion] Gaussian Quadrature routine Numpyization :)
I finally got it to work, but the Numpyized version runs slower than the plain Python one. I think that I can transpose the NodeCord matrix once in the program and feed that in, rather than the scratch matrix that is generated here. Evidently transpose is not very fast even for smal matrices. Here is my test program:
from Numeric import *
Qpnt=array(([20,21,22],[23,24,25],[26,27,28])) NodeCord=array(([1,2,3],[4,5,6],[7,8,9])) TrngleNode=array((1,2,0))
#the original routine def ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord):
SrcPointCol=zeros((3)) SrcPointCol[0] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],0]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],0]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],0] SrcPointCol[1] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],1]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],1]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],1] SrcPointCol[2] = Qpnt[QuadPoint,0]*NodeCord[TrngleNode[0],2]\ + Qpnt[QuadPoint,1]*NodeCord[TrngleNode[1],2]\ + Qpnt[QuadPoint,2]*NodeCord[TrngleNode[2],2] return SrcPointCol
#the yettobefaster routine
def Quad(QuadPoint, TrngleNode, Qpnt,NodeCord): s = Qpnt[QuadPoint,:] c= take(NodeCord, TrngleNode) SrcPointCol= add.reduce(s * transpose(c),1)
return SrcPointCol
QuadPoint=1
print "The Correct:" print ComputeGaussQuadPoint(QuadPoint,TrngleNode,Qpnt,NodeCord)
print "The New" print Quad(QuadPoint,TrngleNode,Qpnt,NodeCord)
 The Numeric Python EM Project
www.members.home.net/europax
Numpydiscussion mailing list Numpydiscussion@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/numpydiscussion
participants (2)

Rob

Tim Hochberg