Fw: [SciPy-dev] scipy.weave versus simple C++.

Hey Prabhu, Here is a near verbatim conversion of your laplace to use Python/Inline. I didn't spend much time, and there are errors. The results don't match. You'll get the general idea from it though. I used blitz arrays because they're indexing is kinda like double** indexing so the conversion was easy. Standard Numeric arrays don't have the pointer arrays to the rows that C uses, so your code wouldn't translate easily to it without macros or calculating offsets -- you could do this though. I'm not sure that indexing blitz arrays is as fast as indexing normal arrays -- this example would suggest not: 500x500, 100 iterations, PII 850 laptop, W2K, gcc-2.95.2 laplace.py: 3.47 seconds laplace.cxx: 2.34 seconds The Python calling overhead should be close to nil, so there is no reason why the python version should be slower, other than the fact that I used blitz arrays and indexing. But the current implementation still pays a 50% penalty for using Python. eric --------------------------------------------------------- # laplace.py import time try: from scipy import weave from scipy import * except: import weave from Numeric import * from weave.blitz_tools import blitz_type_factories def BC(x,y): return x**2-y**2 class Grid: def __init__(self,shape,typecode=Float64): self.shape = shape # should really handle typecode here. self.dx = 1.0 / shape[0] - 1 self.dy = 1.0 / shape[1] - 1 self.u = zeros(shape,typecode) def setBCFunc(self,f): xmin, ymin, xmax, ymax = 0.0, 0.0, 1.0, 1.0 x = arange(self.shape[0])*self.dy y = arange(self.shape[1])*self.dy self.u[0 ,:] = f(xmin,y) self.u[-1,:] = f(xmax,y) self.u[:, 0] = f(x,ymin) self.u[:,-1] = f(x,ymax) class LaplaceSolver: def __init__(self,grid): self.g = grid def timeStep(self, dt = 0.0): dx2 = self.g.dx**2 dy2 = self.g.dy**2 dnr_inv = .5/(dx2+dy2) nx, ny = self.g.shape u = self.g.u code = """ #line 39 "laplace.py" double tmp, err, diff; for (int i=1; i<nx-1; ++i) { for (int j=1; j<ny-1; ++j) { tmp = u(i,j); u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2 + (u(i,j-1) + u(i,j+1))*dx2)*dnr_inv; diff = u(i,j) - tmp; err += diff*diff; } } return_val = Py::new_reference_to(Py::Float(sqrt(err))); """ # compiler keyword only needed on windows with MSVC installed err = weave.inline(code, ['u', 'dx2', 'dy2', 'dnr_inv', 'nx','ny'], type_factories = blitz_type_factories, compiler = 'gcc') return err def solve(self, n_iter=0, eps=1e-16): err = self.timeStep() #print err count = 1 while err > eps: if n_iter and (count > n_iter): return err; err = self.timeStep() #print err count += 1 return count if __name__ == "__main__": grid = Grid((500,500)) grid.setBCFunc(BC) s = LaplaceSolver(grid) t1 = time.time() err = s.solve(20) t2 = time.time() print "Iterations took ", t2 - t1, " seconds." print "Error: ", err

Hi,
"eric" == eric <eric@scipy.org> writes:
eric> Here is a near verbatim conversion of your laplace to use eric> Python/Inline. I didn't spend much time, and there are eric> errors. The results don't match. You'll get the general eric> idea from it though. This is wonderful! Thanks *very* much. I spent the last hour and a half looking over this code over a text terminal (since my desktop's keyboard is a pain to use). I fixed the errors. I didnt realize that it would be so easy to inline the innermost loop. The Numeric/copy/transpose example got me confused a bit. Later on today I'll try to create a simple Python version of your and my code and maybe write up a small document on the different ways to solve this problem using Python (i.e. slow, numeric, inline and blitz). I'll remove all the fancy stuff and clean up the Python code and keep the sample c++ example the way it is. I think it should make for a decent benchmark and introductory document to anyone interested in Python and speed. A comparison with c++ would really prove the point as to why developing with Python is a great idea. :) eric> I used blitz arrays because they're indexing is kinda like eric> double** indexing so the conversion was easy. Standard eric> Numeric arrays don't have the pointer arrays to the rows eric> that C uses, so your code wouldn't translate easily to it eric> without macros or calculating offsets -- you could do this eric> though. I'm not sure that indexing blitz arrays is as fast eric> as indexing normal arrays -- this example would suggest not: Yes, at this point in time I'm happy with the simplest solution. The inline code is pretty much next to trivial -- just a translation to straight C++. Thats really cool! Great job, Eric!! eric> 500x500, 100 iterations, PII 850 laptop, W2K, gcc-2.95.2 eric> laplace.py: 3.47 seconds laplace.cxx: 2.34 seconds Wow! Thats not bad at all! Considering that the example is completely scriptable and trivially extendible with a zillion other Python tools (the standard lib, scipy.plt, gracePlot, mayavi whatnot). eric> The Python calling overhead should be close to nil, so there eric> is no reason why the python version should be slower, other eric> than the fact that I used blitz arrays and indexing. But eric> the current implementation still pays a 50% penalty for eric> using Python. Hmm, maybe it has something to do with the way we are looping. I'll experiment with all this later and mail the list. However, a 50% penalty isnt so bad at all, I guess most folks can live with it. I have more generic questions which I'll leave for later. For now I have enough to do and think about. Eric, my hearty congratulations to you on weave! Great job!! prabhu

Hi Prabhu and Eric, Just from a curiousity I tried your test also using Fortran and here are the results (now comparing only cxx and Fortran): 500x500, 100 iterations, PII 400 laptop, Linux, gcc-2.95.4 Iterations took 6.12530398369 seconds. Fortran iterations took 3.15447306633 seconds. So, using Fortran you get speed up approx. _two_ times compared to cxx!!! If one would use optimized blas and laplack functions, the speed up could be even greater. Below is the testing code that I used. Here is how I used it, step by step, to illustrate that using Fortran from python is very simple: 1) Build fortran wrapper: f2py -c flaplace.f -m flaplace # here you need f2py from CVS As a result you'll get flaplace.so in the current directory. 2) Run tests: $ python laplace.py Iterations took 6.51088702679 seconds. Error: 38047.7567709 Fortran iterations took 3.22211503983 seconds. Error: 21778.5117188 Regards, Pearu -------------------- c File flaplace.f subroutine timestep(u,n,m,dx,dy,error) double precision u(n,m) double precision dx,dy,dx2,dy2,dnr_inv,tmp,err,diff integer n,m,i,j cf2py intent(in) :: dx,dy cf2py intent(in,out) :: u cf2py intent(out) :: error cf2py intent(hide) :: n,m dx2 = dx*dx dy2 = dy*dy dnr_inv = 0.5d0 / (dx2+dy2) error = 0 do 200,i=2,n-1 do 100,j=2,m-1 tmp = u(i,j) u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2+ & (u(i,j-1) + u(i,j+1))*dx2)*dnr_inv diff = u(i,j) - tmp error = error + diff*diff 100 continue 200 continue error = sqrt(error) end ------------------- # modifications to laplace.py import flaplace # New method to LaplaceSolver class def ftimeStep(self,dt = 0.0): u,err = flaplace.timestep(self.g.u,self.g.dx,self.g.dy) return err # Slightly modified solve method: def solve(self, n_iter=0, eps=1e-16,fortran = 0): if fortran: timeStep = self.ftimeStep else: timeStep = self.timeStep err = timeStep() #print err count = 1 while err > eps: if n_iter and (count > n_iter): return err; err = timeStep() #print err count += 1 return count # Slightly modified runner: t1 = time.time() err = s.solve(100,fortran) t2 = time.time() print "Iterations took ", t2 - t1, " seconds." print "Error: ", err t1 = time.time() err = s.solve(100,fortran=1) t2 = time.time() print "Fortran iterations took ", t2 - t1, " seconds." print "Error: ", err --------------- EOF message

Hi,
"PP" == Pearu Peterson <pearu@cens.ioc.ee> writes:
PP> Just from a curiousity I tried your test also using Fortran PP> and here are the results (now comparing only cxx and Fortran): [snip] PP> So, using Fortran you get speed up approx. _two_ times PP> compared to cxx!!! If one would use optimized blas and PP> laplack functions, the speed up could be even greater. PP> Below is the testing code that I used. Here is how I used it, PP> step by step, to illustrate that using Fortran from python is PP> very simple: Sorry for the delayed response. I was out for most of the day. Your example is pretty awesome! Thanks! f2py is really very cool! I'll include this example also in my test script. One point to note is that my cxx example was pretty straight forward. I'm pretty sure that there might be ways to improve the speed. I'll try and see if I can get that done. I think it would be very nice if there were a way to write inline fortran code and use f2py to generate the code on the fly and then also use weave's catalog etc. to handle this transparently. I guess that this should not be hard to do since f2py seems to do most of the dirty work. If Eric could do it for C++ I guess Fortran should be very easy for him. :) I know too little of f2py or weave to actually write this but FWIW here is some pseudo code that is a start. I looked at inline briefly and it looks like the following does something like what it does. def magic_generate_cf2py(in_=[], out=[], in_out=[], hide=[]): d = {'in': in_, 'out':out, 'in,out': in_out, 'hide':hide} cf2py = "" for key, val in d.items(): if val: cf2py += 'cf2py intent(%s) :: '%key for i in val: cf2py += i + ', ' cf2py += '\n' return cf2py def inlinef(code, arg_names=[], local_dict=None, global_dict=None in_=[], out=[], in_out=[], hide=[]): # check if already catalogued. If so simply return catalogued # func. function = magic_check_catalog(code) if function: return function # figure out the type of the args and add the declarations at the # top. declare = magic_declare_headers() # get local dict and global dict local_dict, global_dict = magic_code() # get arg list. args = "" for name in arg_names: args += name + ", " # get cf2py stuff cf2py = magic_generate_cf2py(in_, out, in_out, hide) f_code = """c Inline Fortran code. subroutine func_call(%(args)s) %(declare)s %(cf2py)s %(code)s"""%locals() # write fortran code to file and run f2py on it. file = open(file_name, "w") print >> file, header os.system("f2py -c %s -m %s"%(file_name, mod_name)) # load the module and get the function. function = magic_load_func() # catalog the function for future use. magic_catalog(code, function) return function prabhu

Iterations took 6.12530398369 seconds. Fortran iterations took 3.15447306633 seconds.
So, using Fortran you get speed up approx. _two_ times compared to cxx!!!
Hey! That is larger than I would have expected. Fortran often has a slight advantage ( a few percent), but not a factor of 2. Several things could be in play here. (1) Prabhu and I have no clue how to set compiler switches in gcc for best results. (very possible) (2) Your using a better optimizing Fortran compiler (wonder what the Intel or KAI compiler would do on the C++ code). Which one are you using? If it is g77, then this is exhibit A for supporting (1). (3) If comparisons are made against Prabhu's C++ code instead of my extension module which (mis) uses blitz++ arrays for indexing, would only show a 33% improvement. Correct? This is starting to be more what I'd expect. Anyway, I'd be surprised if the difference is this great in the final analysis -- I'll bet less than 10% difference either way.
If one would use optimized blas and laplack functions, the speed up could be even greater.
Not sure about the difference here. Which blas functions would you use for this? Also, you can use optimized blas/lapack from C just as easily. Keep in mind that ATLAS is pretty much as fast as it gets for BLAS/LAPACK and it is all written in C with special assembly for some CPUs (Intel/AMD/PowerPC). Still, in the end, I would like to have a version of inline that works for Fortran. It's not a large project, but also not one I have time for now. If anyone is interested, drop me a line, and I'll point out how to do it. On the usability side of inline Fortran, the array transpose issue is confusing to people (including me), and, for speed, you have to leave the details of this up to the user. We're just stuck with this. eric
Below is the testing code that I used. Here is how I used it, step by step, to illustrate that using Fortran from python is very simple:
1) Build fortran wrapper: f2py -c flaplace.f -m flaplace # here you need f2py from CVS
As a result you'll get flaplace.so in the current directory.
2) Run tests: $ python laplace.py Iterations took 6.51088702679 seconds. Error: 38047.7567709 Fortran iterations took 3.22211503983 seconds. Error: 21778.5117188
Regards, Pearu
-------------------- c File flaplace.f subroutine timestep(u,n,m,dx,dy,error) double precision u(n,m) double precision dx,dy,dx2,dy2,dnr_inv,tmp,err,diff integer n,m,i,j cf2py intent(in) :: dx,dy cf2py intent(in,out) :: u cf2py intent(out) :: error cf2py intent(hide) :: n,m dx2 = dx*dx dy2 = dy*dy dnr_inv = 0.5d0 / (dx2+dy2) error = 0 do 200,i=2,n-1 do 100,j=2,m-1 tmp = u(i,j) u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2+ & (u(i,j-1) + u(i,j+1))*dx2)*dnr_inv diff = u(i,j) - tmp error = error + diff*diff 100 continue 200 continue error = sqrt(error) end
------------------- # modifications to laplace.py
import flaplace
# New method to LaplaceSolver class def ftimeStep(self,dt = 0.0): u,err = flaplace.timestep(self.g.u,self.g.dx,self.g.dy) return err
# Slightly modified solve method: def solve(self, n_iter=0, eps=1e-16,fortran = 0): if fortran: timeStep = self.ftimeStep else: timeStep = self.timeStep err = timeStep() #print err count = 1 while err > eps: if n_iter and (count > n_iter): return err; err = timeStep() #print err count += 1 return count
# Slightly modified runner: t1 = time.time() err = s.solve(100,fortran) t2 = time.time() print "Iterations took ", t2 - t1, " seconds." print "Error: ", err t1 = time.time() err = s.solve(100,fortran=1) t2 = time.time() print "Fortran iterations took ", t2 - t1, " seconds." print "Error: ", err
--------------- EOF message
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.net http://www.scipy.net/mailman/listinfo/scipy-dev

On Sat, 12 Jan 2002, eric wrote:
Iterations took 6.12530398369 seconds. Fortran iterations took 3.15447306633 seconds.
Hey! That is larger than I would have expected. Fortran often has a slight advantage ( a few percent), but not a factor of 2. Several things could be in play here.
(1) Prabhu and I have no clue how to set compiler switches in gcc for best results. (very possible) (2) Your using a better optimizing Fortran compiler (wonder what the Intel or KAI compiler would do on the C++ code). Which one are you using? If it is g77, then this is exhibit A for supporting (1). (3) If comparisons are made against Prabhu's C++ code instead of my extension module which (mis) uses blitz++ arrays for indexing, would only show a 33% improvement. Correct? This is starting to be more what I'd expect.
I used the same code (laplace.py) that was in your message to this list.
Anyway, I'd be surprised if the difference is this great in the final analysis -- I'll bet less than 10% difference either way.
I used g77 in these tests. But indeed, also with heavy optimizations switches turned on. Here are other results with different optimization flags enabled: 1) g77 -02 (standard) Iterations took 6.34938693047 seconds. Error: 38047.7567709 Fortran iterations took 3.69506299496 seconds. Error: 21778.5117188 2) g77 (no optimization) Iterations took 6.44155204296 seconds. Error: 38047.7567709 Fortran iterations took 3.82356095314 seconds. Error: 21778.5117188 3) g77 -O3 -funroll-loops -march=i686 -malign-double (machine depend.) Iterations took 6.34858202934 seconds. Error: 38047.7567709 Fortran iterations took 3.29500699043 seconds. Error: 21778.5117188 4) ifc -O3 -tpp6 -xi (this is Intel Fortran 90 compiler for Linux) Iterations took 6.38630092144 seconds. Error: 38047.7567709 Fortran iterations took 2.84432804585 seconds. Error: 21779.0996094 Note that even without any optimization and with g77, the factor is still quite high: 1.6. And with Intel compiler this is 2.2. So, I would not be so pessimistic about Fortran capabilities. I guess that these numbers also depend on the particular test, and the current one just fits for Fortran. Anyway, I don't think that people would use Fortran if the gain in speed would be only few percent. It must be higher than that. Regards, Pearu

Hey Pearu, <snip Fortran compared to Blitz>
Iterations took 6.12530398369 seconds. Fortran iterations took 3.15447306633 seconds.
I've rewritten the code in C++ as quickly as I know how and added a few optimizations. It has the same errors as the previous example, but, again, is useful for benchmarking. (maybe Prabhu will fix my bugs in this code also. :) The results are as follows: Blitz indexing: Iterations took 3.54500007629 seconds. After re-write: iterations took 1.77300000191 seconds. With the re-write, C++ is in the same performance range as the fortran code. My timestep loop is included below. It uses pointer math instead of blitz arrays to reduce the effort of array element lookups down to incrementing pointers in the inner loop. It isn't exactly pretty, but we were looking for speed, not beauty. I expect there are some ways to clean it up and still get about the same speed. I used the following optimizations: extra_compile_args = ['-O3','-malign-double','-funroll-loops'] The difference between using and not using -funroll-loops was significant (maybe 25%).
Anyway, I don't think that people would use Fortran if the gain in speed would be only few percent. It must be higher than that.
I think it used to be, but the recent years have been kinder to C than Fortran. Among other things, market forces have pushed C/C++ compilers to evolve more quickly than Fortran compilers simply because more people are interested in fast C/C++ compilers. The result is that C/C++ can be made to execute algorithms at close to the same speed as Fortran in most cases -- at least that has been my experience. As for gcc and g77, I think only the front ends of the compilers are different. The same back-end is shared by both. Still, there are lots of good reasons besides speed for scientist to use Fortran: 1. It is the language you know. (probably the main reason many still use it) 2. There are tons of scientific libraries available. 3. All your labs legacy code is in Fortran. 4. You don't have to mess with memory management. (debugging dangling pointers stinks) 5. You don't have to resort to tricks for speed. 6. You just like the language better. Of course, there are reasons not to use it also ( number 4 above can go on both the "for" and "against" lists), but all the above reasons are valid. Scientist are interested in the fastest way to results -- not in writing elegant, full featured, re-usable, scriptable programs. For one or more of the above reasons, some still consider Fortran their best choice to optimize this interest. Other scientist choose Matlab or IDL or even C. Hopefully we can increase the share that see Python+Numeric+SciPy+f2py+weave as a good choice also. This thread definitely needs to get summarized and put into a web page. see ya, eric ---------------------------------------------------------------------------- ------------------------------ # just replace the timeStep algorithm from the previous post with this one. def timeStep(self, dt = 0.0): dx2 = self.g.dx**2 dy2 = self.g.dy**2 dnr_inv = .5/(dx2+dy2) nx, ny = self.g.shape u = self.g.u code = """ #line 44 "laplace.py" double tmp, err, diff; double *uc, *uu, *ud, *ul, *ur; for (int i=1; i<nx-1; ++i) { uc = u_data+i*ny+1; ur = u_data+i*ny+2; ul = u_data+i*ny; ud = u_data+(i+1)*ny+1; uu = u_data+(i-1)*ny+1; for (int j=1; j<ny-1; ++j) { tmp = *uc; *uc = ((*ul + *ur)*dy2 + (*uu + *ud)*dx2)*dnr_inv; diff = *uc - tmp; err += diff*diff; uc++;ur++;ul++;ud++;uu++; } } return_val = Py::new_reference_to(Py::Float(sqrt(err))); """ #extra_compile_args = ['-O3','-funroll-loops','-march=i686','-malign-double'] # compiler keyword only needed on windows with MSVC installed err = weave.inline(code, ['u', 'dx2', 'dy2', 'dnr_inv', 'nx','ny'], compiler = 'gcc', extra_compile_args = ['-O3','-malign-double','-funroll-loops']) return err

"eric" == eric <eric@scipy.org> writes:
eric> I've rewritten the code in C++ as quickly as I know how and eric> added a few optimizations. It has the same errors as the eric> previous example, but, again, is useful for eric> benchmarking. (maybe Prabhu will fix my bugs in this code eric> also. :) The results are as follows: Yes, I'll try to fix this one also. I'll stop with this though. I hope no one decides to send in an inline assembler comparison. ;) eric> Blitz indexing: Iterations took 3.54500007629 seconds. eric> After re-write: iterations took 1.77300000191 seconds. eric> I think it used to be, but the recent years have been kinder eric> to C than Fortran. Among other things, market forces have eric> pushed C/C++ compilers to evolve more quickly than Fortran eric> compilers simply because more people are interested in fast eric> C/C++ compilers. The result is that C/C++ can be made to eric> execute algorithms at close to the same speed as Fortran in eric> most cases -- at least that has been my experience. As for eric> gcc and g77, I think only the front ends of the compilers eric> are different. The same back-end is shared by both. I also think that it is a known fact that if one uses something like blitz++ can acheive speed as good if not better than Fortran. eric> the above reasons are valid. Scientist are interested in eric> the fastest way to results -- not in writing elegant, full eric> featured, re-usable, scriptable programs. For one or more eric> of the above reasons, some still consider Fortran their best eric> choice to optimize this interest. Other scientist choose eric> Matlab or IDL or even C. Hopefully we can increase the eric> share that see Python+Numeric+SciPy+f2py+weave as a good eric> choice also. Yes. Unfortunately, dylan is not as complete or mature as Python is. To me it seems the best possible approach where you can rapidly prototype and later speed up your code all in the same language. eric> This thread definitely needs to get summarized and put into eric> a web page. Well, I'll work on a draft sometime next week and send it to you folks. Meanwhile I'm stuck pretty bad on f2py. Yeah, this discussion might not belong here but I need to get Pearu's example working if I am to include it. More on that later. prabhu

hi,
"eric" == eric <eric@scipy.org> writes:
eric> Hey Pearu, <snip Fortran compared to Blitz> >> > > Iterations took 6.12530398369 seconds. > > Fortran >> iterations took 3.15447306633 seconds. eric> I've rewritten the code in C++ as quickly as I know how and eric> added a few optimizations. It has the same errors as the eric> previous example, but, again, is useful for eric> benchmarking. (maybe Prabhu will fix my bugs in this code eric> also. :) The results are as follows: I've added all the stuff to my version of the code -- I still have to remove the mayavi dependency and maybe clean it up a bit but here are some prelim results. I have the following solution procedures. (a) slow Python - normal for loops and stuff. (b) numeric -- just there for testing the speed the results will be different since it uses temporaries. (c) blitz -- this uses the silly way to compute the error. (d) inline -- thanks to Eric's code slightly modfied. (e) fastinline -- Eric's new version (the only bug was that err should be set to zero before the loop). (f) fortran -- This is Pearu's version pretty much verbatim. (1) I made pretty sure that the output of weave/inline/fastinline/fortran are all very much alike. I found that there is some loss of precision in the fortan code but the results are pretty much identical. The errors were different by very small numbers. For a 500 by 500 grid I computed a few iterations using weave/inline/fastinline and compared them with the fortan results and they are the same to within 1 part in 1e-15 or something. (2) Here are the results of speed tests. I create a 500x500 grid and run it for 100 iterations. I do two runs and average the results. Just for the heck of it I ran the slow time step for 5 iterations multiplied the result by 20 to get total time for 100 iterations. slow: 1847.60 s == (time for 5 iterations)*20 numeric: 29.5 s blitz: 9.75 s inline: 4.60 s fortran: 3.24 s fastinline: 2.92 s Fastinline is about 630 times faster than the normal slow for loop! Wow! unbelievable!! Obviously, more speedup can be achieved but its clear that blitz, inline, fortran and fastinline are big winners and that they are all reasonably close. inline and fortran are about 40% off which is not bad at all. fastinline is about 10% faster than fortran but the code is dirty. prabhu

Hi Eric, here's a report from Mandrake again (a Mandrake 8.0 laptop with 512 Mb of ram, and stock packages for the most part -- python updated to 2.1.1 from Mandrake 8.1). Some system info: [~/scipy]> uname -a Linux maqroll 2.4.3-20mdk #1 Sun Apr 15 23:03:10 CEST 2001 i686 unknown [~/scipy]> gcc -v Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.96/specs gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk) [~/scipy]> python Python 2.1.1 (#1, Aug 30 2001, 17:36:05) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.61mdk)] on linux-i386 Type "copyright", "credits" or "license" for more information. First, I tried using the updated tarballs, and failed with: dep: ['scipy_distutils', 'scipy_test'] Traceback (most recent call last): File "setup.py", line 44, in ? stand_alone_package(with_dependencies) File "setup.py", line 23, in stand_alone_package config_dict = package_config(primary,dependencies) File "scipy_distutils/misc_util.py", line 174, in package_config config.extend([get_package_config(x) for x in primary]) File "scipy_distutils/misc_util.py", line 161, in get_package_config mod = __import__('setup_'+package_name) ImportError: No module named setup_weave Ron reported the same problem before with no answer, so I went for the cvs files instead. It might be worth fixing this though, as I suspect a lot of people are reluctant to use cvs. Now, using the cvs setup, I may well be doing something wrong. I just grabbed the whole scipy cvs tree, cd'ed into weave and tried import weave;weave.test() after removing my .python21_* stuff and all files in ~/tmp/ related to this. I made sure no other weave is available in sys.path to me. I don't get the 'Abort' I was getting before anymore, so that's good news :) But it still fails, though. Here's a summary: First, I get the following error message a zillion times: E/usr/local/home/fperez/.python21_compiled/44731/ext_string_and_int.cpp: In function `FILE *convert_to_file (PyObject *, char *)': /usr/local/home/fperez/.python21_compiled/44731/ext_string_and_int.cpp:53: `handle_conversion_error_type' undeclared (first use this function) /usr/local/home/fperez/.python21_compiled/44731/ext_string_and_int.cpp:53: (Each undeclared identifier is reported only once for each function it appears in.) This happens for many many files. I don't know how critical this particular problem is, and what incidence it has on the later stuff. Later, I get: Ewarning: specified build_dir '_bad_path_' does not exist or is or is not writable. Trying default locations ....warning: specified build_dir '_bad_path_' does not exist or is or is not writable. Trying default locations .................................................................................................. ====================================================================== ERROR: result[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] ---------------------------------------------------------------------- Traceback (most recent call last): File "weave/tests/test_blitz_tools.py", line 153, in check_5point_avg_2d self.generic_2d(expr) File "weave/tests/test_blitz_tools.py", line 127, in generic_2d mod_location) File "weave/tests/test_blitz_tools.py", line 84, in generic_test blitz_tools.blitz(expr,arg_dict,{},verbose=0) File "weave/blitz_tools.py", line 99, in blitz type_factories = blitz_type_factories) File "weave/inline_tools.py", line 402, in compile_function verbose=verbose, **kw) File "weave/ext_tools.py", line 347, in compile verbose = verbose, **kw) File "weave/build_tools.py", line 176, in build_extension setup(name = module_name, ext_modules = [ext],verbose=verb) File "/usr/lib/python2.1/distutils/core.py", line 157, in setup raise SystemExit, "error: " + str(msg) CompileError: error: command 'gcc' failed with exit status 1
From here on it's a long string of ERROR:... all of which end with a 'failed gcc' message. Things end with:
---------------------------------------------------------------------- Ran 184 tests in 267.801s FAILED (errors=28) I hope this is more useful than what I had before, and let me know if I can help more, or what I may be doing wrong. Since I now have the cvs tree I can easily update as needed and rerun the tests. I'm really looking forward to weave stabilizing, as I suspect I might have immediate need for it in my thesis. So I'll try to help as much as possible. Cheers, F.

Hey Fernando, You caught the CVS in a transitional state -- that is the error your getting with the FILE stuff. I started fixing an issue with exception handling that was bugging me this afternoon. It will be completed tonight. Also, instead of running from the scipy/weave directory of the CVS, try running from the scipy/ directory so that the weave package is in the current directory. Without installing weave explicitly, this is the best way to test it. Of course, wait till the above changes are finished before trying again (I announce this).
dep: ['scipy_distutils', 'scipy_test'] Traceback (most recent call last): File "setup.py", line 44, in ? stand_alone_package(with_dependencies) File "setup.py", line 23, in stand_alone_package config_dict = package_config(primary,dependencies) File "scipy_distutils/misc_util.py", line 174, in package_config config.extend([get_package_config(x) for x in primary]) File "scipy_distutils/misc_util.py", line 161, in get_package_config mod = __import__('setup_'+package_name) ImportError: No module named setup_weave
Hmmm. I'll look into this. I haven't had the problem when installing from the tar-ball, but *can* think of a reason that it might happen. I'll do a little investigation -- especially since I'd like to open it up to wider inspection tomorrow.
I hope this is more useful than what I had before, and let me know if I
can
help more, or what I may be doing wrong. Since I now have the cvs tree I can easily update as needed and rerun the tests.
I'll bet a nickle your not getting the Abort because the test that was causing the Abort fails even earlier, and doesn't allow the Abort to happen. Call me a pessimist... :)
I'm really looking forward to weave stabilizing, as I suspect I might have immediate need for it in my thesis. So I'll try to help as much as
possible. I'd like to work this out also. The cross platform issues are proving to be a headache. Also, last night we learned that the Python 2.1.1 version of dumbdbm (which Prabhu was using on Debian) is buggy. The new code hopefully solves these issues, but I'll bet it remains a trouble spot. Two things could help as far as Mandrake is concerned. (1) If I had a shell temporary shell account on a Mandrake machine, I could diagnose the issue the quickest. Barring that (2) we could set up a time to chat via Yahoo tomorrow or the next day so that you and I could have a co-debugging session. Let me know what works best for you. eric

On Sun, 13 Jan 2002, eric wrote:
Hey Fernando,
You caught the CVS in a transitional state -- that is the error your getting with the FILE stuff. I started fixing an issue with exception handling that was bugging me this afternoon. It will be completed tonight.
Ok. Updated cvs now.
dep: ['scipy_distutils', 'scipy_test'] Traceback (most recent call last): File "setup.py", line 44, in ? stand_alone_package(with_dependencies) File "setup.py", line 23, in stand_alone_package config_dict = package_config(primary,dependencies) File "scipy_distutils/misc_util.py", line 174, in package_config config.extend([get_package_config(x) for x in primary]) File "scipy_distutils/misc_util.py", line 161, in get_package_config mod = __import__('setup_'+package_name) ImportError: No module named setup_weave
Hmmm. I'll look into this. I haven't had the problem when installing from the tar-ball, but *can* think of a reason that it might happen. I'll do a little investigation -- especially since I'd like to open it up to wider inspection tomorrow.
The new tarballs are fine, this problem is now gone.
I'll bet a nickle your not getting the Abort because the test that was causing the Abort fails even earlier, and doesn't allow the Abort to happen. Call me a pessimist... :)
But a correct pessimist, unfortunately! Now that the tarballs are fine, the Abort is back!
I'd like to work this out also. The cross platform issues are proving to be a headache. Also, last night we learned that the Python 2.1.1 version of dumbdbm (which Prabhu was using on Debian) is buggy. The new code hopefully solves these issues, but I'll bet it remains a trouble spot. Two things could help as far as Mandrake is concerned. (1) If I had a shell temporary shell account on a Mandrake machine, I could diagnose the issue the quickest. Barring that (2) we could set up a time to chat via Yahoo tomorrow or the next day so that you and I could have a co-debugging session. Let me know what works best for you.
Unfortunately my office machine (on the net) is an alpha with python 1.5, and on my laptop I only have modem access. But Yahoo is fine, and if you want to chance ssh over a modem, I'll give you an account on my laptop :). Maybe this evening we could try something on yahoo? I live in Colorado (Mountain time), and anything after 5 or 6 pm would be fine by me. Cheers, f.

"eric" == eric <eric@scipy.org> writes:
eric> Still, in the end, I would like to have a version of inline eric> that works for Fortran. It's not a large project, but also eric> not one I have time for now. If anyone is interested, drop eric> me a line, and I'll point out how to do it. On the I'm not volunteering, but it is sure is tempting. Why dont you post the "howto" here? If someone has enough time they could do it. I know that it most probably is painful for you to write it up at this time but I guess writing about it now when you are fresh from coding is the best time to write the doc. :) Also, I am not familiar with f2py or fortran so I'll most probably stumble quite often (if not all the time). I think we need to get together on this. Pearu is obviously the f2py expert and you the weave expert (code weaver?). I'm the only(?) tester so far. If it won't take long I dont mind blowing a day on it helping (with coding/testing whatever) but no more. eric> usability side of inline Fortran, the array transpose issue eric> is confusing to people (including me), and, for speed, you eric> have to leave the details of this up to the user. We're eric> just stuck with this. Yes, maybe a nice document with a few examples on what happens and what to watch out for would be good. Let me first get that draft on the thread done. prabhu

"PP" == Pearu Peterson <pearu@cens.ioc.ee> writes:
PP> 1) Build fortran wrapper: f2py -c flaplace.f -m flaplace # PP> here you need f2py from CVS I've been having a hard time getting my version of f2py (2.5.391) to do this but cant seem to get it right. Does f2py 2.5.393 have this feature or is the CVS version absolutely necessary? PP> As a result you'll get flaplace.so in the current directory. Also what about the transposing of the arrays?? Eric was talking about transposing the array. I quickly read some of your examples and internally you seem to do the necessary adjustments (swapping n and m) to get the right array, right? If so, technically the result of your fortran code should be identical to the c++ version? Please remember that the order of execution is important, i.e. the array must be accessed in the same sequence if not the intermediate step results will be different. However, both versions will converge but I'm just curious about the intermediate steps too. My apologies if these questions sound dumb but I come from a C background with very little fortran experience. I'll learn fortran soon though, not because its faster but because I believe that its worth knowing. prabhu

On Sun, 13 Jan 2002, Prabhu Ramachandran wrote:
"PP" == Pearu Peterson <pearu@cens.ioc.ee> writes:
PP> 1) Build fortran wrapper: f2py -c flaplace.f -m flaplace # PP> here you need f2py from CVS
I've been having a hard time getting my version of f2py (2.5.391) to do this but cant seem to get it right. Does f2py 2.5.393 have this feature or is the CVS version absolutely necessary?
Yes, currently it is absolutely necessary. The corresponding code in 2.5.x is broken due to the recent fast changes in scipy_distutils. In fact, you'll need also get scipy_distutils from its CVS then. After I will check some things, I'll make a new snapshot available soon.
PP> As a result you'll get flaplace.so in the current directory.
Also what about the transposing of the arrays?? Eric was talking about transposing the array. I quickly read some of your examples and internally you seem to do the necessary adjustments (swapping n and m) to get the right array, right? If so, technically the result of your fortran code should be identical to the c++ version? Please remember that the order of execution is important, i.e. the array must be accessed in the same sequence if not the intermediate step results will be different. However, both versions will converge but I'm just curious about the intermediate steps too.
I think that the intermediate steps are different (this shows also in different error results). I just ignored transposing stuff as the test problem is symmetric, right?. It's a mess and a headache to deal with different array orderings, even if you are fully aware about the issue. Somekind of rules need to be worked out, double-checked, and documented in order to ease the pain ;) And I didn't care about bugs also as we are studying the speed differences and here only the number of operations are relevant - they should be same in all test cases for different approaches so that the results will be comparable. So, I think we should count operations, not bugs, though it would be nice to get those fixed as well. Pearu

"PP" == Pearu Peterson <pearu@cens.ioc.ee> writes:
PP> Yes, currently it is absolutely necessary. The corresponding PP> code in 2.5.x is broken due to the recent fast changes in PP> scipy_distutils. In fact, you'll need also get PP> scipy_distutils from its CVS then. After I will check some PP> things, I'll make a new snapshot available soon. I just checked out f2py2e from cvs. I had goofed earlier. I have a local install of Python and also a global one (both the same version). I accidentally installed f2py as root somewhere else. It works beautifully now. There was just one hitch. I had to link cvs/f2py2e/scipy_distutils to cvs/scipy/scipy_distutils Once I did that f2py2e installed just fine now it also builds flaplace.so very nicely. [Prabhu on transposing arrays ...] PP> I think that the intermediate steps are different (this shows PP> also in different error results). I just ignored transposing PP> stuff as the test problem is symmetric, right?. It's a mess PP> and a headache to deal with different array orderings, even if PP> you are fully aware about the issue. Somekind of rules need PP> to be worked out, double-checked, and documented in order to PP> ease the pain ;) And I didn't care about bugs also as we are Yes, that is what I was asking about. I think we'll do this instead. I'll work on a quick draft document -- "The newbie guide to scipy.weave" or something. I'll try to cover what I've done with the test problem and try to illustrate numeric, weave, inline and f2py. I'll let the experts correct it. So maybe you can write a section on array transposing and stuff. PP> studying the speed differences and here only the number of PP> operations are relevant - they should be same in all test PP> cases for different approaches so that the results will be PP> comparable. So, I think we should count operations, not bugs, PP> though it would be nice to get those fixed as well. Of course, I wasnt saying that it was a bug. Actually if you look at my code the numeric version of timeStep will not re-use computed variables. In fact there is no way to do this in pure numeric. Only inline, weave, f2py and pure Python let you do this. i.e. once u(i,j) is computed the next (i or j) computation will re-use this latest value. Numeric uses temporaries so you cant do it. So, I know that the numeric results *will* be different from that of the rest. However, I only wanted clarifications on f2py and what one must worry about when one uses f2py. Anyway, my question is u(i,j) as referred to in either Python (u[i,j]) or in blitz or f2py -- do they mean the same thing? Or is it that when I write a loop in fortran that uses a numeric array I must make sure that u(i,j) == u[j,i] ?? I ask this because there is hope that someday inline fortran is possible and it looks to me that inline fortran is just as easy (if not easier) to write as inline c++. Also, if we are sure of getting more speed with easier to write/understand code, its worth while writing the loops in fortran. I'd really like to know what one should watch out for when one writes fortran code that is used from Python. Thanks, prabhu

On Sun, 13 Jan 2002, Prabhu Ramachandran wrote:
Anyway, my question is u(i,j) as referred to in either Python (u[i,j]) or in blitz or f2py -- do they mean the same thing? Or is it that when I write a loop in fortran that uses a numeric array I must make sure that u(i,j) == u[j,i] ??
I am currently reconsidering the Fortran/C array issues. It is looking promising what I have worked out: u[i,j] in Python and u(i,j) in Fortran will be the same. And one will not need to think about how matrices are stored in memory, as a first instance at least. However, if you are aware about the fact that Fortran and C arrays have different storage orders, then you can be clever by creating arrays with a proper storage order that will be good for performance. The approach will be similar to the one in Pyfort but I think it can be more efficient. Let me say that with this new approach there will be no performance hit compared to the current f2py approach. Just the appearance of Fortran and C multidimensional arrays in Python will be unified and that is a very important step as then matrices in mathematical sense will coincide. However, to code this approach efficiently I need a PyArray_TransposedContiguousFromObject function that does not exists in Numeric. If an Object is PyArray already, then I can easily write that part. However, if an Object is a List or any other Sequence then one needs to repeat lots of code in Numeric. Basically, Array_FromSequence needs to be rewritten to TransposedArray_FromSequence which requires quite good knowledge of Numeric internals which I lack. A good thing is that PyArrayObject needs not to be changed at all. If someone has already coded all that in above or at least parts of it, I would appreciate code examples very much. Pearu

"PP" == Pearu Peterson <pearu@cens.ioc.ee> writes:
PP> I am currently reconsidering the Fortran/C array issues. It is PP> looking promising what I have worked out: u[i,j] in Python and PP> u(i,j) in Fortran will be the same. And one will not need to PP> think about how matrices are stored in memory, as a first PP> instance at least. However, if you are aware about the fact PP> that Fortran and C arrays have different storage orders, then PP> you can be clever by creating arrays with a proper storage PP> order that will be good for performance. The approach will be PP> similar to the one in Pyfort but I think it can be more PP> efficient. PP> Let me say that with this new approach there will be no PP> performance hit compared to the current f2py approach. Just PP> the appearance of Fortran and C multidimensional arrays in PP> Python will be unified and that is a very important step as PP> then matrices in mathematical sense will coincide. Sounds great! Unfortunately I dont have much experience at all with the internals of Numeric or its C API. Good luck! prabhu

Hi! I made a new snapshot of F2PY available. It is beta version of the 5th F2PY release. It includes many updates, bug fixes, the latest scipy_distutils etc. See NEWS.txt for details. The most important new feature is *** New -c switch that will run the Fortran wrapping process from Alpha to Omega. For example, if foo.f contains subroutine bar, then running f2py -c foo.f -m car will create car.so in the current directory that can be readily imported to Python: >>> import car >>> print car.bar.__doc__ <snip> The snapshot is available for download in http://cens.ioc.ee/projects/f2py2e/2.x/ Regards, Pearu
participants (4)
-
eric
-
Fernando Pérez
-
Pearu Peterson
-
Prabhu Ramachandran