
On Sat, 19 Jan 2002, eric wrote:
Hey Pearu,
Jeeze. You work fast. I haven't even had a chance to grab the new f2py yet... I'm guessing I need to update f2py from CVS to try all this out. I'll try and spend some time with this today, but it may be tomorrow before I have a chance to take a look.
I made also a tarball available with the latest changes.
3. LU,piv,X,info = sgesv(A,B,rowmajor=1,copy_A=1,copy_B=1) LU,piv,info = sgetrf(A,rowmajor=1,copy_A=1)
A few points about naming.
1. SciPy has settled on all lower case convention for consistency across interfaces. So using:
lu,piv,x,info = sgesv(a,b,rowmajor=1,copy_a=1,copy_b=1)
will fit the scheme.
Ok, that's not a problem. It is easy to fix by query-replace in emacs... Just thought that capitalized arguments will look more like matrices.
2. Second, is "copy_A" a f2py generated name or can we specify it? I prefer overwrite = 0 to copy=1 because it is more explicit about what is happening. It really warns people that they are clobbering data when they choose to set the flag. Also, the "A" part of copy_A will be ambiguous during usage. In the worst case for the sgesv example, imagine someone does the following:
a = rhs_vector b = lhs_matrix lu,piv,x,info = sgesv(b,a,copy_a=1,copy_b=0)
If the person is trying to save their variable "a" here, they didn't do it, because the "copy_a" is refering to the function's argument "a", not the variable bound to it which is "b" in this example. I ran into this conundrum once when writing an interface generator in the past.
It be better if the names somehow were more descriptive. For sgetrf, the following seems fine:
lu,piv,info = sgetrf(a,overwrite=0,rowmajor=1)
For sgesv, the naming isn't quite as obvious. Maybe,
lu,piv,x,info = sgesv(a,b,overwrite_rhs=0,overwrite_lhs=0)
copy_A is a f2py generated name. overwrite certainly is more suitable to warn users of an in situ changes (I assume that intent(copy) is not used for arguments that are not changed in situ by the wrapped function). However, intent(copy), or should I call it now intent(overwrite), may have wider applications than wrapping lapack. There overwrite arguments may be used differently, that is, for one argument this flag is set to 0, and for the other one, to 1. So, I would like to keep overwrite_<name> rather than having a global overwrite keyword argument.
3. In the C API's, I think the overwrite argument will be used much more than the rowmajor flag. It should come first in the argument list. In fact, in the "old" wrappers, rowmajor was just hardcoded to 1 because the flalpack routines were considered the colummajor functions. Keeping it is fine, it just should go to the end.
>>> lu,piv,info = sgetrf(a, overwrite=0, rowmajor=1)
Not sure how hard these changes are -- I expect they might require some enhancements to f2py. I'm not opposed to having Python wrappers around the routines that help with this stuff. We used to have to do that for the clapack wrappers (until your additions to f2py's capabilities made this unnecessary). If it is easy to get f2py to do this stuff, then, of course, that is better.
Since overwrite argument is f2py generated and rowmajor is specified in the routine signature, you see that rowmajor becomes first (f2py generated arguments are alway appended to the end of argument list). I agree that rowmajor will be probably rarely used but I don't have any good ideas how to tell f2py about user-specified argument ordering. I looked to it and found that the required changes are not just quick hacks. On the other hand. I don't think that lapack or blas routines will be often used in interactive work. They will be more like to be used in nice interfaces such as solve() etc that can have arguments ordered as we want. In using f2py wrappers, I would always recommend
lu,piv,info = sgetrf(a, overwrite_a=0)
instead of
lu,piv,info = sgetrf(a, 0)
that is less informative.
Comment on testing. There was a fairly complete set of tests written for the *interfaces* of the blas routines. These guys only tested how the interfaces worked and didn't do much testing of the functions accuracy. We need both. I don't remember if the LAPACK tests were as complete -- I certainly didn't write them, but Travis O. may have.
Based on the errors we saw a week or two ago in the CLAPACK functions, we really do need to get some tests implemented for this stuff. You and I probably handle writing the interface tests (very tedious, but maybe not as bad with all your changes to f2py), but we really should find an expert to put the linear algebra routines through their paces. We can write some "smoke tests" though that make sure LU and solvers, etc. work for general problems. I'd stick this way up there as one of the most important packages in SciPy, so we should really try and make sure it works right!
Maybe there is a test set of matrices laying around somewhere to help us out in all this. Anyone know of such a thing? -- perhaps in the LAPACK distribution itself.
Yes, testing is important. I don't dare to claim that f2py is bug free ;-) but testing whether the interface works should not be the main goal for linalg, indeed. More important is that the interface acts according to its documentation. I mean here things like whether getrf really returns mathematically correct LU decomposition (leaving aside rounding errors). If it would return, for example, a transpose LU (which would be incorrect), then passing LU to getrs (that may have the same bug), the final result may be correct. I looked already the tests in the lapack distribution but they seem rather difficult to use for us. These tests seem to be "hardcoded" to the Fortran files and extracting the test data from there is not simple. Testing linalg stuff is not a small or trivial project. Any one wants to volunteer?
Few remarks: 1) The sources to these extension modules are quite big (more than 5000 lines in both). Should we factor these modules to cslapack.so cclapack.so cdlapack.so czlapack.so fslapack.so fclapack.so fdlapack.so fzlapack.so (any better name convention?). What do you think?
Well, I think in practice, they'll rarely be needed separately. Having big wrappers isn't that big a deal (heck, I've had SWIG wrappers that were 60000 lines...). I'd vote for keeping them all together unless there is a compelling reason besides size to split them. This keeps things a little simpler (import once, get everything).
That issue is then solved. Good.
2) How shall I commit all this to the scipy CVS repository? In a subdirectory of linalg? Eventually it should be in linalg directory, I think.
We'll since we don't know how to do CVS branches reliably... :| Why don't you make a directory called linalg2. We'll work in there until it gets stable.
Ok.
Pearu, this all looking very promising. Thanks a ton.
Your are very welcome. Pearu