Hi all, as some of you know, I'm a big fan of weave.inline() for the ease of development it offers. Essentially C++ becomes a scripting part of python, with only the occasional recompile overhead (but this is transparent to the users). inline() makes heavy use of the blitz++ library, to represent numpy arrays as blitz arrays and therefore index them as Arr(i,j,k...). This seems to me like a very sound decision, as the blitz library was specifically written to provide simultaneously fortran-like performance with high-level object orientation. With this in mind, I have a few questions: - Is there anything in Numarray's design which prevents this same trick from being used? The core of getting a Numpy array into a blitz one is simply a call to: // Convert a Numpy array to a blitz one, using the original's data (no copy) template<class T, int N> static blitz::Array<T,N> py_to_blitz(PyArrayObject *arr_obj) { blitz::TinyVector<int,N> shape(0); blitz::TinyVector<int,N> strides(0); for (int i=0;i<N;i++) { shape[i] = arr_obj->dimensions[i]; strides[i] = arr_obj->strides[i]/sizeof(T); } return blitz::Array<T,N>((T*) arr_obj->data,shape,strides, blitz::neverDeleteData); } I would not be surprised if something much like this would work for Numarrays, but I'd like to know for sure. - If the above is indeed possible for Numarray, do you guys feel that blitz should be considered an integral part of python's low-level arsenal, or not? I view its advantages especially in the field of an extremely clean, simple syntax (compared to the low-level Numpy API, I'm not familiar with Numarray yet). I've seen examples where a numpy algorithm can be recoded using blitz in a _far_ smaller amount of code, and with performance gains to boot (in some cases, not all). Please note that I am NOT second-guessing any of the design decisions made by the Numarray crowd. I just want to clarify if it's possible to have a C++ API which gives access to numarray with the simplest, highest-level syntax and behavior which is reasonable. This is important for teams where not everyone is an expert on C pointer manipulation details, but they may be perfectly ok with writing loops over things which can be indexed as A(i,j,k). - If any of this seems worthy of more detailed discussion, I told Eric Jones that I'd be happy to coordinate something along these lines at Scipy'03. If there's time in the schedule I'd be glad to do it officially, otherwise we can just try to keep it in mind for a lunch/dinner meeting. Documenting a 'Best Practices for low-level numerical work in python' would probably be a beneficial thing to do for the community. Right now there are so many options that I'm sure I'm not the only one to feel a bit confused. - If Blitz++ is indeed deemed a worthy piece of this puzzle, I wonder if we could contact some of its developers to bring them into this discussion. I worry a bit that activity on its mailing lists seems rather slow, but I noticed that an apparently central guy, Julian C. Cummings, is at CACR/Caltech. Perhaps he could stop by and let us know what the future for blitz looks like. I can fire off an email to him if there is interest in the blitz thing. Anyway, I'm just trying to 'pre-soak' a few ideas for discussion at scipy'03 on these topics. Best regards, Fernando.
Fernando Perez wrote:
- Is there anything in Numarray's design which prevents this same trick from being used?
It's not entirely clear to me what you have in mind. ARE you proposing that Blitz++ arrays be the low-level implimentaiton of Numarray?, or just that there be an easy way to switch between them. In the first case, it was considered a coupl eof years ago, and rejected (as far as I could tell) mainly for reasons of portability: Blitz++ makes heavy use of templates, which are not well supported by the very wide variety of compilers that Python is supported on. One of the goals of Numarray is to make it into the Python standard library, and use of C++ and templates would preclude that, at least in the foreseeable future. Here's one discussion about it: http://www.geocrawler.com/mail/thread.php3?subject=%5BNumpy-discussion%5D+Meta%3A+too+many+numerical+libraries+doing+the&list=1329 If that url doesn't work, I found it by searching google for: Numpy-discussion blitz++ If you are proposing the second: that there could be an easy bridge between Numarray and Blitz++ (maybe using Boost Python somehow?) so that Blitz++ arrays could be used to write Numarray extensions, I'm all for it! I think the community would gain a great deal from an easier to use API for writing compiled Numarray functions. Besides making it easier for those of us that find the need for custom extensions, it would make it much easier for people to write and contibute high performance general purpose code--resulting in a much expanded library for Numarray and SciPy. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Chris Barker wrote:
Fernando Perez wrote:
- Is there anything in Numarray's design which prevents this same trick from being used?
It's not entirely clear to me what you have in mind. ARE you proposing that Blitz++ arrays be the low-level implimentaiton of Numarray?, or just that there be an easy way to switch between them.
Well, it was a bit of both. I'm glad to have been ambiguous, since you answered both :)
In the first case, it was considered a coupl eof years ago, and rejected (as far as I could tell) mainly for reasons of portability: Blitz++ makes heavy use of templates, which are not well supported by the very wide variety of compilers that Python is supported on. One of the goals of Numarray is to make it into the Python standard library, and use of C++ and templates would preclude that, at least in the foreseeable future.
I sort of suspected this, so while this option sounds very nice in theory, I didn't expect it to be a realistic possibility.
If you are proposing the second: that there could be an easy bridge between Numarray and Blitz++ (maybe using Boost Python somehow?) so that Blitz++ arrays could be used to write Numarray extensions, I'm all for it! I think the community would gain a great deal from an easier to use API for writing compiled Numarray functions. Besides making it easier for those of us that find the need for custom extensions, it would make it much easier for people to write and contibute high performance general purpose code--resulting in a much expanded library for Numarray and SciPy.
This is more realistic, and in fact today it _is_ a reality. As I posted before, making a blitz array out of a numpy one is trivial, and you get fast, easy to read C++. I initially learned how to do it by reverse-engineering weave.inline's auto-generated C++ files, but once that got me off the ground, it was trivial to write my own extension modules. And now that I'm starting to learn a bit more about the capabilities of the blitz library, I'm very excited. To indirectly answer Perry's email in this same thread: after his comments, it seems to me that the Numarray->blitz operation would work just as easily as the example I showed for Numpy->blitz. Perry says that in memory, numarrays are very similar to Numpys, so this shouldn't be a problem. Which means we are already quite a ways there! I think to some extent, it's more a matter of making better known to people that this approach exists, and how easy to use it is. I only learned about it after being impressed with inline's super-simple syntax as compared to the horrors of handling numpy arrays by hand. But the more I read the blitz manual, the more I see that blitz arrays behave _a lot_ like numpy arrays even inside the C++ code. So the bridge approach, as long as the constructor call is very cheap, seems like a perfectly reasonable balance between aesthetic purity and real-world practicality to me. This doesn't mean that there are no issues to be discussed, so I'll try to outline the ones that come to mind (as per Perry's request). Perhaps others can add to this list. 1- Cheap constructors: For a rank-N array, the current constructor still requires the creation of 2 TinyVector blitz arrays of size N (shape and strides). It would be great if these copies could be avoided and the existing shape & strides data from the Numpy array could be used directly. The Numpy-> blitz functions would then be almost zero-overhead operations. 2- Access to fortran libraries: I don't know how easy it is to use say BLAS with objects defined in C++ (and with row-major memory layout). This is just my ignorance, but I'd like to clarify this point, since it would be nice to be able to use BLAS/LAPACK from within the C++ code with blitz arrays with no transposition overhead. 3- Benchmark: I think it would be valuable to draft a small but reasonable set of benchmarks comparing the raw numeric/numarray C api, the blitz version and fortran versions of some common algorithms. Blitz relies heavily on templates and C++ compilers are getting better, so it would be nice to know where things stand. I have a preliminary version of this, which I've posted here, but it only covers innerproduct(). As part of that testing, I found more numerical discrepancy than seemed reasonable to me. I would really need to understand the origin of this before feeling comfortable with the blitz approach for production codes. 4- Multiple precision: For a certain class of problems, beign able to work with Numerical arrays whose precision goes beyond double is critical. At http://crd.lbl.gov/~dhbailey/mpdist/index.html there is a good example of a modern, C++ library for extended precision which offers a very simple syntax. I wonder how easy (or not) it would be for Numarray to offer optional support for this kind of arrays. This last topic is a bit more ambitious than the others (which I think seem to be very reasonabl). However, I hope that we can at least discuss it, since there is a class of problems where it is a really important issue. There may be more things I can't think of now, but hopefully this may serve as a first outline for a discussion, both on the mailing list and in person, at scipy. Best, f
I just compiled/installed numarray-0.7 on (latest) Cygwin. _ufunc.dll needs to be linked against /usr/lib/libmingwex.a. Doing that manually seems to work fine. Adding it to the setup files is not as simple, as libnumarray.dll does not link any more as some wrong functions seem to be picked up from the MINGW library as well (and the pow cannot be resolved. Explicitely adding -lm does not help). How is this planned to work. Where needs the sequence "-L/usr/lib/mingw -lmingwex" be put? Is it save to use this mingw library in "normal" Cygwin programs? Same happens with cvs HEAD. Moreover I have the follwoing test error: ,---- | Failure in example: log10([1+1j]) | from line #2267 of numarray.numtest | Expected: array([ 0.150515+0.34109409j]) | Got: array([ 0.150515+5.21807245j]) `---- Greetings, Jochen -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Liberté, Égalité, Fraternité GnuPG key: CC1B0B4D (Part 3 you find in my messages before fall 2003.)
participants (3)
-
Chris Barker
-
Fernando Perez
-
Jochen Küpper