Chris Barker wrote:
Fernando Perez wrote:
- Is there anything in Numarray's design which prevents this same trick
from being used?
It's not entirely clear to me what you have in mind. ARE you proposing that Blitz++ arrays be the low-level implimentaiton of Numarray?, or just that there be an easy way to switch between them.
Well, it was a bit of both. I'm glad to have been ambiguous, since you answered both :)
In the first case, it was considered a coupl eof years ago, and rejected (as far as I could tell) mainly for reasons of portability: Blitz++ makes heavy use of templates, which are not well supported by the very wide variety of compilers that Python is supported on. One of the goals of Numarray is to make it into the Python standard library, and use of C++ and templates would preclude that, at least in the foreseeable future.
I sort of suspected this, so while this option sounds very nice in theory, I didn't expect it to be a realistic possibility.
If you are proposing the second: that there could be an easy bridge between Numarray and Blitz++ (maybe using Boost Python somehow?) so that Blitz++ arrays could be used to write Numarray extensions, I'm all for it! I think the community would gain a great deal from an easier to use API for writing compiled Numarray functions. Besides making it easier for those of us that find the need for custom extensions, it would make it much easier for people to write and contibute high performance general purpose code--resulting in a much expanded library for Numarray and SciPy.
This is more realistic, and in fact today it _is_ a reality. As I posted before, making a blitz array out of a numpy one is trivial, and you get fast, easy to read C++. I initially learned how to do it by reverse-engineering weave.inline's auto-generated C++ files, but once that got me off the ground, it was trivial to write my own extension modules. And now that I'm starting to learn a bit more about the capabilities of the blitz library, I'm very excited.
To indirectly answer Perry's email in this same thread: after his comments, it seems to me that the Numarray->blitz operation would work just as easily as the example I showed for Numpy->blitz. Perry says that in memory, numarrays are very similar to Numpys, so this shouldn't be a problem.
Which means we are already quite a ways there! I think to some extent, it's more a matter of making better known to people that this approach exists, and how easy to use it is. I only learned about it after being impressed with inline's super-simple syntax as compared to the horrors of handling numpy arrays by hand. But the more I read the blitz manual, the more I see that blitz arrays behave _a lot_ like numpy arrays even inside the C++ code.
So the bridge approach, as long as the constructor call is very cheap, seems like a perfectly reasonable balance between aesthetic purity and real-world practicality to me. This doesn't mean that there are no issues to be discussed, so I'll try to outline the ones that come to mind (as per Perry's request). Perhaps others can add to this list.
1- Cheap constructors: For a rank-N array, the current constructor still requires the creation of 2 TinyVector blitz arrays of size N (shape and strides). It would be great if these copies could be avoided and the existing shape & strides data from the Numpy array could be used directly. The Numpy-> blitz functions would then be almost zero-overhead operations.
2- Access to fortran libraries: I don't know how easy it is to use say BLAS with objects defined in C++ (and with row-major memory layout). This is just my ignorance, but I'd like to clarify this point, since it would be nice to be able to use BLAS/LAPACK from within the C++ code with blitz arrays with no transposition overhead.
3- Benchmark: I think it would be valuable to draft a small but reasonable set of benchmarks comparing the raw numeric/numarray C api, the blitz version and fortran versions of some common algorithms. Blitz relies heavily on templates and C++ compilers are getting better, so it would be nice to know where things stand. I have a preliminary version of this, which I've posted here, but it only covers innerproduct().
As part of that testing, I found more numerical discrepancy than seemed reasonable to me. I would really need to understand the origin of this before feeling comfortable with the blitz approach for production codes.
4- Multiple precision: For a certain class of problems, beign able to work with Numerical arrays whose precision goes beyond double is critical. At http://crd.lbl.gov/~dhbailey/mpdist/index.html there is a good example of a modern, C++ library for extended precision which offers a very simple syntax. I wonder how easy (or not) it would be for Numarray to offer optional support for this kind of arrays.
This last topic is a bit more ambitious than the others (which I think seem to be very reasonabl). However, I hope that we can at least discuss it, since there is a class of problems where it is a really important issue.
There may be more things I can't think of now, but hopefully this may serve as a first outline for a discussion, both on the mailing list and in person, at scipy.