Re: [Numpy-discussion] Blitz++, NumArray...

22 Aug 2003

      Chris Barker wrote:
...
Fernando Perez wrote:
...
- Is there anything in Numarray's design which prevents this same trick
from being used?
It's not entirely clear to me what you have in mind. ARE you proposing that
Blitz++ arrays be the low-level implimentaiton of Numarray?, or just that
there be an easy way to switch between them.
Well, it was a bit of both. I'm glad to have been ambiguous, since you 
answered both :)
...
In the first case, it was considered a coupl eof years ago, and rejected 
(as far as I could tell) mainly for reasons of portability: Blitz++ makes
heavy use of templates, which are not well supported by the very wide
variety of compilers that Python is supported on. One of the goals of
Numarray is to make it into the Python standard library, and use of C++ and
templates would preclude that, at least in the foreseeable future.
I sort of suspected this, so while this option sounds very nice in theory, I
didn't expect it to be a realistic possibility.
...
If you are proposing the second: that there could be an easy bridge between
Numarray and Blitz++ (maybe using Boost Python somehow?) so that Blitz++
arrays could be used to write Numarray extensions, I'm all for it! I think
the community would gain a great deal from an easier to use API for writing
compiled Numarray functions. Besides making it easier for those of us that
find the need for custom extensions, it would make it much easier for
people to write and contibute high performance general purpose
code--resulting in a much expanded library for Numarray and SciPy.
This is more realistic, and in fact today it _is_ a reality.  As I posted
before, making a blitz array out of a numpy one is trivial, and you get
fast, easy to read C++.  I initially learned how to do it by
reverse-engineering weave.inline's auto-generated C++ files, but once that
got me off the ground, it was trivial to write my own extension modules.
And now that I'm starting to learn a bit more about the capabilities of the
blitz library, I'm very excited.

To indirectly answer Perry's email in this same thread: after his comments,
it seems to me that the Numarray->blitz operation would work just as easily
as the example I showed for Numpy->blitz.  Perry says that in memory,
numarrays are very similar to Numpys, so this shouldn't be a problem.

Which means we are already quite a ways there!  I think to some extent, it's
more a matter of making better known to people that this approach exists,
and how easy to use it is.  I only learned about it after being impressed
with inline's super-simple syntax as compared to the horrors of handling
numpy arrays by hand.  But the more I read the blitz manual, the more I see
that blitz arrays behave _a lot_ like numpy arrays even inside the C++ code.

So the bridge approach, as long as the constructor call is very cheap, seems
like a perfectly reasonable balance between aesthetic purity and real-world
practicality to me.  This doesn't mean that there are no issues to be
discussed, so I'll try to outline the ones that come to mind (as per Perry's
request).  Perhaps others can add to this list.

1- Cheap constructors:  For a rank-N array, the current constructor still
requires the creation of 2 TinyVector blitz arrays of size N (shape and
strides).  It would be great if these copies could be avoided and the
existing shape & strides data from the Numpy array could be used directly.
The Numpy-> blitz functions would then be almost zero-overhead operations.

2- Access to fortran libraries:  I don't know how easy it is to use say BLAS
with objects defined in C++ (and with row-major memory layout).   This is
just my ignorance, but I'd like to clarify this point, since it would be
nice to be able to use BLAS/LAPACK from within the C++ code with blitz
arrays with no transposition overhead.

3- Benchmark:  I think it would be valuable to draft a small but reasonable
set of benchmarks comparing the raw numeric/numarray C api, the blitz
version and fortran versions of some common algorithms.  Blitz relies
heavily on templates and C++ compilers are getting better, so it would be
nice to know where things stand.  I have a preliminary version of this, which 
I've posted here, but it only covers innerproduct().

As part of that testing, I found more numerical discrepancy than seemed 
reasonable to me.  I would really need to understand the origin of this before 
feeling comfortable with the blitz approach for production codes.

4- Multiple precision:  For a certain class of problems, beign able to work 
with Numerical arrays whose precision goes beyond double is critical.  At 
http://crd.lbl.gov/~dhbailey/mpdist/index.html there is a good example of a 
modern, C++ library for extended precision which offers a very simple syntax. 
  I wonder how easy (or not) it would be for Numarray to offer optional 
support for this kind of arrays.

This last topic is a bit more ambitious than the others (which I think seem to 
be very reasonabl).  However, I hope that we can at least discuss it, since 
there is a class of problems where it is a really important issue.

There may be more things I can't think of now, but hopefully this may serve as 
a first outline for a discussion, both on the mailing list and in person, at 
scipy.

Best,

f