[Numpy-discussion] Insights / lessons learned from NumPy design

Chris Barker - NOAA Federal chris.barker at noaa.gov
Mon Jan 7 13:08:03 EST 2013


On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson
<mike.r.anderson.13 at gmail.com> wrote:
> In the Clojure community there has been some discussion about creating a
> common matrix maths library / API. Currently there are a few different
> fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
> effort to unify them and have a common base on which to build on.
>
> NumPy has been something of an inspiration for this, so I though I'd ask
> here to see what lessons have been learned.

A few thoughts:

> We're thinking of a matrix library

First -- is this a "matrix" library, or a general use nd-array
library? That will drive your design a great deal. For my part, I came
from MATLAB, which started our very focused on matrixes, then extended
to be more generally useful. Personally, I found the matrix-focus to
get in the way more than help -- in any "real" code, you're the actual
matrix operations are likely to be a tiny fraction of the code.

One reason I like numpy is that it is array-first, with secondary
support for matrix stuff.

That being said, there is the numpy matrix type, and there are those
that find it very useful. particularly in teaching situations, though
it feels a bit "tacked-on", and that does get in the way, so if you
want a "real" matrix object, but also a general purpose array lib,
thinking about both up front will be helpful.

> - Support for multi-dimensional matrices (but with fast paths for 1D vectors
> and 2D matrices as the common cases)

what is a multi-dimensional matrix? -- is a 3-d something, a stack of
matrixes? or something else? (note, numpy lacks this kind of object,
but it is sometimes asked for -- i.e a way to do fast matrix
multiplication with a lot of small matrixes)

I think fast paths for 1-D and 2-D is secondary, though you may want
"easy paths" for those. IN particular, if you want good support for
linear algebra (matrixes), then having a clean and natural "row vector
and  "column vector" would be nice. See the archives of this list for
a bunch of discussion about that -- and what the weaknesses are of the
numpy matrix object.

> - Immutability by default, i.e. matrix operations are pure functions that
> create new matrices.

I'd be careful about this -- the purity and predictability is nice,
but these days a lot of time is spend allocating and moving memory
around -- numpy array's mutability is a major key feature -- indeed,
the key issues with performance with numpy surrond the fact that many
copies may be made unnecessarily (note, Dag's suggesting of lazy
evaluation may mitigate this to some extent).

> - Support for 64-bit double precision floats only (this is the standard
> float type in Clojure)

not a bad start, but another major strength of numpy is the multiple
data types - you may wantt to design that concept in from the start.

> - Ability to support multiple different back-end matrix implementations
> (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)

This ties in to another major strength of numpy -- ndarrays are both
powerful python objects, and wrappers around standard C arrays -- that
makes it pretty darn easy to interface with external libraries for
core computation.

HTH,
  -Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list