Mailman 3 A disconnected numarray rant - NumPy-Discussion

Oct. 11, 2004

      Hi,

I'm taking a 1 month break from computers (i.e. I will be completely
off-line), and I have to catch a train in an hour; but I've recently bitten
the bullet and made a matrix class I've been using for some time work with
numarray; I've written down a number of things that occured to me while I was
doing it, including some things which I think are bugs in numarray, so I
thought at least posting the bugs would be a useful service; the rest is very
raw and essentially unedited cut-and-paste of these notes -- sorry about that
and I hope it doesn't contain anything particularly offensive.

P.S. just dumped the code for the matrix class (nummat) at
http://www.dcs.ex.ac.uk/~aschmolc/Stuff/

'as

The following are my notes:

Things that fairly clearly seem to be bugs:
    - numarray.Int32 etc. can't be pickled
    - ``a = array(1+0j); a.imag = a.real * 10`` => IndexError
    - array(0, type=Float64) + 1e3000  => `inf` with right error modes
      but  array(0, type=Float32) + 1e3000 => `OverflowError`
    - numarray.array(10)/numarray.array(0) => 0 
    - numarray.array(10000000000000L) => array(1316134912)
    - numarray.where(0,1,0) => array([0])
    - l = [1,2,3]; numarray.put(l,numarray.array([1,2,0]),[0,0,0]); l => [1, 2, 3]
      a = array([1,2,3]); numarray.put(a,numarray.array([1,2,0]),[0,0,0]); a => array([0, 0, 0])
    - repr(numarray.array([],typecode='i')) (etc. etc.) => "numarray.array([])"
    - getattr(array([1,2,3]), '_aligned') => SystemError
    - obscure: numarray.where(0, matrix(568, convert_scalars=True),2) =>
      ValueError (tries __len__ which fails, as len(array(568)) also fails)

Numeric incompatiblilities (that are either undocumented or bug-like)

- numarray.array('a', typecode='O') => TypeError (object arrays)
- for extra fun try: numarray.array(1, type=numarray.Object) -=> RuntimeError
  something entirely different
- nonzero is completely incompatible
- shape(None) etc. no longer works (IMHO a bug)
- cross_correlate & average missing
- left_shift et al missing
- numarray.sqrt(a,a) is None (*not* the result, as it used to be)
- num.put(a, [0,1,2,3], [10,20]) style behavior seems unavailable (without numarray.numeric)
  put(array([[ 0.,  1.,  2.], [ 3.,  4.,  5.]]), [1, 4], [10,40]) fails
- boolean testing (not even bool(array(0)) works; I'm not sure this is good)

- Generally different handling of rank0-arrays; e.g. ``type(num.array(1.0) +
  0) is float``; one potentially very nasty gotcha are inplace operations
  (e.g. a**=2) which have totally different semantics for python scalars and
  rank0 arrays, which, unlike Attribute errors on ``a.shape``, can lead to
  nasty bugs in corner cases (e.g. when a reduction just infrequently yields
  scalar ``a``) -- I think this should be mentioned in a gotchas section
  (another possible entry would be the need to use .copy() to **save** memory
  on slicing and 1xN, Nx1 matrices versus vectors (people are not used to
  thinking properly about rank from mathematical training or matlab
  exposure)).

- asarray downcasts arrays (e.g.: asarray(array([1.,2.,3.]),'i'))

- numarray.ones(-5) => MemoryError (ValueError would be nicer)
- numarray.ones(2.0), numarray.ones([2]) fail (cf. numarray.range(2.0))
      b=num.array([[1,2,3,4],[5,6,7,8]]*2)
      assert eq(num.diagonal(b), [1,6,3,8])
      assert eq(num.diagonal(b, -1), [5,2,7])
      c = num.array([b,b])
      assert eq(num.diagonal(c,1), [[2,7,4], [2,7,4]])
- no a.toscalar() !!!
- matrixmultiply in the docs
- what's the point of swapaxes (i.e. why not have a generalized in-place
  transpose?)
- what's the point of innerproduct?

- indexing by a list is different from indexing by tuple (I haven't had time
  to look closely at the docs whether that's intentional)

- doesn't know about Numeric's bizzarre '\x0b' typecode
- numarray.sqrt.reduce([]) raises (sensibly) TypeError, not ValueError

- len(array(1)) or array(1)[0] won't work anymore (understandable, but
  should be documented)
- (should maximim, minimum reduce to -inf and inf?)
- <built-in method reduce of _BinaryUFunc object at 0x82dfc9c> is not
  a very helpful repr; should be possible to get to the ufunc itself
- as in Numeric numarray.maximum.reduce(numarray.array([0,-0.])) => -0.0
- __array__ protocol no longer supported (how can a non-derived class convert
  itself efficiently to an array?)

Documentation Gotchas
- p. 34 IMO row vector is used incorrectly; row and column vectors are really
     matrices (i.e. have rank 2) so ``array([[1,2,3]])`` would be a row vector

- No proper explanation of differences between Numeric and numarray, or
  numarray.numeric module differences to proper (e.g. argmin)

- No migration and best-practice advice (e.g. there should be a standard way
  for packages which work with both numarray and numeric as backends to let
  the user choose his preference; how about setting an environment var NumPy
  or something?)

Waffle
------

- there *really* ought to be an array equality function (with optional
  tolerance); it's quite difficult to get right for are normal user (nans;
  zero-size arrays etc.) and it's often required, especially for testing

- rank preserving reduction seems useful as an option would be nice -- e.g. to
  subtract out or divide by the reduced portion (which currently won't e.g.
  work for columns without adding a unit-dimension by hand). 

Design

  The (AFAICS) benefit-free but downside-rich introduction of `type`
  ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

  Is there any reason that Typecode objects that compare as desired to the
  relevant strings ("i", "d") wouldn't have done? Now there is an explosion
  and confusion of interfaces -- some numpy code will now only except
  type(code)s as "typecode" keyword parameter (even in numarray! see
  numarray.mlab!) and other stuff

  Never mind that type already is a highly overused word in the python world.

  The big method bloat.
  '''''''''''''''''''''

  As it says in the Numeric manual introductions there were "good reasons" for
  "very few array methods" -- now there are **56** public methods and 8 public
  attributes (public == not starting with '_'); of those 56 methods about 11
  are accessors and of the rest about half are redundant or worse (i.e. they
  either also exist as numarray functions (argmin, argmax, diagonal, ...) or
  they really ought to be functions (mean, stddev) or they are quite confusing
  (``a.min``, ``a.max`` which behave quite differenlty from ``a.argmin`` and
  ``a.argmax``, never mind ``numarray.minimum``) or simply utterly pointless
  (``a.nelements`` == ``a.size``)).

  - argmin, argmax : what's wrong with numarray.argmin, numarray.argmax??? Why
    do argmin/argmax and max/min have completely different interfaces??? If
    there really is a need for these (there isn't) anything a.min and a.max
    should be called a.flatmin, a.flatmax

  - diagonal, mean, nelements, nonzero, ...

  - perversely the **only** function that I can think off that could have
    sensibly become a method hasn't: ``put`` (it used to work only on arrays
    under Numeric and not without reason, so making it a method would have
    been sensible; numarray.put of course also "works" on non-arrays, it just
    doesn't do anything with them)

  Test Code
  '''''''''
  numtest.py doesn't inspire full confidence (it's about 1000 lines of actual
  code but it doesn't seem that clearly structured and AFAICT contains no
  single loop (and that despite the diversity of shapes, types etc. that exist
  in numarray -- why not try something slightly more systematic?)).

A disconnected numarray rant

Alexander Schmolck

Todd Miller

tags

participants (2)