[Numpy-discussion] Overloading numpy's ufuncs for better type coercion?

Wed Jul 22 08:06:02 EDT 2009

Hi!

(This mail is a reply to a personal conversation with Ullrich Köthe, but is 
obviously of a greater concern.  This is about VIGRA's new NumPy-based python 
bindings.)  Ulli considers this behaviour of NumPy to be a bug:

In [1]: a = numpy.array([200], numpy.uint8)

In [2]: a + a
Out[2]: array([144], dtype=uint8)

However, this is well-known, often-discussed, and IMHO not really unexpected 
for computer programmers who ever worked with C-like languages (even Java has 
many such problems).  Christian even said this is what he wants.

OTOH, I agree that it is a missing feature that NumPy performs "coercion 
before the operation" (to be more precise: the temporary data type should be 
promoted from the operand types, and /then/ the coercion - which can also 
reduce the number of bits - should happen), to fix this strange behaviour:

In [3]: numpy.add(a, a, numpy.empty((1, ), dtype = numpy.uint32))
Out[3]: array([144], dtype=uint32)

Now, our opinions differ on how to deal with this - Ulli planned to overwrite 
(more or less) all ufuncs in vigranumpy in order to return float32 (which is 
sort of a common denominator and the type nearly all other vigranumpy 
functions should accept).  I see two main disadvantages here:

a) Choosing float32 seems to be arbitrary, and I'd like as much as possible of 
vigranumpy to be independent from VIGRA and its particular needs.  I have seen 
so many requests (e.g. on the c++-sig mailing list) for *good* C++/boost-
python <-> numpy bindings that it would be a pity IMO to add special cases for 
VIGRA by overloading __add__ etc.

b) Also, I find it unexpected and undesirable to change the behaviour of such 
basic operations as addition on our ndarray-derived image types.  IMO this 
brings the danger of new users being confused about the different behaviours, 
and even experienced vigranumpy users might eventually fall in the trap when 
dealing with plain ndarrays and our derived types side-by-side.

Ideally, I'd like numpy to be "fixed" - I hope that the "only" obstacle is 
that someone needs to do it, but I am afraid of someone mentioning the term 
"backward-compatibility" (Ulli would surely call it "bug-compatibility" here 
;-) ).

But in the end, I wonder how bad this really is for the VIGRA.  AFAICS, the 
main problem is that one needs to decide upon the pixel types for which to 
export algorithms (in most cases, we'll use just float32, at least for a 
start), and that when one loads images into arrays of the data type used in 
the image file, one will often end up with uint8 arrays which cannot be passed 
into many algorithms without an explicit conversion.  However, is this really 
a bad problem?  For example, the conversion would typically have to be 
performed only once (after loading), no?  Then, why not simplify things 
further by adding a dtype= parameter to importImage()?  This could even 
default to float32 then.

Looking forward to your opinions,
  Hans