[Numpy-discussion] CASTABLE flag

Mon Jan 7 14:51:21 EST 2008

On Monday 07 January 2008 02:13:56 pm Charles R Harris wrote:
> On Jan 7, 2008 12:00 PM, Travis E. Oliphant <oliphant at enthought.com> 
wrote:
> > Charles R Harris wrote:
> > > Hi All,
> > >
> > > I'm thinking that one way to make the automatic type conversion a
> > > bit safer to use would be to add a CASTABLE flag to arrays. Then
> > > we could write something like
> > >
> > > a[...] = typecast(b)
> > >
> > > where typecast returns a view of b with the CASTABLE flag set so
> > > that the assignment operator can check whether to implement the
> > > current behavior or to raise an error. Maybe this could even be
> > > part of the dtype scalar, although that might cause a lot of
> > > problems with the current default conversions. What do folks
> > > think?
> >
> > That is an interesting approach.    The issue raised of having to
> > convert lines of code that currently work (which does implicit
> > casting) would still be there (like ndimage), but it would not
> > cause unnecessary data copying, and would help with this complaint
> > (that I've heard before and have some sympathy towards).
> >
> > I'm intrigued.
>
> Maybe  we could also set a global flag, typesafe, that in the current
> Numpy version would default to false, giving current behavior, but
> could be set true to get the new behavior. Then when Numpy 1.1 comes
> out we could make the global default true. That way folks could keep
> the current code working until they are ready to use the typesafe
> feature.

I'm a bit confused as to which types of casting you are proposing to 
change.  As has been pointed out by several people, users very often 
_want_ to "lose information".  And as I pointed out, it is one of the 
reasons why we are all using numpy today as opposed to numeric!

I'd bet that the vast majority of the people on this list believe that 
the OPs problem of complex numbers being automatically cast into floats 
is a real problem.  Fine.  We should be able to raise an exception in 
that case.

However, two other very common cases of "lost information" are not 
obviously a problems, and are (for many of us) the _preferred_ actions.  
These examples are:

1.  Performing floating point math in higher precision, but casting to a 
lower-precision float if that float is on the lhs of the assignment.  
For example:

In [22]: a = arange(5, dtype=float32)

In [23]: a += arange(5.0)

In [24]: a
Out[24]: array([ 0.,  2.,  4.,  6.,  8.], dtype=float32)

To me, that is fantastic.  I've obviously explicitly requested that I 
want "a" to hold 32-bit floats.  And if I'm careful and use in-place 
math, I get 32-bit floats at the end (and no problems with large 
temporaries or memory doubling by an automatic cast to float64).

In [25]: a = a + arange(5.0)

In [26]: a
Out[26]: array([  0.,   3.,   6.,   9.,  12.])

In this case, I'm reassigning "a" from 32-bits to 64-bits because I'm 
not using in-place math.  The temporary array created on the rhs 
defines the type of the new assignment.  Once again, I think this is 
good.

2.  Similarly, if I want to stuff floats into a int array:

In [28]: a
Out[28]: array([0, 1, 2, 3, 4])

In [29]: a += 2.5

In [30]: a
Out[30]: array([2, 3, 4, 5, 6])

Here, I get C-like rounding/casting of my originally integer array 
because I'm using in-place math.  This is often a very useful behavior.

In [31]: a = a + 2.5

In [32]: a
Out[32]: array([ 4.5,  5.5,  6.5,  7.5,  8.5])

But here, without the in-place math, a gets converted to doubles.

I can certainly say that in my code (which is used by a fair number of 
people in my field), each of these use cases are common.  And I think 
they are one of the _strengths_ of numpy.

I will be very disappointed if this default behavior changes.

Scott

-- 
Scott M. Ransom            Address:  NRAO
Phone:  (434) 296-0320               520 Edgemont Rd.
email:  sransom at nrao.edu             Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989