[Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Tue Jan 8 16:14:12 EST 2013

Hi Nathaniel,

(Responding to both your emails)

> The problem is that rule for arrays - and for every other party of
> numpy in general - are that we *don't* pick types based on values.
> Numpy always uses input types to determine output types, not input
> values.

Yes, of course... array operations are governed exclusively by their
dtypes.  It seems to me that, using the language of the bug report
(2878), if we have this:

result = arr + scalar

I would argue that our job is, rather than to pick result.dtype, to
pick scalar.dtype, and apply the normal rules for array operations.

> So it's pretty unambiguous that
> "using the same rules for arrays and scalars" would mean, ignore the
> value of the scalar, and in expressions like
>   np.array([1], dtype=np.int8) + 1
> we should always upcast to int32/int64.

Ah, but that's my point: we already, in 1.6, ignore the intrinsic
width of the scalar and effectively substitute one based on it's
value:

>>> a = np.array([1], dtype=int8)
>>> (a + 1).dtype
dtype('int8')
>>> (a + 1000).dtype
dtype('int16')
>>> (a + 90000).dtype
dtype('int32')
>>> (a + 2**40).dtype
dtype('int64')

> 1.6, your proposal: in a binary operation, if one operand has ndim==0
> and the other has ndim>0, downcast the ndim==0 item to the smallest
> width that is consistent with its value and the other operand's type.

Yes, exactly.  I'm not trying to propose a completely new behavior: as
I mentioned (although very far upthread), this is the mental model I
had of how things worked in 1.6 already.

> New users don't use narrow-width dtypes... it's important to remember
> in this discussion that in numpy, non-standard dtypes only arise when
> users explicitly request them, so there's some expressed intention
> there that we want to try and respect.

I would respectfully disagree.  One example I cited was that when
dealing with HDF5, it's very common to get int16's (and even int8's)
when reading from a file because they are used to save disk space.
All a new user has to do to get int8's from a file they got from
someone else is:

>>> data = some_hdf5_file['MyDataset'][...]

This is a general issue applying to data which is read from real-world
external sources.  For example, digitizers routinely represent their
samples as int8's or int16's, and you apply a scale and offset to get
a reading in volts.

As you say, the proposed change will prevent accidental upcasting by
people who selected int8/int16 on purpose to save memory, by notifying
them with a ValueError.  But another assumption we could make is that
people who choose to use narrow types for performance reasons should
be expected to use caution when performing operations that might
upcast, and that the default behavior should be to follow the normal
array rules as closely as possible, as is done in 1.6.

Andrew