[Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Wed Jan 9 13:07:16 EST 2013

On 01/09/2013 06:22 PM, Chris Barker - NOAA Federal wrote:
> On Wed, Jan 9, 2013 at 7:09 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>> This is a general issue applying to data which is read from real-world
>>> external sources.  For example, digitizers routinely represent their
>>> samples as int8's or int16's, and you apply a scale and offset to get
>>> a reading in volts.
>>
>> This particular case is actually handled fine by 1.5, because int
>> array + float scalar *does* upcast to float. It's width that's ignored
>> (int8 versus int32), not the basic "kind" of data (int versus float).
>>
>> But overall this does sound like a problem -- but it's not a problem
>> with the scalar/array rules, it's a problem with working with narrow
>> width data in general.
>
> Exactly -- this is key. details asside, we essentially have a choice
> between an approach that makes it easy to preserver your values --
> upcasting liberally, or making it easy to preserve your dtype --
> requiring users to specifically upcast where needed.
>
> IIRC, our experience with earlier versions of numpy (and Numeric
> before that) is that all too often folks would choose a small dtype
> quite deliberately, then have it accidentally upcast for them -- this
> was determined to be not-so-good behavior.
>
> I think the HDF (and also netcdf...) case is a special case -- the
> small dtype+scaling has been chosen deliberately by whoever created
> the data file (to save space), but we would want it generally opaque
> to the consumer of the file -- to me, that means the issue should be
> adressed by the file reading tools, not numpy. If your HDF5 reader
> chooses the the resulting dtype explicitly, it doesn't matter what
> numpy's defaults are. If the user wants to work with the raw, unscaled
> arrays, then they should know what they are doing.

+1. I think h5py should consider:

File("my.h5")['int8_dset'].dtype == int64
File("my.h5", preserve_dtype=True)['int8_dset'].dtype == int8

Dag Sverre