[Numpy-discussion] Moving forward with value based casting

Fri Jun 7 17:41:36 EDT 2019

On Fri, 2019-06-07 at 13:19 -0500, Sebastian Berg wrote:
> On Fri, 2019-06-07 at 07:18 +0200, Ralf Gommers wrote:
> > 
> > On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs at pobox.com>
> > wrote:
> > > My intuition is that what users actually want is for *native
> > > Python
> > > types* to be treated as having 'underspecified' dtypes, e.g. int
> > > is
> > > happy to coerce to int8/int32/int64/whatever, float is happy to
> > > coerce
> > > to float32/float64/whatever, but once you have a fully-specified
> > > numpy
> > > dtype, it should stay.
> > 
> > Thanks Nathaniel, I think this expresses a possible solution better
> > than anything I've seen on this list before. An explicit
> > "underspecified types" concept could make casting understandable.
> 
> Yes, there is one small additional annoyance (but maybe it is just
> that). In that 127 is the 'underspecified' dtype `uint7` (it can be
> safely cast both to uint8 and int8).
> 
> > > In any case, it would probably be helpful to start by just
> > > writing
> > > down the whole set of rules we have now, because I'm not sure
> > > anyone
> > > understands all the details...
> > 
> > +1
> 
> OK, let me try to sketch the details below:
> 
> 0. "Scalars" means scalars or 0-D arrays here.
> 
> 1. The logic below will only be used if we have a mix of arrays and
> scalars. If all are scalars, the logic is never used. (Plus one
> additional tricky case within ufuncs, which is more hypothetical [0])
> 

And of course I just realized that, trying to be simple, I forgot an
important point there:

The logic in 2. is only used when there is a mix of scalars and arrays,
and the arrays are in the same or higher category. As an example:

np.array([1, 2, 3], dtype=np.uint8) + np.float64(12.)

will not demote the float64, because the scalars "float" is a higher
category than the arrays "integer".

- Sebastian

> 2. Scalars will only be demoted within their category. The categories
> and casting rules within the category are as follows:
> 
> Boolean:
>     Casts safely to all (nothing surprising).
> 
> Integers:
>     Casting is possible if output can hold the value.
>     This includes uint8(127) casting to an int8.
>     (unsigned and signed integers are the same "category")
> 
> Floats:
>     Scalars can be demoted based on value, roughly this
>     avoids overflows:
>         float16:     -65000 < value < 65000
>         float32:    -3.4e38 < value < 3.4e38
>         float64:   -1.7e308 < value < 1.7e308
>         float128 (largest type, does not apply).
> 
> Complex: Same logic as floats (applied to .real and .imag).
> 
> Others: Anything else.
> 
> ---
> 
> Ufunc, as well as `result_type` will use this liberally, which
> basically means finding the smallest type for each category and using
> that. Of course for floats we cannot do the actual cast until later,
> since initially we do not know if the cast will actually be
> performed.
> 
> This is only tricky for uint vs. int, because uint8(127) is a "small
> unsigned". I.e. with our current dtypes there is no strict type
> hierarchy uint8(x) may or may not cast to int8. 
> 
> ---
> 
> We could think of doing:
> 
> arr, min_dtype = np.asarray_and_min_dtype(pyobject)
> 
> which could even fix the list example Nathaniel had. Which would work
> if you would do the dtype hierarchy.
> 
> This is where the `uint7` came from a hypothetical `uint7` would fix
> the integer dtype hierarchy, by representing the numbers `0-127`
> which
> can be cast to uint8 and int8.
> 
> Best,
> 
> Sebastian
> 
> 
> [0] Amendment for point 1:
> 
> There is one detail (bug?) here in the logic though, that I missed
> before. If a ufunc (or result_type) sees a mix of scalars and arrays,
> it will try to decide whether or not to use value based logic. Value
> based logic will be skipped if the scalars are in a higher category
> (based on the ones above) then the highest array – for optimization I
> assume.
> Plausibly, this could cause incorrect logic when the dtype signature
> of
> a ufunc is mixed:
>   float32, int8 -> float32
>   float32, int64 -> float64
> 
> May choose the second loop unnecessarily. Or for example if we have a
> datetime64 in the inputs, there would be no way for value based
> casting
> to be used.
> 
> 
> 
> > Ralf
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190607/ae84ee1d/attachment.sig>