[Numpy-discussion] Moving forward with value based casting
Sebastian Berg
sebastian at sipsolutions.net
Fri Jun 7 17:41:36 EDT 2019
On Fri, 2019-06-07 at 13:19 -0500, Sebastian Berg wrote:
> On Fri, 2019-06-07 at 07:18 +0200, Ralf Gommers wrote:
> >
> > On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs at pobox.com>
> > wrote:
> > > My intuition is that what users actually want is for *native
> > > Python
> > > types* to be treated as having 'underspecified' dtypes, e.g. int
> > > is
> > > happy to coerce to int8/int32/int64/whatever, float is happy to
> > > coerce
> > > to float32/float64/whatever, but once you have a fully-specified
> > > numpy
> > > dtype, it should stay.
> >
> > Thanks Nathaniel, I think this expresses a possible solution better
> > than anything I've seen on this list before. An explicit
> > "underspecified types" concept could make casting understandable.
>
> Yes, there is one small additional annoyance (but maybe it is just
> that). In that 127 is the 'underspecified' dtype `uint7` (it can be
> safely cast both to uint8 and int8).
>
> > > In any case, it would probably be helpful to start by just
> > > writing
> > > down the whole set of rules we have now, because I'm not sure
> > > anyone
> > > understands all the details...
> >
> > +1
>
> OK, let me try to sketch the details below:
>
> 0. "Scalars" means scalars or 0-D arrays here.
>
> 1. The logic below will only be used if we have a mix of arrays and
> scalars. If all are scalars, the logic is never used. (Plus one
> additional tricky case within ufuncs, which is more hypothetical [0])
>
And of course I just realized that, trying to be simple, I forgot an
important point there:
The logic in 2. is only used when there is a mix of scalars and arrays,
and the arrays are in the same or higher category. As an example:
np.array([1, 2, 3], dtype=np.uint8) + np.float64(12.)
will not demote the float64, because the scalars "float" is a higher
category than the arrays "integer".
- Sebastian
> 2. Scalars will only be demoted within their category. The categories
> and casting rules within the category are as follows:
>
> Boolean:
> Casts safely to all (nothing surprising).
>
> Integers:
> Casting is possible if output can hold the value.
> This includes uint8(127) casting to an int8.
> (unsigned and signed integers are the same "category")
>
> Floats:
> Scalars can be demoted based on value, roughly this
> avoids overflows:
> float16: -65000 < value < 65000
> float32: -3.4e38 < value < 3.4e38
> float64: -1.7e308 < value < 1.7e308
> float128 (largest type, does not apply).
>
> Complex: Same logic as floats (applied to .real and .imag).
>
> Others: Anything else.
>
> ---
>
> Ufunc, as well as `result_type` will use this liberally, which
> basically means finding the smallest type for each category and using
> that. Of course for floats we cannot do the actual cast until later,
> since initially we do not know if the cast will actually be
> performed.
>
> This is only tricky for uint vs. int, because uint8(127) is a "small
> unsigned". I.e. with our current dtypes there is no strict type
> hierarchy uint8(x) may or may not cast to int8.
>
> ---
>
> We could think of doing:
>
> arr, min_dtype = np.asarray_and_min_dtype(pyobject)
>
> which could even fix the list example Nathaniel had. Which would work
> if you would do the dtype hierarchy.
>
> This is where the `uint7` came from a hypothetical `uint7` would fix
> the integer dtype hierarchy, by representing the numbers `0-127`
> which
> can be cast to uint8 and int8.
>
> Best,
>
> Sebastian
>
>
> [0] Amendment for point 1:
>
> There is one detail (bug?) here in the logic though, that I missed
> before. If a ufunc (or result_type) sees a mix of scalars and arrays,
> it will try to decide whether or not to use value based logic. Value
> based logic will be skipped if the scalars are in a higher category
> (based on the ones above) then the highest array – for optimization I
> assume.
> Plausibly, this could cause incorrect logic when the dtype signature
> of
> a ufunc is mixed:
> float32, int8 -> float32
> float32, int64 -> float64
>
> May choose the second loop unnecessarily. Or for example if we have a
> datetime64 in the inputs, there would be no way for value based
> casting
> to be used.
>
>
>
> > Ralf
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190607/ae84ee1d/attachment.sig>
More information about the NumPy-Discussion
mailing list