Re: [Numpy-discussion] Moving forward with value based casting

June 7, 2019

      On Fri, 2019-06-07 at 07:18 +0200, Ralf Gommers wrote:
...
On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs@pobox.com> wrote:
...
My intuition is that what users actually want is for *native Python
types* to be treated as having 'underspecified' dtypes, e.g. int is
happy to coerce to int8/int32/int64/whatever, float is happy to
coerce
to float32/float64/whatever, but once you have a fully-specified
numpy
dtype, it should stay.
Thanks Nathaniel, I think this expresses a possible solution better
than anything I've seen on this list before. An explicit
"underspecified types" concept could make casting understandable.
Yes, there is one small additional annoyance (but maybe it is just
that). In that 127 is the 'underspecified' dtype `uint7` (it can be
safely cast both to uint8 and int8).
...
...
In any case, it would probably be helpful to start by just writing
down the whole set of rules we have now, because I'm not sure
anyone
understands all the details...
+1
OK, let me try to sketch the details below:

0. "Scalars" means scalars or 0-D arrays here.

1. The logic below will only be used if we have a mix of arrays and
scalars. If all are scalars, the logic is never used. (Plus one
additional tricky case within ufuncs, which is more hypothetical [0])

2. Scalars will only be demoted within their category. The categories
and casting rules within the category are as follows:

Boolean:
    Casts safely to all (nothing surprising).

Integers:
    Casting is possible if output can hold the value.
    This includes uint8(127) casting to an int8.
    (unsigned and signed integers are the same "category")

Floats:
    Scalars can be demoted based on value, roughly this
    avoids overflows:
        float16:     -65000 < value < 65000
        float32:    -3.4e38 < value < 3.4e38
        float64:   -1.7e308 < value < 1.7e308
        float128 (largest type, does not apply).

Complex: Same logic as floats (applied to .real and .imag).

Others: Anything else.

---

Ufunc, as well as `result_type` will use this liberally, which
basically means finding the smallest type for each category and using
that. Of course for floats we cannot do the actual cast until later,
since initially we do not know if the cast will actually be performed.

This is only tricky for uint vs. int, because uint8(127) is a "small
unsigned". I.e. with our current dtypes there is no strict type
hierarchy uint8(x) may or may not cast to int8. 

---

We could think of doing:

arr, min_dtype = np.asarray_and_min_dtype(pyobject)

which could even fix the list example Nathaniel had. Which would work
if you would do the dtype hierarchy.

This is where the `uint7` came from a hypothetical `uint7` would fix
the integer dtype hierarchy, by representing the numbers `0-127` which
can be cast to uint8 and int8.

Best,

Sebastian

[0] Amendment for point 1:

There is one detail (bug?) here in the logic though, that I missed
before. If a ufunc (or result_type) sees a mix of scalars and arrays,
it will try to decide whether or not to use value based logic. Value
based logic will be skipped if the scalars are in a higher category
(based on the ones above) then the highest array – for optimization I
assume.
Plausibly, this could cause incorrect logic when the dtype signature of
a ufunc is mixed:
  float32, int8 -> float32
  float32, int64 -> float64

May choose the second loop unnecessarily. Or for example if we have a
datetime64 in the inputs, there would be no way for value based casting
to be used.
...
Ralf
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Moving forward with value based casting

Sebastian Berg