
On Fri, 2019-06-07 at 07:18 +0200, Ralf Gommers wrote:
On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs@pobox.com> wrote:
My intuition is that what users actually want is for *native Python types* to be treated as having 'underspecified' dtypes, e.g. int is happy to coerce to int8/int32/int64/whatever, float is happy to coerce to float32/float64/whatever, but once you have a fully-specified numpy dtype, it should stay.
Thanks Nathaniel, I think this expresses a possible solution better than anything I've seen on this list before. An explicit "underspecified types" concept could make casting understandable.
Yes, there is one small additional annoyance (but maybe it is just that). In that 127 is the 'underspecified' dtype `uint7` (it can be safely cast both to uint8 and int8).
In any case, it would probably be helpful to start by just writing down the whole set of rules we have now, because I'm not sure anyone understands all the details...
+1
OK, let me try to sketch the details below: 0. "Scalars" means scalars or 0-D arrays here. 1. The logic below will only be used if we have a mix of arrays and scalars. If all are scalars, the logic is never used. (Plus one additional tricky case within ufuncs, which is more hypothetical [0]) 2. Scalars will only be demoted within their category. The categories and casting rules within the category are as follows: Boolean: Casts safely to all (nothing surprising). Integers: Casting is possible if output can hold the value. This includes uint8(127) casting to an int8. (unsigned and signed integers are the same "category") Floats: Scalars can be demoted based on value, roughly this avoids overflows: float16: -65000 < value < 65000 float32: -3.4e38 < value < 3.4e38 float64: -1.7e308 < value < 1.7e308 float128 (largest type, does not apply). Complex: Same logic as floats (applied to .real and .imag). Others: Anything else. --- Ufunc, as well as `result_type` will use this liberally, which basically means finding the smallest type for each category and using that. Of course for floats we cannot do the actual cast until later, since initially we do not know if the cast will actually be performed. This is only tricky for uint vs. int, because uint8(127) is a "small unsigned". I.e. with our current dtypes there is no strict type hierarchy uint8(x) may or may not cast to int8. --- We could think of doing: arr, min_dtype = np.asarray_and_min_dtype(pyobject) which could even fix the list example Nathaniel had. Which would work if you would do the dtype hierarchy. This is where the `uint7` came from a hypothetical `uint7` would fix the integer dtype hierarchy, by representing the numbers `0-127` which can be cast to uint8 and int8. Best, Sebastian [0] Amendment for point 1: There is one detail (bug?) here in the logic though, that I missed before. If a ufunc (or result_type) sees a mix of scalars and arrays, it will try to decide whether or not to use value based logic. Value based logic will be skipped if the scalars are in a higher category (based on the ones above) then the highest array – for optimization I assume. Plausibly, this could cause incorrect logic when the dtype signature of a ufunc is mixed: float32, int8 -> float32 float32, int64 -> float64 May choose the second loop unnecessarily. Or for example if we have a datetime64 in the inputs, there would be no way for value based casting to be used.
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion