Re: [Numpy-discussion] Moving forward with value based casting

June 6, 2019

      On Wed, 2019-06-05 at 21:35 -0400, Marten van Kerkwijk wrote:
...
Hi Sebastian,
Tricky! It seems a balance between unexpected memory blow-up and
unexpected wrapping (the latter mostly for integers).
Some comments specifically on your message first, then some more
general related ones.
1. I'm very much against letting `a + b` do anything else than
`np.add(a, b)`.
Well, I tend to agree. But just to put it out there:

[1] + [2]  == [1, 2]
np.add([1], [2]) == 3

So that is already far from true, since coercion has to occur. Of
course it is true that:

arr + something_else

will at some point force coercion of `something_else`, so that point is
only half valid if either `a` or `b` is already a numpy array/scalar.
...
2. For python values, an argument for casting by value is that a
python int can be arbitrarily long; the only reasonable course of
action for those seems to make them float, and once you do that one
might as well cast to whatever type can hold the value (at least
approximately).
To be honest, the "arbitrary long" thing is another issue, which is the
silent conversion to "object" dtype. Something that is also on the not
done list of: Maybe we should deprecate it.

In other words, we would freeze python int to one clear type, if you
have an arbitrarily large int, you would need to use `object` dtype (or
preferably a new `pyint/arbitrary_precision_int` dtype) explicitly.
...
3. Not necessarily preferred, but for casting of scalars, one can get
more consistent behaviour also by extending the casting by value to
any array that has size=1.
That sounds just as horrible as the current mismatch to me, to be
honest.
...
Overall, just on the narrow question, I'd be quite happy with your
suggestion of using type information if available, i.e., only cast
python values to a minimal dtype.If one uses numpy types, those
mostly will have come from previous calculations with the same
arrays, so things will work as expected. And in most memory-limited
applications, one would do calculations in-place anyway (or, as Tyler
noted, for power users one can assume awareness of memory and thus
the incentive to tell explicitly what dtype is wanted - just
`np.add(a, b, dtype=...)`, no need to create `out`).
More generally, I guess what I don't like about the casting rules
generally is that there is a presumption that if the value can be
cast, the operation will generally succeed. For `np.add` and
`np.subtract`, this perhaps is somewhat reasonable (though for
unsigned a bit more dubious), but for `np.multiply` or `np.power` it
is much less so. (Indeed, we had a long discussion about what to do
with `int ** power` - now special-casing negative integer powers.)
Changing this, however, probably really is a bridge too far!
Indeed that is right. But that is a different point. E.g. there is
nothing wrong for example that `np.power` shouldn't decide that
`int**power` should always _promote_ (not cast) `int` to some larger
integer type if available.
The only point where we seriously have such logic right now is for
np.add.reduce (sum) and np.multiply.reduce (prod), which always use at
least `long` precision (and actually upcast bool->int, although
np.add(True, True) does not. Another difference to True + True...)
...
Finally, somewhat related: I think the largest confusing actually
results from the `uint64+in64 -> float64` casting.  Should this cast
to int64 instead?
Not sure, but yes, it is the other quirk in our casting that should be
discussed….

- Sebastian
...
All the best,
Marten
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Moving forward with value based casting

Sebastian Berg