[Numpy-discussion] Moving forward with value based casting

Thu Jun 6 11:43:44 EDT 2019

On Wed, 2019-06-05 at 21:35 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> Tricky! It seems a balance between unexpected memory blow-up and
> unexpected wrapping (the latter mostly for integers). 
> 
> Some comments specifically on your message first, then some more
> general related ones. 
> 
> 1. I'm very much against letting `a + b` do anything else than
> `np.add(a, b)`.

Well, I tend to agree. But just to put it out there:

[1] + [2]  == [1, 2]
np.add([1], [2]) == 3

So that is already far from true, since coercion has to occur. Of
course it is true that:

arr + something_else

will at some point force coercion of `something_else`, so that point is
only half valid if either `a` or `b` is already a numpy array/scalar.

> 2. For python values, an argument for casting by value is that a
> python int can be arbitrarily long; the only reasonable course of
> action for those seems to make them float, and once you do that one
> might as well cast to whatever type can hold the value (at least
> approximately).

To be honest, the "arbitrary long" thing is another issue, which is the
silent conversion to "object" dtype. Something that is also on the not
done list of: Maybe we should deprecate it.

In other words, we would freeze python int to one clear type, if you
have an arbitrarily large int, you would need to use `object` dtype (or
preferably a new `pyint/arbitrary_precision_int` dtype) explicitly.

> 3. Not necessarily preferred, but for casting of scalars, one can get
> more consistent behaviour also by extending the casting by value to
> any array that has size=1.
> 

That sounds just as horrible as the current mismatch to me, to be
honest.

> Overall, just on the narrow question, I'd be quite happy with your
> suggestion of using type information if available, i.e., only cast
> python values to a minimal dtype.If one uses numpy types, those
> mostly will have come from previous calculations with the same
> arrays, so things will work as expected. And in most memory-limited
> applications, one would do calculations in-place anyway (or, as Tyler
> noted, for power users one can assume awareness of memory and thus
> the incentive to tell explicitly what dtype is wanted - just
> `np.add(a, b, dtype=...)`, no need to create `out`).
> 
> More generally, I guess what I don't like about the casting rules
> generally is that there is a presumption that if the value can be
> cast, the operation will generally succeed. For `np.add` and
> `np.subtract`, this perhaps is somewhat reasonable (though for
> unsigned a bit more dubious), but for `np.multiply` or `np.power` it
> is much less so. (Indeed, we had a long discussion about what to do
> with `int ** power` - now special-casing negative integer powers.)
> Changing this, however, probably really is a bridge too far!

Indeed that is right. But that is a different point. E.g. there is
nothing wrong for example that `np.power` shouldn't decide that
`int**power` should always _promote_ (not cast) `int` to some larger
integer type if available.
The only point where we seriously have such logic right now is for
np.add.reduce (sum) and np.multiply.reduce (prod), which always use at
least `long` precision (and actually upcast bool->int, although
np.add(True, True) does not. Another difference to True + True...)

> 
> Finally, somewhat related: I think the largest confusing actually
> results from the `uint64+in64 -> float64` casting.  Should this cast
> to int64 instead?

Not sure, but yes, it is the other quirk in our casting that should be
discussed….

- Sebastian

> 
> All the best,
> 
> Marten
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190606/8e00c9cb/attachment.sig>