Hi Sebastian,

Tricky! It seems a balance between unexpected memory blow-up and unexpected wrapping (the latter mostly for integers).

Some comments specifically on your message first, then some more general related ones.

1. I'm very much against letting `a + b` do anything else than `np.add(a, b)`.
2. For python values, an argument for casting by value is that a python int can be arbitrarily long; the only reasonable course of action for those seems to make them float, and once you do that one might as well cast to whatever type can hold the value (at least approximately).
3. Not necessarily preferred, but for casting of scalars, one can get more consistent behaviour also by extending the casting by value to any array that has size=1.

Overall, just on the narrow question, I'd be quite happy with your suggestion of using type information if available, i.e., only cast python values to a minimal dtype.If one uses numpy types, those mostly will have come from previous calculations with the same arrays, so things will work as expected. And in most memory-limited applications, one would do calculations in-place anyway (or, as Tyler noted, for power users one can assume awareness of memory and thus the incentive to tell explicitly what dtype is wanted - just `np.add(a, b, dtype=...)`, no need to create `out`).

More generally, I guess what I don't like about the casting rules generally is that there is a presumption that if the value can be cast, the operation will generally succeed. For `np.add` and `np.subtract`, this perhaps is somewhat reasonable (though for unsigned a bit more dubious), but for `np.multiply` or `np.power` it is much less so. (Indeed, we had a long discussion about what to do with `int ** power` - now special-casing negative integer powers.) Changing this, however, probably really is a bridge too far!

Finally, somewhat related: I think the largest confusing actually results from the `uint64+in64 -> float64` casting.  Should this cast to int64 instead?

All the best,

Marten