Re: NEP draft for the future behaviour of scalar promotion

On Tue, 22 Feb 2022, at 1:01 AM, Stefan van der Walt wrote:
it is easier to explain away `x + 1` behaving oddly over `x[0] + 1` behaving oddly
Is it? I find the two equivalent, honestly.
given that we pretend like NumPy scalars do not exist.
This is the leaky abstraction that I think should be plugged in this revamp.
Yup. I would be in favour of such a repr change. (And to be clear, it is *only* a repr change, not a behaviour change!) I have indeed run across this a few times, e.g. trying to encode a single value in json only to find that it was a NumPy int64 rather than an int.
Is it really much more common than arithmetic combining arrays and literals? I'd say it's much *less* common, especially in "idiomatic" NumPy which tries to avoid Python looping over elements.
now? It becomes: x[0] + np.int64(1).
I would write it as x[0].astype(np.int64) + 1, and indeed I think I would find that less confusing, reading the code years later, because it would allow me to not even have to think about type promotion.
The reason we had value inspection was that it gave us a cushy "best of both worlds"; when going with dtype-only casting, you have to give something up.
Yes yes, we agree we are giving something up, we merely disagree about what is better to give up long term for our community. For me, the attractiveness of unified scalar and array semantics, together with unified type promotion, beats the attractiveness of hiding overflow from users, especially since the hiding can only ever be patchy.* I 100% agree with you that it is a tradeoff. But, imho, one worth making. * e.g. the same user might initially be happy about the result of x[0] + 1 matching their infinite-precision expectation, but then be surprised by x[0] + 1 -> 256 y[0] = 1 x[0] + y[0] -> 0 # WTH Juan.

I'll go even further: I would say a common situation where people use syntax like x[0] + 1 is in sanity checks/tests. In which case, it's *very bad* to have different behaviour between x[0] + 1 (e.g. when checking code) and x + 1 (e.g. in production code).

On Tue, 2022-02-22 at 01:43 -0600, Juan Nunez-Iglesias wrote:
I think there are a few use-cases for this (one that comes to mind is integration, where the integration function is sometimes called on scalar values). Especially if you look to new users, who may be using scalars for lack of experience writing vectorized code. But mainly, I think it is the sneakiest backcompat break... The one "middle ground" possibility I see here is that we could limit the weak logic to Python operators in principle (I know this seems unpopular). The main arguments are: * It seems somewhat straight forward to explain that `np.add(x, 1)` behaves more like `np.add(x, np.asarray(1))` * We can give warnings for operators: At least integer overflows will give a warning, notifying users of a potential problem. * The long notation `np.add(x, np.uint8(1))` isn't so bad if you don't have operators. (or `dtype=x.dtype`) (I may well be missing a reason for why this doesn't add up at all.) Unfortunately, there will always be strange cases. No matter what we do, it will not always be clear if a library function calls `np.asarray()` on the input first, or first uses the input directly. I do not think that `asarray` should drag around the information that it was "weak" as JAX at least can (to me this seems prone to errors and unlike JAX our arrays are not immutable). So if you want "weak" logic for function input you need to take care to handle it before calling `np.asarray()`. Cheers, Sebastian

I'll go even further: I would say a common situation where people use syntax like x[0] + 1 is in sanity checks/tests. In which case, it's *very bad* to have different behaviour between x[0] + 1 (e.g. when checking code) and x + 1 (e.g. in production code).

On Tue, 2022-02-22 at 01:43 -0600, Juan Nunez-Iglesias wrote:
I think there are a few use-cases for this (one that comes to mind is integration, where the integration function is sometimes called on scalar values). Especially if you look to new users, who may be using scalars for lack of experience writing vectorized code. But mainly, I think it is the sneakiest backcompat break... The one "middle ground" possibility I see here is that we could limit the weak logic to Python operators in principle (I know this seems unpopular). The main arguments are: * It seems somewhat straight forward to explain that `np.add(x, 1)` behaves more like `np.add(x, np.asarray(1))` * We can give warnings for operators: At least integer overflows will give a warning, notifying users of a potential problem. * The long notation `np.add(x, np.uint8(1))` isn't so bad if you don't have operators. (or `dtype=x.dtype`) (I may well be missing a reason for why this doesn't add up at all.) Unfortunately, there will always be strange cases. No matter what we do, it will not always be clear if a library function calls `np.asarray()` on the input first, or first uses the input directly. I do not think that `asarray` should drag around the information that it was "weak" as JAX at least can (to me this seems prone to errors and unlike JAX our arrays are not immutable). So if you want "weak" logic for function input you need to take care to handle it before calling `np.asarray()`. Cheers, Sebastian
participants (2)
-
Juan Nunez-Iglesias
-
Sebastian Berg