[Numpy-discussion] Value based promotion and user DTypes
Ralf Gommers
ralf.gommers at gmail.com
Tue Jan 26 00:11:46 EST 2021
On Tue, Jan 26, 2021 at 2:01 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:
> Hi all,
>
> does anyone have a thought about how user DTypes (i.e. DTypes not
> currently part of NumPy) should interact with the "value based
> promotion" logic we currently have?
> For now I can just do anything, and we will find out later. And I will
> have to do something for now, basically with the hope that it all turns
> out all-right.
>
> But there are multiple options for both what to offer to user DTypes
> and where we want to move (I am using `bfloat16` as a potential DType
> here).
>
> 1. The "weak" dtype option (this is what JAX does), where:
>
> np.array([1], dtype=bfloat16) + 4.
>
> returns a bfloat16, because 4. is "lower" than all floating
> point types.
> In this scheme the user defined `bfloat16` knows that the input
> is a Python float, but it does not know its value (if an
> overflow occurs during conversion, it could warn or error but
> not upcast). For example `np.array([1], dtype=uint4) + 2**5`
> will try `uint4(2**5)` assuming it works.
> NumPy is different `2.**300` would ensure the result is a `float64`.
>
> If a DType does not make use of this, it would get the behaviour
> of option 2.
>
> 2. The "default" DType option: np.array([1], dtype=bfloat16) + 4. is
> always the same as `bfloat16 + float64 -> float64`.
>
> 3. Use whatever NumPy considers the "smallest appropriate dtype".
> This will not always work correctly for unsigned integers, and for
> floats this would be float16, which doesn't help with bfloat16.
>
> 4. Try to expose the actual value. (I do not want to do this, but it
> is probably a plausible extension with most other options, since
> the other options can be the "default".)
>
>
> Within these options, there is one more difficulty. NumPy currently
> applies the same logic for:
>
> np.array([1], dtype=bfloat16) + np.array(4., dtype=np.float64)
>
> which in my opinion is wrong (the second array is typed). We do have
> the same issue with deciding what to do in the future for NumPy itself.
> Right now I feel that new (user) DTypes should live in the future
> (whatever that future is).
>
I agree. And I have a preference for option 1. Option 2 is too greedy in
upcasting, the value-based casting is problematic in multiple ways (e.g.,
hard for Numba because output dtype cannot be predicted from input dtypes),
and option 4 is hard to understand a rationale for (maybe so the user dtype
itself can implement option 3?).
> I have said previously, that we could distinguish this for universal
> functions. But calls like `np.asarray(4.)` are common, and they would
> lose the information that `4.` was originally a Python float.
>
Hopefully the future will have way fewer asarray calls in it. Rejecting
scalar input to functions would be nice. This is what most other
array/tensor libraries do.
>
> So, recently, I was considering that a better option may be to limit
> this to math Python operators: +, -, /, **, ...
>
+1
This discussion may be relevant:
https://github.com/data-apis/array-api/issues/14.
> Those are the places where it may make a difference to write:
>
> arr + 4. vs. arr + bfloat16(4.)
> int8_arr + 1 vs. int8_arr + np.int8(1)
> arr += 4. (in-place may be the most significant use-case)
>
> while:
>
> np.add(int8_arr, 1) vs. np.add(int8_arr, np.int8(1))
>
> is maybe less significant. On the other hand, it would add a subtle
> difference between operators vs. direct ufunc calls...
>
>
> In general, it may not matter: We can choose option 1 (which the
> bfloat16 does not have to use), and modify it if we ever change the
> logic in NumPy itself. Basically, I will probably pick option 1 for
> now and press on, and we can reconsider later. And hope that it does
> not make things even more complicated than it is now.
>
> Or maybe better just limit it completely to always use the default for
> user DTypes?
>
I'm not sure I understand why you like option 1 but want to give
user-defined dtypes the choice of opting out of it. Upcasting will rarely
make sense for user-defined dtypes anyway.
>
> But I would be interested if the "limit to Python operators" is
> something we should aim for here. This does make a small difference,
> because user DTypes could "live" in the future if we have an idea of
> how that future may look like.
>
A future with:
- no array scalars
- 0-D arrays have the same casting rules as >=1-D arrays
- no value-based casting
would be quite nice. For "same kind" casting like
https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html.
Mixed-kind casting isn't specified there, because it's too different
between libraries. The JAX design (
https://jax.readthedocs.io/en/latest/type_promotion.html) seems sensible
there.
Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210126/757f8103/attachment-0001.html>
More information about the NumPy-Discussion
mailing list