
On Wed, 2021-01-27 at 18:16 +0100, Ralf Gommers wrote:
On Wed, Jan 27, 2021 at 5:44 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Wed, 2021-01-27 at 10:33 +0100, Ralf Gommers wrote:
On Tue, Jan 26, 2021 at 10:21 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
<snip>
Thanks for all the other comments, they are helpful. I am considering writing a (hopefully short) NEP, to define the direction of thinking here (and clarify what user DTypes can expect). I don't like doing that, but the issue turns out to have a lot of traps and confusing points. (Our current logic alone is confusing enough...)
Sounds good, thanks.
The other tricky example I have was:
The following becomes problematic (order does not matter): uint24 + int16 + uint32 -> int64 <== (uint24 + int16) + (uint24 + uint32) -> int64 <== int32 + uint32 -> int64
With the addition that `uint24 + int32 -> int48` is defined the first could be expected to return `int48`, but actually getting there is tricky (and my current code will not).
If promotion result of a user DType with a builtin one, can be a builtin one, then "ammending" the promotion with things like `uint24 + int32 -> int48` can lead to slightly surprising promotion results. This happens if the result of a promotion with another "category" (builtin) can be both a larger category or a lower one.
I'm not sure I follow this. If uint24 and int48 both come from the same third-party package, there is still a problem here?
Yes, at least unless you ask `uint24` to take over all of the work (i.e. pass in all DTypes at once). So with a binary operator design it is "problematic" (in the sense that you have to live with the above result). Of course a binary operator base does probably not preclude a more complex design. I like a binary operator (it seems much easier to reason about and is a common design pattern). But it would be plausible to have an n-ary design where you pass all dtypes to each and ask them to handle it (similar to `__array_ufunc__`). We could even have both (the binary version for most things, but the ability to hook into the n-ary "reduction").
I'd say just document it and recommend that if >1 custom dtypes are used, then the user should (if they really care about the issue you bring up) determine the output dtype you want via some use of result_type and then explicitly cast.
Right, this is a problem that keeps giving... Maybe a point of how tricky Units are, but similar things will also apply to other "families" of dtypes. If you have Units (that can be based off any other NumPy numerical type), you can break my scheme to work around the associativity issue in the same way: Unit[int16] + uint16 + float16 has no clear hierarchy between them (Unit is the highest, but `float16` dictates the precision). So, probably we just shouldn't care too much about this (for now), but if we want the above to return `Unit[float16]`, we must have additional logic, to do reasonably... (aside from a binary operation) I agree that these are all "insignificant" issues in many ways, since most users will never even notice about the subtleties. So in some ways my meandering towards binary-op only is that it feels at least small enough in complexity that it hopefully doesn't make solutions for the above much more complicated. Cheers, Sebastian
Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion