[Numpy-discussion] Fixing/implementing value based Casting

Fri Dec 11 06:09:28 EST 2020

On Wed, Dec 9, 2020 at 5:22 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> Sorry that this will again be a bit complicated again :(. In brief:
>
> * I would like to pass around scalars in some (partially new) C-API
>   to implement value-based promotion.
> * There are some subtle commutativity issues with promotion.
>   Commutativity may change in that case (with respect of value based
>   promotion, probably to the better normally). [0]
>
>
> In the past days, I have been looking into implementing value-based
> promotion in a way that I had done it for Prototype before.
> The idea was that NEP 42, allows for the creation of DType dynamically,
> which does allow very powerful value based promotion/casting.
>
> But I decided there are too many quirks with creating type instances
> dynamically (potentially very often) just to pass around one additional
> piece of information.
> That approach was far more powerful, but it is power and complexity
> that we do not require, given that:
>
> * Value based promotion is only used for a mix of scalars and arrays
>   (where "scalar" is annoyingly defined as 0-D at the moment)
> * I assume it is only relevant for `np.result_type` and promotion
>   in ufuncs (which often uses `np.result_type`).
>   `np.can_cast` has such behaviour, but I think it is easier [1].
>   We could implement more powerful "value based" logic, but I doubt
>   it is worthwhile.
> * This is already stretching the Python C-API beyond its limits.
>
>
> So I will suggest this instead which *must* modify some (poorly
> defined) current behaviour:
>
> 1. We always evaluate concrete DTypes first in promotion, this means
>    that in rare cases the non-commutativity of promotion may change
>    the result dtype:
>
>        np.result_type(-1, 2**16, np.float32)
>
>    The same can also happens when you reorder the normal dtypes:
>
>        np.result_type(np.int8, np.uint16, np.float32)
>        np.result_type(np.float32, np.int8, np.uint16)
>
>    in both cases the `np.float32` is moved to the front
>
> 2. If we reorder the above operation, we can define that we never
>    promote two "scalar values". Instead we convert both to a
>    concrete one first.  This makes it effectively like:
>
>        np.result_type(np.array(-1).dtype, np.array(2**16).dtype)
>
>    This means that we never have to deal with promoting two values.
>
> 3. We need additional private API (we were always going to need some
>    additional API); That API could become public:
>
>    * Convert a single value into a concrete dtype, you could say
>      the same as `self.common_dtype(None)`, but a dedicated function
>      seems simpler. A dtype like this will never use `common_dtype()`.
>    * `common_dtype_with_scalar(self, other, scalar)` (note that
>      only one of the DTypes can have a scalar).
>      As a fallback, this function can be implemented by converting
>      to the concrete DType and retrying with the normal `common_dtype`.
>
>    (At leas the second slot must be made public we are to allow value
>    based promotion for user DTypes. I expect we will, but it is not
>    particularly important to me right now.)
>
> 4. Our public API (including new C-API) has to expose and take the
>    scalar values. That means promotion in ufuncs will get DTypes and
>    `scalar_values`, although those should normally be `NULL` (or None).
>
>    In future python API, this is probably acceptable:
>
>         np.result_type([t if v is None else v for t, v in zip(dtypes,
> scalar_values)])
>
>    In C, we need to expose a function below `result_type` which
>    accepts both the scalar values and DTypes explicitly.
>
> 5. For the future: As said many times, I would like to deprecate
>    using value based promotion for anything except Python core types.
>    That just seems wrong and confusing.
>

I agree with this. Value-based promotion was never a great idea, so let's
try to keep it as minimal as possible. I'm not even sure what kind of
value-based promotion for non Python builtin types is happening now (?).

   My only problem is that while I can warn (possibly sometimes too
>    often) when behaviour will change.  I do not have a good idea about
>    silencing that warning.
>

Do you see a real issue with this somewhere, or is it all just corner
cases? In that case no warning seems okay.

>
> Note that this affects NEP 42 (a little bit). NEP 42 currently makes a
> nod towards the dynamic type creation, but falls short of actually
> defining it.
>
So These rules have to be incorporated, but IMO they do not affect the
> general design choices in the NEP.
>
>
> There is probably even more complexity to be found here, but for now
> the above seems to be at least good enough to make headway...
>
>
> Any thoughts or clarity remaining that I can try to confuse? :)
>

My main question is why you're considering both deprecating and expanding
public API (in points 3 and 4). If you have a choice, keep everything
private I'd say.

My other question is: this is a complex story, it all sounds reasonable but
do you need more feedback than "sounds reasonable"?

Cheers,
Ralf

> Cheers,
>
> Sebastian
>
>
>
> [0] We could use the reordering trick also for concrete DTypes,
> although, that would require introducing some kind of priority... I do
> not like that much as public API, but it might be something to look at
> internally or for types deriving from the builtin abstract DTypes:
>     * inexact
>     * other
>
> Just evaluating all `inexact` first would probably solve our
> commutativity issues.
>
> [1] NumPy uses `np.can_cast(value, dtype)` also. For example:
>
>     np.can_cast(np.array(1., dtype=np.float64), np.float32, casting="safe")
>
> returns True. My working hypothesis is that `np.can_cast` as above is
> just a side battle.  I.e. we can either:
>
> * Flip the switch on it (can-cast does no value based logic, even
> though we use it internally, we do not need it).
> * Or, we can implement those cases of `np.can_cast` by using promotion.
>
> The first one is tempting, but I assume we should go with the second
> since it preserves behaviour and is slightly more powerful.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201211/ecc88c80/attachment-0001.html>