[Numpy-discussion] Fixing/implementing value based Casting

Fri Dec 11 11:31:08 EST 2020

On Fri, 2020-12-11 at 12:09 +0100, Ralf Gommers wrote:
> On Wed, Dec 9, 2020 at 5:22 PM Sebastian Berg <   
> sebastian at sipsolutions.net>
> wrote:
> 
> > Hi all,
> > 
> > Sorry that this will again be a bit complicated again :(. In brief:
> > 
> > * I would like to pass around scalars in some (partially new) C-API
> >   to implement value-based promotion.
> > * There are some subtle commutativity issues with promotion.
> >   Commutativity may change in that case (with respect of value
> > based
> >   promotion, probably to the better normally). [0]
> > 
> > 
> > In the past days, I have been looking into implementing value-based
> > promotion in a way that I had done it for Prototype before.
> > The idea was that NEP 42, allows for the creation of DType
> > dynamically,
> > which does allow very powerful value based promotion/casting.
> > 
> > But I decided there are too many quirks with creating type
> > instances
> > dynamically (potentially very often) just to pass around one
> > additional
> > piece of information.
> > That approach was far more powerful, but it is power and complexity
> > that we do not require, given that:
> > 
> > * Value based promotion is only used for a mix of scalars and
> > arrays
> >   (where "scalar" is annoyingly defined as 0-D at the moment)
> > * I assume it is only relevant for `np.result_type` and promotion
> >   in ufuncs (which often uses `np.result_type`).
> >   `np.can_cast` has such behaviour, but I think it is easier [1].
> >   We could implement more powerful "value based" logic, but I doubt
> >   it is worthwhile.
> > * This is already stretching the Python C-API beyond its limits.
> > 
> > 
> > So I will suggest this instead which *must* modify some (poorly
> > defined) current behaviour:
> > 
> > 1. We always evaluate concrete DTypes first in promotion, this
> > means
> >    that in rare cases the non-commutativity of promotion may change
> >    the result dtype:
> > 
> >        np.result_type(-1, 2**16, np.float32)
> > 
> >    The same can also happens when you reorder the normal dtypes:
> > 
> >        np.result_type(np.int8, np.uint16, np.float32)
> >        np.result_type(np.float32, np.int8, np.uint16)
> > 
> >    in both cases the `np.float32` is moved to the front
> > 
> > 2. If we reorder the above operation, we can define that we never
> >    promote two "scalar values". Instead we convert both to a
> >    concrete one first.  This makes it effectively like:
> > 
> >        np.result_type(np.array(-1).dtype, np.array(2**16).dtype)
> > 
> >    This means that we never have to deal with promoting two values.
> > 
> > 3. We need additional private API (we were always going to need
> > some
> >    additional API); That API could become public:
> > 
> >    * Convert a single value into a concrete dtype, you could say
> >      the same as `self.common_dtype(None)`, but a dedicated
> > function
> >      seems simpler. A dtype like this will never use
> > `common_dtype()`.
> >    * `common_dtype_with_scalar(self, other, scalar)` (note that
> >      only one of the DTypes can have a scalar).
> >      As a fallback, this function can be implemented by converting
> >      to the concrete DType and retrying with the normal
> > `common_dtype`.
> > 
> >    (At leas the second slot must be made public we are to allow
> > value
> >    based promotion for user DTypes. I expect we will, but it is not
> >    particularly important to me right now.)
> > 
> > 4. Our public API (including new C-API) has to expose and take the
> >    scalar values. That means promotion in ufuncs will get DTypes
> > and
> >    `scalar_values`, although those should normally be `NULL` (or
> > None).
> > 
> >    In future python API, this is probably acceptable:
> > 
> >         np.result_type([t if v is None else v for t, v in
> > zip(dtypes,
> > scalar_values)])
> > 
> >    In C, we need to expose a function below `result_type` which
> >    accepts both the scalar values and DTypes explicitly.
> > 
> > 5. For the future: As said many times, I would like to deprecate
> >    using value based promotion for anything except Python core
> > types.
> >    That just seems wrong and confusing.
> > 
> 
> I agree with this. 

It is tempting to wonder what would happen if we dropped it entirely,
but I fear my current assumption is that it should keep working largely
unchanged with careful deprecations hopefully added soon...

> Value-based promotion was never a great idea, so let's
> try to keep it as minimal as possible. I'm not even sure what kind of
> value-based promotion for non Python builtin types is happening now
> (?).

It (roughly?) identical for all zero dimensional objects:

    arr1 = np.array(1, dtype=np.int64)
    arr2 = np.array([1, 2], dtype=np.int32)

    (arr1 + arr2).dtype == np.int32
    (1 + arr2).dtype == np.int32

In the first addition `arr1` behaves like the Python `1` even though it
has a dtype attached.

The reason for this probably that our entry-points greedily convert
arrays. And it shows one caveat: If we/SciPy call `np.asarray` on a
Python integer input we would lose value-based behaviour, this may
actually be a bigger pain point (see below for example).

> 
>    My only problem is that while I can warn (possibly sometimes too
> >    often) when behaviour will change.  I do not have a good idea
> > about
> >    silencing that warning.
> > 
> 
> Do you see a real issue with this somewhere, or is it all just corner
> cases? In that case no warning seems okay.
> 

Probably it is mostly corner cases, if you do:

    arr_uint16 + int32(1) + 1.

We would warn for the first case, but not for:

    arr_uint16 + (int32(1) + 1.)

even though it gives identical results. The same might happen in
`np.concatenate` where all arguments are passed at once.

I can think of one bigger pain point for this type of function:

    def function(arr1, arr2):
        arr1 = np.asarray(arr1)
        arr2 = np.asarray(arr2)
        return arr1 + arr2  # some complex code

we could add a cast to the function I guess. But for the end-user it
might be tricky to realize that they need to cast the input to that
function.
And those type of functions are abundant...

> 
> > 
> > Note that this affects NEP 42 (a little bit). NEP 42 currently
> > makes a
> > nod towards the dynamic type creation, but falls short of actually
> > defining it.
> > 
> So These rules have to be incorporated, but IMO they do not affect
> the
> > general design choices in the NEP.
> > 
> > 
> > There is probably even more complexity to be found here, but for
> > now
> > the above seems to be at least good enough to make headway...
> > 
> > 
> > Any thoughts or clarity remaining that I can try to confuse? :)
> > 
> 
> My main question is why you're considering both deprecating and
> expanding
> public API (in points 3 and 4). If you have a choice, keep everything
> private I'd say.
> 

I had to realize that the non-associativity is trickier to solve. 
Still digging into that...

But, I guess we can probably live with it if user DTypes can show some
non-associativity even in a single call to `np.result_type` or
`np.concatenate`. Generally I don't have much squirms, as long as
things don't get worse (they are already broken).

The premise for requiring some new public API is that for us:

    int16(1) + 1 == int16(2)  # value based for Python 1

A user implements int24, if it is to fit in perfectly we would like: 

    int24(1) + 1 == int24(2)

Which requires some way to pass `int24` the information that it is a
Python `1` in some form (there are probably many options for how to
pass it).

Exposure in promotion might be interesting for weirdly complex ufuncs,
like `scipy.special.eval_jacobi` which have mixed type inputs. Again a
corner case of a corner case, but would prefer if there was a
(possible) future solution.

Cheers,

Sebastian

> My other question is: this is a complex story, it all sounds
> reasonable but
> do you need more feedback than "sounds reasonable"?
> 
> Cheers,
> Ralf
> 
> 
> 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > 
> > [0] We could use the reordering trick also for concrete DTypes,
> > although, that would require introducing some kind of priority... I
> > do
> > not like that much as public API, but it might be something to look
> > at
> > internally or for types deriving from the builtin abstract DTypes:
> >     * inexact
> >     * other
> > 
> > Just evaluating all `inexact` first would probably solve our
> > commutativity issues.
> > 
> > [1] NumPy uses `np.can_cast(value, dtype)` also. For example:
> > 
> >     np.can_cast(np.array(1., dtype=np.float64), np.float32,
> > casting="safe")
> > 
> > returns True. My working hypothesis is that `np.can_cast` as above
> > is
> > just a side battle.  I.e. we can either:
> > 
> > * Flip the switch on it (can-cast does no value based logic, even
> > though we use it internally, we do not need it).
> > * Or, we can implement those cases of `np.can_cast` by using
> > promotion.
> > 
> > The first one is tempting, but I assume we should go with the
> > second
> > since it preserves behaviour and is slightly more powerful.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201211/cf431d94/attachment.sig>