[Typing-sig] Re: Next tensor typing meeting: Monday 15th November

19 Nov 2021

      On Fri, 2021-11-19 at 10:59 +0000, Matthew Rahtz via Typing-sig wrote:
...
Thanks for this thorough explanation, Sebastian!
I fear that because of my own inexperience with the subtleties of
type
promotion, there were a few details I didn't understand:
Well, I don't know typing well.  And could use a bit more formalism
about how to describe all of this...
...
We have a "common DType".  For NumPy, this should be based on simple
...
binary overload stubs:
    * UInt8 + Int8 -> NotImplemented  (well, no overload I guess)
    * Int8 + UInt8 -> Int16
If I'm understanding correctly, 'common DType' is a type operator,
and is
represented by '+' in these two lines?
How come these aren't symmetric - how come Uint8 + Int8 isn't also
Int16?
Sorry, I was conflating things.  The asymmetry that I mean is really
the same as Python `float + int`, where you have:

    float.__add__(int) -> int
    float.__radd__(int) -> int

But `int.__add__(float)` is undefined.  In this case `Int8` is the one
that knows about `UInt8`, just like float knows about int and not the
other way around.

DETOUR: unfortunately (or fortunately), that information turns out to
be somewhat useful to retain backwards compatibility in NumPy by
resolving that:

    common(float16, int16, uint16) -> float16
    common(int16, uint16, float16) -> float16

even though:

    common(float16, common(int16, uint16))
        -> common(float16, int32)
        -> float32

but these are the gory, mind boggling inanities :(.  That maybe you can
get away with ignoring, at least for now.
Basically, NumPy tried to be smart about it in the past, and I didn't
want to break it...
...
2. For math functions NumPy has [2]:
...
* A list of DType signatures that is supported
* An additional promotion for mixed type-signatures or as a
fallback
  (these also have a signature themselves)
So the "list of DType signatures" is essentially a list of return
types
given homogeneous argument types? Like this - literally just a list
of
signatures in the codebase somewhere, *not* auto-generated using
logic?
Yes, although they are not required to be homogeneous.  That is the
case that covers the vast majority, though.  E.g. SciPy has special
math functions that have both float and int inputs, or real and
complex.
...
subtract(uint8, uint8) → uint8
...
subtract(datetime, datetime) → timedelta
...
And the "additional promotion for mixed type-signatures" is for cases
where
the argument types are heterogeneous - and these are *not* just a
list of
signatures in the code (because it would be massive), but some logic
that
determines on the fly what the result type should be?
Yes (where heterogeneous is not necessarily the only possibility).
...
And uses that for the result.  The ugly part is, that in some cases
we
...
probably need to check, e.g. whether:
     common + common -> common
Is '+' here an actual addition or the 'common DType' operator? Could
you
give an example of a case where this is true?
In that case I actually meant the original addition.

A (probably bad) example is that `remainder()` is not defined for
complex types, but resolving:

    remainder(float64, complex128)

using promotion might end up with:

    remainder(complex128, complex128) -> complex128

Of course that is easy to solve, if you define:

    # A union, or something more explicit?
    Real = Union[int8, ..., float32, float64]

    @overload
    def remainder(DT1 : Real, DT2 : Real) -> CommonDType[DT1, DT2]

And we define up-front that `complex` can't work, by typing it as real
only.

This alone should cover all but the dark or dusty corners!

At runtime, it seemed easier to just offer a blanket "try the promotion
and see if it finds something valid", rather than attempting to limit
it to known valid cases.
Also some promotions might clearly be correct, but lacks an
implementation...

Maybe the only useful part :).  In terms of typing, I think if you
wanted to cover even that with the same approach, you would get:

    @overload
    def remainder(DT1 : Float32, DT2 : Float32) -> Float32
    @overload
    ...

    @overload  # or @promotion?
    def remainder(DT1 : Any, DT2 : Any)  \
            -> remainder(CommonDType[DT1, DT2], CommonDType[DT1, DT2])

which is a re-entrant check, i.e. the "promotion".  I don't know typing
well enough whether promotion is something that fits in even remotely.

Cheers,

Sebastian
...
Cheers,
Matthew
On Thu, 18 Nov 2021 at 17:46, Sebastian Berg 
wrote:
...
On Thu, 2021-11-18 at 07:53 -0800, Guido van Rossum wrote:
...
Calling out to Python during checking is not an option. But maybe
we
could
extract a table with all the possibilities from the runtime at
build
/
install time?
It’s disturbing that the rules are different for different
libraries
though. That will make this much harder.
Ah, fair enough.  Maybe it is simply OK to duplicate most of this.
As Stephan Hoyer pointed out, you can obviously start by cutting
corners and focus only on the clear-cut Numerical cases which
should
have no surprises.
Sorry for the long mail, and risking to repeat some things and a
lack
of being deep into typing...
I will try to outline what from my perspective you would need to
cover
(almost?) everything in NumPy in a way that matches well enough
with
what we got.  I expect the worst part here is the "promoter".
Note that this is a bit angled at a "complete" solution, that may
just
be utopic for the fact that users could extend NumPy DTypes.
1. We have a "common DType".  For NumPy, this should be based on
simple
binary overload stubs:
    * UInt8 + Int8 -> NotImplemented  (well, no overload I guess)
    * Int8 + UInt8 -> Int16
The elephant in the room is that for NumPy (or JAX, ...), "common
DType" is not actually a binary operation (it is not associative
when
called on more DTypes).  So in theory, you need trickier logic. [1]
Of course you should probably just cut corners here (at least for
now).
I see two ways to do that:
* Refuse typing of any rules that might end up non-associative
  (This mainly is anything mixing unsigned and signed integers,
  in a sense included in what Stephan Hoyer suggested.)
* Limit common DType two DTypes (repeated DTypes are OK of course).
and both seem reasonable and will probably cover the majority of
use
cases.
2. For math functions NumPy has [2]:
* A list of DType signatures that is supported
* An additional promotion for mixed type-signatures or as a
fallback
  (these also have a signature themselves)
Exporting the list of both should be OK (users could extend it!).
Further, typing what the promotion does should also be fine in
practice.
Now, when a user has `uint8 + int8`, we have no type signature for
it
but we do find the "promoter" (based on an abstract type hierarchy
if
push comes to shove).  And it finds:
     common = CommonDType[UInt8, Int8]
And uses that for the result.  The ugly part is, that in some cases
we
probably need to check, e.g. whether:
     common + common -> common
exists.  How much "logic" could/would such a "promoter" have access
too?  Is that even possible or am I just summoning more dragons?
Cutting corners?  I think limiting this to only ever trying
`CommonDType` and nothing else would cover the vast majority of
cases.
But this is the only "idea" I have with respect to not listing an
infinite amount of combinations :(.
Cheers,
Sebastian
[1] JAX solves this by brute-forcing and caching, IIRC.  In NumPy I
resolve (most of) the weirdness by inspecting how the binary
operation
has non-commutative overloads.
(The idea is that, `float + int -> float`, but `int + float ->
undefined`, so "float" is, in a sense, more important.)
So it is very annoying/ugly logic, but it is spelled out...
[2] I probably have _some_ wiggle room to change this, but its
complicated... and I am not sure there is much wiggle room.
But, I am not smart enough to find an easier solution that is
backwards
and forwards compatible and extensible...
...
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig <
typing-sig@python.org> wrote:
...
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into
existing
...
Python exposed functionality to implement these operations? 
Yes,
that
could even have harmless "side effects" very occasionally. 
And
it
would mean using the library to type the library...
...
But to me it seems daunting to duplicate potentially very
complex
logic
that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic
is
not
ideal.
Enabling the type checker to call out to existing Python code
is
definitely an option we should consider, but I suspect there be
dragons.
Pradeep - as someone who actually knows how type checkers work,
how
viable
do you think it would be?
NumPy promotion is potentially more complex than the above
common-
DType
...
operation.
Oh, gosh, right, I'd forgotten about all the other types the
NumPy
can
interact with. I guess this is mainly an argument against the
'brute force'
solutions 1 and 2? (And I guess extra weight towards the point
that
duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg

wrote:
...
Hey all,
...
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
   - Option 1: One overload for each possible combination
of
dtypes
   - Option 2: One overload for each result dtype
   - Option 3: Don't handle type promotion
   - Option 4: Use a dtype class hierarchy and exploit
Union
type
operator
   behaviour
   - Option 5: Propose a type promotion operator
   - Option 6: Propose a 'nearest common parent' operator
   - Option 7: Propose a type lookup table operator
   - Consensus during discussion was that since it looks
like a
new
type
   operator *would* be required, we should probably hold
off on
dealing
   with this until the community shows a strong desire for
this
feature, and
   in the meantime just not handle data type promotion in
stubs.
Sorry for missing the discussion, coming over from NumPy to
here.
Hopefully, my terminology matches yours pretty well and the
below
is
helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes
are
already typed.  In the sense that a library doing "promotion"
should
have the logic for it explicitly spelled out somewhere.
The library needs to find the right C loop and result dtype
before it
can do the actual operation, so this is a fairly distinct
step
[1].
That seems very unlike typical Python where you do not have a
distinct
"promotion" implementation.
My question is:  Is there any chance you could call back into
existing
Python exposed functionality to implement these operations? 
Yes,
that
could even have harmless "side effects" very occasionally. 
And
it
would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very
complex
logic
that, at least for NumPy, is often available at runtime.
About "nearest common parent"
-----------------------------
What I/we now use in NumPy is a `__common_dtype__` binary
operator
(only internal/available in C right now).  There are a few
things
to
note about it though:
* It is not always commutative/associative in NumPy (I do
some
extra
  stuff to hide that sometimes)
* "Common dtype" is not always equivalent to "promotion" in
math
  functions.  (See below.)
So, effectively NumPy has this.  And it is based on a binary
classmethod.
This would probably solve the majority of annotations, but it
would be
nice to not have to implement the logic twice?
About "promotion" in functions
------------------------------
NumPy promotion is potentially more complex than the above
common-DType
operation.  Although I could provide API like:
    np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated
corners:
    np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta
    np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier...  But the point is,
that
there is a big potential complexity.  (I.e. also math
functions
that
have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have
mild
side-
effects (for NumPy)
About "value based" logic
-------------------------
In the discussion it was mentioned that NumPy's "value based"
logic.
`1000` might be considered an `int16` or `uint16` but not an
`int8`.
The one good part about this:  We would _really_ like to get
rid
of
that in NumPy.  Although that will still leave you with
special
promotion/common-dtype rules when a Python
integer/float/complex
is
involved.  (Just the value should not matter, except that a
bad
value
could lead to an error, e.g. if the Python integer is too
large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy.  I suppose some
libraries
may not promote explicitly, but have `ndarray[Int8]` be an
actual
type.
But probably such a library could auto-generating a table of
overloads?
[2] I am intentionally not using `np.result_type`, because
NumPy
dtypes
can be parametric, and `np.result_type("S3", "S4")` is "S4",
but
you
are primarily interested in `String, String -> String`.
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
...
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: guido@python.org
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: sebastian@sipsolutions.net
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: sebastian@sipsolutions.net