Hi all,
Our next tensor typing meeting will be next week, *Monday the 15th of November at 10am San Francisco time / 6pm London time*, at http://meet.google.com/fft-dzjq-ksu.
The main thing on our agenda will be a *discussion of how to handle dtype promotion in stubs*. That is, how can we handle the fact that a tensor with dtype uint8 + a tensor with dtype int8 = a tensor with dtype int16? I'll do a short presentation with my current understanding of what the options are, then we can discuss the pros and cons of each approach. (Having said that, if anyone else has already worked out a clean solution to this, please let me know!)
If anyone has anything else they'd like to talk about, do give me a shout.
See you then! Matthew
Reminder that we'll be meeting today at *10am San Francisco time / 6pm London time* at http://meet.google.com/fft-dzjq-ksu.
Our agenda for this month consists of a discussion of how to handle data type promotion in stubs. I'll present the 7 possible solutions I've been able to think of, and then we'll discuss the pros and cons of each/whether there's something better we've overlooked.
See you then!
On Mon, 8 Nov 2021 at 11:27, Matthew Rahtz mrahtz@google.com wrote:
Hi all,
Our next tensor typing meeting will be next week, *Monday the 15th of November at 10am San Francisco time / 6pm London time*, at http://meet.google.com/fft-dzjq-ksu.
The main thing on our agenda will be a *discussion of how to handle dtype promotion in stubs*. That is, how can we handle the fact that a tensor with dtype uint8 + a tensor with dtype int8 = a tensor with dtype int16? I'll do a short presentation with my current understanding of what the options are, then we can discuss the pros and cons of each approach. (Having said that, if anyone else has already worked out a clean solution to this, please let me know!)
If anyone has anything else they'd like to talk about, do give me a shout.
See you then! Matthew
Thanks to everyone for coming! Recording is here https://drive.google.com/file/d/1ywE7sHM--9QEZ_LhcALFBVF_nsO601NB/view?usp=sharing, and the slides are at Dealing with DTypes https://docs.google.com/presentation/d/14hHWlpMOuijNEhF7fwvANa8Ainu4R2LtBt3WO3df-ww/edit?usp=sharing&resourcekey=0-oLrZPOXcAXvGvNaEOQahJQ. Details of this and previous tensor typing meetings are available here https://docs.google.com/document/d/1oaG0V2ZE5BRDjd9N-Tr1N0IKGwZQcraIlZ0N8ayqVg8/edit# .
Our next tensor typing meeting will tentatively be on *Monday the 13th of December*. See you then!
A quick summary of the talk is below.
- How should we handle data type promotion in stubs? - Option 1: One overload for each possible combination of dtypes - Option 2: One overload for each result dtype - Option 3: Don't handle type promotion - Option 4: Use a dtype class hierarchy and exploit Union type operator behaviour - Option 5: Propose a type promotion operator - Option 6: Propose a 'nearest common parent' operator - Option 7: Propose a type lookup table operator - Consensus during discussion was that since it looks like a new type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
On Mon, 15 Nov 2021 at 11:37, Matthew Rahtz mrahtz@google.com wrote:
Reminder that we'll be meeting today at *10am San Francisco time / 6pm London time* at http://meet.google.com/fft-dzjq-ksu.
Our agenda for this month consists of a discussion of how to handle data type promotion in stubs. I'll present the 7 possible solutions I've been able to think of, and then we'll discuss the pros and cons of each/whether there's something better we've overlooked.
See you then!
On Mon, 8 Nov 2021 at 11:27, Matthew Rahtz mrahtz@google.com wrote:
Hi all,
Our next tensor typing meeting will be next week, *Monday the 15th of November at 10am San Francisco time / 6pm London time*, at http://meet.google.com/fft-dzjq-ksu.
The main thing on our agenda will be a *discussion of how to handle dtype promotion in stubs*. That is, how can we handle the fact that a tensor with dtype uint8 + a tensor with dtype int8 = a tensor with dtype int16? I'll do a short presentation with my current understanding of what the options are, then we can discuss the pros and cons of each approach. (Having said that, if anyone else has already worked out a clean solution to this, please let me know!)
If anyone has anything else they'd like to talk about, do give me a shout.
See you then! Matthew
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of dtypes
- Option 2: One overload for each result dtype
- Option 3: Don't handle type promotion
- Option 4: Use a dtype class hierarchy and exploit Union type
operator behaviour
- Option 5: Propose a type promotion operator
- Option 6: Propose a 'nearest common parent' operator
- Option 7: Propose a type lookup table operator
- Consensus during discussion was that since it looks like a new
type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent" -----------------------------
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
* It is not always commutative/associative in NumPy (I do some extra stuff to hide that sometimes) * "Common dtype" is not always equivalent to "promotion" in math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions ------------------------------
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta
np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic -------------------------
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`.
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common-DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of dtypes
- Option 2: One overload for each result dtype
- Option 3: Don't handle type promotion
- Option 4: Use a dtype class hierarchy and exploit Union type
operator behaviour
- Option 5: Propose a type promotion operator
- Option 6: Propose a 'nearest common parent' operator
- Option 7: Propose a type lookup table operator
- Consensus during discussion was that since it looks like a new
type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do some extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common-DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of dtypes
- Option 2: One overload for each result dtype
- Option 3: Don't handle type promotion
- Option 4: Use a dtype class hierarchy and exploit Union type
operator behaviour
- Option 5: Propose a type promotion operator
- Option 6: Propose a 'nearest common parent' operator
- Option 7: Propose a type lookup table operator
- Consensus during discussion was that since it looks like a new
type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do some extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
There has been a recent community effort to standarize different NumPy-like array APIs in Python. The edge cases of dtype promotion are still pretty heterogeneous, but a subset of type promotion cases (mostly dtypes within the same family, like promotion between different float types) have shared semantics across all libraries: https://data-apis.org/array-api/latest/API_specification/type_promotion.html
These are the cases that I think would make sense for Python type checkers.
On Thu, Nov 18, 2021 at 7:54 AM Guido van Rossum guido@python.org wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common-DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of dtypes
- Option 2: One overload for each result dtype
- Option 3: Don't handle type promotion
- Option 4: Use a dtype class hierarchy and exploit Union type
operator behaviour
- Option 5: Propose a type promotion operator
- Option 6: Propose a 'nearest common parent' operator
- Option 7: Propose a type lookup table operator
- Consensus during discussion was that since it looks like a new
type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do some extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: guido@python.org
--
--Guido (mobile) _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: shoyer@gmail.com
https://data-apis.org/array-api/latest/API_specification/type_promotion.html
Oh, super nice - I hadn't seen these!
On Thu, 18 Nov 2021 at 16:58, Stephan Hoyer shoyer@gmail.com wrote:
There has been a recent community effort to standarize different NumPy-like array APIs in Python. The edge cases of dtype promotion are still pretty heterogeneous, but a subset of type promotion cases (mostly dtypes within the same family, like promotion between different float types) have shared semantics across all libraries:
https://data-apis.org/array-api/latest/API_specification/type_promotion.html
These are the cases that I think would make sense for Python type checkers.
On Thu, Nov 18, 2021 at 7:54 AM Guido van Rossum guido@python.org wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common-DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of dtypes
- Option 2: One overload for each result dtype
- Option 3: Don't handle type promotion
- Option 4: Use a dtype class hierarchy and exploit Union type
operator behaviour
- Option 5: Propose a type promotion operator
- Option 6: Propose a 'nearest common parent' operator
- Option 7: Propose a type lookup table operator
- Consensus during discussion was that since it looks like a new
type operator *would* be required, we should probably hold off on
dealing
with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do some extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: guido@python.org
--
--Guido (mobile) _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: shoyer@gmail.com
On Thu, 2021-11-18 at 07:53 -0800, Guido van Rossum wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
Ah, fair enough. Maybe it is simply OK to duplicate most of this.
As Stephan Hoyer pointed out, you can obviously start by cutting corners and focus only on the clear-cut Numerical cases which should have no surprises.
Sorry for the long mail, and risking to repeat some things and a lack of being deep into typing...
I will try to outline what from my perspective you would need to cover (almost?) everything in NumPy in a way that matches well enough with what we got. I expect the worst part here is the "promoter".
Note that this is a bit angled at a "complete" solution, that may just be utopic for the fact that users could extend NumPy DTypes.
1. We have a "common DType". For NumPy, this should be based on simple binary overload stubs:
* UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
The elephant in the room is that for NumPy (or JAX, ...), "common DType" is not actually a binary operation (it is not associative when called on more DTypes). So in theory, you need trickier logic. [1]
Of course you should probably just cut corners here (at least for now). I see two ways to do that:
* Refuse typing of any rules that might end up non-associative (This mainly is anything mixing unsigned and signed integers, in a sense included in what Stephan Hoyer suggested.) * Limit common DType two DTypes (repeated DTypes are OK of course).
and both seem reasonable and will probably cover the majority of use cases.
2. For math functions NumPy has [2]:
* A list of DType signatures that is supported * An additional promotion for mixed type-signatures or as a fallback (these also have a signature themselves)
Exporting the list of both should be OK (users could extend it!).
Further, typing what the promotion does should also be fine in practice.
Now, when a user has `uint8 + int8`, we have no type signature for it but we do find the "promoter" (based on an abstract type hierarchy if push comes to shove). And it finds:
common = CommonDType[UInt8, Int8]
And uses that for the result. The ugly part is, that in some cases we probably need to check, e.g. whether:
common + common -> common
exists. How much "logic" could/would such a "promoter" have access too? Is that even possible or am I just summoning more dragons?
Cutting corners? I think limiting this to only ever trying `CommonDType` and nothing else would cover the vast majority of cases. But this is the only "idea" I have with respect to not listing an infinite amount of combinations :(.
Cheers,
Sebastian
[1] JAX solves this by brute-forcing and caching, IIRC. In NumPy I resolve (most of) the weirdness by inspecting how the binary operation has non-commutative overloads. (The idea is that, `float + int -> float`, but `int + float -> undefined`, so "float" is, in a sense, more important.)
So it is very annoying/ugly logic, but it is spelled out...
[2] I probably have _some_ wiggle room to change this, but its complicated... and I am not sure there is much wiggle room. But, I am not smart enough to find an easier solution that is backwards and forwards compatible and extensible...
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common- DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of dtypes - Option 2: One overload for each result dtype - Option 3: Don't handle type promotion - Option 4: Use a dtype class hierarchy and exploit Union type operator behaviour - Option 5: Propose a type promotion operator - Option 6: Propose a 'nearest common parent' operator - Option 7: Propose a type lookup table operator - Consensus during discussion was that since it looks like a new type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do some
extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in math
functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta
np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Thanks for this thorough explanation, Sebastian!
I fear that because of my own inexperience with the subtleties of type promotion, there were a few details I didn't understand:
We have a "common DType". For NumPy, this should be based on simple
binary overload stubs: * UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
If I'm understanding correctly, 'common DType' is a type operator, and is represented by '+' in these two lines?
How come these aren't symmetric - how come Uint8 + Int8 isn't also Int16?
2. For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a fallback (these also have a signature themselves)
So the "list of DType signatures" is essentially a list of return types given homogeneous argument types? Like this - literally just a list of signatures in the codebase somewhere, *not* auto-generated using logic?
subtract(uint8, uint8) → uint8 ... subtract(datetime, datetime) → timedelta ...
And the "additional promotion for mixed type-signatures" is for cases where the argument types are heterogeneous - and these are *not* just a list of signatures in the code (because it would be massive), but some logic that determines on the fly what the result type should be?
And uses that for the result. The ugly part is, that in some cases we
probably need to check, e.g. whether: common + common -> common
Is '+' here an actual addition or the 'common DType' operator? Could you give an example of a case where this is true?
Cheers, Matthew
On Thu, 18 Nov 2021 at 17:46, Sebastian Berg seberg@berkeley.edu wrote:
On Thu, 2021-11-18 at 07:53 -0800, Guido van Rossum wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
Ah, fair enough. Maybe it is simply OK to duplicate most of this.
As Stephan Hoyer pointed out, you can obviously start by cutting corners and focus only on the clear-cut Numerical cases which should have no surprises.
Sorry for the long mail, and risking to repeat some things and a lack of being deep into typing...
I will try to outline what from my perspective you would need to cover (almost?) everything in NumPy in a way that matches well enough with what we got. I expect the worst part here is the "promoter".
Note that this is a bit angled at a "complete" solution, that may just be utopic for the fact that users could extend NumPy DTypes.
- We have a "common DType". For NumPy, this should be based on simple
binary overload stubs:
* UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
The elephant in the room is that for NumPy (or JAX, ...), "common DType" is not actually a binary operation (it is not associative when called on more DTypes). So in theory, you need trickier logic. [1]
Of course you should probably just cut corners here (at least for now). I see two ways to do that:
- Refuse typing of any rules that might end up non-associative (This mainly is anything mixing unsigned and signed integers, in a sense included in what Stephan Hoyer suggested.)
- Limit common DType two DTypes (repeated DTypes are OK of course).
and both seem reasonable and will probably cover the majority of use cases.
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a fallback (these also have a signature themselves)
Exporting the list of both should be OK (users could extend it!).
Further, typing what the promotion does should also be fine in practice.
Now, when a user has `uint8 + int8`, we have no type signature for it but we do find the "promoter" (based on an abstract type hierarchy if push comes to shove). And it finds:
common = CommonDType[UInt8, Int8]
And uses that for the result. The ugly part is, that in some cases we probably need to check, e.g. whether:
common + common -> common
exists. How much "logic" could/would such a "promoter" have access too? Is that even possible or am I just summoning more dragons?
Cutting corners? I think limiting this to only ever trying `CommonDType` and nothing else would cover the vast majority of cases. But this is the only "idea" I have with respect to not listing an infinite amount of combinations :(.
Cheers,
Sebastian
[1] JAX solves this by brute-forcing and caching, IIRC. In NumPy I resolve (most of) the weirdness by inspecting how the binary operation has non-commutative overloads. (The idea is that, `float + int -> float`, but `int + float -> undefined`, so "float" is, in a sense, more important.)
So it is very annoying/ugly logic, but it is spelled out...
[2] I probably have _some_ wiggle room to change this, but its complicated... and I am not sure there is much wiggle room. But, I am not smart enough to find an easier solution that is backwards and forwards compatible and extensible...
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common- DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of
dtypes
- Option 2: One overload for each result dtype
- Option 3: Don't handle type promotion
- Option 4: Use a dtype class hierarchy and exploit Union
type operator behaviour
- Option 5: Propose a type promotion operator
- Option 6: Propose a 'nearest common parent' operator
- Option 7: Propose a type lookup table operator
- Consensus during discussion was that since it looks like a
new type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do some
extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
On Fri, 2021-11-19 at 10:59 +0000, Matthew Rahtz via Typing-sig wrote:
Thanks for this thorough explanation, Sebastian!
I fear that because of my own inexperience with the subtleties of type promotion, there were a few details I didn't understand:
Well, I don't know typing well. And could use a bit more formalism about how to describe all of this...
We have a "common DType". For NumPy, this should be based on simple
binary overload stubs: * UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
If I'm understanding correctly, 'common DType' is a type operator, and is represented by '+' in these two lines?
How come these aren't symmetric - how come Uint8 + Int8 isn't also Int16?
Sorry, I was conflating things. The asymmetry that I mean is really the same as Python `float + int`, where you have:
float.__add__(int) -> int float.__radd__(int) -> int
But `int.__add__(float)` is undefined. In this case `Int8` is the one that knows about `UInt8`, just like float knows about int and not the other way around.
DETOUR: unfortunately (or fortunately), that information turns out to be somewhat useful to retain backwards compatibility in NumPy by resolving that:
common(float16, int16, uint16) -> float16 common(int16, uint16, float16) -> float16
even though:
common(float16, common(int16, uint16)) -> common(float16, int32) -> float32
but these are the gory, mind boggling inanities :(. That maybe you can get away with ignoring, at least for now. Basically, NumPy tried to be smart about it in the past, and I didn't want to break it...
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
So the "list of DType signatures" is essentially a list of return types given homogeneous argument types? Like this - literally just a list of signatures in the codebase somewhere, *not* auto-generated using logic?
Yes, although they are not required to be homogeneous. That is the case that covers the vast majority, though. E.g. SciPy has special math functions that have both float and int inputs, or real and complex.
subtract(uint8, uint8) → uint8 ... subtract(datetime, datetime) → timedelta ...
And the "additional promotion for mixed type-signatures" is for cases where the argument types are heterogeneous - and these are *not* just a list of signatures in the code (because it would be massive), but some logic that determines on the fly what the result type should be?
Yes (where heterogeneous is not necessarily the only possibility).
And uses that for the result. The ugly part is, that in some cases we
probably need to check, e.g. whether: common + common -> common
Is '+' here an actual addition or the 'common DType' operator? Could you give an example of a case where this is true?
In that case I actually meant the original addition.
A (probably bad) example is that `remainder()` is not defined for complex types, but resolving:
remainder(float64, complex128)
using promotion might end up with:
remainder(complex128, complex128) -> complex128
Of course that is easy to solve, if you define:
# A union, or something more explicit? Real = Union[int8, ..., float32, float64]
@overload def remainder(DT1 : Real, DT2 : Real) -> CommonDType[DT1, DT2]
And we define up-front that `complex` can't work, by typing it as real only.
This alone should cover all but the dark or dusty corners!
At runtime, it seemed easier to just offer a blanket "try the promotion and see if it finds something valid", rather than attempting to limit it to known valid cases. Also some promotions might clearly be correct, but lacks an implementation...
Maybe the only useful part :). In terms of typing, I think if you wanted to cover even that with the same approach, you would get:
@overload def remainder(DT1 : Float32, DT2 : Float32) -> Float32 @overload ...
@overload # or @promotion? def remainder(DT1 : Any, DT2 : Any) \ -> remainder(CommonDType[DT1, DT2], CommonDType[DT1, DT2])
which is a re-entrant check, i.e. the "promotion". I don't know typing well enough whether promotion is something that fits in even remotely.
Cheers,
Sebastian
Cheers, Matthew
On Thu, 18 Nov 2021 at 17:46, Sebastian Berg seberg@berkeley.edu wrote:
On Thu, 2021-11-18 at 07:53 -0800, Guido van Rossum wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
Ah, fair enough. Maybe it is simply OK to duplicate most of this.
As Stephan Hoyer pointed out, you can obviously start by cutting corners and focus only on the clear-cut Numerical cases which should have no surprises.
Sorry for the long mail, and risking to repeat some things and a lack of being deep into typing...
I will try to outline what from my perspective you would need to cover (almost?) everything in NumPy in a way that matches well enough with what we got. I expect the worst part here is the "promoter".
Note that this is a bit angled at a "complete" solution, that may just be utopic for the fact that users could extend NumPy DTypes.
- We have a "common DType". For NumPy, this should be based on
simple binary overload stubs:
* UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
The elephant in the room is that for NumPy (or JAX, ...), "common DType" is not actually a binary operation (it is not associative when called on more DTypes). So in theory, you need trickier logic. [1]
Of course you should probably just cut corners here (at least for now). I see two ways to do that:
- Refuse typing of any rules that might end up non-associative
(This mainly is anything mixing unsigned and signed integers, in a sense included in what Stephan Hoyer suggested.)
- Limit common DType two DTypes (repeated DTypes are OK of course).
and both seem reasonable and will probably cover the majority of use cases.
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
Exporting the list of both should be OK (users could extend it!).
Further, typing what the promotion does should also be fine in practice.
Now, when a user has `uint8 + int8`, we have no type signature for it but we do find the "promoter" (based on an abstract type hierarchy if push comes to shove). And it finds:
common = CommonDType[UInt8, Int8]
And uses that for the result. The ugly part is, that in some cases we probably need to check, e.g. whether:
common + common -> common
exists. How much "logic" could/would such a "promoter" have access too? Is that even possible or am I just summoning more dragons?
Cutting corners? I think limiting this to only ever trying `CommonDType` and nothing else would cover the vast majority of cases. But this is the only "idea" I have with respect to not listing an infinite amount of combinations :(.
Cheers,
Sebastian
[1] JAX solves this by brute-forcing and caching, IIRC. In NumPy I resolve (most of) the weirdness by inspecting how the binary operation has non-commutative overloads. (The idea is that, `float + int -> float`, but `int + float -> undefined`, so "float" is, in a sense, more important.)
So it is very annoying/ugly logic, but it is spelled out...
[2] I probably have _some_ wiggle room to change this, but its complicated... and I am not sure there is much wiggle room. But, I am not smart enough to find an easier solution that is backwards and forwards compatible and extensible...
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common- DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
A quick summary of the talk is below.
- How should we handle data type promotion in stubs?
- Option 1: One overload for each possible combination of dtypes - Option 2: One overload for each result dtype - Option 3: Don't handle type promotion - Option 4: Use a dtype class hierarchy and exploit Union type operator behaviour - Option 5: Propose a type promotion operator - Option 6: Propose a 'nearest common parent' operator - Option 7: Propose a type lookup table operator - Consensus during discussion was that since it looks like a new type operator *would* be required, we should probably hold off on dealing with this until the community shows a strong desire for this feature, and in the meantime just not handle data type promotion in stubs.
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do
some extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in
math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta
np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
I feel like I might be missing some key piece of context - I'm still confused about this:
Sorry, I was conflating things. The asymmetry that I mean is really the same as Python `float + int`, where you have:
float.__add__(int) -> int float.__radd__(int) -> int
But `int.__add__(float)` is undefined. In this case `Int8` is the one that knows about `UInt8`, just like float knows about int and not the other way around.
In what sense is it undefined? `1 + 1.0` works at runtime, and the type seems to be inferred fine in pytype and mypy. What am I missing?
remainder(float64, complex128)
Interesting. So it sounds like the takeaway is that a lot of the edge cases are to do with complex types?
On Fri, 19 Nov 2021 at 16:01, Sebastian Berg seberg@berkeley.edu wrote:
On Fri, 2021-11-19 at 10:59 +0000, Matthew Rahtz via Typing-sig wrote:
Thanks for this thorough explanation, Sebastian!
I fear that because of my own inexperience with the subtleties of type promotion, there were a few details I didn't understand:
Well, I don't know typing well. And could use a bit more formalism about how to describe all of this...
We have a "common DType". For NumPy, this should be based on simple
binary overload stubs: * UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
If I'm understanding correctly, 'common DType' is a type operator, and is represented by '+' in these two lines?
How come these aren't symmetric - how come Uint8 + Int8 isn't also Int16?
Sorry, I was conflating things. The asymmetry that I mean is really the same as Python `float + int`, where you have:
float.__add__(int) -> int float.__radd__(int) -> int
But `int.__add__(float)` is undefined. In this case `Int8` is the one that knows about `UInt8`, just like float knows about int and not the other way around.
DETOUR: unfortunately (or fortunately), that information turns out to be somewhat useful to retain backwards compatibility in NumPy by resolving that:
common(float16, int16, uint16) -> float16 common(int16, uint16, float16) -> float16
even though:
common(float16, common(int16, uint16)) -> common(float16, int32) -> float32
but these are the gory, mind boggling inanities :(. That maybe you can get away with ignoring, at least for now. Basically, NumPy tried to be smart about it in the past, and I didn't want to break it...
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
So the "list of DType signatures" is essentially a list of return types given homogeneous argument types? Like this - literally just a list of signatures in the codebase somewhere, *not* auto-generated using logic?
Yes, although they are not required to be homogeneous. That is the case that covers the vast majority, though. E.g. SciPy has special math functions that have both float and int inputs, or real and complex.
subtract(uint8, uint8) → uint8 ... subtract(datetime, datetime) → timedelta ...
And the "additional promotion for mixed type-signatures" is for cases where the argument types are heterogeneous - and these are *not* just a list of signatures in the code (because it would be massive), but some logic that determines on the fly what the result type should be?
Yes (where heterogeneous is not necessarily the only possibility).
And uses that for the result. The ugly part is, that in some cases we
probably need to check, e.g. whether: common + common -> common
Is '+' here an actual addition or the 'common DType' operator? Could you give an example of a case where this is true?
In that case I actually meant the original addition.
A (probably bad) example is that `remainder()` is not defined for complex types, but resolving:
remainder(float64, complex128)
using promotion might end up with:
remainder(complex128, complex128) -> complex128
Of course that is easy to solve, if you define:
# A union, or something more explicit? Real = Union[int8, ..., float32, float64] @overload def remainder(DT1 : Real, DT2 : Real) -> CommonDType[DT1, DT2]
And we define up-front that `complex` can't work, by typing it as real only.
This alone should cover all but the dark or dusty corners!
At runtime, it seemed easier to just offer a blanket "try the promotion and see if it finds something valid", rather than attempting to limit it to known valid cases. Also some promotions might clearly be correct, but lacks an implementation...
Maybe the only useful part :). In terms of typing, I think if you wanted to cover even that with the same approach, you would get:
@overload def remainder(DT1 : Float32, DT2 : Float32) -> Float32 @overload ... @overload # or @promotion? def remainder(DT1 : Any, DT2 : Any) \ -> remainder(CommonDType[DT1, DT2], CommonDType[DT1, DT2])
which is a re-entrant check, i.e. the "promotion". I don't know typing well enough whether promotion is something that fits in even remotely.
Cheers,
Sebastian
Cheers, Matthew
On Thu, 18 Nov 2021 at 17:46, Sebastian Berg seberg@berkeley.edu wrote:
On Thu, 2021-11-18 at 07:53 -0800, Guido van Rossum wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
Ah, fair enough. Maybe it is simply OK to duplicate most of this.
As Stephan Hoyer pointed out, you can obviously start by cutting corners and focus only on the clear-cut Numerical cases which should have no surprises.
Sorry for the long mail, and risking to repeat some things and a lack of being deep into typing...
I will try to outline what from my perspective you would need to cover (almost?) everything in NumPy in a way that matches well enough with what we got. I expect the worst part here is the "promoter".
Note that this is a bit angled at a "complete" solution, that may just be utopic for the fact that users could extend NumPy DTypes.
- We have a "common DType". For NumPy, this should be based on
simple binary overload stubs:
* UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
The elephant in the room is that for NumPy (or JAX, ...), "common DType" is not actually a binary operation (it is not associative when called on more DTypes). So in theory, you need trickier logic. [1]
Of course you should probably just cut corners here (at least for now). I see two ways to do that:
- Refuse typing of any rules that might end up non-associative (This mainly is anything mixing unsigned and signed integers, in a sense included in what Stephan Hoyer suggested.)
- Limit common DType two DTypes (repeated DTypes are OK of course).
and both seem reasonable and will probably cover the majority of use cases.
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
Exporting the list of both should be OK (users could extend it!).
Further, typing what the promotion does should also be fine in practice.
Now, when a user has `uint8 + int8`, we have no type signature for it but we do find the "promoter" (based on an abstract type hierarchy if push comes to shove). And it finds:
common = CommonDType[UInt8, Int8]
And uses that for the result. The ugly part is, that in some cases we probably need to check, e.g. whether:
common + common -> common
exists. How much "logic" could/would such a "promoter" have access too? Is that even possible or am I just summoning more dragons?
Cutting corners? I think limiting this to only ever trying `CommonDType` and nothing else would cover the vast majority of cases. But this is the only "idea" I have with respect to not listing an infinite amount of combinations :(.
Cheers,
Sebastian
[1] JAX solves this by brute-forcing and caching, IIRC. In NumPy I resolve (most of) the weirdness by inspecting how the binary operation has non-commutative overloads. (The idea is that, `float + int -> float`, but `int + float -> undefined`, so "float" is, in a sense, more important.)
So it is very annoying/ugly logic, but it is spelled out...
[2] I probably have _some_ wiggle room to change this, but its complicated... and I am not sure there is much wiggle room. But, I am not smart enough to find an easier solution that is backwards and forwards compatible and extensible...
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing
Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common- DType
operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
Hey all,
> A quick summary of the talk is below. > > - How should we handle data type promotion in stubs? > - Option 1: One overload for each possible combination > of > dtypes > - Option 2: One overload for each result dtype > - Option 3: Don't handle type promotion > - Option 4: Use a dtype class hierarchy and exploit > Union > type > operator > behaviour > - Option 5: Propose a type promotion operator > - Option 6: Propose a 'nearest common parent' operator > - Option 7: Propose a type lookup table operator > - Consensus during discussion was that since it looks > like a > new > type > operator *would* be required, we should probably hold > off on > dealing > with this until the community shows a strong desire for > this > feature, and > in the meantime just not handle data type promotion in > stubs. >
Sorry for missing the discussion, coming over from NumPy to here. Hopefully, my terminology matches yours pretty well and the below is helpful or at least interesting :).
One I would like to note is that, unlike Python types, DTypes are already typed. In the sense that a library doing "promotion" should have the logic for it explicitly spelled out somewhere. The library needs to find the right C loop and result dtype before it can do the actual operation, so this is a fairly distinct step [1]. That seems very unlike typical Python where you do not have a distinct "promotion" implementation.
My question is: Is there any chance you could call back into existing Python exposed functionality to implement these operations? Yes, that could even have harmless "side effects" very occasionally. And it would mean using the library to type the library...
But to me it seems daunting to duplicate potentially very complex logic that, at least for NumPy, is often available at runtime.
About "nearest common parent"
What I/we now use in NumPy is a `__common_dtype__` binary operator (only internal/available in C right now). There are a few things to note about it though:
- It is not always commutative/associative in NumPy (I do
some extra stuff to hide that sometimes)
- "Common dtype" is not always equivalent to "promotion" in
math functions. (See below.)
So, effectively NumPy has this. And it is based on a binary classmethod. This would probably solve the majority of annotations, but it would be nice to not have to implement the logic twice?
About "promotion" in functions
NumPy promotion is potentially more complex than the above common-DType operation. Although I could provide API like:
np.add.resolve_dtypes(DType1, DType2) -> DTypeX
but the general rules are tricky, and there are complicated corners:
np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number
Maybe "Timedelta" is a terrible outlier... But the point is, that there is a big potential complexity. (I.e. also math functions that have mixed float and integer inputs and/or outputs.)
Right now, I can't guarantee that the above would not have mild side- effects (for NumPy)
About "value based" logic
In the discussion it was mentioned that NumPy's "value based" logic. `1000` might be considered an `int16` or `uint16` but not an `int8`.
The one good part about this: We would _really_ like to get rid of that in NumPy. Although that will still leave you with special promotion/common-dtype rules when a Python integer/float/complex is involved. (Just the value should not matter, except that a bad value could lead to an error, e.g. if the Python integer is too large.)
Cheers,
Sebastian
[1] OK, I may be extrapolating from NumPy. I suppose some libraries may not promote explicitly, but have `ndarray[Int8]` be an actual type. But probably such a library could auto-generating a table of overloads?
[2] I am intentionally not using `np.result_type`, because NumPy dtypes can be parametric, and `np.result_type("S3", "S4")` is "S4", but you are primarily interested in `String, String -> String`. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
The thing to know is that `x + y` first calls `x.__add__(y)` and if that returns a specific singleton, `NotImplemented`, it calls `y.__radd__(x)`. (There are some corner cases where `__radd__` is called first due to one class being a subclass of the other, and if they are both the same class `__radd__` may not be called at all.)
So there's a difference between `+` and `__add__`, and that's what's confusing people and complicating matters. But in the end it all works like your intuition tells you.
On Tue, Nov 23, 2021 at 4:11 AM Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
I feel like I might be missing some key piece of context - I'm still confused about this:
Sorry, I was conflating things. The asymmetry that I mean is really the same as Python `float + int`, where you have:
float.__add__(int) -> int float.__radd__(int) -> int
But `int.__add__(float)` is undefined. In this case `Int8` is the one that knows about `UInt8`, just like float knows about int and not the other way around.
In what sense is it undefined? `1 + 1.0` works at runtime, and the type seems to be inferred fine in pytype and mypy. What am I missing?
remainder(float64, complex128)
Interesting. So it sounds like the takeaway is that a lot of the edge cases are to do with complex types?
On Fri, 19 Nov 2021 at 16:01, Sebastian Berg seberg@berkeley.edu wrote:
On Fri, 2021-11-19 at 10:59 +0000, Matthew Rahtz via Typing-sig wrote:
Thanks for this thorough explanation, Sebastian!
I fear that because of my own inexperience with the subtleties of type promotion, there were a few details I didn't understand:
Well, I don't know typing well. And could use a bit more formalism about how to describe all of this...
We have a "common DType". For NumPy, this should be based on simple
binary overload stubs: * UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
If I'm understanding correctly, 'common DType' is a type operator, and is represented by '+' in these two lines?
How come these aren't symmetric - how come Uint8 + Int8 isn't also Int16?
Sorry, I was conflating things. The asymmetry that I mean is really the same as Python `float + int`, where you have:
float.__add__(int) -> int float.__radd__(int) -> int
But `int.__add__(float)` is undefined. In this case `Int8` is the one that knows about `UInt8`, just like float knows about int and not the other way around.
DETOUR: unfortunately (or fortunately), that information turns out to be somewhat useful to retain backwards compatibility in NumPy by resolving that:
common(float16, int16, uint16) -> float16 common(int16, uint16, float16) -> float16
even though:
common(float16, common(int16, uint16)) -> common(float16, int32) -> float32
but these are the gory, mind boggling inanities :(. That maybe you can get away with ignoring, at least for now. Basically, NumPy tried to be smart about it in the past, and I didn't want to break it...
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
So the "list of DType signatures" is essentially a list of return types given homogeneous argument types? Like this - literally just a list of signatures in the codebase somewhere, *not* auto-generated using logic?
Yes, although they are not required to be homogeneous. That is the case that covers the vast majority, though. E.g. SciPy has special math functions that have both float and int inputs, or real and complex.
subtract(uint8, uint8) → uint8 ... subtract(datetime, datetime) → timedelta ...
And the "additional promotion for mixed type-signatures" is for cases where the argument types are heterogeneous - and these are *not* just a list of signatures in the code (because it would be massive), but some logic that determines on the fly what the result type should be?
Yes (where heterogeneous is not necessarily the only possibility).
And uses that for the result. The ugly part is, that in some cases we
probably need to check, e.g. whether: common + common -> common
Is '+' here an actual addition or the 'common DType' operator? Could you give an example of a case where this is true?
In that case I actually meant the original addition.
A (probably bad) example is that `remainder()` is not defined for complex types, but resolving:
remainder(float64, complex128)
using promotion might end up with:
remainder(complex128, complex128) -> complex128
Of course that is easy to solve, if you define:
# A union, or something more explicit? Real = Union[int8, ..., float32, float64] @overload def remainder(DT1 : Real, DT2 : Real) -> CommonDType[DT1, DT2]
And we define up-front that `complex` can't work, by typing it as real only.
This alone should cover all but the dark or dusty corners!
At runtime, it seemed easier to just offer a blanket "try the promotion and see if it finds something valid", rather than attempting to limit it to known valid cases. Also some promotions might clearly be correct, but lacks an implementation...
Maybe the only useful part :). In terms of typing, I think if you wanted to cover even that with the same approach, you would get:
@overload def remainder(DT1 : Float32, DT2 : Float32) -> Float32 @overload ... @overload # or @promotion? def remainder(DT1 : Any, DT2 : Any) \ -> remainder(CommonDType[DT1, DT2], CommonDType[DT1, DT2])
which is a re-entrant check, i.e. the "promotion". I don't know typing well enough whether promotion is something that fits in even remotely.
Cheers,
Sebastian
Cheers, Matthew
On Thu, 18 Nov 2021 at 17:46, Sebastian Berg seberg@berkeley.edu wrote:
On Thu, 2021-11-18 at 07:53 -0800, Guido van Rossum wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
Ah, fair enough. Maybe it is simply OK to duplicate most of this.
As Stephan Hoyer pointed out, you can obviously start by cutting corners and focus only on the clear-cut Numerical cases which should have no surprises.
Sorry for the long mail, and risking to repeat some things and a lack of being deep into typing...
I will try to outline what from my perspective you would need to cover (almost?) everything in NumPy in a way that matches well enough with what we got. I expect the worst part here is the "promoter".
Note that this is a bit angled at a "complete" solution, that may just be utopic for the fact that users could extend NumPy DTypes.
- We have a "common DType". For NumPy, this should be based on
simple binary overload stubs:
* UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
The elephant in the room is that for NumPy (or JAX, ...), "common DType" is not actually a binary operation (it is not associative when called on more DTypes). So in theory, you need trickier logic. [1]
Of course you should probably just cut corners here (at least for now). I see two ways to do that:
- Refuse typing of any rules that might end up non-associative (This mainly is anything mixing unsigned and signed integers, in a sense included in what Stephan Hoyer suggested.)
- Limit common DType two DTypes (repeated DTypes are OK of course).
and both seem reasonable and will probably cover the majority of use cases.
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
Exporting the list of both should be OK (users could extend it!).
Further, typing what the promotion does should also be fine in practice.
Now, when a user has `uint8 + int8`, we have no type signature for it but we do find the "promoter" (based on an abstract type hierarchy if push comes to shove). And it finds:
common = CommonDType[UInt8, Int8]
And uses that for the result. The ugly part is, that in some cases we probably need to check, e.g. whether:
common + common -> common
exists. How much "logic" could/would such a "promoter" have access too? Is that even possible or am I just summoning more dragons?
Cutting corners? I think limiting this to only ever trying `CommonDType` and nothing else would cover the vast majority of cases. But this is the only "idea" I have with respect to not listing an infinite amount of combinations :(.
Cheers,
Sebastian
[1] JAX solves this by brute-forcing and caching, IIRC. In NumPy I resolve (most of) the weirdness by inspecting how the binary operation has non-commutative overloads. (The idea is that, `float + int -> float`, but `int + float -> undefined`, so "float" is, in a sense, more important.)
So it is very annoying/ugly logic, but it is spelled out...
[2] I probably have _some_ wiggle room to change this, but its complicated... and I am not sure there is much wiggle room. But, I am not smart enough to find an easier solution that is backwards and forwards compatible and extensible...
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
Thanks for reaching out, Sebastian!
My question is: Is there any chance you could call back into existing > Python exposed functionality to implement these operations? > Yes, > that > could even have harmless "side effects" very occasionally. > And > it > would mean using the library to type the library...
> But to me it seems daunting to duplicate potentially very > complex > logic > that, at least for NumPy, is often available at runtime.
Hmm, that's a good point. I agree having to duplicate the logic is not ideal.
Enabling the type checker to call out to existing Python code is definitely an option we should consider, but I suspect there be dragons. Pradeep - as someone who actually knows how type checkers work, how viable do you think it would be?
NumPy promotion is potentially more complex than the above common- DType > operation.
Oh, gosh, right, I'd forgotten about all the other types the NumPy can interact with. I guess this is mainly an argument against the 'brute force' solutions 1 and 2? (And I guess extra weight towards the point that duplicate logic for this would not be great?)
On Wed, 17 Nov 2021 at 22:24, Sebastian Berg seberg@berkeley.edu wrote:
> Hey all, > > > A quick summary of the talk is below. > > > > - How should we handle data type promotion in stubs? > > - Option 1: One overload for each possible combination > > of > > dtypes > > - Option 2: One overload for each result dtype > > - Option 3: Don't handle type promotion > > - Option 4: Use a dtype class hierarchy and exploit > > Union > > type > > operator > > behaviour > > - Option 5: Propose a type promotion operator > > - Option 6: Propose a 'nearest common parent' operator > > - Option 7: Propose a type lookup table operator > > - Consensus during discussion was that since it looks > > like a > > new > > type > > operator *would* be required, we should probably hold > > off on > > dealing > > with this until the community shows a strong desire for > > this > > feature, and > > in the meantime just not handle data type promotion in > > stubs. > > > > Sorry for missing the discussion, coming over from NumPy to > here. > Hopefully, my terminology matches yours pretty well and the > below > is > helpful or at least interesting :). > > > One I would like to note is that, unlike Python types, DTypes > are > already typed. In the sense that a library doing "promotion" > should > have the logic for it explicitly spelled out somewhere. > The library needs to find the right C loop and result dtype > before it > can do the actual operation, so this is a fairly distinct > step > [1]. > That seems very unlike typical Python where you do not have a > distinct > "promotion" implementation. > > My question is: Is there any chance you could call back into > existing > Python exposed functionality to implement these operations? > Yes, > that > could even have harmless "side effects" very occasionally. > And > it > would mean using the library to type the library... > > But to me it seems daunting to duplicate potentially very > complex > logic > that, at least for NumPy, is often available at runtime. > > > About "nearest common parent" > ----------------------------- > > What I/we now use in NumPy is a `__common_dtype__` binary > operator > (only internal/available in C right now). There are a few > things > to > note about it though: > > * It is not always commutative/associative in NumPy (I do > some > extra > stuff to hide that sometimes) > * "Common dtype" is not always equivalent to "promotion" in > math > functions. (See below.) > > So, effectively NumPy has this. And it is based on a binary > classmethod. > This would probably solve the majority of annotations, but it > would be > nice to not have to implement the logic twice? > > > About "promotion" in functions > ------------------------------ > > NumPy promotion is potentially more complex than the above > common-DType > operation. Although I could provide API like: > > np.add.resolve_dtypes(DType1, DType2) -> DTypeX > > but the general rules are tricky, and there are complicated > corners: > > np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta > > np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number > > Maybe "Timedelta" is a terrible outlier... But the point is, > that > there is a big potential complexity. (I.e. also math > functions > that > have mixed float and integer inputs and/or outputs.) > > Right now, I can't guarantee that the above would not have > mild > side- > effects (for NumPy) > > > About "value based" logic > ------------------------- > > In the discussion it was mentioned that NumPy's "value based" > logic. > `1000` might be considered an `int16` or `uint16` but not an > `int8`. > > The one good part about this: We would _really_ like to get > rid > of > that in NumPy. Although that will still leave you with > special > promotion/common-dtype rules when a Python > integer/float/complex > is > involved. (Just the value should not matter, except that a > bad > value > could lead to an error, e.g. if the Python integer is too > large.) > > Cheers, > > Sebastian > > > [1] OK, I may be extrapolating from NumPy. I suppose some > libraries > may not promote explicitly, but have `ndarray[Int8]` be an > actual > type. > But probably such a library could auto-generating a table of > overloads? > > [2] I am intentionally not using `np.result_type`, because > NumPy > dtypes > can be parametric, and `np.result_type("S3", "S4")` is "S4", > but > you > are primarily interested in `String, String -> String`. > _______________________________________________ > Typing-sig mailing list -- typing-sig@python.org > To unsubscribe send an email to typing-sig-leave@python.org > https://mail.python.org/mailman3/lists/typing-sig.python.org/ > Member address: mrahtz@google.com > _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
Oh, I see! Thanks Guido for clarifying.
Alright, I think I understand that rest of your explanation now, Sebastian. Thanks again for laying it all out - I'll keep this thread in mind for when we get round to implementing it :)
On Tue, 23 Nov 2021 at 17:02, Guido van Rossum guido@python.org wrote:
The thing to know is that `x + y` first calls `x.__add__(y)` and if that returns a specific singleton, `NotImplemented`, it calls `y.__radd__(x)`. (There are some corner cases where `__radd__` is called first due to one class being a subclass of the other, and if they are both the same class `__radd__` may not be called at all.)
So there's a difference between `+` and `__add__`, and that's what's confusing people and complicating matters. But in the end it all works like your intuition tells you.
On Tue, Nov 23, 2021 at 4:11 AM Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
I feel like I might be missing some key piece of context - I'm still confused about this:
Sorry, I was conflating things. The asymmetry that I mean is really the same as Python `float + int`, where you have:
float.__add__(int) -> int float.__radd__(int) -> int
But `int.__add__(float)` is undefined. In this case `Int8` is the one that knows about `UInt8`, just like float knows about int and not the other way around.
In what sense is it undefined? `1 + 1.0` works at runtime, and the type seems to be inferred fine in pytype and mypy. What am I missing?
remainder(float64, complex128)
Interesting. So it sounds like the takeaway is that a lot of the edge cases are to do with complex types?
On Fri, 19 Nov 2021 at 16:01, Sebastian Berg seberg@berkeley.edu wrote:
On Fri, 2021-11-19 at 10:59 +0000, Matthew Rahtz via Typing-sig wrote:
Thanks for this thorough explanation, Sebastian!
I fear that because of my own inexperience with the subtleties of type promotion, there were a few details I didn't understand:
Well, I don't know typing well. And could use a bit more formalism about how to describe all of this...
We have a "common DType". For NumPy, this should be based on simple
binary overload stubs: * UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
If I'm understanding correctly, 'common DType' is a type operator, and is represented by '+' in these two lines?
How come these aren't symmetric - how come Uint8 + Int8 isn't also Int16?
Sorry, I was conflating things. The asymmetry that I mean is really the same as Python `float + int`, where you have:
float.__add__(int) -> int float.__radd__(int) -> int
But `int.__add__(float)` is undefined. In this case `Int8` is the one that knows about `UInt8`, just like float knows about int and not the other way around.
DETOUR: unfortunately (or fortunately), that information turns out to be somewhat useful to retain backwards compatibility in NumPy by resolving that:
common(float16, int16, uint16) -> float16 common(int16, uint16, float16) -> float16
even though:
common(float16, common(int16, uint16)) -> common(float16, int32) -> float32
but these are the gory, mind boggling inanities :(. That maybe you can get away with ignoring, at least for now. Basically, NumPy tried to be smart about it in the past, and I didn't want to break it...
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
So the "list of DType signatures" is essentially a list of return types given homogeneous argument types? Like this - literally just a list of signatures in the codebase somewhere, *not* auto-generated using logic?
Yes, although they are not required to be homogeneous. That is the case that covers the vast majority, though. E.g. SciPy has special math functions that have both float and int inputs, or real and complex.
subtract(uint8, uint8) → uint8 ... subtract(datetime, datetime) → timedelta ...
And the "additional promotion for mixed type-signatures" is for cases where the argument types are heterogeneous - and these are *not* just a list of signatures in the code (because it would be massive), but some logic that determines on the fly what the result type should be?
Yes (where heterogeneous is not necessarily the only possibility).
And uses that for the result. The ugly part is, that in some cases we
probably need to check, e.g. whether: common + common -> common
Is '+' here an actual addition or the 'common DType' operator? Could you give an example of a case where this is true?
In that case I actually meant the original addition.
A (probably bad) example is that `remainder()` is not defined for complex types, but resolving:
remainder(float64, complex128)
using promotion might end up with:
remainder(complex128, complex128) -> complex128
Of course that is easy to solve, if you define:
# A union, or something more explicit? Real = Union[int8, ..., float32, float64] @overload def remainder(DT1 : Real, DT2 : Real) -> CommonDType[DT1, DT2]
And we define up-front that `complex` can't work, by typing it as real only.
This alone should cover all but the dark or dusty corners!
At runtime, it seemed easier to just offer a blanket "try the promotion and see if it finds something valid", rather than attempting to limit it to known valid cases. Also some promotions might clearly be correct, but lacks an implementation...
Maybe the only useful part :). In terms of typing, I think if you wanted to cover even that with the same approach, you would get:
@overload def remainder(DT1 : Float32, DT2 : Float32) -> Float32 @overload ... @overload # or @promotion? def remainder(DT1 : Any, DT2 : Any) \ -> remainder(CommonDType[DT1, DT2], CommonDType[DT1, DT2])
which is a re-entrant check, i.e. the "promotion". I don't know typing well enough whether promotion is something that fits in even remotely.
Cheers,
Sebastian
Cheers, Matthew
On Thu, 18 Nov 2021 at 17:46, Sebastian Berg seberg@berkeley.edu wrote:
On Thu, 2021-11-18 at 07:53 -0800, Guido van Rossum wrote:
Calling out to Python during checking is not an option. But maybe we could extract a table with all the possibilities from the runtime at build / install time?
It’s disturbing that the rules are different for different libraries though. That will make this much harder.
Ah, fair enough. Maybe it is simply OK to duplicate most of this.
As Stephan Hoyer pointed out, you can obviously start by cutting corners and focus only on the clear-cut Numerical cases which should have no surprises.
Sorry for the long mail, and risking to repeat some things and a lack of being deep into typing...
I will try to outline what from my perspective you would need to cover (almost?) everything in NumPy in a way that matches well enough with what we got. I expect the worst part here is the "promoter".
Note that this is a bit angled at a "complete" solution, that may just be utopic for the fact that users could extend NumPy DTypes.
- We have a "common DType". For NumPy, this should be based on
simple binary overload stubs:
* UInt8 + Int8 -> NotImplemented (well, no overload I guess) * Int8 + UInt8 -> Int16
The elephant in the room is that for NumPy (or JAX, ...), "common DType" is not actually a binary operation (it is not associative when called on more DTypes). So in theory, you need trickier logic. [1]
Of course you should probably just cut corners here (at least for now). I see two ways to do that:
- Refuse typing of any rules that might end up non-associative (This mainly is anything mixing unsigned and signed integers, in a sense included in what Stephan Hoyer suggested.)
- Limit common DType two DTypes (repeated DTypes are OK of course).
and both seem reasonable and will probably cover the majority of use cases.
- For math functions NumPy has [2]:
- A list of DType signatures that is supported
- An additional promotion for mixed type-signatures or as a
fallback (these also have a signature themselves)
Exporting the list of both should be OK (users could extend it!).
Further, typing what the promotion does should also be fine in practice.
Now, when a user has `uint8 + int8`, we have no type signature for it but we do find the "promoter" (based on an abstract type hierarchy if push comes to shove). And it finds:
common = CommonDType[UInt8, Int8]
And uses that for the result. The ugly part is, that in some cases we probably need to check, e.g. whether:
common + common -> common
exists. How much "logic" could/would such a "promoter" have access too? Is that even possible or am I just summoning more dragons?
Cutting corners? I think limiting this to only ever trying `CommonDType` and nothing else would cover the vast majority of cases. But this is the only "idea" I have with respect to not listing an infinite amount of combinations :(.
Cheers,
Sebastian
[1] JAX solves this by brute-forcing and caching, IIRC. In NumPy I resolve (most of) the weirdness by inspecting how the binary operation has non-commutative overloads. (The idea is that, `float + int -> float`, but `int + float -> undefined`, so "float" is, in a sense, more important.)
So it is very annoying/ugly logic, but it is spelled out...
[2] I probably have _some_ wiggle room to change this, but its complicated... and I am not sure there is much wiggle room. But, I am not smart enough to find an easier solution that is backwards and forwards compatible and extensible...
On Thu, Nov 18, 2021 at 02:36 Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
> Thanks for reaching out, Sebastian! > > My question is: Is there any chance you could call back into > existing > > Python exposed functionality to implement these operations? > > Yes, > > that > > could even have harmless "side effects" very occasionally. > > And > > it > > would mean using the library to type the library... > > > > But to me it seems daunting to duplicate potentially very > > complex > > logic > > that, at least for NumPy, is often available at runtime. > > > Hmm, that's a good point. I agree having to duplicate the logic > is > not > ideal. > > Enabling the type checker to call out to existing Python code > is > definitely an option we should consider, but I suspect there be > dragons. > Pradeep - as someone who actually knows how type checkers work, > how > viable > do you think it would be? > > NumPy promotion is potentially more complex than the above > common- > DType > > operation. > > > Oh, gosh, right, I'd forgotten about all the other types the > NumPy > can > interact with. I guess this is mainly an argument against the > 'brute force' > solutions 1 and 2? (And I guess extra weight towards the point > that > duplicate logic for this would not be great?) > > > > > > On Wed, 17 Nov 2021 at 22:24, Sebastian Berg > seberg@berkeley.edu > wrote: > > > Hey all, > > > > > A quick summary of the talk is below. > > > > > > - How should we handle data type promotion in stubs? > > > - Option 1: One overload for each possible combination > > > of > > > dtypes > > > - Option 2: One overload for each result dtype > > > - Option 3: Don't handle type promotion > > > - Option 4: Use a dtype class hierarchy and exploit > > > Union > > > type > > > operator > > > behaviour > > > - Option 5: Propose a type promotion operator > > > - Option 6: Propose a 'nearest common parent' operator > > > - Option 7: Propose a type lookup table operator > > > - Consensus during discussion was that since it looks > > > like a > > > new > > > type > > > operator *would* be required, we should probably hold > > > off on > > > dealing > > > with this until the community shows a strong desire for > > > this > > > feature, and > > > in the meantime just not handle data type promotion in > > > stubs. > > > > > > > Sorry for missing the discussion, coming over from NumPy to > > here. > > Hopefully, my terminology matches yours pretty well and the > > below > > is > > helpful or at least interesting :). > > > > > > One I would like to note is that, unlike Python types, DTypes > > are > > already typed. In the sense that a library doing "promotion" > > should > > have the logic for it explicitly spelled out somewhere. > > The library needs to find the right C loop and result dtype > > before it > > can do the actual operation, so this is a fairly distinct > > step > > [1]. > > That seems very unlike typical Python where you do not have a > > distinct > > "promotion" implementation. > > > > My question is: Is there any chance you could call back into > > existing > > Python exposed functionality to implement these operations? > > Yes, > > that > > could even have harmless "side effects" very occasionally. > > And > > it > > would mean using the library to type the library... > > > > But to me it seems daunting to duplicate potentially very > > complex > > logic > > that, at least for NumPy, is often available at runtime. > > > > > > About "nearest common parent" > > ----------------------------- > > > > What I/we now use in NumPy is a `__common_dtype__` binary > > operator > > (only internal/available in C right now). There are a few > > things > > to > > note about it though: > > > > * It is not always commutative/associative in NumPy (I do > > some > > extra > > stuff to hide that sometimes) > > * "Common dtype" is not always equivalent to "promotion" in > > math > > functions. (See below.) > > > > So, effectively NumPy has this. And it is based on a binary > > classmethod. > > This would probably solve the majority of annotations, but it > > would be > > nice to not have to implement the logic twice? > > > > > > About "promotion" in functions > > ------------------------------ > > > > NumPy promotion is potentially more complex than the above > > common-DType > > operation. Although I could provide API like: > > > > np.add.resolve_dtypes(DType1, DType2) -> DTypeX > > > > but the general rules are tricky, and there are complicated > > corners: > > > > np.divide.resolve_dtypes(Timedelta, Number) -> Timedelta > > > > np.divide.resolve_dtypes(Timedelta, Timedelta) -> Number > > > > Maybe "Timedelta" is a terrible outlier... But the point is, > > that > > there is a big potential complexity. (I.e. also math > > functions > > that > > have mixed float and integer inputs and/or outputs.) > > > > Right now, I can't guarantee that the above would not have > > mild > > side- > > effects (for NumPy) > > > > > > About "value based" logic > > ------------------------- > > > > In the discussion it was mentioned that NumPy's "value based" > > logic. > > `1000` might be considered an `int16` or `uint16` but not an > > `int8`. > > > > The one good part about this: We would _really_ like to get > > rid > > of > > that in NumPy. Although that will still leave you with > > special > > promotion/common-dtype rules when a Python > > integer/float/complex > > is > > involved. (Just the value should not matter, except that a > > bad > > value > > could lead to an error, e.g. if the Python integer is too > > large.) > > > > Cheers, > > > > Sebastian > > > > > > [1] OK, I may be extrapolating from NumPy. I suppose some > > libraries > > may not promote explicitly, but have `ndarray[Int8]` be an > > actual > > type. > > But probably such a library could auto-generating a table of > > overloads? > > > > [2] I am intentionally not using `np.result_type`, because > > NumPy > > dtypes > > can be parametric, and `np.result_type("S3", "S4")` is "S4", > > but > > you > > are primarily interested in `String, String -> String`. > > _______________________________________________ > > Typing-sig mailing list -- typing-sig@python.org > > To unsubscribe send an email to typing-sig-leave@python.org > > https://mail.python.org/mailman3/lists/typing-sig.python.org/ > > > Member address: mrahtz@google.com > > > _______________________________________________ > Typing-sig mailing list -- typing-sig@python.org > To unsubscribe send an email to typing-sig-leave@python.org > https://mail.python.org/mailman3/lists/typing-sig.python.org/ > Member address: guido@python.org > _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: sebastian@sipsolutions.net
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/