Type bounds on PEP 646
Hi, I have been thinking about the implications of not supporting bounds in PEP 646. Although when I mentioned it the last time that I checked the PEP I did not give it much importance, after having thought more about it and heard about more use cases I wonder if leaving the bound for another PEP would not limit too much the value of the PEP. The main motivation of the PEP is to use variadics to model the shape of a tensor, either by specifying the exact size of the dimension or by specifying their name. However, if for example a library like NumPy or PyTorch would want to take advantage of this feature, they would need to restrict the bound of the variadic. For example, if they go for typing exact dimensions they would like to accept Tensor[L[20],L[40]] but not Tensor[L[20], str, ClassFoo]. (L = Literal) Similarly, if they prefer to go for the naming route (Tensor[Width, Height]) but they don't want arbitrary types in there. Therefore, I was wondering if leaving it for a future PEP is the best approach. The initial reason was to keep the PEP simple but I think that the additional complexity could come from supporting Variance (which can be left out), but bounds would not introduce any new behavior, they would just replicate the logic of the bounds in TypeVar. Any thoughts? Alfonso.
Could you elaborate with more examples? On Thu, Jul 15, 2021 at 9:52 AM Alfonso L. Castaño < alfonsoluis.castanom@um.es> wrote:
Hi,
I have been thinking about the implications of not supporting bounds in PEP 646. Although when I mentioned it the last time that I checked the PEP I did not give it much importance, after having thought more about it and heard about more use cases I wonder if leaving the bound for another PEP would not limit too much the value of the PEP.
The main motivation of the PEP is to use variadics to model the shape of a tensor, either by specifying the exact size of the dimension or by specifying their name. However, if for example a library like NumPy or PyTorch would want to take advantage of this feature, they would need to restrict the bound of the variadic. For example, if they go for typing exact dimensions they would like to accept Tensor[L[20],L[40]] but not Tensor[L[20], str, ClassFoo]. (L = Literal) Similarly, if they prefer to go for the naming route (Tensor[Width, Height]) but they don't want arbitrary types in there.
Therefore, I was wondering if leaving it for a future PEP is the best approach. The initial reason was to keep the PEP simple but I think that the additional complexity could come from supporting Variance (which can be left out), but bounds would not introduce any new behavior, they would just replicate the logic of the bounds in TypeVar.
Any thoughts?
Alfonso. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
Sure. The main motivation of the PEP is to allow numerical libraries to use a variadic to represent the shape of a tensor. Depending on the library (Numpy, Tensorflow, PyTorch) the tensor class will be slightly different, but overall the types of the variadic will represent something about each dimension. Ts = TypeVarTuple('Ts', bound=...) class Tensor(Generic[*Ts]): ... The main idea that we have in mind is that they will specify the size of the dimensions, what would be documented in the class signature. In that case the bound would be int, so that they can write things like: # L = Literal x : Tensor[L[20], L[40]] If the types of the variadic are supposed to represent the size of the dimensions, it would not make sense to allow users to type that a tensor with a variadic that does not represent numbers. For example, it would not make sense to write: x : Tensor[RandomClass, None, L['foo']] = ... Or in a function signature: def f(x : Tensor[RandomClass, None, L['foo']): ... The same issue arises if they decide to go for the approach of using the variadic to describe what a dimension represents. In that case a reasonable approach would be to do the following: Dimension = NewType('Dimension', int) Height = NewType('Height', Dimension) Width = NewType('Width', Dimension) Channels = NewType('Channels', Dimension) Ts = TypeVarTuple("Ts", bound=Dimension) class Tensor(Generic[*Ts]): ... This way it would limit what can represent a dimension. So that it can be written: x : Tensor[Width, Height] = ... But not something different that does not make sense. Overall, these libraries will define the Tensor class with an idea in mind of what the types of the variadic represent and it would be very strange that users would be allowed to annotate tensors with types that don't represent anything that makes sense according to the class definition. It is also relevant that this problem does not arise with the old approach of having a different class for each number of dimensions: class Tensor1d(Generic[A]): ... class Tensor2d(Generic[A,B]): ... Finally, it might be worth keeping in mind that some of the new features that have been proposed like type arithmetic or broadcasting might require to rely on bound. Best, Alfonso.
Ah, so this is about allowing the 'bound=...' parameter for TypeVarTuple and how it should constrain the allowable types in each dimension. I guess there might be two ways to go about it -- the way you showed, or an alternative that makes clear that a TypeVarTuple stands for a *tuple* of type variables. This would be TypeVarTuple("T", bound=Tuple[Dimension, ...]). The latter would simplify the transition to just using TypeVar instead of TypeVarTuple. The latter is technically redundant, although using it probably clarifies things to the user -- but if your proposal for bound=Dimension is accepted it is no longer equivalent. On Thu, Jul 15, 2021 at 3:30 PM Alfonso L. Castaño < alfonsoluis.castanom@um.es> wrote:
Sure.
The main motivation of the PEP is to allow numerical libraries to use a variadic to represent the shape of a tensor. Depending on the library (Numpy, Tensorflow, PyTorch) the tensor class will be slightly different, but overall the types of the variadic will represent something about each dimension.
Ts = TypeVarTuple('Ts', bound=...) class Tensor(Generic[*Ts]): ...
The main idea that we have in mind is that they will specify the size of the dimensions, what would be documented in the class signature. In that case the bound would be int, so that they can write things like:
# L = Literal x : Tensor[L[20], L[40]]
If the types of the variadic are supposed to represent the size of the dimensions, it would not make sense to allow users to type that a tensor with a variadic that does not represent numbers. For example, it would not make sense to write:
x : Tensor[RandomClass, None, L['foo']] = ...
Or in a function signature:
def f(x : Tensor[RandomClass, None, L['foo']): ...
The same issue arises if they decide to go for the approach of using the variadic to describe what a dimension represents. In that case a reasonable approach would be to do the following:
Dimension = NewType('Dimension', int) Height = NewType('Height', Dimension) Width = NewType('Width', Dimension) Channels = NewType('Channels', Dimension)
Ts = TypeVarTuple("Ts", bound=Dimension) class Tensor(Generic[*Ts]): ...
This way it would limit what can represent a dimension. So that it can be written:
x : Tensor[Width, Height] = ...
But not something different that does not make sense.
Overall, these libraries will define the Tensor class with an idea in mind of what the types of the variadic represent and it would be very strange that users would be allowed to annotate tensors with types that don't represent anything that makes sense according to the class definition.
It is also relevant that this problem does not arise with the old approach of having a different class for each number of dimensions:
class Tensor1d(Generic[A]): ... class Tensor2d(Generic[A,B]): ...
Finally, it might be worth keeping in mind that some of the new features that have been proposed like type arithmetic or broadcasting might require to rely on bound.
Best, Alfonso. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
Ah, so this is about allowing the 'bound=...' parameter for TypeVarTuple and how it should constrain the allowable types in each dimension.
Yes indeed, sorry for not being clear enough. I see your point about specifying that it is bound to a tuple, however I see a few drawbacks, you already mentioned some of them. - It is already called TypeVarTuple so writing again a tuple in the bound is sort of redundant. - If anything that is not a tuple would be passed in the bound it should throw an error so unlike with TypeVar, you can't write any type but just tuples. So why not just write the type of the tuple? However, in my opinion the main issue is the following. With the approach of bound=int it is clear that the variadic is a list of any length of type variable bound by int. If we use a tuple it would suggest to users that they can use tuples of a fixed length what goes against the idea of variadic, for example: TypeVarTuple("Ts", bound=Tuple[int, str]) In that case we would need to throw an error for anything that is not of the form Tuple[Foo, ...]. Therefore, I would suggest not using tuple since it is more redundant and increases the number of ways a user might declare an invalid TypeVarTuple.
Thanks for bringing this up, Alfonso!
There are two reasons I think we should hold off on adding bound just yet:
1. In the short term, I'm not sure we necessarily do want libraries using
bound=int on their Shape TypeVarTuples, in order to allow users to choose
whether they want to use semantic annotations (example A below) or size
annotations (example B below):
A:
class Width: pass
class Height: pass
Tensor[Height, Width]
B:
Tensor[L[480], L[640]]
2. In the longer term, I feel like there could be subtleties on how we want
bound to behave that we'll only discover after we've used TypeVarTuple in
the field a bit more. In particular, having bound=int limit all of the
types seems potentially too inflexible. Maybe in the future we'll find
cases where we, say, want to limit the type of the first type, but leave
the rest unconstrained. If so, that's probably also going to involve making
decisions about a more flexible syntax for describing partially-unknown
shapes, which is going to be complicated/involve a lot of discussion, so I
definitely want to leave for the future.
Overall, I expect the whole thing of using generics to annotate shapes is
still going to be (and should be) in the experimental phase for a while
even if the PEP is accepted, and I think it's fine to let users shoot
themselves in the foot while that's the case to avoid locking ourselves
into things we might regret in the future.
On Fri, 16 Jul 2021 at 09:15, Alfonso L. Castaño
Ah, so this is about allowing the 'bound=...' parameter for TypeVarTuple and how it should constrain the allowable types in each dimension.
Yes indeed, sorry for not being clear enough.
I see your point about specifying that it is bound to a tuple, however I see a few drawbacks, you already mentioned some of them.
- It is already called TypeVarTuple so writing again a tuple in the bound is sort of redundant. - If anything that is not a tuple would be passed in the bound it should throw an error so unlike with TypeVar, you can't write any type but just tuples. So why not just write the type of the tuple?
However, in my opinion the main issue is the following.
With the approach of bound=int it is clear that the variadic is a list of any length of type variable bound by int. If we use a tuple it would suggest to users that they can use tuples of a fixed length what goes against the idea of variadic, for example:
TypeVarTuple("Ts", bound=Tuple[int, str])
In that case we would need to throw an error for anything that is not of the form Tuple[Foo, ...].
Therefore, I would suggest not using tuple since it is more redundant and increases the number of ways a user might declare an invalid TypeVarTuple. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Hi Matthew, Regarding 1. It makes total sense and I think that it reinforces the idea of supporting bounds. I was not only putting bound=int as an example, but also bound=Dimension or anything that the library authors (or users) might consider appropriate. Even bound=Union[int,Dimension]! This should be a decision of the library and we should offer library authors a mechanism for specifying what they want to allow users to put in the shape types. A: class Dim: pass class Width(Dim): pass class Height(Dim): pass Ts = TypeVarTuple("Ts", bound=Dim) Tensor[Height, Width] B: Ts = TypeVarTuple("Ts", bound=int) Tensor[L[480], L[640]] Regarding 2. I wonder if you could give more concrete examples about what other behaviors could bound=... have apart from what we have commented, since this feature is rather simple and cannot have many incompatible alternatives. As of the example of limiting the first type the user would combine a type variable + variadic. I agree with the importance of giving users the tools to explore how they want to use types to model tensor shapes, but the problem is that without bounds it will be so unrestrictive that most libraries/users will not be able to write something consistent and will not really have a chance to experiment.
On 1: OK, I agree, you're right. On 2:
As of the example of limiting the first type the user would combine a type variable + variadic.
Oh, hmm, that's a good point.
I wonder if you could give more concrete examples about what other behaviors could bound=... have apart from what we have commented
The specific thing I was thinking of is, are there going to be any situations where we'd use a TypeVarTuple but in practice always pass in, say, exactly 4 types? In cases like that one might want set, say, bound=Tuple[int, str, str, float]. Admittedly now that I say this out loud this seems like a perverse use-case - why *wouldn't* you just use normal TypeVars in that case? - but maybe there could be situations where it would be more convenient somehow to use a TypeVarTuple...? I still have a vague worry about use cases we're not anticipating, but having thought about this a bit more the strength of my conviction is significantly weakened.
I agree with the importance of giving users the tools to explore how they want to use types to model tensor shapes, but the problem is that without bounds it will be so unrestrictive that most libraries/users will not be able to write something consistent and will not really have a chance to experiment.
What kinds of concrete problems are you imagining?
On Fri, 16 Jul 2021 at 10:38, Alfonso L. Castaño
Hi Matthew,
Regarding 1. It makes total sense and I think that it reinforces the idea of supporting bounds. I was not only putting bound=int as an example, but also bound=Dimension or anything that the library authors (or users) might consider appropriate. Even bound=Union[int,Dimension]! This should be a decision of the library and we should offer library authors a mechanism for specifying what they want to allow users to put in the shape types.
A: class Dim: pass class Width(Dim): pass class Height(Dim): pass Ts = TypeVarTuple("Ts", bound=Dim) Tensor[Height, Width]
B: Ts = TypeVarTuple("Ts", bound=int) Tensor[L[480], L[640]]
Regarding 2. I wonder if you could give more concrete examples about what other behaviors could bound=... have apart from what we have commented, since this feature is rather simple and cannot have many incompatible alternatives. As of the example of limiting the first type the user would combine a type variable + variadic.
I agree with the importance of giving users the tools to explore how they want to use types to model tensor shapes, but the problem is that without bounds it will be so unrestrictive that most libraries/users will not be able to write something consistent and will not really have a chance to experiment. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
On Mon, Jul 19, 2021 at 12:16 AM Matthew Rahtz via Typing-sig < typing-sig@python.org> wrote:
[Alfonso]
I wonder if you could give more concrete examples about what other behaviors could bound=... have apart from what we have commented
The specific thing I was thinking of is, are there going to be any situations where we'd use a TypeVarTuple but in practice always pass in, say, exactly 4 types? In cases like that one might want set, say, bound=Tuple[int, str, str, float]. Admittedly now that I say this out loud this seems like a perverse use-case - why *wouldn't* you just use normal TypeVars in that case? - but maybe there could be situations where it would be more convenient somehow to use a TypeVarTuple...?
A use case might involve constructing other types using the * operator. That doesn't work with TypeVar, only with TypeVarTuple. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
Maybe Matthew can construct an example? If Matthew doesn't think there is one, it's a moot point. FWIW I just talked to a member of the steering council. They sent a reminder that the PEP needs more work on the proposed syntax changes (Matthew will have seen their exact words) before they can review it further. I expect that if we revise the PEP beyond that at this point, there will be many more months of delay in the review. On Mon, Jul 19, 2021 at 12:49 PM Alfonso L. Castaño < alfonsoluis.castanom@um.es> wrote:
Could you write an example Guido? I am not sure that I understand what you mean. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
I agree with Guido that we don't want to block the PEP for a non-critical feature. Bounds start to be useful only in the type arithmetic PEP. That is when we would care about performing `int` operations on the dimensions. Given that we are only talking about obscure edge cases like `Tensor[int, str, bool]`, I think it's completely fair to defer this to the type arithmetic PEP. We'd discussed adding a bound earlier but there was the question about whether we would want `bound=Tuple[int, ...]` and possibly `bound=Union[<3D tuple>, <4D tuple>]` in certain cases. Also, there's the bikeshedding around `bound=int` vs `element_bound=int` since technically the bound is a Tuple. Overall, I think we need more real-world use cases to decide one way or another. Otherwise, the PEP is pretty indifferent to Tensor operations in particular. -- S Pradeep Kumar
Sorry for forgetting to reply to this.
In the end, given that revisions would likely imply further delay, I agree
with Pradeep - let's leave this for a future PEP.
On Mon, 19 Jul 2021 at 22:23, S Pradeep Kumar
I agree with Guido that we don't want to block the PEP for a non-critical feature. Bounds start to be useful only in the type arithmetic PEP. That is when we would care about performing `int` operations on the dimensions. Given that we are only talking about obscure edge cases like `Tensor[int, str, bool]`, I think it's completely fair to defer this to the type arithmetic PEP.
We'd discussed adding a bound earlier but there was the question about whether we would want `bound=Tuple[int, ...]` and possibly `bound=Union[<3D tuple>, <4D tuple>]` in certain cases. Also, there's the bikeshedding around `bound=int` vs `element_bound=int` since technically the bound is a Tuple. Overall, I think we need more real-world use cases to decide one way or another. Otherwise, the PEP is pretty indifferent to Tensor operations in particular. -- S Pradeep Kumar _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
participants (4)
-
Alfonso L. Castaño
-
Guido van Rossum
-
Matthew Rahtz
-
S Pradeep Kumar