Let's Fix Class Annotations -- And Maybe Annotations Generally
Howdy howdy. While working on my PEP I stumbled over a lot of behavior by annotations that I found inconsistent and inconvenient. I think there are several problems here that needs fixing. This discussion will probably evolve into a PEP, and I'll be happy to steer that process. But I'm less certain about what the right thing to do is. (Although I do know what I'd prefer!) So let's talk about it! Annotations are represented in Python as a dictionary. They can be present on functions, classes, and modules as an attribute called "__annotations__". We start with: how do you get the annotations from one of these objects? Surely it's as easy as this line from Lib/inspect.py shows us: return func.__annotations__ And yes, that's best practice for getting an annotation from a function object. But consider this line from Lib/functools.py: ann = getattr(cls, '__annotations__', {}) Huh. Why doesn't it simply look at cls.__annotations__? It's because the language declares that __annotations__ on a class or module is optional. Since cls.__annotations__ may not be defined, evaluating that might throw an exception. Three-argument getattr() is much safer, and I assert it's best practice for getting the annotations from a module. But consider this line from Lib/dataclasses.py: cls_annotations = cls.__dict__.get('__annotations__', {}) And a very similar line from Lib/typing.py: ann = base.__dict__.get('__annotations__', {}) Huh! Why is this code skipping the attribute entirely, and examining cls.__dict__? It's because the getattr() approach has a subtle bug when dealing with classes. Consider this example: class A: ax:int=3 class B(A): pass print(getattr(B, '__annotations__', {})) That's right, B *inherits* A.__annotations__! So this prints {'ax': int}. This *can't* the intended behavior of __annotations__ on classes. It's only supposed to contain annotations for the fields of B itself, not those of one of its randomly-selected base classes. But that's how it behaves today--and people have had to work around this behavior for years. Examining the class dict is, sadly, best practice for getting __annotations__ from a class. So, already: three different objects can have __annotations__, and there are three different best practices for getting their __annotations__. Let's zoom out for a moment. Here's the list of predefined data fields you can find on classes: __annotations__ __bases__ __class__ __dict__ __doc__ __module__ __mro__ __name__ __qualname__ All of these describe metadata about the class. In every case *except one*, the field is mandatory, which also means it's never inherited. And in every case *except one*, you cannot delete the field. (Though you *are* allowed to overwrite some of them.) You guessed it: __annotations__ is the exception. It's optional, and you're allowed to delete it. And these exceptions are causing problems. It seems to me that, if the only way to correctly use a language-defined attribute of classes is by rooting around in its __dict__, the design is a misfire. (Much of the above also applies to modules, too. The big difference: since modules lack inheritance, you don't need to look in their __dict__.) Now consider what happens if my "delayed annotations of annotations using descriptors" PEP is accepted. If that happens, pulling __annotations__ out of the class dict won't work if they haven't been generated yet. So today's "best practice" becomes tomorrow's "this code doesn't work". To correctly examine class annotations, code would have to do something like this, which should work correctly in any Python 3.x version: if (getattr(cls, '__co_annotations__', None) or ('__annotations__' in cls.__dict__)): ann = cls.__annotations__ else: ann = {} This is getting ridiculous. Let's move on to a related topic. For each of the objects that can have annotations, what happens if o.__annotations__ is set, and you "del o.__annotations__", then you access o.__annotations__? It depends on what the object is, because each of them behaves differently. You already know what happens with classes: if any of the base classes has __annotations__ set, you'll get the first one you find in the MRO. If none of the bases have __annotations__ set you'll get an AttributeError. For a module, if you delete it then try to access it, you'll always get an AttributeError. For a function, if you delete it then try to get it, the function will create a new empty dict, store it as its new annotations dict, and return that. Why does it do that? I'm not sure. The relevent PEP (3107) doesn't specify this behavior. So, annotations can be set on three different object types, and each of those three have a different behavior when you delete the annotations then try to get them again. As a final topic: what are the permitted types for __annotations__? If you say "o.__annotations__ = <x>", what types are and aren't allowed for <x>? For functions, __annotations__ may be assigned to either None or a dict (an object that passes PyDict_Check). Anything else throws a TypeError. For classes and modules, no checking is done whatsoever, and you can set __annotations__ on those two to any Python object. While "a foolish consistency is the hobgoblin of little minds", I don't see the benefit of setting a module's __annotations__ to 2+3j. I think it's long past time that we cleaned up the behavior of annotations. They should be simple and consistent across all objects that support them. At the very least, I think we should make cls.__annotations__ required rather than optional, so that it's never inherited. What should its default value be? An empty dict would be more compatible, but None would be cheaper. Note that creating the empty dict on the fly, the way function objects do, wouldn't really help--because current best practice means looking in cls.__dict__. I also think you shouldn't be able to delete __annotations__ on any of the three objects (function, class, module). It should always be set, so that the best practice for accessing annotations on an object is always o.__annotations__. If I could wave my magic wand and do whatever I wanted, I'd change the semantics for __annotations__ to the following: * Functions, classes, and modules always have an __annotations__ member set. * "del o.__annotations__" always throws a TypeError. * The language will set __annotations__ to a dict if the object has annotations, or None if it has no annotations. * You may set __annotations__, but you can only set it to either None or a dict (passes PyDict_Check). * You may only access __annotations__ as an attribute, and because it's always set, best practice is to use "o.__annotations__" (though getattr will always work too). * How __annotations__ is stored is implementation-specific behavior; looking in the relevant __dict__ is unsupported. This would grant sanity and consistency to __annotations__ in a way it's never so far enjoyed. The problem is, it's a breaking change. But the existing semantics are kind of terrible, so at this point my goal is to break them. I think the best practice needs to stop requiring examining cls.__dict__; in fact I'd prefer people stop doing it altogether. If we change the behavior as part of a new release of Python, code that examines annotations on classes can do a version check: if (sys.version_info.major >=3 and sys.version_info.minor >= 10): def get_annotations(o): return o.__annotations__ or {} else: def get_annotations(o): # eight or ten lines of complex code goes here ... Or code can just use inspect.get_type_hints(), which is tied to the Python version anyway and should always do the right thing. //arry/
My code pretty much does what you suggest at the end of your message: On Mon, 2021-01-11 at 09:22 -0800, Larry Hastings wrote:
Or code can just use inspect.get_type_hints(), which is tied to the Python version anyway and should always do the right thing.
So far, this has proven mostly[1] sufficient for my needs in a runtime type validation and encoding/decoding library. [1] A pain point for me is the runtime cost of evaluating 3.10 style type hints, as they're (re-)evaluated for every call to get_type_hints. I've worked around this for now with my own function affix_type_hints, which evaluates get_type_hints once and replaces __annotations__ with the evaluated value. It also addresses a scoping problem where a type hint may reference a value that's not globally scoped for the object being annotated; the hint can be evaluated and affixed within that scope. Paul
On Mon, Jan 11, 2021 at 9:23 AM Larry Hastings <larry@hastings.org> wrote:
[SNIP - background info]
If I could wave my magic wand and do whatever I wanted, I'd change the semantics for __annotations__ to the following:
* Functions, classes, and modules always have an __annotations__ member set. * "del o.__annotations__" always throws a TypeError. * The language will set __annotations__ to a dict if the object has annotations, or None if it has no annotations. * You may set __annotations__, but you can only set it to either None or a dict (passes PyDict_Check). * You may only access __annotations__ as an attribute, and because it's always set, best practice is to use "o.__annotations__" (though getattr will always work too). * How __annotations__ is stored is implementation-specific behavior; looking in the relevant __dict__ is unsupported.
This would grant sanity and consistency to __annotations__ in a way it's never so far enjoyed. The problem is, it's a breaking change. But the existing semantics are kind of terrible, so at this point my goal is to break them. I think the best practice needs to stop requiring examining cls.__dict__; in fact I'd prefer people stop doing it altogether.
So the biggest potential breakages are code that: 1. Directly get the attribute from __dict__ 2. The fact that it would no longer be inherited Am I missing anything else? For issue #1, it seems that inspect.get_type_hints(), as you point out below, resolves that. As well, code could be updated appropriately without much effort to check different places if the attribute was not found in __dict__. For issue #2, if the default was `None`, then couldn't that be used as an implicit feature marker that you can't (incorrectly) rely on inheritance to percolate up the annotations of the superclass if the subclass happens to not define any annotations? This all seems reasonable to me. Since it's a change to the object model it will probably need a PEP, but I would suspect it would mostly revolve around guiding people to how to update their code to work across Python versions. -Brett
If we change the behavior as part of a new release of Python, code that examines annotations on classes can do a version check:
if (sys.version_info.major >=3 and sys.version_info.minor >= 10):
def get_annotations(o): return o.__annotations__ or {} else: def get_annotations(o): # eight or ten lines of complex code goes here ...
Or code can just use inspect.get_type_hints(), which is tied to the Python version anyway and should always do the right thing.
*/arry* _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AWKVI3NR... Code of Conduct: http://python.org/psf/codeofconduct/
At last, a nibble on the other fishing line! ;-) On 1/11/21 1:47 PM, Brett Cannon wrote:
So the biggest potential breakages are code that:
1. Directly get the attribute from __dict__ 2. The fact that it would no longer be inherited
Am I missing anything else?
Those are the big ones, the ones I expect people to actually experience. I can name three more breakages, though these get progressively more obscure: * Nobody expect o.__annotations__ to ever be None (unless they assigned None to it themselves). If the attribute is set they expect its value to be a dict. * "del o.__annotations__" currently works on modules and classes if the attribute is set. "del fn.__annotations__" always works. * On modules and classes you can set o.__annotations__ to any Python value. (Functions already only permit you to set it to None or a dict.) I have no idea if anybody is depending on these behaviors. The lesson that years of Python core dev has taught me is: if Python exhibits a behavior, somebody out there depends on it, and you'll break their code if you change it. Or, expressed more succinctly, any change is a breaking change for somebody. So the question is, is the improvement this brings worth the breakage it also brings? In this case, I sure hope so!
For issue #2, if the default was `None`, then couldn't that be used as an implicit feature marker that you can't (incorrectly) rely on inheritance to percolate up the annotations of the superclass if the subclass happens to not define any annotations?
Currently Python never sets o.__annotations__ to None on any object. So yes, assuming the user doesn't set it to None themselves, this would be new behavior. If I understand your question correctly, yes, users could write new code that says if o.__annotations__ is None: # great, we're in Python 3.10+ and no annotation was set on o! .... else: .... Or they could just look at sys.version_info ;-) Thanks for your feedback, //arry/
I think your analysis of the problems is great. I don't worry about people deleting `__attributes__` (and those that do can switch to calling its .clear() method instead) but I worry about people not expecting to get None. So if you can avoid that in your solution that would be great. The easiest thing would be just to create an empty `__annotations__` for classes that have no annotated variables, and to hell with the cost. People who delete it are already living dangerously. (I noticed that `__slots__` is missing from your list. Maybe because it follows yet another pattern?) On Mon, Jan 11, 2021 at 4:07 PM Larry Hastings <larry@hastings.org> wrote:
At last, a nibble on the other fishing line! ;-)
On 1/11/21 1:47 PM, Brett Cannon wrote:
So the biggest potential breakages are code that:
1. Directly get the attribute from __dict__ 2. The fact that it would no longer be inherited
Am I missing anything else?
Those are the big ones, the ones I expect people to actually experience. I can name three more breakages, though these get progressively more obscure:
- Nobody expect o.__annotations__ to ever be None (unless they assigned None to it themselves). If the attribute is set they expect its value to be a dict. - "del o.__annotations__" currently works on modules and classes if the attribute is set. "del fn.__annotations__" always works. - On modules and classes you can set o.__annotations__ to any Python value. (Functions already only permit you to set it to None or a dict.)
I have no idea if anybody is depending on these behaviors. The lesson that years of Python core dev has taught me is: if Python exhibits a behavior, somebody out there depends on it, and you'll break their code if you change it. Or, expressed more succinctly, any change is a breaking change for somebody. So the question is, is the improvement this brings worth the breakage it also brings? In this case, I sure hope so!
For issue #2, if the default was `None`, then couldn't that be used as an implicit feature marker that you can't (incorrectly) rely on inheritance to percolate up the annotations of the superclass if the subclass happens to not define any annotations?
Currently Python never sets o.__annotations__ to None on any object. So yes, assuming the user doesn't set it to None themselves, this would be new behavior. If I understand your question correctly, yes, users could write new code that says
if o.__annotations__ is None: # great, we're in Python 3.10+ and no annotation was set on o! .... else: ....
Or they could just look at sys.version_info ;-)
Thanks for your feedback,
*/arry* _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WAWHRS6R... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 1/11/21 4:39 PM, Guido van Rossum wrote:
The easiest thing would be just to create an empty `__annotations__` for classes that have no annotated variables, and to hell with the cost.
I assume you'd keep the existing behavior where functions lazy-create an empty dict if they have no annotations too? That all would work fine and be consistent, but you'd probably have to set the empty __annotations__ dict on modules too. I've noticed that code that examines annotations tends to handle two classes of objects: "functions" and "not-functions". Modules also store their __annotations__ in their __dict__, so the same code path works fine for examining the annotations of both classes and modules.
(I noticed that `__slots__` is missing from your list. Maybe because it follows yet another pattern?)
I forgot about __slots__! Yup, it's optional, and you can even delete it, though after the class is defined I'm not sure how much difference that makes. Slots intelligently support inheritance, too. I always kind of wondered why annotations didn't support inheritance--if D is a subclass of C, why doesn't D.__annotations__ contain all C's annotations too? But we're way past reconsidering that behavior now. Cheers, //arry/
On Mon, Jan 11, 2021 at 5:21 PM Larry Hastings <larry@hastings.org> wrote:
On 1/11/21 4:39 PM, Guido van Rossum wrote:
The easiest thing would be just to create an empty `__annotations__` for classes that have no annotated variables, and to hell with the cost.
I assume you'd keep the existing behavior where functions lazy-create an empty dict if they have no annotations too?
Indeed -- that's trying to provide a uniform interface.
That all would work fine and be consistent, but you'd probably have to set the empty __annotations__ dict on modules too. I've noticed that code that examines annotations tends to handle two classes of objects: "functions" and "not-functions". Modules also store their __annotations__ in their __dict__, so the same code path works fine for examining the annotations of both classes and modules.
I'm not against giving all modules an empty `__annotations__` by default.
(I noticed that `__slots__` is missing from your list. Maybe because it follows yet another pattern?)
I forgot about __slots__! Yup, it's optional, and you can even delete it, though after the class is defined I'm not sure how much difference that makes.
Slots intelligently support inheritance, too. I always kind of wondered why annotations didn't support inheritance--if D is a subclass of C, why doesn't D.__annotations__ contain all C's annotations too? But we're way past reconsidering that behavior now.
Anyway, `__slots__` doesn't behave that way -- seems it behaves similar to `__annotations__`. ``` Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.
class A: __slots__ = ['a'] ... class B(A): __slots__ = ['b'] ... B.__slots__ ['b'] class X(A): pass ... X.__slots__ ['a'] A().__dict__ Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'A' object has no attribute '__dict__' X().__dict__ {}
--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 1/11/21 5:28 PM, Guido van Rossum wrote:
On Mon, Jan 11, 2021 at 5:21 PM Larry Hastings <larry@hastings.org <mailto:larry@hastings.org>> wrote:
Slots intelligently support inheritance, too. I always kind of wondered why annotations didn't support inheritance--if D is a subclass of C, why doesn't D.__annotations__ contain all C's annotations too? But we're way past reconsidering that behavior now.
Anyway, `__slots__` doesn't behave that way -- seems it behaves similar to `__annotations__`.
__slots__ itself doesn't behave that way, but subclasses do inherit the slots defined on their parent: class C: __slots__ = ['a'] class D(C): __slots__ = ['b'] d = D() d.a = 5 d.b = "foo" print(f"{d.a=} {d.b=}") prints d.a=5 d.b='foo' That's the inheritance behavior I was referring to. Cheers, //arry/
Oh, but the behavior of annotations in e.g. mypy is the same. They are cumulative. On Mon, Jan 11, 2021 at 17:42 Larry Hastings <larry@hastings.org> wrote:
On 1/11/21 5:28 PM, Guido van Rossum wrote:
On Mon, Jan 11, 2021 at 5:21 PM Larry Hastings <larry@hastings.org> wrote:
Slots intelligently support inheritance, too. I always kind of wondered why annotations didn't support inheritance--if D is a subclass of C, why doesn't D.__annotations__ contain all C's annotations too? But we're way past reconsidering that behavior now.
Anyway, `__slots__` doesn't behave that way -- seems it behaves similar to `__annotations__`.
__slots__ itself doesn't behave that way, but subclasses do inherit the slots defined on their parent:
class C: __slots__ = ['a']
class D(C): __slots__ = ['b']
d = D() d.a = 5 d.b = "foo" print(f"{d.a=} {d.b=}")
prints
d.a=5 d.b='foo'
That's the inheritance behavior I was referring to.
Cheers,
*/arry*
-- --Guido (mobile)
On 12/01/21 2:21 pm, Larry Hastings wrote:
Slots intelligently support inheritance, too.
Are you sure about that? My experiments suggest that it has the same problem as __annotations__: Python 3.8.2 (default, Mar 23 2020, 11:36:18) [Clang 8.1.0 (clang-802.0.42)] on darwin Type "help", "copyright", "credits" or "license" for more information.
class C: ... __slots__ = ['a', 'b'] ... class D(C): ... __slots__ = ['c', 'd'] ... class E(D): ... pass ... C.__slots__ ['a', 'b'] D.__slots__ ['c', 'd'] E.__slots__ ['c', 'd']
-- Greg
On 1/11/21 6:31 PM, Greg Ewing wrote:
On 12/01/21 2:21 pm, Larry Hastings wrote:
Slots intelligently support inheritance, too.
Are you sure about that? My experiments suggest that it has the same problem as __annotations__:
Python 3.8.2 (default, Mar 23 2020, 11:36:18) [Clang 8.1.0 (clang-802.0.42)] on darwin Type "help", "copyright", "credits" or "license" for more information.
class C: ... __slots__ = ['a', 'b'] ... class D(C): ... __slots__ = ['c', 'd'] ... class E(D): ... pass ... C.__slots__ ['a', 'b'] D.__slots__ ['c', 'd'] E.__slots__ ['c', 'd']
Guido said the same thing. I did say "Slots", not "__slots__", though. You'll find that your class D supports attributes "a", "b", "c", and "d", and that's the inheritance I was referring to. Cheers, //arry/
12.01.21 03:21, Larry Hastings пише:
I forgot about __slots__! Yup, it's optional, and you can even delete it, though after the class is defined I'm not sure how much difference that makes.
It affects pickling if __slotnames__ is not set yet. The latter is set when you pickle or copy an instance the first time.
On 12/01/21 6:22 am, Larry Hastings wrote: * The language will set __annotations__ to a dict if the object has
annotations, or None if it has no annotations.
That sounds inconvenient -- it means that any code referencing __annotations__ has to guard against the possibility of it being None. If we're changing things, I'm wondering if the best thing would be to introduce an annotations() function as the new best practice for getting an object's annotations. It would know how to handle all the type-specific pecularities, and could take care of things such as manufacturing an empty dict if the object doesn't have any annotations. -- Greg
Isn't that just typing.get_type_hints()? On Mon, Jan 11, 2021 at 5:11 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 12/01/21 6:22 am, Larry Hastings wrote:
* The language will set __annotations__ to a dict if the object has
annotations, or None if it has no annotations.
That sounds inconvenient -- it means that any code referencing __annotations__ has to guard against the possibility of it being None.
If we're changing things, I'm wondering if the best thing would be to introduce an annotations() function as the new best practice for getting an object's annotations. It would know how to handle all the type-specific pecularities, and could take care of things such as manufacturing an empty dict if the object doesn't have any annotations.
-- Greg _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DXQJ2V32... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 1/11/21 5:05 PM, Greg Ewing wrote:
On 12/01/21 6:22 am, Larry Hastings wrote:
* The language will set __annotations__ to a dict if the object has
annotations, or None if it has no annotations.
That sounds inconvenient -- it means that any code referencing __annotations__ has to guard against the possibility of it being None.
It was a balancing act. Using an 64-byte empty dict per object with no defined annotations seems so wasteful. And anything short of an empty dict, you'd have to guard against. Current code already has to guard against "__annotations__ aren't set" anyway, so I figured the cost of migrating to checking a different condition would be small. And None is so cheap, and the guard is so easy: if o.__annotations__:
If we're changing things, I'm wondering if the best thing would be to introduce an annotations() function as the new best practice for getting an object's annotations. It would know how to handle all the type-specific pecularities, and could take care of things such as manufacturing an empty dict if the object doesn't have any annotations.
I guess I'm marginally against this, just because it seems like a needless change. We don't need the flexibility of a function with optional parameters and such, and with a data descriptor we can already put code behind __annotations__ (as I have already done). Plus, the function should probably cache its result--you wouldn't want code that called it ten times to generate ten fresh dicts, would you?--and already we're most of the way to what I proposed in PEP 649. Cheers, //arry/
On Tue, Jan 12, 2021 at 12:56 PM Larry Hastings <larry@hastings.org> wrote:
It was a balancing act. Using an 64-byte empty dict per object with no defined annotations seems so wasteful. And anything short of an empty dict, you'd have to guard against. Current code already has to guard against "__annotations__ aren't set" anyway, so I figured the cost of migrating to checking a different condition would be small. And None is so cheap, and the guard is so easy:
if o.__annotations__:
Does it have to be mutable? If not, maybe there could be a singleton "immutable empty dict-like object", in the same way that an empty tuple can be put anywhere that expects a sequence. That'd be as cheap as None (modulo a once-per-interpreter cost for the additional static object). ChrisA
On 1/11/21 6:09 PM, Chris Angelico wrote:
On Tue, Jan 12, 2021 at 12:56 PM Larry Hastings <larry@hastings.org> wrote:
It was a balancing act. Using an 64-byte empty dict per object with no defined annotations seems so wasteful. And anything short of an empty dict, you'd have to guard against. Current code already has to guard against "__annotations__ aren't set" anyway, so I figured the cost of migrating to checking a different condition would be small. And None is so cheap, and the guard is so easy:
if o.__annotations__:
Does it have to be mutable? If not, maybe there could be a singleton "immutable empty dict-like object", in the same way that an empty tuple can be put anywhere that expects a sequence. That'd be as cheap as None (modulo a once-per-interpreter cost for the additional static object).
Historically, annotations dicts are mutable. I don't know how often people mutate them, but I would assume it's uncommon. So technically this would be a breaking change. But it does seem low-risk. Cheers, //arry/
On 12/01/21 2:46 pm, Larry Hastings wrote:
Using an 64-byte empty dict per object with no defined annotations seems so wasteful. And anything short of an empty dict, you'd have to guard against.
If __annotations__ were to return a read-only mapping object instead of a dict, the same empty object could be used for all annotationless objects. -- Greg
On Mon, Jan 11, 2021 at 5:57 PM Larry Hastings <larry@hastings.org> wrote:
On 1/11/21 5:05 PM, Greg Ewing wrote:
On 12/01/21 6:22 am, Larry Hastings wrote:
* The language will set __annotations__ to a dict if the object has
annotations, or None if it has no annotations.
That sounds inconvenient -- it means that any code referencing __annotations__ has to guard against the possibility of it being None.
It was a balancing act. Using an 64-byte empty dict per object with no defined annotations seems so wasteful. And anything short of an empty dict, you'd have to guard against. Current code already has to guard against "__annotations__ aren't set" anyway, so I figured the cost of migrating to checking a different condition would be small. And None is so cheap, and the guard is so easy:
if o.__annotations__:
But if things could fall through in the default case such that use in a `for` loop, it is nice as Guido pointed out. The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much. I bet you save more memory running with -OO than what this will cost users in memory. And I know you were somewhat joking when you mentioned using sys.version_info, but since this would be behind a __future__ import it means the version check just means you then need to *potentially* worry about the semantic shift (until the change becomes permanent). It seems the changes are all still easy enough to have fallthrough and semantic checks that it won't be much of a problem. I think it really means people need to rely on typing.get_type_hints() more than they may be doing right now.
If we're changing things, I'm wondering if the best thing would be to introduce an annotations() function as the new best practice for getting an object's annotations. It would know how to handle all the type-specific pecularities, and could take care of things such as manufacturing an empty dict if the object doesn't have any annotations.
I guess I'm marginally against this, just because it seems like a needless change. We don't need the flexibility of a function with optional parameters and such, and with a data descriptor we can already put code behind __annotations__ (as I have already done). Plus, the function should probably cache its result--you wouldn't want code that called it ten times to generate ten fresh dicts, would you?--and already we're most of the way to what I proposed in PEP 649.
I also don't think introspection on annotations is common enough to warrant a built-in function; this stuff is meant for tools, not for the average developer to be dynamically playing with. As Guido pointed out, typing.get_type_hints() already covers this.
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.
Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that. If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
And I know you were somewhat joking when you mentioned using sys.version_info, but since this would be behind a __future__ import
Would it? My original proposal would make breaking changes to how you examine __annotations__. Let's say we put those behind a from __future__ import. Now we're gonna write library code that examines annotations. A user passes in a class and asks us to examine its annotations. The old semantics might be active on it, or the new ones. How do we know which set of semantics we need to use? It occurs to me that you could take kls.__module__, pull out the module from sys.modules, then look inside to see if it contains the correct "future" object imported from the __future__ module. Is that an approach we would suggest to our users? Also, very little code ever examines annotations; most code with annotations merely defines them. So I suspect most annotations users wouldn't care either way--which also means a "from __future__ import" that changes the semantics of examining or modifying annotations isn't going to see a lot of uptake, because it doesn't really affect them. The change in semantics only affects people whose code examines annotations, which I suspect is very few. So I wasn't really joking when I proposed making these changes without a from __future__ import, and suggested users use a version check. The library code would know based on the Python version number which semantics were active, no peeking in modules to find future object. They could literally write what I suggested: if you know you're running python 3.10 or higher: examine using the new semantics else: examine using the old semantics I realize that's a pretty aggressive approach, which is why I prefaced it with "if I could wave my magic wand". But if we're going to make breaking changes, then whatever we do, it's going to break some people's code until it gets updated to cope with the new semantics. In that light this approach seemed reasonable. But really this is why I started this thread in the first place. My idea of what's reasonable is probably all out of whack. So I wanted to start the conversation, to get feedback on how much breakage is allowable and how best to mitigate it. If it wasn't a controversial change, then we wouldn't need to talk about it! And finally: if we really do set a default of an empty dict on classes and modules, then my other in-theory breaking changes: * you can't delete __annotations__ * you can only set __annotations__ to a dict or None (this is already true of functions, but not of classes or modules) will, I expect, in practice breaking exactly zero code. Who deletes __annotations__? Who ever sets __annotations__ to something besides a dict? So if the practical breakage is zero, why bother gating it with "from __future__ import" at all?
I think it really means people need to rely on typing.get_type_hints() more than they may be doing right now.
What I find frustrating about that answer--and part of what motivated me to work on this in the first place--is that typing.get_type_hints() requires your annotations to be type hints. All type hints are annotations, but not all annotations are type hints, and it's entirely plausible for users to have reasonable uses for non-type-hint annotations that typing.get_type_hints() wouldn't like. The two things typing.get_type_hints() does, that I know of, that can impede such non-type-hint annotations are: * It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)". * It regards all string annotations as "forward references", which means they get eval()'d and the result returned as the annotation. typing.get_type_hints() doesn't catch any exceptions here, so if the eval fails, typing.get_type_hints() fails and you can't use it to examine your annotations. PEP 484 "explicitly does NOT prevent other uses of annotations". But if you force everyone to use typing.get_type_hints() to examine their annotations, then you have de facto prevented any use of annotations that isn't compatible with type hints. Cheers, //arry/
On Tue, Jan 12, 2021 at 6:35 PM Larry Hastings <larry@hastings.org> wrote:
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.
Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
I would like that very much. And the exception for functions is especially helpful.
And I know you were somewhat joking when you mentioned using sys.version_info, but since this would be behind a __future__ import
Would it?
My original proposal would make breaking changes to how you examine __annotations__. Let's say we put those behind a from __future__ import. Now we're gonna write library code that examines annotations. A user passes in a class and asks us to examine its annotations. The old semantics might be active on it, or the new ones. How do we know which set of semantics we need to use?
It occurs to me that you could take kls.__module__, pull out the module from sys.modules, then look inside to see if it contains the correct "future" object imported from the __future__ module. Is that an approach we would suggest to our users?
You're kidding, right? Also, very little code ever examines annotations; most code with
annotations merely defines them. So I suspect most annotations users wouldn't care either way--which also means a "from __future__ import" that changes the semantics of examining or modifying annotations isn't going to see a lot of uptake, because it doesn't really affect them. The change in semantics only affects people whose code examines annotations, which I suspect is very few.
I agree, but they're pretty vocal -- the breakage in get_type_hints() due to the scope issue in 3.10 (which isn't even in beta) has drawn plenty of complaints. Also, dataclasses (which I have to assume is fairly popular :-) introspects `__annotations__`, and even mutates and sets it. So I wasn't really joking when I proposed making these changes without a
from __future__ import, and suggested users use a version check. The library code would know based on the Python version number which semantics were active, no peeking in modules to find future object. They could literally write what I suggested:
if you know you're running python 3.10 or higher: examine using the new semantics else: examine using the old semantics
I realize that's a pretty aggressive approach, which is why I prefaced it with "if I could wave my magic wand". But if we're going to make breaking changes, then whatever we do, it's going to break some people's code until it gets updated to cope with the new semantics. In that light this approach seemed reasonable.
Is there a way that such code could be written without a version check? E.g. for modules we could recommend `getattr(m, "__attributes__", None) or {}`, and that would work in earlier versions too. I'm not sure what would work for classes, since most code will want to combine the annotations for all classes in the MRO, and the way to do that would change -- before 3.10, you *must* use `cls.__dict__.get("__attributes__")` whereas for 3.10+ you *must* use `cls.__attributes__`. Note, for a moment I thought that for modules we don't need to evaluate annotations lazily (I know that's your other PEP/thread, but still, it seems related). But we do, because there's an idiom where people write ``` from __future__ import annotations import typing if typing.TYPE_CHECKING: from somewhere import Class a: Class ``` Here introspecting the annotations would fail, but clearly the intention was to use them purely for static type checking, so the user presumably doesn't care. (But does that mean that if a single annotation cannot be evaluated, the entire annotations dict becomes inaccessible? That's a general weakness of the PEP 649 scheme, right?) But really this is why I started this thread in the first place. My idea
of what's reasonable is probably all out of whack. So I wanted to start the conversation, to get feedback on how much breakage is allowable and how best to mitigate it. If it wasn't a controversial change, then we wouldn't need to talk about it!
And finally: if we really do set a default of an empty dict on classes and modules, then my other in-theory breaking changes:
- you can't delete __annotations__ - you can only set __annotations__ to a dict or None (this is already true of functions, but not of classes or modules)
will, I expect, in practice breaking exactly zero code. Who deletes __annotations__? Who ever sets __annotations__ to something besides a dict? So if the practical breakage is zero, why bother gating it with "from __future__ import" at all?
Maybe for the benefit of users who rely on some specific library that gets the annotations out of a class dict. The library could document "don't use that future annotations because then your annotations won't work" which would give that library a few releases time to come up with an alternative strategy.
I think it really means people need to rely on typing.get_type_hints() more than they may be doing right now.
What I find frustrating about that answer--and part of what motivated me to work on this in the first place--is that typing.get_type_hints() requires your annotations to be type hints. All type hints are annotations, but not all annotations are type hints, and it's entirely plausible for users to have reasonable uses for non-type-hint annotations that typing.get_type_hints() wouldn't like.
The two things typing.get_type_hints() does, that I know of, that can impede such non-type-hint annotations are:
- It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)". - It regards all string annotations as "forward references", which means they get eval()'d and the result returned as the annotation. typing.get_type_hints() doesn't catch any exceptions here, so if the eval fails, typing.get_type_hints() fails and you can't use it to examine your annotations.
PEP 484 "explicitly does NOT prevent other uses of annotations". But if you force everyone to use typing.get_type_hints() to examine their annotations, then you have de facto prevented any use of annotations that isn't compatible with type hints.
I suspect that the most common use of annotation introspection is to implement some kind of runtime type checking scheme (there are many of those, I think even JSON schema verifiers based on typing.TypedDict) and those users would presumably be fine with get_type_hints(). Note that PEP 593 introduces a way to attach arbitrary extra data to an annotation, e.g. ``` UnsignedShort = Annotated[int, struct2.ctype('H')] name: Annotated[str, struct2.ctype("<10s")] ``` -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
As long as I'm gravedigging old conversations...! Remember this one, also from January of this year? Here's a link to the thread in the c.l.p-d Mailman archive. The first message in the thread is a good overview of the problem: https://mail.python.org/archives/list/python-dev@python.org/thread/AWKVI3NRC... Here's kind of where we left it: On 1/12/21 7:48 PM, Guido van Rossum wrote:
On Tue, Jan 12, 2021 at 6:35 PM Larry Hastings <larry@hastings.org <mailto:larry@hastings.org>> wrote:
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.
Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
I would like that very much. And the exception for functions is especially helpful.
First of all, I've proposed a function that should also help a lot: https://bugs.python.org/issue43817 The function will be called inspect.get_annotations(o). It's like typing.get_type_hints(o) except less opinionated. This function would become the best practice for everybody who wants annotations**, like so: import inspect if hasattr(inspect, "get_annotations"): how_i_get_annotations = inspect.get_annotations else: # do whatever it was I did in Python 3.9 and before... ** Everybody who specifically wants /type hints/ should instead call typing.get_type_hints(), and good news!, /that/ function has existed for several versions now. So they probably already /do/ call it. I'd still like to add a default empty __annotations__ dict to all classes and modules for Python 3.10, for everybody who doesn't switch to using this as-yet-unwritten inspect.get_annotations() function. The other changes I propose in that thread (e.g. deleting __annotations__ always throws TypeError) would be nice, but honestly they aren't high priority. They can wait until after Python 3.10. Just these these two things (inspect.get_annotations() and always populating __annotations__ for classes and modules) would go a long way to cleaning up how people examine annotations. Long-term, hopefully we can fold the desirable behaviors of inspect.get_annotations() into the language itself, at which point we could probably deprecate the function. That wouldn't be until a long time from now of course. Does this need a lot of discussion, or can I just go ahead with the bpo and PR and such? I mean, I'd JFDI, as Barry always encourages, but given how much debate we've had over annotations in the last two weeks, I figured I should first bring it up here. Happy two-weeks'-notice, //arry/ p.s. I completely forgot about this until just now--sorry. At least I remembered before Python 3.10b1!
This is happening, right? Adding a default `__annotations = {}` to modules and classes. (Though https://bugs.python.org/issue43901 seems temporarily stuck.) On Mon, Apr 19, 2021 at 10:10 PM Larry Hastings <larry@hastings.org> wrote:
As long as I'm gravedigging old conversations...! Remember this one, also from January of this year? Here's a link to the thread in the c.l.p-d Mailman archive. The first message in the thread is a good overview of the problem:
https://mail.python.org/archives/list/python-dev@python.org/thread/AWKVI3NRC...
Here's kind of where we left it:
On 1/12/21 7:48 PM, Guido van Rossum wrote:
On Tue, Jan 12, 2021 at 6:35 PM Larry Hastings <larry@hastings.org> wrote:
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.
Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
I would like that very much. And the exception for functions is especially helpful.
First of all, I've proposed a function that should also help a lot:
https://bugs.python.org/issue43817
The function will be called inspect.get_annotations(o). It's like typing.get_type_hints(o) except less opinionated. This function would become the best practice for everybody who wants annotations**, like so:
import inspect if hasattr(inspect, "get_annotations"): how_i_get_annotations = inspect.get_annotations else: # do whatever it was I did in Python 3.9 and before...
** Everybody who specifically wants *type hints* should instead call typing.get_type_hints(), and good news!, *that* function has existed for several versions now. So they probably already *do* call it.
I'd still like to add a default empty __annotations__ dict to all classes and modules for Python 3.10, for everybody who doesn't switch to using this as-yet-unwritten inspect.get_annotations() function. The other changes I propose in that thread (e.g. deleting __annotations__ always throws TypeError) would be nice, but honestly they aren't high priority. They can wait until after Python 3.10. Just these these two things (inspect.get_annotations() and always populating __annotations__ for classes and modules) would go a long way to cleaning up how people examine annotations.
Long-term, hopefully we can fold the desirable behaviors of inspect.get_annotations() into the language itself, at which point we could probably deprecate the function. That wouldn't be until a long time from now of course.
Does this need a lot of discussion, or can I just go ahead with the bpo and PR and such? I mean, I'd JFDI, as Barry always encourages, but given how much debate we've had over annotations in the last two weeks, I figured I should first bring it up here.
Happy two-weeks'-notice,
*/arry*
p.s. I completely forgot about this until just now--sorry. At least I remembered before Python 3.10b1! _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/J4LZEIZT... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 4/23/21 9:26 PM, Guido van Rossum wrote:
This is happening, right? Adding a default `__annotations = {}` to modules and classes. (Though https://bugs.python.org/issue43901 <https://bugs.python.org/issue43901> seems temporarily stuck.)
It's happening, and I wouldn't say it's stuck. I'm actively working on it--currently puzzling my way through some wild unit test failures. I expect to ship my first PR over the weekend. Cheers, //arry/
I've hit a conceptual snag in this. What I thought I needed to do: set __annotations__= {} in the module dict, and set __annotations__= {} in user class dicts. The latter was more delicate than the former but I think I figured out a good spot for both. I have this much working, including fixing the test suite. But now I realize (*head-slap* here): if *every* class is going to have annotations, does that mean builtin classes too? StructSequence classes like float? Bare-metal type objects like complex? Heck, what about type itself?! My knee-jerk initial response: yes, those too. Which means adding a new getsetdef to the type object. But that's slightly complicated. The point of doing this is to preserve the existing best-practice of peeking in the class dict for __annotations__, to avoid inheriting it. If I'm to preserve that, the get/set for __annotations__ on a type object would need to get/set it on tp_dict if tp_dict was not NULL, and use internal storage somewhere if there is no tp_dict. It's worth noticing that builtin types don't currently have __annotations__ set, and you can't set them. (Or, at least, float, complex, and type didn't have them set, and wouldn't let me set annotations on them.) So presumably people using current best practice--peek in the class dict--aren't having problems. So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I /just/ make the change for user classes and leave builtin classes untouched? What do you think? Cheers, //arry/
On Sat, 24 Apr 2021, 5:53 pm Larry Hastings, <larry@hastings.org> wrote:
So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I *just* make the change for user classes and leave builtin classes untouched? What do you think?
I'd suggest kicking the can down the road: leave builtin classes alone for now, but file a ticket to reconsider the question for 3.11. In the meantime, inspect.get_annotations can help hide the discrepancy. Cheers, Nick.
Cheers,
*/arry* _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IK7IWUCT... Code of Conduct: http://python.org/psf/codeofconduct/
On 4/24/21 7:11 AM, Nick Coghlan wrote:
On Sat, 24 Apr 2021, 5:53 pm Larry Hastings, <larry@hastings.org <mailto:larry@hastings.org>> wrote:
So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I /just/ make the change for user classes and leave builtin classes untouched? What do you think?
I'd suggest kicking the can down the road: leave builtin classes alone for now, but file a ticket to reconsider the question for 3.11.
In the meantime, inspect.get_annotations can help hide the discrepancy.
The good news: inspect.get_annotations() absolutely can handle it. inspect.get_annotations() is so paranoid about examining the object you pass in, I suspect you could pass in an old boot and it would pull out the annotations--if it had any. Cheers, //arry/
On 24. 04. 21 9:52, Larry Hastings wrote:
I've hit a conceptual snag in this.
What I thought I needed to do: set __annotations__= {} in the module dict, and set __annotations__= {} in user class dicts. The latter was more delicate than the former but I think I figured out a good spot for both. I have this much working, including fixing the test suite.
But now I realize (*head-slap* here): if *every* class is going to have annotations, does that mean builtin classes too? StructSequence classes like float? Bare-metal type objects like complex? Heck, what about type itself?!
My knee-jerk initial response: yes, those too. Which means adding a new getsetdef to the type object. But that's slightly complicated. The point of doing this is to preserve the existing best-practice of peeking in the class dict for __annotations__, to avoid inheriting it. If I'm to preserve that, the get/set for __annotations__ on a type object would need to get/set it on tp_dict if tp_dict was not NULL, and use internal storage somewhere if there is no tp_dict.
It's worth noticing that builtin types don't currently have __annotations__ set, and you can't set them. (Or, at least, float, complex, and type didn't have them set, and wouldn't let me set annotations on them.) So presumably people using current best practice--peek in the class dict--aren't having problems.
So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I /just/ make the change for user classes and leave builtin classes untouched? What do you think?
Beware of adding mutable state to bulit-in (C static) type objects: these are shared across interpreters, so changing them can “pollute” unwanted contexts. This has been so for a long time [0]. There are some subinterpreter efforts underway that might eventually lead to making __annotations__ on static types easier to add, but while you're certainly welcome to explore the neighboring rabbit hole as well, I do think you're going in too far for now :) [0] https://mail.python.org/archives/list/python-dev@python.org/message/KLCZIA6F...
On 4/24/21 8:09 AM, Petr Viktorin wrote:
On 24. 04. 21 9:52, Larry Hastings wrote:
I've hit a conceptual snag in this.
What I thought I needed to do: set __annotations__= {} in the module dict, and set __annotations__= {} in user class dicts. The latter was more delicate than the former but I think I figured out a good spot for both. I have this much working, including fixing the test suite.
But now I realize (*head-slap* here): if *every* class is going to have annotations, does that mean builtin classes too? StructSequence classes like float? Bare-metal type objects like complex? Heck, what about type itself?!
My knee-jerk initial response: yes, those too. Which means adding a new getsetdef to the type object. But that's slightly complicated. The point of doing this is to preserve the existing best-practice of peeking in the class dict for __annotations__, to avoid inheriting it. If I'm to preserve that, the get/set for __annotations__ on a type object would need to get/set it on tp_dict if tp_dict was not NULL, and use internal storage somewhere if there is no tp_dict.
It's worth noticing that builtin types don't currently have __annotations__ set, and you can't set them. (Or, at least, float, complex, and type didn't have them set, and wouldn't let me set annotations on them.) So presumably people using current best practice--peek in the class dict--aren't having problems.
So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I /just/ make the change for user classes and leave builtin classes untouched? What do you think?
Beware of adding mutable state to bulit-in (C static) type objects: these are shared across interpreters, so changing them can “pollute” unwanted contexts.
This has been so for a long time [0]. There are some subinterpreter efforts underway that might eventually lead to making __annotations__ on static types easier to add, but while you're certainly welcome to explore the neighboring rabbit hole as well, I do think you're going in too far for now :)
[0] https://mail.python.org/archives/list/python-dev@python.org/message/KLCZIA6F...
That's a good point! The sort of detail one forgets in the rush of the moment. Given that the lack of annotations on builtin types already isn't a problem, and given this wrinkle, and generally given the "naw you don't have to" vibe I got from you and Nick (and the lack of "yup you gotta" I got from anybody else), I'm gonna go with not polluting the builtin types for now. This is not to say that, in the fullness of time, those objects should never have annotations. Even in the three random types I picked in my example, there's at least one example: float.imag is a data member and might theoretically be annotated. But we can certainly kick this can down the road too. Maybe by the time we get around to it, we'll have a read-only dictionary we can use for the purpose. Cheers, //arry/
On Sat, Apr 24, 2021 at 2:25 PM Larry Hastings <larry@hastings.org> wrote:
This is not to say that, in the fullness of time, those objects should never have annotations. Even in the three random types I picked in my example, there's at least one example: float.imag is a data member and might theoretically be annotated. But we can certainly kick this can down the road too. Maybe by the time we get around to it, we'll have a read-only dictionary we can use for the purpose.
We already have one -- the mappingproxy type you get back from a class' __dict__ attribute. (Though in the fullness of times, type objects presumably won't be shared between multiple interpreters, which solves the problem in a different way.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Tue, Jan 12, 2021 at 6:31 PM Larry Hastings <larry@hastings.org> wrote:
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.
Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
Great!
And I know you were somewhat joking when you mentioned using sys.version_info, but since this would be behind a __future__ import
Would it?
I thought you had proposed that initially, but it appears I mixed this with your PEP email. 😅 Sorry about that!
My original proposal would make breaking changes to how you examine __annotations__. Let's say we put those behind a from __future__ import. Now we're gonna write library code that examines annotations. A user passes in a class and asks us to examine its annotations. The old semantics might be active on it, or the new ones. How do we know which set of semantics we need to use?
It occurs to me that you could take kls.__module__, pull out the module from sys.modules, then look inside to see if it contains the correct "future" object imported from the __future__ module. Is that an approach we would suggest to our users?
Also, very little code ever examines annotations; most code with annotations merely defines them. So I suspect most annotations users wouldn't care either way--which also means a "from __future__ import" that changes the semantics of examining or modifying annotations isn't going to see a lot of uptake, because it doesn't really affect them. The change in semantics only affects people whose code examines annotations, which I suspect is very few.
So I wasn't really joking when I proposed making these changes without a from __future__ import, and suggested users use a version check. The library code would know based on the Python version number which semantics were active, no peeking in modules to find future object. They could literally write what I suggested:
if you know you're running python 3.10 or higher: examine using the new semantics else: examine using the old semantics
I realize that's a pretty aggressive approach, which is why I prefaced it with "if I could wave my magic wand". But if we're going to make breaking changes, then whatever we do, it's going to break some people's code until it gets updated to cope with the new semantics. In that light this approach seemed reasonable.
But really this is why I started this thread in the first place. My idea of what's reasonable is probably all out of whack. So I wanted to start the conversation, to get feedback on how much breakage is allowable and how best to mitigate it. If it wasn't a controversial change, then we wouldn't need to talk about it!
And finally: if we really do set a default of an empty dict on classes and modules, then my other in-theory breaking changes:
- you can't delete __annotations__ - you can only set __annotations__ to a dict or None (this is already true of functions, but not of classes or modules)
will, I expect, in practice breaking exactly zero code. Who deletes __annotations__? Who ever sets __annotations__ to something besides a dict? So if the practical breakage is zero, why bother gating it with "from __future__ import" at all?
I think it really means people need to rely on typing.get_type_hints() more than they may be doing right now.
What I find frustrating about that answer--and part of what motivated me to work on this in the first place--is that typing.get_type_hints() requires your annotations to be type hints. All type hints are annotations, but not all annotations are type hints, and it's entirely plausible for users to have reasonable uses for non-type-hint annotations that typing.get_type_hints() wouldn't like.
You and I have talked about this extensively, so I'm aware. 😉
The two things typing.get_type_hints() does, that I know of, that can impede such non-type-hint annotations are:
- It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)".
Huh, I wasn't aware of that.
-Brett
- - It regards all string annotations as "forward references", which means they get eval()'d and the result returned as the annotation. typing.get_type_hints() doesn't catch any exceptions here, so if the eval fails, typing.get_type_hints() fails and you can't use it to examine your annotations.
PEP 484 "explicitly does NOT prevent other uses of annotations". But if you force everyone to use typing.get_type_hints() to examine their annotations, then you have de facto prevented any use of annotations that isn't compatible with type hints.
Cheers,
*/arry*
On Tue, Jan 12, 2021 at 8:00 PM Brett Cannon <brett@python.org> wrote:
- It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)".
Huh, I wasn't aware of that.
This has tripped up many people. Maybe we should just bite the bullet and change this? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Tue, 2021-01-12 at 20:09 -0800, Guido van Rossum wrote:
On Tue, Jan 12, 2021 at 8:00 PM Brett Cannon <brett@python.org> wrote:
* It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)".
Huh, I wasn't aware of that.
This has tripped up many people. Maybe we should just bite the bullet and change this?
+1, FWIW.
On 13/01/21 3:31 pm, Larry Hastings wrote:
Let's say we put those behind a from __future__ import. Now we're gonna write library code that examines annotations. A user passes in a class and asks us to examine its annotations. The old semantics might be active on it, or the new ones. How do we know which set of semantics we need to use?
This implies that __future__ is the wrong mechanism to use. It's only appropriate when the changes it triggers are confined to the module that uses it, which is not the case here. -- Greg
On Wed, 13 Jan 2021, 12:35 pm Larry Hastings, <larry@hastings.org> wrote:
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.
Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
Could you get the best of both worlds by making __annotations__ an auto-populating descriptor on "type", the way it is on functions? Continue to add a non-empty annotations dict to the class dict eagerly, but only add the empty dict when "cls.__annotations__" is accessed. Then your co_annotations PEP would only be changing the way the non-empty case was handled, rather than introducing the descriptor in the first place. Cheers, Nick.
On 1/16/21 8:41 AM, Nick Coghlan wrote:
Could you get the best of both worlds by making __annotations__ an auto-populating descriptor on "type", the way it is on functions?
Continue to add a non-empty annotations dict to the class dict eagerly, but only add the empty dict when "cls.__annotations__" is accessed.
I think that'll work though it's a little imprecise. Consider the best practice for getting class annotations, example here from Lib/dataclasses.py: cls_annotations = cls.__dict__.get('__annotations__', {}) What happens when that current best practice code meets your proposed "lazy-populate the empty dict" approach? * If a class has annotations set, cls.__dict__['__annotations__'] will be set, so the code works fine. * If a class doesn't have annotations set, then cls.__dict__['__annotations__'] won't be set yet. So people peering in cls.__dict__['__annotations__'] will get the right /answer/, that no annotations are set. But they'll see the wrong /specifics/: they'll think annotations are unset, when in fact it has an empty dict as its value. So the code will continue to work, even though it's arguably a little misguided. If anybody distinguished between "annotations are unset" and "annotations are set to an empty dict", that code would fail, but I expect nobody ever does that. Two notes about this idea. First, I think most people who use this best-practices code above use it for modules as well as classes. (They have two code paths: one for functions, the other for not-functions.) But everything I said above is true for both classes and modules. Second, I think this is only sensible if, at the same time, we make it illegal to delete cls.__annotations__. If we lazy-populate the empty dict, and a user deletes cls.__annotations__, and we don't remember some extra state, we'd just re-"lazy" create the empty dict the next time they asked for it. Which is actually what functions do, just lazy-repopulate the empty annotations dict every time, and I'm not keen to bring those semantics to classes and modules. Cheers, //arry/
On 17/01/21 12:31 pm, Larry Hastings wrote:
Consider the best practice for getting class annotations, example here from Lib/dataclasses.py:
cls_annotations = cls.__dict__.get('__annotations__', {})
Isn't that going to get broken anyway? It won't trigger the calling of __co_annotations__. -- Greg
On 1/16/21 4:09 PM, Greg Ewing wrote:
On 17/01/21 12:31 pm, Larry Hastings wrote:
Consider the best practice for getting class annotations, example here from Lib/dataclasses.py:
cls_annotations = cls.__dict__.get('__annotations__', {})
Isn't that going to get broken anyway? It won't trigger the calling of __co_annotations__.
I proposed these as two separate conversations, because I wanted to clean up the semantics of annotations whether or not PEP 649 was accepted. But, yes, if PEP 649 is accepted (in some form), this current-best-practice would no longer work, and the new best practice would likely become much more complicated. Cheers, //arry/
On Sat, Jan 16, 2021 at 3:32 PM Larry Hastings <larry@hastings.org> wrote:
[...] If anybody distinguished between "annotations are unset" and "annotations are set to an empty dict", that code would fail, but I expect nobody ever does that.
I agree, since I can't think of differing semantics. Given that `__annotations__` is filled from annotated class variables, the only reason someone might care about the difference would be if they are aware of code that manually *sets* `X.__annotations__ = {}` and they have some kind of shared understanding that that means something special. I find that highly unlikely, and frankly, if someone needs such a shared understanding, let them just pick a unique key to set. I do worry about the best practice getting worse if your PEP 649 is accepted. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 1/18/21 12:16 PM, Guido van Rossum wrote:
I do worry about the best practice getting worse if your PEP 649 is accepted.
A good part of what motivated me to start this second thread ("Let's Fix ...") was how much worse best practice would become if PEP 649 is accepted. But if we accept PEP 649, /and/ take steps to fix the semantics of annotations, I think the resulting best practice will be excellent in the long-run. Let's assume for a minute that PEP 649 is accepted more-or-less like it is now. (The name resolution approach is clearly going to change but that won't affect the discussion here.) And let's assume that we also change the semantics so annotations are always defined (you can't delete them) and they're guaranteed to be either a dict or None. (Permitting __annotations__ to be None isn't settled yet, but it's most similar to current semantics, so let's just assume it for now.) Because the current semantics are kind of a mess, most people who examine annotations already have a function that gets the annotations for them. Given that, I really do think the best approach is to gate the code on version 3.10, like I've described before: if python version >= 3.10: def get_annotations(o): return o.__annotations__ else: def get_annotations(o): if isinstance(o, (type, types.ModuleType)): return o.__dict__.get("__annotations__", None) else: return o.__annotations__ This assumes returning None is fine. If it had to always return a valid dict, I'd add "or {}" to the end of every return statement. Given that it already has to be a function, I think this approach is readable and performant. And, at some future time when the code can drop support for Python < 3.10, we can throw away the if statement and the whole else block, keeping just the one-line function. At which point maybe we'd refactor away the function and just use "o.__annotations__" everywhere. I concede that, in the short term, now we've got nine lines and two if statements to do something that /should/ be relatively straightforward--accessing the annotations on an object. But that's where we find ourselves. Current best practice is kind of a mess, and unfortunately PEP 649 breaks current best practice anyway. My goal is to fix the semantics so that long-term best practice is sensible, easy, and obvious. Cheers, //arry/
Hm. It's unfortunate that this would break code using what is *currently* the best practice. The saving grace seems that for *many* use cases the best practice is to call typing.get_type_hints(). This is particularly useful for classes because it includes annotations from base classes. Also, for functions and modules I would recommend `getattr(o, "__annotations__", None)` (perhaps with `or {}` added). I would also honestly discount what dataclasses.py and typing.py have to do. But what do 3rd party packages do when they don't want to use get_type_hints() and they want to get it right for classes? That would give an indication of how serious we should take breaking current best practice. On Mon, Jan 18, 2021 at 1:10 PM Larry Hastings <larry@hastings.org> wrote:
On 1/18/21 12:16 PM, Guido van Rossum wrote:
I do worry about the best practice getting worse if your PEP 649 is accepted.
A good part of what motivated me to start this second thread ("Let's Fix ...") was how much worse best practice would become if PEP 649 is accepted. But if we accept PEP 649, *and* take steps to fix the semantics of annotations, I think the resulting best practice will be excellent in the long-run.
Let's assume for a minute that PEP 649 is accepted more-or-less like it is now. (The name resolution approach is clearly going to change but that won't affect the discussion here.) And let's assume that we also change the semantics so annotations are always defined (you can't delete them) and they're guaranteed to be either a dict or None. (Permitting __annotations__ to be None isn't settled yet, but it's most similar to current semantics, so let's just assume it for now.)
Because the current semantics are kind of a mess, most people who examine annotations already have a function that gets the annotations for them. Given that, I really do think the best approach is to gate the code on version 3.10, like I've described before:
if python version >= 3.10: def get_annotations(o): return o.__annotations__ else: def get_annotations(o): if isinstance(o, (type, types.ModuleType)): return o.__dict__.get("__annotations__", None) else: return o.__annotations__
This assumes returning None is fine. If it had to always return a valid dict, I'd add "or {}" to the end of every return statement.
Given that it already has to be a function, I think this approach is readable and performant. And, at some future time when the code can drop support for Python < 3.10, we can throw away the if statement and the whole else block, keeping just the one-line function. At which point maybe we'd refactor away the function and just use "o.__annotations__" everywhere.
I concede that, in the short term, now we've got nine lines and two if statements to do something that *should* be relatively straightforward--accessing the annotations on an object. But that's where we find ourselves. Current best practice is kind of a mess, and unfortunately PEP 649 breaks current best practice anyway. My goal is to fix the semantics so that long-term best practice is sensible, easy, and obvious.
Cheers,
*/arry*
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 1/18/21 2:39 PM, Guido van Rossum wrote:
Hm. It's unfortunate that this would break code using what is *currently* the best practice.
I can't figure out how to avoid it. The problem is, current best practice sidesteps the class and goes straight to the dict. How do we intercept that and run the code to lazy-calculate the annotations? I mean, let's consider something crazy. What if we change cls.__dict__ from a normal dict to a special dict that handles the __co_annotations__ machinery? That might work, except, we literally allow users to supply their own cls.__dict__ via __prepare__. So we can't rely on our special dict. What if we change cls.__dict__ to a getset? The user is allowed to set cls.__dict__, but when you get __dict__, we wrap the actual internal dict object with a special object that intercepts accesses to __annotations__ and handles the __co_annotations__ mechanism. That might work but it's really crazy and unfortunate. And it's remotely possible that a user might override __dict__ as a property, in a way that breaks this mechanism too. So it's not guaranteed to always work. I'm not suggesting we should do these things, I'm just trying to illustrate how hard I think the problem is. If someone has a good idea how we can add the __co_annotations__ machinery without breaking current best practice I'd love to hear it.
Also, for functions and modules I would recommend `getattr(o, "__annotations__", None)` (perhaps with `or {}` added).
For functions you don't need to bother; fn.__annotations__ is guaranteed to always be set, and be either a dict or None. (Python will only ever set it to a dict, but the user is permitted to set it to None.) I agree with your suggested best practice for modules as it stands today. And actually, let me walk back something I've said before. I believe I've said several times that "people treat classes and modules the same". Actually that's wrong. * Lib/typing.py treats functions and modules the same; it uses getattr(o, '__annotations__', None). It treats classes separately and uses cls.__dict__.get('__annotations__', {}). * Lib/dataclasses.py uses fn.__annotations__ for functions and cls.__dict__.get('__annotations__', {}) for classes. It doesn't handle modules at all. * Lib/inspect.py calls Lib/typing.py to get annotations. Which in retrospect I think is a bug, because annotations and type hints aren't the same thing. (typing.get_type_hints changes None to type(None), it evaluates strings, etc). So, for what it's worth, I literally have zero examples of people treating classes and modules the same when it comes to annotations. Sorry for the confusion!
I would also honestly discount what dataclasses.py and typing.py have to do. But what do 3rd party packages do when they don't want to use get_type_hints() and they want to get it right for classes? That would give an indication of how serious we should take breaking current best practice.
I'm not sure how to figure that out. Off the top of my head, the only current third-party packages I can think of that uses annotations are mypy and attrs. I took a quick look at mypy but I can't figure out what it's doing. attrs does something a little kooky. It access __annotations__ using a function called _has_own_attributes(), which detects whether or not the object is inheriting an attribute. But it doesn't peek in __dict__, instead it walks the mro and sees if any of its base classes have the same (non-False) value for that attribute. https://github.com/python-attrs/attrs/blob/a025629e36440dcc27aee0ee5b04d6523... Happily, that seems like it would continue to work even if PEP 649 is accepted. That's good news! Cheers, //arry/
On Mon, Jan 18, 2021 at 4:34 PM Larry Hastings <larry@hastings.org> wrote:
On 1/18/21 2:39 PM, Guido van Rossum wrote:
Hm. It's unfortunate that this would break code using what is *currently* the best practice.
I can't figure out how to avoid it. The problem is, current best practice sidesteps the class and goes straight to the dict. How do we intercept that and run the code to lazy-calculate the annotations?
I mean, let's consider something crazy. What if we change cls.__dict__ from a normal dict to a special dict that handles the __co_annotations__ machinery? That might work, except, we literally allow users to supply their own cls.__dict__ via __prepare__. So we can't rely on our special dict.
There's a secret though. `cls.__dict__` is not actually a dict -- is a mappingproxy. The proxy exists because we want to be able to intercept changes to class attributes such as `__add__` or `__getattribute__` in order to manipulate the C-level wrappers that implement such overloads. So *perhaps* we could expand the mappingproxy class to trap read access to `__annotations__` as a key to do your bidding. (The trick might be exposed by things like .keys() but that doesn't bother me as much.) I honestly don't know how the mappingproxy and `__prepare__` interact. I have to admit I've never used the latter. Presumably the mappingproxy still plays a role because we'd still want to intercept e.g. `cls.__add__ = <some function>`.
What if we change cls.__dict__ to a getset? The user is allowed to set cls.__dict__, but when you get __dict__, we wrap the actual internal dict object with a special object that intercepts accesses to __annotations__ and handles the __co_annotations__ mechanism. That might work but it's really crazy and unfortunate. And it's remotely possible that a user might override __dict__ as a property, in a way that breaks this mechanism too. So it's not guaranteed to always work.
Maybe such guarantees are overrated; in any case it looks like a rare second-order effect (and we're already talking about esoteric usage patterns). I'm not suggesting we should do these things, I'm just trying to illustrate
how hard I think the problem is. If someone has a good idea how we can add the __co_annotations__ machinery without breaking current best practice I'd love to hear it.
Also, for functions and modules I would recommend `getattr(o, "__annotations__", None)` (perhaps with `or {}` added).
For functions you don't need to bother; fn.__annotations__ is guaranteed to always be set, and be either a dict or None. (Python will only ever set it to a dict, but the user is permitted to set it to None.)
I agree with your suggested best practice for modules as it stands today.
And actually, let me walk back something I've said before. I believe I've said several times that "people treat classes and modules the same". Actually that's wrong.
- Lib/typing.py treats functions and modules the same; it uses getattr(o, '__annotations__', None). It treats classes separately and uses cls.__dict__.get('__annotations__', {}). - Lib/dataclasses.py uses fn.__annotations__ for functions and cls.__dict__.get('__annotations__', {}) for classes. It doesn't handle modules at all. - Lib/inspect.py calls Lib/typing.py to get annotations. Which in retrospect I think is a bug, because annotations and type hints aren't the same thing. (typing.get_type_hints changes None to type(None), it evaluates strings, etc).
So, for what it's worth, I literally have zero examples of people treating classes and modules the same when it comes to annotations. Sorry for the confusion!
Yeah, that part felt fishy -- basically classes are the only complicated case here, because in order to construct the full set of annotations you must walk the MRO. Honestly *if* you are walking the MRO anyways, it probably doesn't matter much if you use cls.__dict__.get('__annotations__') or getattr(cls, '__annotations__') -- you might see some duplicates but you should generally end up with the same overall set of annotations (though presumably one could construct a counter-example using multiple inheritance).
I would also honestly discount what dataclasses.py and typing.py have to do. But what do 3rd party packages do when they don't want to use get_type_hints() and they want to get it right for classes? That would give an indication of how serious we should take breaking current best practice.
I'm not sure how to figure that out. Off the top of my head, the only current third-party packages I can think of that uses annotations are mypy and attrs. I took a quick look at mypy but I can't figure out what it's doing.
Mypy is irrelevant because it reads your source code -- it doesn't ever run your code to inspect `__annotations__`. attrs does something a little kooky. It access __annotations__ using a
function called _has_own_attributes(), which detects whether or not the object is inheriting an attribute. But it doesn't peek in __dict__, instead it walks the mro and sees if any of its base classes have the same (non-False) value for that attribute.
https://github.com/python-attrs/attrs/blob/a025629e36440dcc27aee0ee5b04d6523...
Happily, that seems like it would continue to work even if PEP 649 is accepted. That's good news!
I wonder how much pain it cost to develop that. Another example of a well-known library that presumably does something clever with annotations at runtime is Pydantic. I've not looked into it more. There are people who routinely search many GitHub repos for various patterns. Maybe one of them can help? (I've never tried this but IIRC Irit showed me some examples.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 1/18/21 5:33 PM, Guido van Rossum wrote:
There's a secret though. `cls.__dict__` is not actually a dict -- is a mappingproxy. The proxy exists because we want to be able to intercept changes to class attributes such as `__add__` or `__getattribute__` in order to manipulate the C-level wrappers that implement such overloads.
So *perhaps* we could expand the mappingproxy class to trap read access to `__annotations__` as a key to do your bidding. (The trick might be exposed by things like .keys() but that doesn't bother me as much.)
I honestly don't know how the mappingproxy and `__prepare__` interact.
`__prepare__` returns a dict-like namespace that is used as is. `EnumMeta` uses `__prepare__` to return an instance of `_EnumDict`. When `type.__new__` is called, whatever the namespace used to be is then converted into a normal Python dict, and a mappingproxy is returned for all further `cls.__dict__` requests. -- ~Ethan~
On 1/18/21 5:53 PM, Ethan Furman wrote:
`__prepare__` returns a dict-like namespace that is used as is. `EnumMeta` uses `__prepare__` to return an instance of `_EnumDict`.
When `type.__new__` is called, whatever the namespace used to be is then converted into a normal Python dict, and a mappingproxy is returned for all further `cls.__dict__` requests.
To be more precise, when `__prepare__` is called, there is no class, and therefore no `class.__dict__`. -- ~Ethan~
participants (10)
-
Brett Cannon
-
Chris Angelico
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Larry Hastings
-
Nick Coghlan
-
Paul Bryan
-
Petr Viktorin
-
Serhiy Storchaka