Howdy howdy. While working on my PEP I stumbled over a lot of
behavior by
annotations that I found inconsistent and inconvenient. I think
there are
several problems here that needs fixing. This discussion will
probably
evolve into a PEP, and I'll be happy to steer that process. But
I'm less
certain about what the right thing to do is. (Although I do know
what I'd
prefer!) So let's talk about it!
Annotations are represented in Python as a dictionary. They can
be present
on functions, classes, and modules as an attribute called
"__annotations__".
We start with: how do you get the annotations from one of these
objects?
Surely it's as easy as this line from Lib/inspect.py shows us:
return func.__annotations__
And yes, that's best practice for getting an annotation from a
function object.
But consider this line from Lib/functools.py:
ann = getattr(cls, '__annotations__', {})
Huh. Why doesn't it simply look at cls.__annotations__? It's
because the
language declares that __annotations__ on a class or module is
optional.
Since cls.__annotations__ may not be defined, evaluating that
might throw an
exception. Three-argument getattr() is much safer, and I assert
it's best
practice for getting the annotations from a module.
But consider this line from Lib/dataclasses.py:
cls_annotations = cls.__dict__.get('__annotations__', {})
And a very similar line from Lib/typing.py:
ann = base.__dict__.get('__annotations__', {})
Huh! Why is this code skipping the attribute entirely, and
examining
cls.__dict__? It's because the getattr() approach has a subtle
bug when
dealing with classes. Consider this example:
class A:
ax:int=3
class B(A):
pass
print(getattr(B, '__annotations__', {}))
That's right, B *inherits* A.__annotations__! So this prints
{'ax': int}.
This *can't* the intended behavior of __annotations__ on classes.
It's only
supposed to contain annotations for the fields of B itself, not
those of one
of its randomly-selected base classes. But that's how it behaves
today--and
people have had to work around this behavior for years. Examining
the class
dict is, sadly, best practice for getting __annotations__ from a
class.
So, already: three different objects can have __annotations__, and
there are
three different best practices for getting their __annotations__.
Let's zoom out for a moment. Here's the list of predefined data
fields
you can find on classes:
__annotations__
__bases__
__class__
__dict__
__doc__
__module__
__mro__
__name__
__qualname__
All of these describe metadata about the class. In every case
*except one*,
the field is mandatory, which also means it's never inherited. And
in every
case *except one*, you cannot delete the field. (Though you *are*
allowed
to overwrite some of them.)
You guessed it: __annotations__ is the exception. It's optional,
and
you're allowed to delete it. And these exceptions are causing
problems.
It seems to me that, if the only way to correctly use a
language-defined
attribute of classes is by rooting around in its __dict__, the
design is
a misfire.
(Much of the above also applies to modules, too. The big
difference: since
modules lack inheritance, you don't need to look in their
__dict__.)
Now consider what happens if my "delayed annotations of
annotations using
descriptors" PEP is accepted. If that happens, pulling
__annotations__
out of the class dict won't work if they haven't been generated
yet. So
today's "best practice" becomes tomorrow's "this code doesn't
work".
To correctly examine class annotations, code would have to do
something
like this, which should work correctly in any Python 3.x version:
if (getattr(cls, '__co_annotations__', None)
or ('__annotations__' in cls.__dict__)):
ann = cls.__annotations__
else:
ann = {}
This is getting ridiculous.
Let's move on to a related topic. For each of the objects that
can
have annotations, what happens if o.__annotations__ is set, and
you
"del o.__annotations__", then you access o.__annotations__? It
depends on
what the object is, because each of them behaves differently.
You already know what happens with classes: if any of the base
classes has
__annotations__ set, you'll get the first one you find in the
MRO. If none
of the bases have __annotations__ set you'll get an
AttributeError.
For a module, if you delete it then try to access it, you'll
always get
an AttributeError.
For a function, if you delete it then try to get it, the function
will
create a new empty dict, store it as its new annotations dict, and
return
that. Why does it do that? I'm not sure. The relevent PEP
(3107) doesn't
specify this behavior.
So, annotations can be set on three different object types, and
each of
those three have a different behavior when you delete the
annotations
then try to get them again.
As a final topic: what are the permitted types for
__annotations__?
If you say "o.__annotations__ = <x>", what types are and
aren't
allowed for <x>?
For functions, __annotations__ may be assigned to either None or
a dict (an object that passes PyDict_Check). Anything else throws
a
TypeError. For classes and modules, no checking is done
whatsoever,
and you can set __annotations__ on those two to any Python object.
While "a foolish consistency is the hobgoblin of little minds",
I don't see the benefit of setting a module's __annotations__ to
2+3j.
I think it's long past time that we cleaned up the behavior of
annotations.
They should be simple and consistent across all objects that
support them.
At the very least, I think we should make cls.__annotations__
required rather
than optional, so that it's never inherited. What should its
default value
be? An empty dict would be more compatible, but None would be
cheaper.
Note that creating the empty dict on the fly, the way function
objects do,
wouldn't really help--because current best practice means looking
in
cls.__dict__.
I also think you shouldn't be able to delete __annotations__ on
any of the
three objects (function, class, module). It should always be set,
so that
the best practice for accessing annotations on an object is always
o.__annotations__.
If I could wave my magic wand and do whatever I wanted, I'd change
the
semantics for __annotations__ to the following:
* Functions, classes, and modules always have an __annotations__
member set.
* "del o.__annotations__" always throws a TypeError.
* The language will set __annotations__ to a dict if the object
has
annotations, or None if it has no annotations.
* You may set __annotations__, but you can only set it to either
None or a
dict (passes PyDict_Check).
* You may only access __annotations__ as an attribute, and because
it's
always set, best practice is to use "o.__annotations__" (though
getattr
will always work too).
* How __annotations__ is stored is implementation-specific
behavior;
looking in the relevant __dict__ is unsupported.
This would grant sanity and consistency to __annotations__ in a
way it's
never so far enjoyed. The problem is, it's a breaking change.
But the
existing semantics are kind of terrible, so at this point my goal
is to
break them. I think the best practice needs to stop requiring
examining
cls.__dict__; in fact I'd prefer people stop doing it altogether.
If we change the behavior as part of a new release of Python,
code that examines annotations on classes can do a version check:
if (sys.version_info.major >=3
and sys.version_info.minor >= 10):
def get_annotations(o):
return o.__annotations__ or {}
else:
def get_annotations(o):
# eight or ten lines of complex code goes here
...
Or code can just use inspect.get_type_hints(), which is tied to
the Python version
anyway and should always do the right thing.
/arry