Mailman 3 Let's Fix Class Annotations -- And Maybe Annotations Generally - Python-Dev

11 Jan 2021

      Howdy howdy.  While working on my PEP I stumbled over a lot of behavior by
annotations that I found inconsistent and inconvenient.  I think there are
several problems here that needs fixing.  This discussion will probably
evolve into a PEP, and I'll be happy to steer that process.  But I'm less
certain about what the right thing to do is.  (Although I do know what I'd
prefer!)  So let's talk about it!

Annotations are represented in Python as a dictionary.  They can be present
on functions, classes, and modules as an attribute called "__annotations__".

We start with: how do you get the annotations from one of these objects?
Surely it's as easy as this line from Lib/inspect.py shows us:

     return func.__annotations__

And yes, that's best practice for getting an annotation from a function 
object.
But consider this line from Lib/functools.py:

     ann = getattr(cls, '__annotations__', {})

Huh.  Why doesn't it simply look at cls.__annotations__?  It's because the
language declares that __annotations__ on a class or module is optional.
Since cls.__annotations__ may not be defined, evaluating that might throw an
exception.  Three-argument getattr() is much safer, and I assert it's best
practice for getting the annotations from a module.

But consider this line from Lib/dataclasses.py:

     cls_annotations = cls.__dict__.get('__annotations__', {})

And a very similar line from Lib/typing.py:

     ann = base.__dict__.get('__annotations__', {})

Huh!  Why is this code skipping the attribute entirely, and examining
cls.__dict__?  It's because the getattr() approach has a subtle bug when
dealing with classes.  Consider this example:

     class A:
         ax:int=3
     class B(A):
         pass
     print(getattr(B, '__annotations__', {}))

That's right, B *inherits* A.__annotations__!  So this prints {'ax': int}.

This *can't* the intended behavior of __annotations__ on classes. It's only
supposed to contain annotations for the fields of B itself, not those of one
of its randomly-selected base classes.  But that's how it behaves today--and
people have had to work around this behavior for years.  Examining the class
dict is, sadly, best practice for getting __annotations__ from a class.

So, already: three different objects can have __annotations__, and there are
three different best practices for getting their __annotations__.

Let's zoom out for a moment.  Here's the list of predefined data fields
you can find on classes:

     __annotations__
     __bases__
     __class__
     __dict__
     __doc__
     __module__
     __mro__
     __name__
     __qualname__

All of these describe metadata about the class.  In every case *except one*,
the field is mandatory, which also means it's never inherited. And in every
case *except one*, you cannot delete the field.  (Though you *are* allowed
to overwrite some of them.)

You guessed it: __annotations__ is the exception.  It's optional, and
you're allowed to delete it.  And these exceptions are causing problems.
It seems to me that, if the only way to correctly use a language-defined
attribute of classes is by rooting around in its __dict__, the design is
a misfire.

(Much of the above also applies to modules, too.  The big difference: since
modules lack inheritance, you don't need to look in their __dict__.)

Now consider what happens if my "delayed annotations of annotations using
descriptors" PEP is accepted.  If that happens, pulling __annotations__
out of the class dict won't work if they haven't been generated yet.  So
today's "best practice" becomes tomorrow's "this code doesn't work".
To correctly examine class annotations, code would have to do something
like this, which should work correctly in any Python 3.x version:

     if (getattr(cls, '__co_annotations__', None)
         or ('__annotations__' in cls.__dict__)):
         ann = cls.__annotations__
     else:
         ann = {}

This is getting ridiculous.

Let's move on to a related topic.  For each of the objects that can
have annotations, what happens if o.__annotations__ is set, and you
"del o.__annotations__", then you access o.__annotations__?  It depends on
what the object is, because each of them behaves differently.

You already know what happens with classes: if any of the base classes has
__annotations__ set, you'll get the first one you find in the MRO.  If none
of the bases have __annotations__ set you'll get an AttributeError.

For a module, if you delete it then try to access it, you'll always get
an AttributeError.

For a function, if you delete it then try to get it, the function will
create a new empty dict, store it as its new annotations dict, and return
that.  Why does it do that?  I'm not sure.  The relevent PEP (3107) doesn't
specify this behavior.

So, annotations can be set on three different object types, and each of
those three have a different behavior when you delete the annotations
then try to get them again.

As a final topic: what are the permitted types for __annotations__?
If you say "o.__annotations__ = <x>", what types are and aren't
allowed for <x>?

For functions, __annotations__ may be assigned to either None or
a dict (an object that passes PyDict_Check).  Anything else throws a
TypeError.  For classes and modules, no checking is done whatsoever,
and you can set __annotations__ on those two to any Python object.
While "a foolish consistency is the hobgoblin of little minds",
I don't see the benefit of setting a module's __annotations__ to 2+3j.

I think it's long past time that we cleaned up the behavior of annotations.
They should be simple and consistent across all objects that support them.

At the very least, I think we should make cls.__annotations__ required 
rather
than optional, so that it's never inherited.  What should its default value
be?  An empty dict would be more compatible, but None would be cheaper.
Note that creating the empty dict on the fly, the way function objects do,
wouldn't really help--because current best practice means looking in
cls.__dict__.

I also think you shouldn't be able to delete __annotations__ on any of the
three objects (function, class, module).  It should always be set, so that
the best practice for accessing annotations on an object is always
o.__annotations__.

If I could wave my magic wand and do whatever I wanted, I'd change the
semantics for __annotations__ to the following:

* Functions, classes, and modules always have an __annotations__ member set.
* "del o.__annotations__" always throws a TypeError.
* The language will set __annotations__ to a dict if the object has
   annotations, or None if it has no annotations.
* You may set __annotations__, but you can only set it to either None or a
   dict (passes PyDict_Check).
* You may only access __annotations__ as an attribute, and because it's
   always set, best practice is to use "o.__annotations__" (though getattr
   will always work too).
* How __annotations__ is stored is implementation-specific behavior;
   looking in the relevant __dict__ is unsupported.

This would grant sanity and consistency to __annotations__ in a way it's
never so far enjoyed.  The problem is, it's a breaking change. But the
existing semantics are kind of terrible, so at this point my goal is to
break them. I think the best practice needs to stop requiring examining
cls.__dict__; in fact I'd prefer people stop doing it altogether.

If we change the behavior as part of a new release of Python,
code that examines annotations on classes can do a version check:

     if (sys.version_info.major >=3
         and sys.version_info.minor >= 10):

         def get_annotations(o):
             return o.__annotations__ or {}
     else:
         def get_annotations(o):
             # eight or ten lines of complex code goes here
             ...

Or code can just use inspect.get_type_hints(), which is tied to the 
Python version
anyway and should always do the right thing.

//arry/

Let's Fix Class Annotations -- And Maybe Annotations Generally

tags

participants (10)