[Python-ideas] Add support for external annotations in the typing module

Till till.varoquaux at gmail.com
Fri Jan 18 16:59:03 EST 2019

Thanks for the feedback Gregory.

You raise a lot of good points; this is going to help me write a clearer
(0)   Pretty much. They can be used as refinement for more advanced type
checkers (e.g.: for linear types).

(1a) I knew about the postponed evaluation but hadn't read PEP-563 yet. Thx
for the heads up.

(1b) I think you think you meant `Intersection` type rather than `Union`
type. A value of type `Intersection[A, B]` is both of type `A` and of type
`B`. If we had Intersection and allowed to passed arguments decorated with
NoTypeCheck then we good do without `Annotation`. This could be a bit messy
though because you'd probably want to make sure that NoTypeCheck only
appears in `Intersection`.
Another advantage of `Annotated` is that there's a clear "principal" type.
So you can make calls to constructors transparent. e.g.:

  class A:

  A_with_info = Annotated[A, ...]
  A_with_info(5) # create the value A(5)

(2a) and (2b): I don't have any strong feelings when it comes to syntax; I
tried to be consistent with the standard library (and maybe I got it
wrong). My understanding [] is used to create a new type whereas () is used
to create a new value:

  > Deque(range(2))
  deque([0, 1])

  > Deque[int]

On Thu, 17 Jan 2019 at 19:35 Gregory P. Smith <greg at krypto.org> wrote:

> On Thu, Jan 17, 2019 at 2:34 PM Till <till.varoquaux at gmail.com> wrote:
>> We started a discussion in https://github.com/python/typing/issues/600
>> about adding support for extra annotations in the typing module.
>> Since this is probably going to turn into a PEP I'm transferring the
>> discussion here to have more visibility.
>> The document below has been modified a bit from the one in GH to reflect
>> the feedback I got:
>>  + Added a small blurb about how ``Annotated`` should support being used
>> as an alias
>> Things that were raised but are not reflected in this document:
>>  + The dataclass example is confusing. I kept it for now because
>> dataclasses often come up in conversations about why we might want to
>> support annotations in the typing module. Maybe I should rework the
>> section.
>>  + `...` as a valid parameter for the first argument (if you want to add
>> an annotation but use the type inferred by your type checker). This is an
>> interesting idea, it's probably worth adding support for it if and only if
>> we decide to support in other places. (c.f.:
>> https://github.com/python/typing/issues/276)
>> Thanks,
>> Add support for external annotations in the typing module
>> ==========================================================
>> We propose adding an ``Annotated`` type to the typing module to decorate
>> existing types with context-specific metadata. Specifically, a type ``T``
>> can be annotated with metadata ``x`` via the typehint ``Annotated[T, x]``.
>> This metadata can be used for either static analysis or at runtime. If a
>> library (or tool) encounters a typehint ``Annotated[T, x]`` and has no
>> special logic for metadata ``x``, it should ignore it and simply treat the
>> type as ``T``. Unlike the `no_type_check` functionality that current exists
>> in the ``typing`` module which completely disables typechecking annotations
>> on a function or a class, the ``Annotated`` type allows for both static
>> typechecking of ``T`` (e.g., via MyPy or Pyre, which can safely ignore
>> ``x``)  together with runtime access to ``x`` within a specific
>> application. We believe that the introduction of this type would address a
>> diverse set of use cases of interest to the broader Python community.
>> Motivating examples:
>> ~~~~~~~~~~~~~~~~~~~~
>> reading binary data
>> +++++++++++++++++++
>> The ``struct`` module provides a way to read and write C structs directly
>> from their byte representation. It currently relies on a string
>> representation of the C type to read in values::
>>   record = b'raymond   \x32\x12\x08\x01\x08'
>>   name, serialnum, school, gradelevel = unpack('<10sHHb', record)
>> The struct documentation [struct-examples]_ suggests using a named tuple
>> to unpack the values and make this a bit more tractable::
>>   from collections import namedtuple
>>   Student = namedtuple('Student', 'name serialnum school gradelevel')
>>   Student._make(unpack('<10sHHb', record))
>>   # Student(name=b'raymond   ', serialnum=4658, school=264, gradelevel=8)
>> However, this recommendation is somewhat problematic; as we add more
>> fields, it's going to get increasingly tedious to match the properties in
>> the named tuple with the arguments in ``unpack``.
>> Instead, annotations can provide better interoperability with a type
>> checker or an IDE without adding any special logic outside of the
>> ``struct`` module::
>>   from typing import NamedTuple
>>   UnsignedShort = Annotated[int, struct.ctype('H')]
>>   SignedChar = Annotated[int, struct.ctype('b')]
>>   @struct.packed
>>   class Student(NamedTuple):
>>     # MyPy typechecks 'name' field as 'str'
>>     name: Annotated[str, struct.ctype("<10s")]
>>     serialnum: UnsignedShort
>>     school: SignedChar
>>     gradelevel: SignedChar
>>   # 'unpack' only uses the metadata within the type annotations
>>   Student.unpack(record))
>>   # Student(name=b'raymond   ', serialnum=4658, school=264, gradelevel=8)
>> dataclasses
>> ++++++++++++
>> Here's an example with dataclasses [dataclass]_ that is a problematic
>> from the typechecking standpoint::
>>   from dataclasses import dataclass, field
>>   @dataclass
>>   class C:
>>     myint: int = 0
>>     # the field tells the @dataclass decorator that the default action in
>> the
>>     # constructor of this class is to set "self.mylist = list()"
>>     mylist: List[int] = field(default_factory=list)
>> Even though one might expect that ``mylist`` is a class attribute
>> accessible via ``C.mylist`` (like ``C.myint`` is) due to the assignment
>> syntax, that is not the case. Instead, the ``@dataclass`` decorator strips
>> out the assignment to this attribute, leading to an ``AttributeError`` upon
>> access::
>>   C.myint  # Ok: 0
>>   C.mylist  # AttributeError: type object 'C' has no attribute 'mylist'
>> This can lead to confusion for newcomers to the library who may not
>> expect this behavior. Furthermore, the typechecker needs to understand the
>> semantics of dataclasses and know to not treat the above example as an
>> assignment operation in (which translates to additional complexity).
>> It makes more sense to move the information contained in ``field`` to an
>> annotation::
>>   @dataclass
>>   class C:
>>       myint: int = 0
>>       mylist: Annotated[List[int], field(default_factory=list)]
>>   # now, the AttributeError is more intuitive because there is no
>> assignment operator
>>   C.mylist  # AttributeError
>>   # the constructor knows how to use the annotations to set the 'mylist'
>> attribute
>>   c = C()
>>   c.mylist  # []
>> The main benefit of writing annotations like this is that it provides a
>> way for clients to gracefully degrade when they don't know what to do with
>> the extra annotations (by just ignoring them). If you used a typechecker
>> that didn't have any special handling for dataclasses and the ``field``
>> annotation, you would still be able to run checks as though the type were
>> simply::
>>   class C:
>>       myint: int = 0
>>       mylist: List[int]
>> lowering barriers to developing new types
>> +++++++++++++++++++++++++++++++++++++++++
>> Typically when adding a new type, we need to upstream that type to the
>> typing module and change MyPy [MyPy]_, PyCharm [PyCharm]_, Pyre [Pyre]_,
>> pytype [pytype]_, etc. This is particularly important when working on
>> open-source code that makes use of our new types, seeing as the code would
>> not be immediately transportable to other developers' tools without
>> additional logic (this is a limitation of MyPy plugins [MyPy-plugins]_),
>> which allow for extending MyPy but would require a consumer of new
>> typehints to be using MyPy and have the same plugin installed). As a
>> result, there is a high cost to developing and trying out new types in a
>> codebase. Ideally, we should be able to introduce new types in a manner
>> that allows for graceful degradation when clients do not have a custom MyPy
>> plugin, which would lower the barrier to development and ensure some degree
>> of backward compatibility.
>> For example, suppose that we wanted to add support for tagged unions
>> [tagged-unions]_ to Python. One way to accomplish would be to annotate
>> ``TypedDict`` in Python such that only one field is allowed to be set::
>>   Currency = Annotated(
>>     TypedDict('Currency', {'dollars': float, 'pounds': float},
>> total=False),
>>     TaggedUnion,
>>   )
>> This is a somewhat cumbersome syntax but it allows us to iterate on this
>> proof-of-concept and have people with non-patched IDEs work in a codebase
>> with tagged unions. We could easily test this proposal and iron out the
>> kinks before trying to upstream tagged union to `typing`, MyPy, etc.
>> Moreover, tools that do not have support for parsing the ``TaggedUnion``
>> annotation would still be able able to treat `Currency` as a ``TypedDict``,
>> which is still a close approximation (slightly less strict).
>> Details of proposed changes to ``typing``
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> syntax
>> ++++++
>> ``Annotated`` is parameterized with a type and an arbitrary list of
>> Python values that represent the annotations. Here are the specific details
>> of the syntax:
>> * The first argument to ``Annotated`` must be a valid ``typing`` type or
>> ``...`` (to use the infered type).
>> * Multiple type annotations are supported (Annotated supports variadic
>> arguments): ``Annotated[int, ValueRange(3, 10), ctype("char")]``
>> * ``Annotated`` must be called with at least two arguments
>> (``Annotated[int]`` is not valid)
>> * The order of the annotations is preserved and matters for equality
>> checks::
>>    Annotated[int, ValueRange(3, 10), ctype("char")] != \
>>     Annotated[int, ctype("char"), ValueRange(3, 10)]
>> * Nested ``Annotated`` types are flattened, with metadata ordered
>> starting with the innermost annotation::
>>    Annotated[Annotated[int, ValueRange(3, 10)], ctype("char")] ==\
>>     Annotated[int, ValueRange(3, 10), ctype("char")]
>> * Duplicated annotations are not removed: ``Annotated[int, ValueRange(3,
>> 10)] != Annotated[int, ValueRange(3, 10), ValueRange(3, 10)]``
>> * ``Annotation`` can be used a higher order aliases::
>>     Typevar T = ...
>>     Vec = Annotated[List[Tuple[T, T]], MaxLen(10)]
>>     # Vec[int] == `Annotated[List[Tuple[int, int]], MaxLen(10)]
>> consuming annotations
>> ++++++++++++++++++++++
>> Ultimately, the responsibility of how to interpret the annotations (if at
>> all) is the responsibility of the tool or library encountering the
>> `Annotated` type. A tool or library encountering an `Annotated` type can
>> scan through the annotations to determine if they are of interest (e.g.,
>> using `isinstance`).
>> **Unknown annotations**
>>   When a tool or a library does not support annotations or encounters an
>> unknown annotation it should just ignore it and treat annotated type as the
>> underlying type. For example, if we were to add an annotation that is not
>> an instance of `struct.ctype` to the annotation for name (e.g.,
>> `Annotated[str, 'foo', struct.ctype("<10s")]`), the unpack method should
>> ignore it.
>> **Namespacing annotations**
>>   We do not need namespaces for annotations since the class used by the
>> annotations acts as a namespace.
>> **Multiple annotations**
>>   It's up to the tool consuming the annotations to decide whether the
>> client is allowed to have several annotations on one type and how to merge
>> those annotations.
>>   Since the ``Annotated`` type allows you to put several annotations of
>> the same (or different) type(s) on any node, the tools or libraries
>> consuming those annotations are in charge of dealing with potential
>> duplicates. For example, if you are doing value range analysis you might
>> allow this::
>>     T1 = Annotated[int, ValueRange(-10, 5)]
>>     T2 = Annotated[T1, ValueRange(-20, 3)]
>>   Flattening nested annotations, this translates to::
>>     T2 = Annotated[int, ValueRange(-10, 5), ValueRange(-20, 3)]
>>   An application consuming this type might choose to reduce these
>> annotations via an intersection of the ranges, in which case ``T2`` would
>> be treated equivalently to ``Annotated[int, ValueRange(-10, 3)]``.
>>   An alternative application might reduce these via a union, in which
>> case ``T2`` would be treated equivalently to ``Annotated[int,
>> ValueRange(-20, 5)]``.
>>   Other applications may decide to not support multiple annotations and
>> throw an exception.
> (0) Observaton / TL;DR - This PEP really seems to be more of a way to
> declare multiple different arbitrary purposes annotations all attached to a
> single callable/parameter/return/variable.  So that static checkers
> continue to work, but runtime user of annotations for whatever purpose can
> also work at the same time.
> (1a) A struct.unpack supporting this will then need to evaluate
> annotations in the outer scope at runtime due to our desired long term
> PEP-563 `from __future__ import annotations` behavior.  But that becomes
> true of anything else wanting to use annotations at runtime so we should
> really make a typing library function that does this for everyone to use.
> (1b) This proposal potentially expands the burden of type checkers... but
> it shouldn't.  They should be free to take the first type listed in an
> Annotated[] block as the type of the variable, raising an error if someone
> has listed multiple types (telling them to use Union[] for that).  a static
> checker *could* do useful things with multiple annotations it knows how
> to handle, but I think it'd be unwise to implement that in any manner where
> Annotated and Union could both be used for the same purpose.
> It makes me wonder if Annotated[] is meaningfully different from Union at
> all.
> (2a) At first glance I don't like that the `T1 = Annotated[int,
> SomeOtherInfo(23)]` syntax uses [] rather than () as it really is
> constructing a runtime type.  It isn't clear what should use [] and what
> should use () so I'd suggest using () for everything there.
> (2b) Ask yourself: Why should SomeOtherInfo and ValueRange and
> struct.ctype be () calls yet none of `Annotated[Union[List[bytes],
> Dict[bytes, Optional[float]]]]` be calls?  If you can come up with an
> answer to that, why _should_ anyone need to know that?
> -gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190118/004967d0/attachment-0001.html>

More information about the Python-ideas mailing list