PEP 544: Protocols - second round
Hi all, After collecting suggestions in the previous discussion on python-dev https://mail.python.org/pipermail/python-dev/2017-March/thread.html#147629 and playing with implementation, here is an updated version of PEP 544. -- Ivan A link for those who don't like reading long e-mails: https://www.python.org/dev/peps/pep-0544/ ========================= PEP: 544 Title: Protocols Version: $Revision$ Last-Modified: $Date$ Author: Ivan Levkivskyi <levkivskyi@gmail.com>, Jukka Lehtosalo < jukka.lehtosalo@iki.fi>, Łukasz Langa <lukasz@langa.pl> Discussions-To: Python-Dev <python-dev@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-Mar-2017 Python-Version: 3.7 Abstract ======== Type hints introduced in PEP 484 can be used to specify type metadata for static type checkers and other third party tools. However, PEP 484 only specifies the semantics of *nominal* subtyping. In this PEP we specify static and runtime semantics of protocol classes that will provide a support for *structural* subtyping (static duck typing). .. _rationale: Rationale and Goals =================== Currently, PEP 484 and the ``typing`` module [typing]_ define abstract base classes for several common Python protocols such as ``Iterable`` and ``Sized``. The problem with them is that a class has to be explicitly marked to support them, which is unpythonic and unlike what one would normally do in idiomatic dynamically typed Python code. For example, this conforms to PEP 484:: from typing import Sized, Iterable, Iterator class Bucket(Sized, Iterable[int]): ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ... The same problem appears with user-defined ABCs: they must be explicitly subclassed or registered. This is particularly difficult to do with library types as the type objects may be hidden deep in the implementation of the library. Also, extensive use of ABCs might impose additional runtime costs. The intention of this PEP is to solve all these problems by allowing users to write the above code without explicit base classes in the class definition, allowing ``Bucket`` to be implicitly considered a subtype of both ``Sized`` and ``Iterable[int]`` by static type checkers using structural [wiki-structural]_ subtyping:: from typing import Iterator, Iterable class Bucket: ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ... def collect(items: Iterable[int]) -> int: ... result: int = collect(Bucket()) # Passes type check Note that ABCs in ``typing`` module already provide structural behavior at runtime, ``isinstance(Bucket(), Iterable)`` returns ``True``. The main goal of this proposal is to support such behavior statically. The same functionality will be provided for user-defined protocols, as specified below. The above code with a protocol class matches common Python conventions much better. It is also automatically extensible and works with additional, unrelated classes that happen to implement the required protocol. Nominal vs structural subtyping ------------------------------- Structural subtyping is natural for Python programmers since it matches the runtime semantics of duck typing: an object that has certain properties is treated independently of its actual runtime class. However, as discussed in PEP 483, both nominal and structural subtyping have their strengths and weaknesses. Therefore, in this PEP we *do not propose* to replace the nominal subtyping described by PEP 484 with structural subtyping completely. Instead, protocol classes as specified in this PEP complement normal classes, and users are free to choose where to apply a particular solution. See section on `rejected`_ ideas at the end of this PEP for additional motivation. Non-goals --------- At runtime, protocol classes will be simple ABCs. There is no intent to provide sophisticated runtime instance and class checks against protocol classes. This would be difficult and error-prone and will contradict the logic of PEP 484. As well, following PEP 484 and PEP 526 we state that protocols are **completely optional**: * No runtime semantics will be imposed for variables or parameters annotated with a protocol class. * Any checks will be performed only by third-party type checkers and other tools. * Programmers are free to not use them even if they use type annotations. * There is no intent to make protocols non-optional in the future. Existing Approaches to Structural Subtyping =========================================== Before describing the actual specification, we review and comment on existing approaches related to structural subtyping in Python and other languages: * ``zope.interface`` [zope-interfaces]_ was one of the first widely used approaches to structural subtyping in Python. It is implemented by providing special classes to distinguish interface classes from normal classes, to mark interface attributes, and to explicitly declare implementation. For example:: from zope.interface import Interface, Attribute, implementer class IEmployee(Interface): name = Attribute("Name of employee") def do(work): """Do some work""" @implementer(IEmployee) class Employee: name = 'Anonymous' def do(self, work): return work.start() Zope interfaces support various contracts and constraints for interface classes. For example:: from zope.interface import invariant def required_contact(obj): if not (obj.email or obj.phone): raise Exception("At least one contact info is required") class IPerson(Interface): name = Attribute("Name") email = Attribute("Email Address") phone = Attribute("Phone Number") invariant(required_contact) Even more detailed invariants are supported. However, Zope interfaces rely entirely on runtime validation. Such focus on runtime properties goes beyond the scope of the current proposal, and static support for invariants might be difficult to implement. However, the idea of marking an interface class with a special base class is reasonable and easy to implement both statically and at runtime. * Python abstract base classes [abstract-classes]_ are the standard library tool to provide some functionality similar to structural subtyping. The drawback of this approach is the necessity to either subclass the abstract class or register an implementation explicitly:: from abc import ABC class MyTuple(ABC): pass MyTuple.register(tuple) assert issubclass(tuple, MyTuple) assert isinstance((), MyTuple) As mentioned in the `rationale`_, we want to avoid such necessity, especially in static context. However, in a runtime context, ABCs are good candidates for protocol classes and they are already used extensively in the ``typing`` module. * Abstract classes defined in ``collections.abc`` module [collections-abc]_ are slightly more advanced since they implement a custom ``__subclasshook__()`` method that allows runtime structural checks without explicit registration:: from collections.abc import Iterable class MyIterable: def __iter__(self): return [] assert isinstance(MyIterable(), Iterable) Such behavior seems to be a perfect fit for both runtime and static behavior of protocols. As discussed in `rationale`_, we propose to add static support for such behavior. In addition, to allow users to achieve such runtime behavior for *user-defined* protocols a special ``@runtime`` decorator will be provided, see detailed `discussion`_ below. * TypeScript [typescript]_ provides support for user-defined classes and interfaces. Explicit implementation declaration is not required and structural subtyping is verified statically. For example:: interface LabeledItem { label: string; size?: int; } function printLabel(obj: LabeledItem) { console.log(obj.label); } let myObj = {size: 10, label: "Size 10 Object"}; printLabel(myObj); Note that optional interface members are supported. Also, TypeScript prohibits redundant members in implementations. While the idea of optional members looks interesting, it would complicate this proposal and it is not clear how useful it will be. Therefore it is proposed to postpone this; see `rejected`_ ideas. In general, the idea of static protocol checking without runtime implications looks reasonable, and basically this proposal follows the same line. * Go [golang]_ uses a more radical approach and makes interfaces the primary way to provide type information. Also, assignments are used to explicitly ensure implementation:: type SomeInterface interface { SomeMethod() ([]byte, error) } if _, ok := someval.(SomeInterface); ok { fmt.Printf("value implements some interface") } Both these ideas are questionable in the context of this proposal. See the section on `rejected`_ ideas. .. _specification: Specification ============= Terminology ----------- We propose to use the term *protocols* for types supporting structural subtyping. The reason is that the term *iterator protocol*, for example, is widely understood in the community, and coming up with a new term for this concept in a statically typed context would just create confusion. This has the drawback that the term *protocol* becomes overloaded with two subtly different meanings: the first is the traditional, well-known but slightly fuzzy concept of protocols such as iterator; the second is the more explicitly defined concept of protocols in statically typed code. The distinction is not important most of the time, and in other cases we propose to just add a qualifier such as *protocol classes* when referring to the static type concept. If a class includes a protocol in its MRO, the class is called an *explicit* subclass of the protocol. If a class is a structural subtype of a protocol, it is said to implement the protocol and to be compatible with a protocol. If a class is compatible with a protocol but the protocol is not included in the MRO, the class is an *implicit* subtype of the protocol. (Note that one can explicitly subclass a protocol and still not implement it if a protocol attribute is set to ``None`` in the subclass, see Python [data-model]_ for details.) The attributes (variables and methods) of a protocol that are mandatory for other class in order to be considered a structural subtype are called protocol members. .. _definition: Defining a protocol ------------------- Protocols are defined by including a special new class ``typing.Protocol`` (an instance of ``abc.ABCMeta``) in the base classes list, typically at the end of the list. Here is a simple example:: from typing import Protocol class SupportsClose(Protocol): def close(self) -> None: ... Now if one defines a class ``Resource`` with a ``close()`` method that has a compatible signature, it would implicitly be a subtype of ``SupportsClose``, since the structural subtyping is used for protocol types:: class Resource: ... def close(self) -> None: self.file.close() self.lock.release() Apart from few restrictions explicitly mentioned below, protocol types can be used in every context where a normal types can:: def close_all(things: Iterable[SupportsClose]) -> None: for t in things: t.close() f = open('foo.txt') r = Resource() close_all([f, r]) # OK! close_all([1]) # Error: 'int' has no 'close' method Note that both the user-defined class ``Resource`` and the built-in ``IO`` type (the return type of ``open()``) are considered subtypes of ``SupportsClose``, because they provide a ``close()`` method with a compatible type signature. Protocol members ---------------- All methods defined in the protocol class body are protocol members, both normal and decorated with ``@abstractmethod``. If any parameters of a protocol method are not annotated, then their types are assumed to be ``Any`` (see PEP 484). Bodies of protocol methods are type checked. An abstract method that should not be called via ``super()`` ought to raise ``NotImplementedError``. Example:: from typing import Protocol from abc import abstractmethod class Example(Protocol): def first(self) -> int: # This is a protocol member return 42 @abstractmethod def second(self) -> int: # Method without a default implementation raise NotImplementedError Static methods, class methods, and properties are equally allowed in protocols. To define a protocol variable, one can use PEP 526 variable annotations in the class body. Additional attributes *only* defined in the body of a method by assignment via ``self`` are not allowed. The rationale for this is that the protocol class implementation is often not shared by subtypes, so the interface should not depend on the default implementation. Examples:: from typing import Protocol, List class Template(Protocol): name: str # This is a protocol member value: int = 0 # This one too (with default) def method(self) -> None: self.temp: List[int] = [] # Error in type checker class Concrete: def __init__(self, name: str, value: int) -> None: self.name = name self.value = value var: Template = Concrete('value', 42) # OK To distinguish between protocol class variables and protocol instance variables, the special ``ClassVar`` annotation should be used as specified by PEP 526. By default, protocol variables as defined above are considered readable and writable. To define a read-only protocol variable, one can use an (abstract) property. Explicitly declaring implementation ----------------------------------- To explicitly declare that a certain class implements a given protocol, it can be used as a regular base class. In this case a class could use default implementations of protocol members. ``typing.Sequence`` is a good example of a protocol with useful default methods. Static analysis tools are expected to automatically detect that a class implements a given protocol. So while it's possible to subclass a protocol explicitly, it's *not necessary* to do so for the sake of type-checking. The default implementations cannot be used if the subtype relationship is implicit and only via structural subtyping -- the semantics of inheritance is not changed. Examples:: class PColor(Protocol): @abstractmethod def draw(self) -> str: ... def complex_method(self) -> int: # some complex code here class NiceColor(PColor): def draw(self) -> str: return "deep blue" class BadColor(PColor): def draw(self) -> str: return super().draw() # Error, no default implementation class ImplicitColor: # Note no 'PColor' base here def draw(self) -> str: return "probably gray" def comlex_method(self) -> int: # class needs to implement this nice: NiceColor another: ImplicitColor def represent(c: PColor) -> None: print(c.draw(), c.complex_method()) represent(nice) # OK represent(another) # Also OK Note that there is little difference between explicit and implicit subtypes, the main benefit of explicit subclassing is to get some protocol methods "for free". In addition, type checkers can statically verify that the class actually implements the protocol correctly:: class RGB(Protocol): rgb: Tuple[int, int, int] @abstractmethod def intensity(self) -> int: return 0 class Point(RGB): def __init__(self, red: int, green: int, blue: str) -> None: self.rgb = red, green, blue # Error, 'blue' must be 'int' # Type checker might warn that 'intensity' is not defined A class can explicitly inherit from multiple protocols and also form normal classes. In this case methods are resolved using normal MRO and a type checker verifies that all subtyping are correct. The semantics of ``@abstractmethod`` is not changed, all of them must be implemented by an explicit subclass before it can be instantiated. Merging and extending protocols ------------------------------- The general philosophy is that protocols are mostly like regular ABCs, but a static type checker will handle them specially. Subclassing a protocol class would not turn the subclass into a protocol unless it also has ``typing.Protocol`` as an explicit base class. Without this base, the class is "downgraded" to a regular ABC that cannot be used with structural subtyping. The rationale for this rule is that we don't want to accidentally have some class act as a protocol just because one of its base classes happens to be one. We still slightly prefer nominal subtyping over structural subtyping in the static typing world. A subprotocol can be defined by having *both* one or more protocols as immediate base classes and also having ``typing.Protocol`` as an immediate base class:: from typing import Sized, Protocol class SizedAndClosable(Sized, Protocol): def close(self) -> None: ... Now the protocol ``SizedAndClosable`` is a protocol with two methods, ``__len__`` and ``close``. If one omits ``Protocol`` in the base class list, this would be a regular (non-protocol) class that must implement ``Sized``. Alternatively, one can implement ``SizedAndClosable`` protocol by merging the ``SupportsClose`` protocol from the example in the `definition`_ section with ``typing.Sized``:: from typing import Sized class SupportsClose(Protocol): def close(self) -> None: ... class SizedAndClosable(Sized, SupportsClose, Protocol): pass The two definitions of ``SizedAndClosable`` are equivalent. Subclass relationships between protocols are not meaningful when considering subtyping, since structural compatibility is the criterion, not the MRO. If ``Protocol`` is included in the base class list, all the other base classes must be protocols. A protocol can't extend a regular class, see `rejected`_ ideas for rationale. Note that rules around explicit subclassing are different from regular ABCs, where abstractness is simply defined by having at least one abstract method being unimplemented. Protocol classes must be marked *explicitly*. Generic protocols ----------------- Generic protocols are important. For example, ``SupportsAbs``, ``Iterable`` and ``Iterator`` are generic protocols. They are defined similar to normal non-protocol generic types:: class Iterable(Protocol[T]): @abstractmethod def __iter__(self) -> Iterator[T]: ... ``Protocol[T, S, ...]`` is allowed as a shorthand for ``Protocol, Generic[T, S, ...]``. User-defined generic protocols support explicitly declared variance. Type checkers will warn if the inferred variance is different from the declared variance. Examples:: T = TypeVar('T') T_co = TypeVar('T_co', covariant=True) T_contra = TypeVar('T_contra', contravariant=True) class Box(Protocol[T_co]): def content(self) -> T_co: ... box: Box[float] second_box: Box[int] box = second_box # This is OK due to the covariance of 'Box'. class Sender(Protocol[T_contra]): def send(self, data: T_contra) -> int: ... sender: Sender[float] new_sender: Sender[int] new_sender = sender # OK, 'Sender' is contravariant. class Proto(Protocol[T]): attr: T # this class is invariant, since it has a mutable attribute var: Proto[float] another_var: Proto[int] var = another_var # Error! 'Proto[float]' is incompatible with 'Proto[int]'. Note that unlike nominal classes, de-facto covariant protocols cannot be declared as invariant, since this can break transitivity of subtyping (see `rejected`_ ideas for details). For example:: T = TypeVar('T') class AnotherBox(Protocol[T]): # Error, this protocol is covariant in T, def content(self) -> T: # not invariant. ... Recursive protocols ------------------- Recursive protocols are also supported. Forward references to the protocol class names can be given as strings as specified by PEP 484. Recursive protocols are useful for representing self-referential data structures like trees in an abstract fashion:: class Traversable(Protocol): def leaves(self) -> Iterable['Traversable']: ... Note that for recursive protocols, a class is considered a subtype of the protocol in situations where the decision depends on itself. Continuing the previous example:: class SimpleTree: def leaves(self) -> List['SimpleTree']: ... root: Traversable = SimpleTree() # OK class Tree(Generic[T]): def leaves(self) -> List['Tree[T]']: ... def walk(graph: Traversable) -> None: ... tree: Tree[float] = Tree() walk(tree) # OK, 'Tree[float]' is a subtype of 'Traversable' Using Protocols =============== Subtyping relationships with other types ---------------------------------------- Protocols cannot be instantiated, so there are no values whose runtime type is a protocol. For variables and parameters with protocol types, subtyping relationships are subject to the following rules: * A protocol is never a subtype of a concrete type. * A concrete type ``X`` is a subtype of protocol ``P`` if and only if ``X`` implements all protocol members of ``P`` with compatible types. In other words, subtyping with respect to a protocol is always structural. * A protocol ``P1`` is a subtype of another protocol ``P2`` if ``P1`` defines all protocol members of ``P2`` with compatible types. Generic protocol types follow the same rules of variance as non-protocol types. Protocol types can be used in all contexts where any other types can be used, such as in ``Union``, ``ClassVar``, type variables bounds, etc. Generic protocols follow the rules for generic abstract classes, except for using structural compatibility instead of compatibility defined by inheritance relationships. Unions and intersections of protocols ------------------------------------- ``Union`` of protocol classes behaves the same way as for non-protocol classes. For example:: from typing import Union, Optional, Protocol class Exitable(Protocol): def exit(self) -> int: ... class Quittable(Protocol): def quit(self) -> Optional[int]: ... def finish(task: Union[Exitable, Quittable]) -> int: ... class DefaultJob: ... def quit(self) -> int: return 0 finish(DefaultJob()) # OK One can use multiple inheritance to define an intersection of protocols. Example:: from typing import Sequence, Hashable class HashableFloats(Sequence[float], Hashable, Protocol): pass def cached_func(args: HashableFloats) -> float: ... cached_func((1, 2, 3)) # OK, tuple is both hashable and sequence If this will prove to be a widely used scenario, then a special intersection type construct could be added in future as specified by PEP 483, see `rejected`_ ideas for more details. ``Type[]`` with protocols ------------------------- Variables and parameters annotated with ``Type[Proto]`` accept only concrete (non-protocol) subtypes of ``Proto``. The main reason for this is to allow instantiation of parameters with such type. For example:: class Proto(Protocol): @abstractmethod def meth(self) -> int: ... class Concrete: def meth(self) -> int: return 42 def fun(cls: Type[Proto]) -> int: return cls().meth() # OK fun(Proto) # Error fun(Concrete) # OK The same rule applies to variables:: var: Type[Proto] var = Proto # Error var = Concrete # OK var().meth() # OK Assigning an ABC or a protocol class to a variable is allowed if it is not explicitly typed, and such assignment creates a type alias. For normal (non-abstract) classes, the behavior of ``Type[]`` is not changed. ``NewType()`` and type aliases ------------------------------ Protocols are essentially anonymous. To emphasize this point, static type checkers might refuse protocol classes inside ``NewType()`` to avoid an illusion that a distinct type is provided:: from typing import NewType, Protocol, Iterator class Id(Protocol): code: int secrets: Iterator[bytes] UserId = NewType('UserId', Id) # Error, can't provide distinct type In contrast, type aliases are fully supported, including generic type aliases:: from typing import TypeVar, Reversible, Iterable, Sized T = TypeVar('T') class SizedIterable(Iterable[T], Sized, Protocol): pass CompatReversible = Union[Reversible[T], SizedIterable[T]] .. _discussion: ``@runtime`` decorator and narrowing types by ``isinstance()`` -------------------------------------------------------------- The default semantics is that ``isinstance()`` and ``issubclass()`` fail for protocol types. This is in the spirit of duck typing -- protocols basically would be used to model duck typing statically, not explicitly at runtime. However, it should be possible for protocol types to implement custom instance and class checks when this makes sense, similar to how ``Iterable`` and other ABCs in ``collections.abc`` and ``typing`` already do it, but this is limited to non-generic and unsubscripted generic protocols (``Iterable`` is statically equivalent to ``Iterable[Any]`). The ``typing`` module will define a special ``@runtime`` class decorator that provides the same semantics for class and instance checks as for ``collections.abc`` classes, essentially making them "runtime protocols":: from typing import runtime, Protocol @runtime class Closable(Protocol): def close(self): ... assert isinstance(open('some/file'), Closable) Static type checkers will understand ``isinstance(x, Proto)`` and ``issubclass(C, Proto)`` for protocols defined with this decorator (as they already do for ``Iterable`` etc.). Static type checkers will narrow types after such checks by the type erased ``Proto`` (i.e. with all variables having type ``Any`` and all methods having type ``Callable[..., Any]``). Note that ``isinstance(x, Proto[int])`` etc. will always fail in agreement with PEP 484. Examples:: from typing import Iterable, Iterator, Sequence def process(items: Iterable[int]) -> None: if isinstance(items, Iterator): # 'items' has type 'Iterator[int]' here elif isinstance(items, Sequence[int]): # Error! Can't use 'isinstance()' with subscripted protocols Note that instance checks are not 100% reliable statically, this is why this behavior is opt-in, see section on `rejected`_ ideas for examples. Using Protocols in Python 2.7 - 3.5 =================================== Variable annotation syntax was added in Python 3.6, so that the syntax for defining protocol variables proposed in `specification`_ section can't be used if support for earlier versions is needed. To define these in a manner compatible with older versions of Python one can use properties. Properties can be settable and/or abstract if needed:: class Foo(Protocol): @property def c(self) -> int: return 42 # Default value can be provided for property... @abstractproperty def d(self) -> int: # ... or it can be abstract return 0 Also function type comments can be used as per PEP 484 (for example to provide compatibility with Python 2). The ``typing`` module changes proposed in this PEP will also be backported to earlier versions via the backport currently available on PyPI. Runtime Implementation of Protocol Classes ========================================== Implementation details ---------------------- The runtime implementation could be done in pure Python without any effects on the core interpreter and standard library except in the ``typing`` module, and a minor update to ``collections.abc``: * Define class ``typing.Protocol`` similar to ``typing.Generic``. * Implement metaclass functionality to detect whether a class is a protocol or not. Add a class attribute ``_is_protocol = True`` if that is the case. Verify that a protocol class only has protocol base classes in the MRO (except for object). * Implement ``@runtime`` that allows ``__subclasshook__()`` performing structural instance and subclass checks as in ``collections.abc`` classes. * All structural subtyping checks will be performed by static type checkers, such as ``mypy`` [mypy]_. No additional support for protocol validation will be provided at runtime. * Classes ``Mapping``, ``MutableMapping``, ``Sequence``, and ``MutableSequence`` in ``collections.abc`` module will support structural instance and subclass checks (like e.g. ``collections.abc.Iterable``). Changes in the typing module ---------------------------- The following classes in ``typing`` module will be protocols: * ``Callable`` * ``Awaitable`` * ``Iterable``, ``Iterator`` * ``AsyncIterable``, ``AsyncIterator`` * ``Hashable`` * ``Sized`` * ``Container`` * ``Collection`` * ``Reversible`` * ``Sequence``, ``MutableSequence`` * ``Mapping``, ``MutableMapping`` * ``ContextManager``, ``AsyncContextManager`` * ``SupportsAbs`` (and other ``Supports*`` classes) Most of these classes are small and conceptually simple. It is easy to see what are the methods these protocols implement, and immediately recognize the corresponding runtime protocol counterpart. Practically, few changes will be needed in ``typing`` since some of these classes already behave the necessary way at runtime. Most of these will need to be updated only in the corresponding ``typeshed`` stubs [typeshed]_. All other concrete generic classes such as ``List``, ``Set``, ``IO``, ``Deque``, etc are sufficiently complex that it makes sense to keep them non-protocols (i.e. require code to be explicit about them). Also, it is too easy to leave some methods unimplemented by accident, and explicitly marking the subclass relationship allows type checkers to pinpoint the missing implementations. Introspection ------------- The existing class introspection machinery (``dir``, ``__annotations__`` etc) can be used with protocols. In addition, all introspection tools implemented in the ``typing`` module will support protocols. Since all attributes need to be defined in the class body based on this proposal, protocol classes will have even better perspective for introspection than regular classes where attributes can be defined implicitly -- protocol attributes can't be initialized in ways that are not visible to introspection (using ``setattr()``, assignment via ``self``, etc.). Still, some things like types of attributes will not be visible at runtime in Python 3.5 and earlier, but this looks like a reasonable limitation. There will be only limited support of ``isinstance()`` and ``issubclass()`` as discussed above (these will *always* fail with ``TypeError`` for subscripted generic protocols, since a reliable answer could not be given at runtime in this case). But together with other introspection tools this give a reasonable perspective for runtime type checking tools. .. _rejected: Rejected/Postponed Ideas ======================== The ideas in this section were previously discussed in [several]_ [discussions]_ [elsewhere]_. Make every class a protocol by default -------------------------------------- Some languages such as Go make structural subtyping the only or the primary form of subtyping. We could achieve a similar result by making all classes protocols by default (or even always). However we believe that it is better to require classes to be explicitly marked as protocols, for the following reasons: * Protocols don't have some properties of regular classes. In particular, ``isinstance()``, as defined for normal classes, is based on the nominal hierarchy. In order to make everything a protocol by default, and have ``isinstance()`` work would require changing its semantics, which won't happen. * Protocol classes should generally not have many method implementations, as they describe an interface, not an implementation. Most classes have many method implementations, making them bad protocol classes. * Experience suggests that many classes are not practical as protocols anyway, mainly because their interfaces are too large, complex or implementation-oriented (for example, they may include de facto private attributes and methods without a ``__`` prefix). * Most actually useful protocols in existing Python code seem to be implicit. The ABCs in ``typing`` and ``collections.abc`` are rather an exception, but even they are recent additions to Python and most programmers do not use them yet. * Many built-in functions only accept concrete instances of ``int`` (and subclass instances), and similarly for other built-in classes. Making ``int`` a structural type wouldn't be safe without major changes to the Python runtime, which won't happen. Protocols subclassing normal classes ------------------------------------ The main rationale to prohibit this is to preserve transitivity of subtyping, consider this example:: from typing import Protocol class Base: attr: str class Proto(Base, Protocol): def meth(self) -> int: ... class C: attr: str def meth(self) -> int: return 0 Now, ``C`` is a subtype of ``Proto``, and ``Proto`` is a subtype of ``Base``. But ``C`` cannot be a subtype of ``Base`` (since the latter is not a protocol). This situation would be really weird. In addition, there is an ambiguity about whether attributes of ``Base`` should become protocol members of ``Proto``. Support optional protocol members --------------------------------- We can come up with examples where it would be handy to be able to say that a method or data attribute does not need to be present in a class implementing a protocol, but if it is present, it must conform to a specific signature or type. One could use a ``hasattr()`` check to determine whether they can use the attribute on a particular instance. Languages such as TypeScript have similar features and apparently they are pretty commonly used. The current realistic potential use cases for protocols in Python don't require these. In the interest of simplicity, we propose to not support optional methods or attributes. We can always revisit this later if there is an actual need. Allow only protocol methods and force use of getters and setters ---------------------------------------------------------------- One could argue that protocols typically only define methods, but not variables. However, using getters and setters in cases where only a simple variable is needed would be quite unpythonic. Moreover, the widespread use of properties (that often act as type validators) in large code bases is partially due to previous absence of static type checkers for Python, the problem that PEP 484 and this PEP are aiming to solve. For example:: # without static types class MyClass: @property def my_attr(self): return self._my_attr @my_attr.setter def my_attr(self, value): if not isinstance(value, int): raise ValidationError("An integer expected for my_attr") self._my_attr = value # with static types class MyClass: my_attr: int Support non-protocol members ---------------------------- There was an idea to make some methods "non-protocol" (i.e. not necessary to implement, and inherited in explicit subclassing), but it was rejected, since this complicates things. For example, consider this situation:: class Proto(Protocol): @abstractmethod def first(self) -> int: raise NotImplementedError def second(self) -> int: return self.first() + 1 def fun(arg: Proto) -> None: arg.second() The question is should this be an error? We think most people would expect this to be valid. Therefore, to be on the safe side, we need to require both methods to be implemented in implicit subclasses. In addition, if one looks at definitions in ``collections.abc``, there are very few methods that could be considered "non-protocol". Therefore, it was decided to not introduce "non-protocol" methods. There is only one downside to this: it will require some boilerplate for implicit subtypes of ``Mapping`` and few other "large" protocols. But, this applies to few "built-in" protocols (like ``Mapping`` and ``Sequence``) and people are already subclassing them. Also, such style is discouraged for user-defined protocols. It is recommended to create compact protocols and combine them. Make protocols interoperable with other approaches -------------------------------------------------- The protocols as described here are basically a minimal extension to the existing concept of ABCs. We argue that this is the way they should be understood, instead of as something that *replaces* Zope interfaces, for example. Attempting such interoperabilities will significantly complicate both the concept and the implementation. On the other hand, Zope interfaces are conceptually a superset of protocols defined here, but using an incompatible syntax to define them, because before PEP 526 there was no straightforward way to annotate attributes. In the 3.6+ world, ``zope.interface`` might potentially adopt the ``Protocol`` syntax. In this case, type checkers could be taught to recognize interfaces as protocols and make simple structural checks with respect to them. Use assignments to check explicitly that a class implements a protocol ---------------------------------------------------------------------- In the Go language the explicit checks for implementation are performed via dummy assignments [golang]_. Such a way is also possible with the current proposal. Example:: class A: def __len__(self) -> float: return ... _: Sized = A() # Error: A.__len__ doesn't conform to 'Sized' # (Incompatible return type 'float') This approach moves the check away from the class definition and it almost requires a comment as otherwise the code probably would not make any sense to an average reader -- it looks like dead code. Besides, in the simplest form it requires one to construct an instance of ``A``, which could be problematic if this requires accessing or allocating some resources such as files or sockets. We could work around the latter by using a cast, for example, but then the code would be ugly. Therefore we discourage the use of this pattern. Support ``isinstance()`` checks by default ------------------------------------------ The problem with this is instance checks could be unreliable, except for situations where there is a common signature convention such as ``Iterable``. For example:: class P(Protocol): def common_method_name(self, x: int) -> int: ... class X: <a bunch of methods> def common_method_name(self) -> None: ... # Note different signature def do_stuff(o: Union[P, X]) -> int: if isinstance(o, P): return o.common_method_name(1) # oops, what if it's an X instance? Another potentially problematic case is assignment of attributes *after* instantiation:: class P(Protocol): x: int class C: def initialize(self) -> None: self.x = 0 c = C() isinstance(c1, P) # False c.initialize() isinstance(c, P) # True def f(x: Union[P, int]) -> None: if isinstance(x, P): # static type of x is P here ... else: # type of x is "int" here? print(x + 1) f(C()) # oops We argue that requiring an explicit class decorator would be better, since one can then attach warnings about problems like this in the documentation. The user would be able to evaluate whether the benefits outweigh the potential for confusion for each protocol and explicitly opt in -- but the default behavior would be safer. Finally, it will be easy to make this behavior default if necessary, while it might be problematic to make it opt-in after being default. Provide a special intersection type construct --------------------------------------------- There was an idea to allow ``Proto = All[Proto1, Proto2, ...]`` as a shorthand for:: class Proto(Proto1, Proto2, ..., Protocol): pass However, it is not yet clear how popular/useful it will be and implementing this in type checkers for non-protocol classes could be difficult. Finally, it will be very easy to add this later if needed. Prohibit explicit subclassing of protocols by non-protocols ----------------------------------------------------------- This was rejected for the following reasons: * Backward compatibility: People are already using ABCs, including generic ABCs from ``typing`` module. If we prohibit explicit subclassing of these ABCs, then quite a lot of code will break. * Convenience: There are existing protocol-like ABCs (that will be turned into protocols) that have many useful "mix-in" (non-abstract) methods. For example in the case of ``Sequence`` one only needs to implement ``__getitem__`` and ``__len__`` in an explicit subclass, and one gets ``__iter__``, ``__contains__``, ``__reversed__``, ``index``, and ``count`` for free. * Explicit subclassing makes it explicit that a class implements a particular protocol, making subtyping relationships easier to see. * Type checkers can warn about missing protocol members or members with incompatible types more easily, without having to use hacks like dummy assignments discussed above in this section. * Explicit subclassing makes it possible to force a class to be considered a subtype of a protocol (by using ``# type: ignore`` together with an explicit base class) when it is not strictly compatible, such as when it has an unsafe override. Covariant subtyping of mutable attributes ----------------------------------------- Rejected because covariant subtyping of mutable attributes is not safe. Consider this example:: class P(Protocol): x: float def f(arg: P) -> None: arg.x = 0.42 class C: x: int c = C() f(c) # Would typecheck if covariant subtyping # of mutable attributes were allowed c.x >> 1 # But this fails at runtime It was initially proposed to allow this for practical reasons, but it was subsequently rejected, since this may mask some hard to spot bugs. Overriding inferred variance of protocol classes ------------------------------------------------ It was proposed to allow declaring protocols as invariant if they are actually covariant or contravariant (as it is possible for nominal classes, see PEP 484). However, it was decided not to do this because of several downsides: * Declared protocol invariance breaks transitivity of sub-typing. Consider this situation:: T = TypeVar('T') class P(Protocol[T]): # Declared as invariant def meth(self) -> T: ... class C: def meth(self) -> float: ... class D(C): def meth(self) -> int: ... Now we have that ``D`` is a subtype of ``C``, and ``C`` is a subtype of ``P[float]``. But ``D`` is *not* a subtype of ``P[float]`` since ``D`` implements ``P[int]``, and ``P`` is invariant. There is a possibility to "cure" this by looking for protocol implementations in MROs but this will be too complex in a general case, and this "cure" requires abandoning simple idea of purely structural subtyping for protocols. * Subtyping checks will always require type inference for protocols. In the above example a user may complain: "Why did you infer ``P[int]`` for my ``D``? It implements ``P[float]``!". Normally, inference can be overruled by an explicit annotation, but here this will require explicit subclassing, defeating the purpose of using protocols. * Allowing overriding variance will make impossible more detailed error messages in type checkers citing particular conflicts in member type signatures. * Finally, explicit is better than implicit in this case. Requiring user to declare correct variance will simplify understanding the code and will avoid unexpected errors at the point of use. Support adapters and adaptation ------------------------------- Adaptation was proposed by PEP 246 (rejected) and is supported by ``zope.interface``, see https://docs.zope.org/zope.interface/adapter.html. Adapters is quite an advanced concept, and PEP 484 supports unions and generic aliases that can be used instead of adapters. This can be illustrated with an example of ``Iterable`` protocol, there is another way of supporting iteration by providing ``__getitem__`` and ``__len__``. If a function supports both this way and the now standard ``__iter__`` method, then it could be annotated by a union type:: class OldIterable(Sized, Protocol[T]): def __getitem__(self, item: int) -> T: ... CompatIterable = Union[Iterable[T], OldIterable[T]] class A: def __iter__(self) -> Iterator[str]: ... class B: def __len__(self) -> int: ... def __getitem__(self, item: int) -> str: ... def iterate(it: CompatIterable[str]) -> None: ... iterate(A()) # OK iterate(B()) # OK Since there is a reasonable alternative for such cases with existing tooling, it is therefore proposed not to include adaptation in this PEP. Backwards Compatibility ======================= This PEP is almost fully backwards compatible. Few collection classes such as ``Sequence`` and ``Mapping`` will be turned into runtime protocols, therefore results of ``isinstance()`` checks are going to change in some edge cases. For example, a class that implements the ``Sequence`` protocol but does not explicitly inherit from ``Sequence`` currently returns ``False`` in corresponding instance and class checks. With this PEP implemented, such checks will return ``True``. Implementation ============== A working implementation of this PEP for ``mypy`` type checker is found on GitHub repo at https://github.com/ilevkivskyi/mypy/tree/protocols, corresponding ``typeshed`` stubs for more flavor are found at https://github.com/ilevkivskyi/typeshed/tree/protocols. Installation steps:: git clone --recurse-submodules https://github.com/ilevkivskyi/mypy/ cd mypy && git checkout protocols && cd typeshed git remote add proto https://github.com/ilevkivskyi/typeshed git fetch proto && git checkout proto/protocols cd .. && git add typeshed && sudo python3 -m pip install -U . The runtime implementation of protocols in ``typing`` module is found at https://github.com/ilevkivskyi/typehinting/tree/protocols. The version of ``collections.abc`` with structural behavior for mappings and sequences is found at https://github.com/ilevkivskyi/cpython/tree/protocols. References ========== .. [typing] https://docs.python.org/3/library/typing.html .. [wiki-structural] https://en.wikipedia.org/wiki/Structural_type_system .. [zope-interfaces] https://zopeinterface.readthedocs.io/en/latest/ .. [abstract-classes] https://docs.python.org/3/library/abc.html .. [collections-abc] https://docs.python.org/3/library/collections.abc.html .. [typescript] https://www.typescriptlang.org/docs/handbook/interfaces.html .. [golang] https://golang.org/doc/effective_go.html#interfaces_and_types .. [data-model] https://docs.python.org/3/reference/datamodel.html#special-method-names .. [typeshed] https://github.com/python/typeshed/ .. [mypy] http://github.com/python/mypy/ .. [several] https://mail.python.org/pipermail/python-ideas/2015-September/thread.html#35... .. [discussions] https://github.com/python/typing/issues/11 .. [elsewhere] https://github.com/python/peps/pull/224 Copyright ========= This document has been placed in the public domain.
On Wed, 24 May 2017 23:31:47 +0200 Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
Hi all,
After collecting suggestions in the previous discussion on python-dev https://mail.python.org/pipermail/python-dev/2017-March/thread.html#147629 and playing with implementation, here is an updated version of PEP 544.
-- Ivan
A link for those who don't like reading long e-mails: https://www.python.org/dev/peps/pep-0544/
=========================
PEP: 544 Title: Protocols
Can you give this PEP a more explicit title? "Protocols" sound like network protocols to me. Regards Antoine.
On 25 May 2017 at 21:26, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 24 May 2017 23:31:47 +0200 Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
Hi all,
After collecting suggestions in the previous discussion on python-dev https://mail.python.org/pipermail/python-dev/2017-March/thread.html#147629 and playing with implementation, here is an updated version of PEP 544.
-- Ivan
A link for those who don't like reading long e-mails: https://www.python.org/dev/peps/pep-0544/
=========================
PEP: 544 Title: Protocols
Can you give this PEP a more explicit title? "Protocols" sound like network protocols to me.
Especially given the existing use of the term in an asyncio context: https://www.python.org/dev/peps/pep-3156/#transports-and-protocols Given the abstract, I'd suggest "Structural Subtyping" as a suitable title for the PEP. That said, I think it's fine to use "protocol" throughout the rest of the PEP as is currently the case - object protocols and network protocols are clearly different things, it's just the bare word "Protocols" appearing as a PEP title in the PEP index with no other context that's potentially confusing. I'm +1 on the general idea of the PEP, and only have one question regarding the specifics. Given: import typing class MyContainer: def __len__(self) -> int: ... def close(self) -> None: ... Would that be enough for a static typechecker to consider MyContainer a structural subtype of both typing.Sized and SupportsClose from the PEP, even though neither is imported explicitly into the module? I'm assuming the answer is "Yes", but I didn't see it explicitly stated anywhere. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
2017-05-25 7:19 GMT-07:00 Nick Coghlan <ncoghlan@gmail.com>:
On 25 May 2017 at 21:26, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 24 May 2017 23:31:47 +0200 Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
Hi all,
After collecting suggestions in the previous discussion on python-dev https://mail.python.org/pipermail/python-dev/2017- March/thread.html#147629 and playing with implementation, here is an updated version of PEP 544.
-- Ivan
A link for those who don't like reading long e-mails: https://www.python.org/dev/peps/pep-0544/
=========================
PEP: 544 Title: Protocols
Can you give this PEP a more explicit title? "Protocols" sound like network protocols to me.
Especially given the existing use of the term in an asyncio context: https://www.python.org/dev/peps/pep-3156/#transports-and-protocols
Given the abstract, I'd suggest "Structural Subtyping" as a suitable title for the PEP.
That said, I think it's fine to use "protocol" throughout the rest of the PEP as is currently the case - object protocols and network protocols are clearly different things, it's just the bare word "Protocols" appearing as a PEP title in the PEP index with no other context that's potentially confusing.
I'm +1 on the general idea of the PEP, and only have one question regarding the specifics. Given:
import typing
class MyContainer: def __len__(self) -> int: ... def close(self) -> None: ...
Would that be enough for a static typechecker to consider MyContainer a structural subtype of both typing.Sized and SupportsClose from the PEP, even though neither is imported explicitly into the module? I'm assuming the answer is "Yes", but I didn't see it explicitly stated anywhere.
Yes, that should be the case. Specifically, if you pass a MyContainer object to a function whose argument is annotated as typing.Sized or SupportsClose, a type checker should accept that call.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ jelle.zijlstra%40gmail.com
On Thu, May 25, 2017 at 7:19 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Given the abstract, I'd suggest "Structural Subtyping" as a suitable title for the PEP.
Maybe even "Structural Subtyping (a.k.a. Duck Typing)"
That said, I think it's fine to use "protocol" throughout the rest of the PEP as is currently the case - object protocols and network protocols are clearly different things, it's just the bare word "Protocols" appearing as a PEP title in the PEP index with no other context that's potentially confusing.
Agreed.
I'm +1 on the general idea of the PEP, and only have one question regarding the specifics. Given:
import typing
class MyContainer: def __len__(self) -> int: ... def close(self) -> None: ...
Would that be enough for a static typechecker to consider MyContainer a structural subtype of both typing.Sized and SupportsClose from the PEP, even though neither is imported explicitly into the module? I'm assuming the answer is "Yes", but I didn't see it explicitly stated anywhere.
Yes. Imports don't enter the matter. (Things get tied together at the call sites.) @Ivan: if there isn't an example that makes this clear we should add one. -- --Guido van Rossum (python.org/~guido)
On 24/05/17 14:31, Ivan Levkivskyi wrote:
Hi all,
After collecting suggestions in the previous discussion on python-dev https://mail.python.org/pipermail/python-dev/2017-March/thread.html#147629 and playing with implementation, here is an updated version of PEP 544.
-- Ivan
I really like this PEP in general. I think this brings the type system for type-hints closer to Python semantics. But there are a few points I disagree with. I don't think Protocol types should be tied to ABCs. It just makes things more complex with no obvious benefit. I also think all references to 'isinstance' and 'issubclass' should be removed. Type-hints should not have runtime semantics, beyond those that they have as classes. In fact, there is no need for protocol types to be classes at all. Cheers, Mark.
On Thu, May 25, 2017 at 10:49 AM, Mark Shannon <mark@hotpy.org> wrote:
I really like this PEP in general. I think this brings the type system for type-hints closer to Python semantics.
Thank you.
But there are a few points I disagree with. I don't think Protocol types should be tied to ABCs. It just makes things more complex with no obvious benefit.
There are backwards compatibility benefits -- we could make e.g. Sequence a Protocol in Python 3.7 and it would be possible to write code that inherits from Sequence and works in Python 3.6 and 3.7. For this to work we need some support for non-abstract methods in Protocols.
I also think all references to 'isinstance' and 'issubclass' should be removed. Type-hints should not have runtime semantics, beyond those that they have as classes.
Again, backwards compatibility.
In fact, there is no need for protocol types to be classes at all.
That's pretty much a separate discussion (see https://github.com/python/typing/issues/432). -- --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
Thanks everyone for interesting suggestions! @Antoine @Guido: Some of the possible options for the title are: * Protocols (structural subtyping) * Protocols (static duck typing) * Structural subtyping (static duck typing) which one do you prefer? @Nick: Yes, explicit imports are not necessary for static type checkers (I will add a short comment about this). @Mark: I agree with Guido on all points here. For example, collections.abc.Iterable is already a class, and lots of code uses isinstance(obj, collections.abc.Iterable) and similar checks with other ABCs (also in a structural manner, i.e. via __subclasshook__). So that disabling this will case many breakages. The question of whether typing.Iterable[int] should be a class is independent (orthogonal) and does not belong to this PEP. -- Ivan
Some of the possible options for the title are It seems like you're talking about something most other languages would refer to as "Interfaces". What is unique about this proposal that would call for not using the industry standard language?
Type-hints should not have runtime semantics, beyond those that they have as classes lots of code uses isinstance(obj, collections.abc.Iterable) and similar checks with other ABCs Having interfaces defined as something extended from abc doesn't necessitate their use at runtime, but it does open up a great deal of options for those of us who want to do so. I've been leveraging abc for a few years now to implement a lightweight version of what this PEP is attempting to achieve (https://github.com/kevinconway/iface). Once you start getting into dynamically loaded plugins you often lose the ability to strictly enforce the shape of the input until runtime. In those cases, I've found it exceedingly useful to add 'isinstance' and 'issubbclass' as assertions to input of untrusted types for the tests and non-production deployments. For a perf boost in prod you can throw the -O flag and strip out the assertions to remove the runtime checks. I've found that to be a valuable pattern.
On Sun, May 28, 2017 at 8:21 AM Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
Thanks everyone for interesting suggestions!
@Antoine @Guido: Some of the possible options for the title are: * Protocols (structural subtyping) * Protocols (static duck typing) * Structural subtyping (static duck typing) which one do you prefer?
@Nick: Yes, explicit imports are not necessary for static type checkers (I will add a short comment about this).
@Mark: I agree with Guido on all points here. For example, collections.abc.Iterable is already a class, and lots of code uses isinstance(obj, collections.abc.Iterable) and similar checks with other ABCs (also in a structural manner, i.e. via __subclasshook__). So that disabling this will case many breakages. The question of whether typing.Iterable[int] should be a class is independent (orthogonal) and does not belong to this PEP.
-- Ivan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/kevinjacobconway%40gmail....
On 28 May 2017 at 16:13, Kevin Conway <kevinjacobconway@gmail.com> wrote:
Some of the possible options for the title are It seems like you're talking about something most other languages would refer to as "Interfaces". What is unique about this proposal that would call for not using the industry standard language?
Well, I would say there is no "industry standard language" about structural subtyping. There are interfaces, protocols, traits, mixins, typeclasses, roles, and probably some other terms I am not aware of - all with subtly different semantics in different languages. There are several reasons why we use term protocol and not interface. Two important reasons for me are: * The term protocol is already de-facto standard in Python for things like sequence protocol, iterator protocol, descriptor protocol, etc. * Protocols are very different from Java interfaces in one important aspect: they don't require explicit declaration of implementation, they are mainly oriented on duck-typing. Maybe we need to add a short section to rejected ideas?
lots of code uses isinstance(obj, collections.abc.Iterable) and similar checks with other ABCs Having interfaces defined as something extended from abc doesn't necessitate their use at runtime, but it does open up a great deal of
Type-hints should not have runtime semantics, beyond those that they have as classes options for those of us who want to do so. I've been leveraging abc for a few years now to implement a lightweight version of what this PEP is attempting to achieve
IIUC this is not the main goal of the PEP, the main goal is to provide support/standard for _static_ structural subtyping. Possibility to use protocols in runtime context is rather a minor bonus that exists mostly to provide a seamless transition for projects that already use ABCs. -- Ivan
On Sun, May 28, 2017 at 8:27 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 28 May 2017 at 16:13, Kevin Conway <kevinjacobconway@gmail.com> wrote:
Some of the possible options for the title are It seems like you're talking about something most other languages would refer to as "Interfaces". What is unique about this proposal that would call for not using the industry standard language?
Well, I would say there is no "industry standard language" about structural subtyping. There are interfaces, protocols, traits, mixins, typeclasses, roles, and probably some other terms I am not aware of - all with subtly different semantics in different languages. There are several reasons why we use term protocol and not interface. Two important reasons for me are: * The term protocol is already de-facto standard in Python for things like sequence protocol, iterator protocol, descriptor protocol, etc. * Protocols are very different from Java interfaces in one important aspect: they don't require explicit declaration of implementation, they are mainly oriented on duck-typing. Maybe we need to add a short section to rejected ideas?
If you feel like it. Regarding the title, I'd like to keep the word Protocol in the title too, so I'd go with "Protocols: Structural subtyping (duck typing)" -- hope that's not too long to fit in a PEP title field.
lots of code uses isinstance(obj, collections.abc.Iterable) and similar checks with other ABCs Having interfaces defined as something extended from abc doesn't necessitate their use at runtime, but it does open up a great deal of
Type-hints should not have runtime semantics, beyond those that they have as classes options for those of us who want to do so. I've been leveraging abc for a few years now to implement a lightweight version of what this PEP is attempting to achieve
IIUC this is not the main goal of the PEP, the main goal is to provide support/standard for _static_ structural subtyping. Possibility to use protocols in runtime context is rather a minor bonus that exists mostly to provide a seamless transition for projects that already use ABCs.
Is something like this already in the PEP? It deserves attention in one of the earlier sections. -- --Guido van Rossum (python.org/~guido)
On 28 May 2017 at 19:40, Guido van Rossum <guido@python.org> wrote:
On Sun, May 28, 2017 at 8:27 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
[...]
Regarding the title, I'd like to keep the word Protocol in the title too, so I'd go with "Protocols: Structural subtyping (duck typing)" -- hope that's not too long to fit in a PEP title field.
OK, this looks reasonable.
Type-hints should not have runtime semantics, beyond those that they
lots of code uses isinstance(obj, collections.abc.Iterable) and similar checks with other ABCs Having interfaces defined as something extended from abc doesn't necessitate their use at runtime, but it does open up a great deal of
have as classes options for those of us who want to do so. I've been leveraging abc for a few years now to implement a lightweight version of what this PEP is attempting to achieve
IIUC this is not the main goal of the PEP, the main goal is to provide support/standard for _static_ structural subtyping. Possibility to use protocols in runtime context is rather a minor bonus that exists mostly to provide a seamless transition for projects that already use ABCs.
Is something like this already in the PEP? It deserves attention in one of the earlier sections.
Yes, similar discussions appear in "Rationale and Goals", and "Existing approaches to structural subtyping". Maybe I need to polish the text there adding more focus on static typing. -- Ivan
From the PEP: The problem with them is that a class has to be explicitly marked to support them, which is unpythonic and unlike what one would normally do in idiomatic dynamically typed Python code. The same problem appears with user-defined ABCs: they must be explicitly subclassed or registered. Neither of these statements are entirely true. The semantics of `abc` allow for exactly the kind of detached interfaces this PEP is attempting to provide. The `abc.ABCMeta` provides the `__subclasshook__` which allows a developer to override the default check of internal `abc` registry state with virtually any logic that determines the relationship of a class with the interface. The prior art I linked to earlier in the thread uses this feature to generically support `issubclass` and `isinstance` in such a way that the PEPs goal is achieved.
The intention of this PEP is to solve all these problems by allowing users to write the above code without explicit base classes in the class definition As I understand this goal, you want to take what some of us in the community have been building ourselves and make it canonical via the stdlib. What strikes me as odd is that the focus is on 3rd party type checkers first rather than introducing this as a feature of the language runtime and then updating the type checker contract to make use of it. I see a mention of the `isinstance` check support in the postponed/rejected ideas, but the only rationale given for it being in that category is, generally, "there are edge cases". For example, the PEP lists this as an edge case: The problem with this is instance checks could be unreliable, except for situations where there is a common signature convention such as Iterable However, the sample given demonstrates precisely the expected behavior of checking if a concrete implements the protocol. It's unclear why this sample is given as a negative. The other case given is: Another potentially problematic case is assignment of attributes after instantiation Can you elaborate on how type checkers would not encounter this same issue? If there is a solution to this problem for type checkers, would that same solution not work at runtime? Also, it seems odd to use a custom initialize function rather than `__init__`. I don't think it was intentional, but this makes it seem like a bit of a strawman that doesn't represent typical Python code.
Also, extensive use of ABCs might impose additional runtime costs. I'd love to see some data around this. Given that it's a rationale for the PEP I'd expect to see some numbers behind it. For example, is memory cost of directly registering implementations to abc linear or worse? What is the runtime growth pattern of isinstance or issubclass when used with heavily registered or deeply registered abc graphs and is it different than those calls on concrete class hierarchies? Does the cost affect anything more than the initial evaluation of the code or, in the absence of isinstance/issubclass checks, does it continue to have an impact on the runtime?
On Mon, May 29, 2017 at 5:41 AM Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 28 May 2017 at 19:40, Guido van Rossum <guido@python.org> wrote:
On Sun, May 28, 2017 at 8:27 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
[...]
Regarding the title, I'd like to keep the word Protocol in the title too, so I'd go with "Protocols: Structural subtyping (duck typing)" -- hope that's not too long to fit in a PEP title field.
OK, this looks reasonable.
Type-hints should not have runtime semantics, beyond those that they
lots of code uses isinstance(obj, collections.abc.Iterable) and similar checks with other ABCs Having interfaces defined as something extended from abc doesn't necessitate their use at runtime, but it does open up a great deal of
have as classes options for those of us who want to do so. I've been leveraging abc for a few years now to implement a lightweight version of what this PEP is attempting to achieve
IIUC this is not the main goal of the PEP, the main goal is to provide support/standard for _static_ structural subtyping. Possibility to use protocols in runtime context is rather a minor bonus that exists mostly to provide a seamless transition for projects that already use ABCs.
Is something like this already in the PEP? It deserves attention in one of the earlier sections.
Yes, similar discussions appear in "Rationale and Goals", and "Existing approaches to structural subtyping". Maybe I need to polish the text there adding more focus on static typing.
-- Ivan
[I added some blank lines to separate the PEP quotes from Kevin's responses.] On Mon, May 29, 2017 at 7:51 AM, Kevin Conway <kevinjacobconway@gmail.com> wrote:
From the PEP:
The problem with them is that a class has to be explicitly marked to support them, which is unpythonic and unlike what one would normally do in idiomatic dynamically typed Python code. The same problem appears with user-defined ABCs: they must be explicitly subclassed or registered.
Neither of these statements are entirely true. The semantics of `abc` allow for exactly the kind of detached interfaces this PEP is attempting to provide. The `abc.ABCMeta` provides the `__subclasshook__` which allows a developer to override the default check of internal `abc` registry state with virtually any logic that determines the relationship of a class with the interface. The prior art I linked to earlier in the thread uses this feature to generically support `issubclass` and `isinstance` in such a way that the PEPs goal is achieved.
But that doesn't help a static type checker. You can't expect a static checker to understand the implementation of a particular `__subclasshook__`. In practice, except for a few "one-trick ponies" such as Hashable, existing ABCs rely on subclassing or registration to make isinstance() work, and for statically checking code that uses duck typing those aren't enough.
The intention of this PEP is to solve all these problems by allowing users to write the above code without explicit base classes in the class definition
As I understand this goal, you want to take what some of us in the community have been building ourselves and make it canonical via the stdlib.
Not really. The goal is to suggest implementation a frequently requested feature in static checkers based on the type system laid out in PEP 484, namely checks for duck types. To support this the type system from PEP 484 needs to be extended, and that's what PEP 544 is about.
What strikes me as odd is that the focus is on 3rd party type checkers first rather than introducing this as a feature of the language runtime and then updating the type checker contract to make use of it.
This seems to be a misunderstanding, or at least an attempt to change the agenda for PEP 544. The primary goal of the PEP is not to support runtime checking but static checking. This is not new -- PEP 484 and PEP 526 before it have also focused on features that are useful primarily for static checkers. (Also, a bit of history: PEP 484 intentionally focused on static checking support because there was widespread skepticism about the need for more runtime checking, but there was a subset of the community that was very interested in static checking.)
I see a mention of the `isinstance` check support in the postponed/rejected ideas, but the only rationale given for it being in that category is, generally, "there are edge cases". For example, the PEP lists this as an edge case:
The problem with this is instance checks could be unreliable, except for situations where there is a common signature convention such as Iterable
However, the sample given demonstrates precisely the expected behavior of checking if a concrete implements the protocol. It's unclear why this sample is given as a negative.
I assume we're talking about this example: class P(Protocol): def common_method_name(self, x: int) -> int: ... class X: <a bunch of methods> def common_method_name(self) -> None: ... # Note different signature def do_stuff(o: Union[P, X]) -> int: if isinstance(o, P): return o.common_method_name(1) # oops, what if it's an X instance? The problem there is that the "state of the art" for runtiming checking isinstance(o, P) boils down to hasattr(o, 'common_method_name') while the type checker takes the method signatures into account, so it will consider X objects not to be instances of P. The other case given is:
Another potentially problematic case is assignment of attributes after instantiation
Can you elaborate on how type checkers would not encounter this same issue? If there is a solution to this problem for type checkers, would that same solution not work at runtime? Also, it seems odd to use a custom initialize function rather than `__init__`. I don't think it was intentional, but this makes it seem like a bit of a strawman that doesn't represent typical Python code.
Lots of code I've seen initializes variables in a separate function (usually called from `__init__`). Mypy, at least, considers instance variables assigned through `self` in all methods of a class to be potential instance variable declarations, otherwise a lot of code could not be type-checked. Again, the example is problematic given that the runtime check for isinstance(c, P) can't do better than hasattr(c, 'x'). (I think there's a typo in the PEP here, 'c1' should be 'c'.) The need to use an explicit class decorator to add isinstance support is used as a way to encourage developers to think about whether the runtime instance check will match the picture as seen by the static checker, before they turn on this decorator. It seems reasonable to me.
Also, extensive use of ABCs might impose additional runtime costs.
I'd love to see some data around this. Given that it's a rationale for the PEP I'd expect to see some numbers behind it. For example, is memory cost of directly registering implementations to abc linear or worse? What is the runtime growth pattern of isinstance or issubclass when used with heavily registered or deeply registered abc graphs and is it different than those calls on concrete class hierarchies? Does the cost affect anything more than the initial evaluation of the code or, in the absence of isinstance/issubclass checks, does it continue to have an impact on the runtime?
Its commonly known that ABCs are expensive (though if you want to do precise measurements you're welcome). Here's one data point: https://github.com/python/mypy/commit/1be4db7ac6e06a162355c3d5f7794d21b89a10... -- it's a one-line diff that removes `metaclass=ABCMeta` from one class, and the commit message reads: Make the AST classes not ABCs This results in roughly a 20% speedup on the non-parsing steps. Here are the timings I got from running mypy on itself: Before the change: 3861.8ms (49.0%) SemanticallyAnalyzedFile 2760.3ms (35.0%) UnprocessedFile 1111.8ms (14.1%) ParsedFile 142.8ms ( 1.8%) PartiallySemanticallyAnalyzedFile After the change: 3086.1ms (45.1%) SemanticallyAnalyzedFile 2665.1ms (39.0%) UnprocessedFile 945.1ms (13.8%) ParsedFile 139.6ms ( 2.0%) PartiallySemanticallyAnalyzedFile -- --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
On 31 May 2017 at 00:58, Guido van Rossum <guido@python.org> wrote: [...] Thank you for very detailed answers! I have practically nothing to add. It seems to me that most of the Kevin's questions stem from unnecessary focus on runtime type checking. Here are two ideas about how to fix this: * Add the word "static" somewhere in the PEP title. * Add a short note at the start mentioning this is an extension of the type system proposed in PEP 484 and recommending to read PEP 484 first. What do you think? -- Ivan
Hi, I'm interested in startup time too, not only execution time. Here is very rough test: with open('with_abc.py', 'w') as f: print("import abc", file=f) for i in range(1, 1001): print(f"class A{i}(metaclass=abc.ABCMeta): pass", file=f) with open('without_abc.py', 'w') as f: print("import abc", file=f) for i in range(1, 1001): print(f"class A{i}: pass", file=f) $ time python3 -c 'import abc' real 0m0.051s user 0m0.035s sys 0m0.013s $ time python3 -c 'import with_abc' real 0m0.083s user 0m0.063s sys 0m0.017s $ time python3 -c 'import without_abc' real 0m0.055s user 0m0.042s sys 0m0.011s It seems 1000 ABC classes takes less than 30ms but 1000 normal classes takes less than 10ms. I don't know this penalty is acceptable or not. But how about making ABC optional? I don't want to use ABC so frequently when there is no real requirement of ABC. ABC implementation is very complex and sometimes ABC cause unexpected performance issue, like you fixed in https://github.com/python/typing/pull/383 If we start with "Protocol is always ABC" and we face unexpected performance penalty later, it may be difficult to find and optimize it. # If we can stop using ABC for io.IOBase, Python startup time will be few ms faster. # Maybe, I should implement weakset and abc in C. Regards,
On Wed, May 31, 2017 at 2:16 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 31 May 2017 at 00:58, Guido van Rossum <guido@python.org> wrote: [...]
Thank you for very detailed answers! I have practically nothing to add. It seems to me that most of the Kevin's questions stem from unnecessary focus on runtime type checking. Here are two ideas about how to fix this:
* Add the word "static" somewhere in the PEP title.
So the title could become "Protocols: Static structural subtyping (duck typing)" -- long, but not record-setting. * Add a short note at the start mentioning this is an extension of the type
system proposed in PEP 484 and recommending to read PEP 484 first.
Hm, the Abstract already spells that out. I suspect that many people react to the discussion without first reading the PEP itself (I do this myself :-). The only thing that could possibly be confusing about the abstract is that it claims to specify "static and runtime semantics" -- but that's reasonable, since the runtime semantics must somehow be specified even if they're minimal. -- --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
On 1 June 2017 at 00:10, Guido van Rossum <guido@python.org> wrote:
On Wed, May 31, 2017 at 2:16 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 31 May 2017 at 00:58, Guido van Rossum <guido@python.org> wrote: [...]
Thank you for very detailed answers! I have practically nothing to add. It seems to me that most of the Kevin's questions stem from unnecessary focus on runtime type checking. Here are two ideas about how to fix this:
* Add the word "static" somewhere in the PEP title.
So the title could become "Protocols: Static structural subtyping (duck typing)" -- long, but not record-setting.
I am thinking about "Protocols: Structural subtyping (static duck typing)". The reason is that subtyping is already a mostly static concept (in contrast to subclassing), while duck typing is typically associated with the runtime behaviour. This might seem minor, but this version of the title sounds much more naturally to me. -- Ivan
On Fri, Jun 2, 2017 at 3:10 PM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 1 June 2017 at 00:10, Guido van Rossum <guido@python.org> wrote:
On Wed, May 31, 2017 at 2:16 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 31 May 2017 at 00:58, Guido van Rossum <guido@python.org> wrote: [...]
Thank you for very detailed answers! I have practically nothing to add. It seems to me that most of the Kevin's questions stem from unnecessary focus on runtime type checking. Here are two ideas about how to fix this:
* Add the word "static" somewhere in the PEP title.
So the title could become "Protocols: Static structural subtyping (duck typing)" -- long, but not record-setting.
I am thinking about "Protocols: Structural subtyping (static duck typing)". The reason is that subtyping is already a mostly static concept (in contrast to subclassing), while duck typing is typically associated with the runtime behaviour.
This might seem minor, but this version of the title sounds much more naturally to me.
+1 -- --Guido van Rossum (python.org/~guido)
Hi, I have to admit I am not happy with separating the concepts of 'runtime' and 'static' types as implied by pep544. I am currently exploring a type hint generator that produces hints out of types used in unit tests. It debugs the tests and collects the parameter types of call and return events. It ignores a type when a supertype is present. Failing isinstance/issubclass calls for protocols would hurt there. I understand that any type checker code that could provide isinstance functionality for pep544 protocols would rely on method signatures that my hint generator is just producing. proof of concept implementation (writes method docstrings, no pep484 type hints yet): https://github.com/markuswissinger/ducktestpy This is currently just some personal project that some of you will consider a strange idea. I just want to mention that this use case might not play well together with pep544. Regards Markus 2017-06-05 23:59 GMT+02:00 Guido van Rossum <guido@python.org>:
On Fri, Jun 2, 2017 at 3:10 PM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 1 June 2017 at 00:10, Guido van Rossum <guido@python.org> wrote:
On Wed, May 31, 2017 at 2:16 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 31 May 2017 at 00:58, Guido van Rossum <guido@python.org> wrote: [...]
Thank you for very detailed answers! I have practically nothing to add. It seems to me that most of the Kevin's questions stem from unnecessary focus on runtime type checking. Here are two ideas about how to fix this:
* Add the word "static" somewhere in the PEP title.
So the title could become "Protocols: Static structural subtyping (duck typing)" -- long, but not record-setting.
I am thinking about "Protocols: Structural subtyping (static duck typing)". The reason is that subtyping is already a mostly static concept (in contrast to subclassing), while duck typing is typically associated with the runtime behaviour.
This might seem minor, but this version of the title sounds much more naturally to me.
+1
-- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ markus.wissinger%40gmail.com
On Thu, Jun 22, 2017 at 10:44 AM, Markus Wissinger < markus.wissinger@gmail.com> wrote:
Hi,
I have to admit I am not happy with separating the concepts of 'runtime' and 'static' types as implied by pep544.
I am currently exploring a type hint generator that produces hints out of types used in unit tests. It debugs the tests and collects the parameter types of call and return events. It ignores a type when a supertype is present. Failing isinstance/issubclass calls for protocols would hurt there. I understand that any type checker code that could provide isinstance functionality for pep544 protocols would rely on method signatures that my hint generator is just producing.
proof of concept implementation (writes method docstrings, no pep484 type hints yet): https://github.com/markuswissinger/ducktestpy
This is currently just some personal project that some of you will consider a strange idea.
Not a strange idea, I've had a similar idea and played a bit with it ~10 years ago (inspired by a Java project whose name eludes me now). Also, I think PyCharm is able to do similar things (see https://blog.jetbrains.com/pycharm/2013/02/dynamic-runtime-type-inference-in... ). S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” — R. Buckminster Fuller
On 22 June 2017 at 10:44, Markus Wissinger <markus.wissinger@gmail.com> wrote:
I have to admit I am not happy with separating the concepts of 'runtime' and 'static' types as implied by pep544.
This is not something new, already PEP 483 makes a clear distinction between types (a static concept) and classes (a runtime concept).
Failing isinstance/issubclass calls for protocols would hurt there. I understand that any type checker code that could provide isinstance functionality for pep544 protocols would rely on method signatures that my hint generator is just producing.
isinstance(obj, T) and issubclass(Cls, T) already fail if T is a subscripted generic like List[int], so that again nothing new here. To check runtime subtyping with such types one can write a third party introspection tool based on typing_inspect package on PyPI (which potentially might in future become an official wrapper for currently internal typing API). -- Ivan
participants (11)
-
Antoine Pitrou
-
Guido van Rossum
-
INADA Naoki
-
Ivan Levkivskyi
-
Jelle Zijlstra
-
Kevin Conway
-
Mark Shannon
-
Markus Wissinger
-
Nick Coghlan
-
Stefan Richthofer
-
Stéfane Fermigier