Support Algebraic Data Types with @sealed
The original (superseded) PEP 622 proposed adding a @sealed decorator so that users could define algebraic data types (ADTs) that static type checkers could verify are `match`ed exhaustively: - https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types There is a somewhat fairly-upvoted question about ADTs in Python, indicating interest in this area: - https://stackoverflow.com/questions/16258553 @sealed would be very valuable towards making ADT usage within Python more type-safe. Is there any historical discussion about why @sealed was dropped from the final pattern matching PEP? What would be the next steps needed to push this forward? Would writing a new PEP be the next step?
El sáb, 16 abr 2022 a las 12:51, John Hagen (<johnthagen@gmail.com>) escribió:
The original (superseded) PEP 622 proposed adding a @sealed decorator so that users could define algebraic data types (ADTs) that static type checkers could verify are `match`ed exhaustively:
- https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types
There is a somewhat fairly-upvoted question about ADTs in Python, indicating interest in this area:
- https://stackoverflow.com/questions/16258553
@sealed would be very valuable towards making ADT usage within Python more type-safe. Is there any historical discussion about why @sealed was dropped from the final pattern matching PEP?
I don't recall exactly, but my sense was that the original proposal was too big and complicated, so it was slimmed down into PEP 634. A number of non-essential features were dropped.
What would be the next steps needed to push this forward? Would writing a new PEP be the next step?
Yes, this would need to be a new PEP. I'd probably be willing to sponsor such a PEP.
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
I'd like to understand what benefit `@sealed` has over defining an explicit union type that contains the classes that comprise the exhaustiveness set. The linked PEP says that the union approach is "fragile". I use this technique extensively, and I find it anything bug fragile. It's incredibly robust, easy to maintain, and easy to understand. It also works well with today's type system and type checkers. I'm very negative on the idea of adding `@sealed` to the type system. It would represent the first (and so far only) place where semantics of a type depend on the module in which it is declared. Such a construct would be an exceptional case in the type system and would be problematic for for lazy type evaluators like pyright, which evaluate types of individual symbols on demand rather than analyzing entire files. -Eric -- Eric Traut Contributor to pyright & pylance
El sáb, 16 abr 2022 a las 13:47, John Hagen (<johnthagen@gmail.com>) escribió:
Yes, this would need to be a new PEP. I'd probably be willing to sponsor such a PEP.
Thanks Jelle. I will try to put a PEP together, likely pulling heavily from the section already written in PEP 622.
Looking at that section again, I noticed a problem with the spec at https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types. Given a class hierarchy with `Node` as the base class and a number of leaf classes, it says "With such definition, a type checker can safely treat Node as Union[Name, Operation, Assignment, Print]". But there is nothing in the code sample that tells a type checker that an object of type `Node` can't be an instance of the base class. Presumably you could use `abc.ABCMeta` to indicate that the base class is abstract, but that may be disruptive at runtime, because it requires using a metaclass. Eric's feedback is also important to consider. What compelling benefit does this new type system feature provide over just using a Union?
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
On Sun, Apr 17, 2022 at 12:46 PM Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
El sáb, 16 abr 2022 a las 13:47, John Hagen (<johnthagen@gmail.com>) escribió:
Yes, this would need to be a new PEP. I'd probably be willing to sponsor such a PEP.
Thanks Jelle. I will try to put a PEP together, likely pulling heavily from the section already written in PEP 622.
Looking at that section again, I noticed a problem with the spec at https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types. Given a class hierarchy with `Node` as the base class and a number of leaf classes, it says "With such definition, a type checker can safely treat Node as Union[Name, Operation, Assignment, Print]". But there is nothing in the code sample that tells a type checker that an object of type `Node` can't be an instance of the base class. Presumably you could use `abc.ABCMeta` to indicate that the base class is abstract, but that may be disruptive at runtime, because it requires using a metaclass.
Type checkers could probably just treat the base class of a sealed class as implicitly abstract. (Just as they would treat every subclass as `@final` outside the module.)
Eric's feedback is also important to consider. What compelling benefit does this new type system feature provide over just using a Union?
Perhaps nothing other than the convenience of not having to repeat the class names in the definition of the union (at the cost of having to define the base class, adorn it with `@sealed`, and repeating that as the base class for all the case classes). I also agree with Eric that it feels awkward to require that all the case classes must be defined in the same file -- this violates a general refactoring principle. Though maybe it's okay (just like all methods of a class must be defined in the same file). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Eric's feedback is also important to consider. What compelling benefit does this new type system feature provide over just using a Union?
If you use a Union, then you lose inheritance, which is pretty useful in Python. I use abstract methods, private methods, public methods, common attributes , __init_subclass__, __subclasses__, and even isinstance(..., Base). The base class methods and attributes work with all the IDE features, like rename, jump-to-definition, and find-usage. If class A and class B do not share a common base class, then A.method and B.method are not typically considered related by the IDE, even if A and B appear in a Union somewhere else in the program. I make nominal sum types using an abstract base class in Python all the time even though I don't get exhaustiveness checking. I just add `else: raise NotImplementedError()` everywhere to keep PyCharm from complaining.
Eric's feedback is also important to consider. What compelling benefit does this new type system feature provide over just using a Union?
One of the benefits that the base-class approach has over the Union approach is that the base type can define additional state and methods that the variants can implement, or can be used when referring to the base type. I see this as being more a more general and flexible ADT. A very simplistic example: from dataclasses import dataclass from enum import Enum, auto class UsState(Enum): Alabama = auto() Alaska = auto() # --snip-- # @sealed class Coin: value: int def value_in_cents(self) -> int: return self.value @dataclass class Nickle(Coin): value = 5 @dataclass class Dime(Coin): value = 10 @dataclass class Quarter(Coin): value = 25 us_state: UsState def get_value(coin: Coin) -> int: match coin: case Quarter(state): print(f"State quarter from {state}") case _: pass return coin.value_in_cents() Also as reference, I did a language survey of how other languages handle ADTs (to help motivate using the name `sealed` or use something else). ## `sealed` - Kotlin: https://kotlinlang.org/docs/sealed-classes.html - Scala 2: https://docs.scala-lang.org/tour/pattern-matching.html - Java 17: https://openjdk.java.net/jeps/409 ## `enum` - Scala 3: https://docs.scala-lang.org/scala3/reference/enums/adts.html - Rust: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html - Swift: https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html ## Pipe (`|`) - Haskell: https://wiki.haskell.org/Algebraic_data_type - OCaml: https://cs3110.github.io/textbook/chapters/data/algebraic_data_types.html - F#: https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/discrimina... ## Manual Property - TypeScript: https://www.typescriptlang.org/docs/handbook/typescript-in-5-minutes-func.ht... ## Don't have ADTs - Go: https://go.dev/doc/faq#variant_types
On Mon, Apr 18, 2022, at 14:24, John Hagen wrote:
One of the benefits that the base-class approach has over the Union approach is that the base type can define additional state and methods that the variants can implement, or can be used when referring to the base type. I see this as being more a more general and flexible ADT.
You can achieve this without `@sealed` by changing Coin -> BaseCoin, and adding `Coin = Nickle | Dime | Quarter`. `@sealed` helps avoid the need for the `Coin`/`BaseCoin` separation, and for repeating the sub-class names, but it is not essential. Ran
You can achieve this without `@sealed` by changing Coin -> BaseCoin, and adding `Coin = Nickle | Dime | Quarter`.
There are some edge cases to this approach. For example, you can't put Coin in another Union and match on it because Unions do not implement isinstance: Coin = Nickle | Dime | Quarter def get_value(coin: int | Coin) -> int: match coin: case int(): return coin case Coin(): # TypeError: called match pattern must be a type return coin.value get_value(Dime())
You can achieve this without `@sealed` by changing Coin -> BaseCoin, and adding `Coin = Nickle | Dime | Quarter`.
Exactly. This is the pattern I use extensively in my code, and I find it to be very robust, easy to understand, and easy to maintain.
you can't put Coin in another Union and match on it because Unions do not implement isinstance
If all of the classes in the union derive from the same base class, you can use the base class for the `isinstance` check or case statement.
If all of the classes in the union derive from the same base class, you can use the base class for the `isinstance` check or case statement.
You're not wrong, but I would consider it quite annoying to have to import both Coin and BaseCoin in order to use this ADT. Conceptually, they are the same thing split into two. If this became standard, I think Python would be the first programming language in which the base class of an ADT and the type of the ADT were distinct names and objects.
Beyond the annoyance of multiple imports, there's also the potential issue of non-Union subclasses of BaseCoin. I've used this pattern a few times, and it works alright, but it feels inelegant to have 2 types that conceptually represent the same point in the type hierarchy, and can create confusing when trying to use one in an invalid way. It also creates an opportunity for those points to, unintentionally diverge. That being said, I'm not a huge fan of the `sealed` decorator suggestion. It seems like it would be easy to miss when refractoring or adding classes. I think I'd prefer something like: ``` class Tree(Sealed[Node, Leaf]): ... class Node(Tree): ... class Leaf(Tree):. ``` To keep the anticipated sub classes specific, and work more similarly to `abc.ABC` Or, if something more implicit through location is desired, to avoid repetition, then at least a shared indentation would be clearer: ``` with sealed('Tree'): class Tree: ... class Node(Tree): ... class Leaf(Tree):... ``` (Just to illustrate the idea, I'm sure there's a better representation) But I'm not convinced the relatively limited repetition is sufficiently compelling. On Tue, 19 Apr 2022, 01:17 David Hagen, <david@drhagen.com> wrote:
If all of the classes in the union derive from the same base class, you can use the base class for the `isinstance` check or case statement.
You're not wrong, but I would consider it quite annoying to have to import both Coin and BaseCoin in order to use this ADT. Conceptually, they are the same thing split into two. If this became standard, I think Python would be the first programming language in which the base class of an ADT and the type of the ADT were distinct names and objects. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: diabolo.dan@gmail.com
I think I'd prefer something like: ``` class Tree(Sealed[Node, Leaf]): ... class Node(Tree): ... class Leaf(Tree):. ```
Some languages do make you explicitly list out of the subclasses (e.g. Java and Ceylon) rather than requiring that all the subclasses simply be defined in the same file (e.g. Kotlin and Scala). I sympathize with both designs. The same-file sealed directive avoids having to repeat the names of the subclasses, while the explicit list avoids having the weird requirement that all the subclasses be in the same file. In a statically typed language, I think I would lean toward the explicit list, but in Python, this is very problematic. The explicit list is inherently a circular reference, which is fine in a statically typed language, but in Python there is just no place to put the explicit list where the runtime can pass over it. In your example, the interpreter would fail at Sealed[Node, Leaf] because name Node is not defined. Now that I think about it, maybe you could do some weird stuff with lazy annotations. There is no natural place to put the annotation, but there is no limit to the number of unnatural places. ``` form __future__ import annotations class Tree: ... __sealed__: Node | Leaf class Node(Tree): ... class Leaf(Tree): ... ```
If we're in the bikeshedding phase, it might be worth mentioning this ( https://github.com/python/mypy/issues/2464) quite ancient Mypy issue. Near the end there are ideas to use an outer class as a namespace (like Enums do). I like it. It would also allow for primitive/enum values to be members of the ADT in addition to class instances. On Tue, Apr 19, 2022 at 5:21 PM Daniel Cooper <diabolo.dan@gmail.com> wrote:
Beyond the annoyance of multiple imports, there's also the potential issue of non-Union subclasses of BaseCoin.
I've used this pattern a few times, and it works alright, but it feels inelegant to have 2 types that conceptually represent the same point in the type hierarchy, and can create confusing when trying to use one in an invalid way. It also creates an opportunity for those points to, unintentionally diverge.
That being said, I'm not a huge fan of the `sealed` decorator suggestion. It seems like it would be easy to miss when refractoring or adding classes.
I think I'd prefer something like:
``` class Tree(Sealed[Node, Leaf]): ...
class Node(Tree): ... class Leaf(Tree):. ```
To keep the anticipated sub classes specific, and work more similarly to `abc.ABC`
Or, if something more implicit through location is desired, to avoid repetition, then at least a shared indentation would be clearer:
``` with sealed('Tree'): class Tree: ...
class Node(Tree): ... class Leaf(Tree):... ```
(Just to illustrate the idea, I'm sure there's a better representation)
But I'm not convinced the relatively limited repetition is sufficiently compelling.
On Tue, 19 Apr 2022, 01:17 David Hagen, <david@drhagen.com> wrote:
If all of the classes in the union derive from the same base class, you can use the base class for the `isinstance` check or case statement.
You're not wrong, but I would consider it quite annoying to have to import both Coin and BaseCoin in order to use this ADT. Conceptually, they are the same thing split into two. If this became standard, I think Python would be the first programming language in which the base class of an ADT and the type of the ADT were distinct names and objects. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: diabolo.dan@gmail.com
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: tinchester@gmail.com
Unions do not implement isinstance
Unions do support isinstance since python 3.10 but apparently they do not support pattern matching. This feels like an oversight to me. If unions did support pattern matching, then do you think they would be a sufficient replacement for sealed classes? -Thomas
On Tue, Apr 19, 2022 at 9:43 AM <tmkehrenberg@gmail.com> wrote:
Unions do not implement isinstance
Unions do support isinstance since python 3.10 but apparently they do not support pattern matching. This feels like an oversight to me. If unions did support pattern matching, then do you think they would be a sufficient replacement for sealed classes?
What would be the syntax? The same as a class match? match x: case Nickle(): ... case Dime(): ... case Coin(): ... # Assume this is the union Would such a union support arguments? Could we write match x: case Nickle(value=pennies): ... case Coin(value=pennies): ... ? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
I come from the FSharp community. Thanks for considering this, ADTs are particularly missing for me in today's Python. I am used to structure all my code with them, and so is the whole community at F#. One thing I like to mention regarding the naming: @choice class I how I would define it. Not only does it say, what it does (opposite to how its implemented) it also aligns nicely with @data class @sealed doesn't say much about its usage. I am also interested about the syntax for "the next stage of the object", like in: type Shape = | Rectangle of width : float * length : float | Circle of radius : float | Prism of width : float * float * height : float How would you do that?
Hi Matthias, There has been some back-and-forth during the years on how to improve sum types in Python in this community, but lately personally I've come to the realization the existing infrastructure (unions, enums, literals) is very good already. Are unions enough for your use case? ``` @dataclass class Rectangle: width: float length: float @dataclass class Circle: radius: float @dataclass class Prism: width: float height: float Shape = Rectangle | Circle | Prism ``` You can exhaustively match on this if you know the `assert_never` trick, and I find it works very well. On Mon, Mar 20, 2023 at 3:29 PM Matthias Gramberg via Typing-sig < typing-sig@python.org> wrote:
I come from the FSharp community.
Thanks for considering this, ADTs are particularly missing for me in today's Python. I am used to structure all my code with them, and so is the whole community at F#.
One thing I like to mention regarding the naming:
@choice class
I how I would define it. Not only does it say, what it does (opposite to how its implemented) it also aligns nicely with
@data class
@sealed doesn't say much about its usage.
I am also interested about the syntax for "the next stage of the object", like in:
type Shape = | Rectangle of width : float * length : float | Circle of radius : float | Prism of width : float * float * height : float
How would you do that? _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: tinchester@gmail.com
David and I are pleased to share a draft PEP. We welcome any additional feedback. It can be viewed rendered at: https://github.com/johnthagen/sealed-typing-pep PEP: <REQUIRED: pep number> Title: Adding a sealed qualifier to typing Author: John Hagen <johnthagen@gmail.com>, David Hagen <david@drhagen.com> Sponsor: Jelle Zijlstra PEP-Delegate: <PEP delegate's real name> Discussions-To: typing-sig@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 17-Apr-2022 Python-Version: 3.12 Post-History: Resolution: <url> Abstract ======== This PEP proposes a ``@sealed`` decorator be added to the ``typing`` module to support creating versatile algebraic data types (ADTs) which type checkers can exhaustively pattern match against. Motivation ========== Quite often it is desirable to apply exhaustiveness to a set of classes without defining ad-hoc union types, which is itself fragile if a class is missing in the union definition. A design pattern where a group of record-like classes is combined into a union is popular in other languages that support pattern matching [1]_ and is known as a nominal sum type, a key instatiation of algebraic data types [2]_. We propose adding a special decorator class ``@sealed`` to the ``typing`` module [3]_, that will have no effect at runtime, but will indicate to static type checkers that all direct subclasses of this class should be defined in the same module as the base class. The idea is that, since all subclasses are known, the type checker can treat the sealed base class as a union of all its subclasses. Together with dataclasses this allows a clean and safe support of algebraic data types in Python. Consider this example, .. code-block:: python from dataclasses import dataclass from typing import sealed @sealed class Node: ... @sealed class Expression(Node): ... @sealed class Statement(Node): ... @dataclass class Name(Expression): name: str @dataclass class Operation(Expression): left: Expression op: str right: Expression @dataclass class Assignment(Statement): target: str value: Expression @dataclass class Print(Statement): value: Expression With such a definition, a type checker can safely treat ``Node`` as ``Union[Expression, Statement]``, and also safely treat ``Expression`` as ``Union[Name, Operation]`` and ``Statement`` as ``Union[Assignment, Print]``. With these declarations, a type checking error will occur in the below snippet, because ``Name`` is not handled (and the type checker can give a useful error message). .. code-block:: python def dump(node: Node) -> str: match node: case Assignment(target, value): return f"{target} = {dump(value)}" case Print(value): return f"print({dump(value)})" case Operation(left, op, right): return f"({dump(left)} {op} {dump(right)})" Note: This section was largely derived from PEP 622 [4]_. Rationale ========= Kotlin [5]_, Scala 2 [6]_, and Java 17 [7]_ all support a ``sealed`` keyword that is used to declare algebraic data types. By using the same terminology, the ``@sealed`` decorator will be familiar to developers familiar with those languages. Specification ============= The ``typing.sealed`` decorator can be applied to the declaration of any class. This decoration indicates to type checkers that all immediate subclasses of the decorated class are defined in the current file. The exhaustiveness checking features of type checkers should assume that there are no subclasses outside the current file, treating the decorated class as a ``Union`` of all its same-file subclasses. Type checkers should raise an error if a sealed class is inherited in a file different from where the sealed class is declared. A sealed class is automatically declared to be abstract. Whatever actions a type checker normally takes with abstract classes should be taken with sealed classes as well. What exactly these behaviors are (e.g. disallowing instantiation) is outside the scope of this PEP. Similar to the ``typing.final`` decorator [8]_, the only runtime behavior of this decorator is to set the ``__sealed__`` attribute of class to ``True`` so that the sealed property of the class can be introspected. There is no runtime enforcement of sealed class inheritance. Reference Implementation ======================== [Link to any existing implementation and details about its state, e.g. proof-of-concept.] Rejected Ideas ============== Generalize ``Enum`` ------------------- Rust [9]_, Scala 3 [10]_, and Swift [11]_ support algebraic data types using a generalized ``enum`` mechanism. .. code-block:: rust enum Message { Quit, Move { x: i32, y: i32 }, Write(String), ChangeColor(i32, i32, i32), } One could imagine a generalization of the Python ``Enum`` [12]_ to support variants of different shapes. But given that the Python ``Enum`` is more or less a normal class, with some magic internals, this would be a much more invasive change. .. code-block:: python from dataclasses import dataclass from enum import Enum class Message(Enum): @dataclass class Quit: ... @dataclass class Move: x: int y: int @dataclass class Write: message: str @dataclass class ChangeColor: r: int g: int b: int Explicitly list subclasses -------------------------- Java requires that subclasses be explicitly listed with the base class. .. code-block:: java public sealed interface Node permits Leaf, Branch {} public final class Leaf {} public final class Branch {} The advantage of this requirement is that subclasses can be defined anywhere, not just in the same file, eliminating the somewhat weird file dependence of this feature. Once disadvantage is that requires that all subclasses to be written twice: once when defined and once in the enumerated list on the base class. There is also an inherent circular reference when explicitly enumerating the subclasses. The subclass refers to the base class in order to inherit from it, and the base class refers to the subclasses in order to enumerate them. In statically typed languages, these kinds of circular references in the types can be managed, but in Python, it is much harder. For example, this ``Sealed`` base class that behaves like ``Generic``: .. code-block:: python from typing import Sealed class Node(Sealed[Leaf, Branch]): ... class Leaf(Node): ... class Branch(Node): ... This cannot work because ``Leaf`` must be defined before ``Node`` and ``Node`` must be defined before ``Leaf``. This is a not an annotation, so lazy annotations cannot save it. Perhaps, the subclasses in the enumerated list could be strings, but that severely hurts the ergonomics of this feature. If the enumerated list was in an annotation, it could be made to work, but there is no natural place for the annotation to live. Here is one possibility: .. code-block:: python class Node: __sealed__: Leaf | Branch class Leaf(Node): ... class Branch(Node): ... ``Union`` of independent variants --------------------------------- Some of the behavior of ``sealed`` can be emulated with ``Union`` today. .. code-block:: python class Leaf: ... class Branch: ... Node = Leaf | Branch The main problem with this is that the ADT loses all the features of inheritance, which is rather featureful in Python, to put it mildly. There can be no abstract methods, private methods to be reused by the subclasses, public methods to be exposed on all subclasses, ``__init_subclass__``, etc. Even if a specific method is implemented on each subclass, then rename, jump-to-definition, find-usage, and other IDE features are difficult to make work reliably. Adding a base class in addition to the union type alleviates some of these issues: .. code-block:: python class BaseNode: ... class Leaf(BaseNode): ... class Branch(BaseNode): ... Node = Leaf | Branch Despite being possible today, this is quite unergonomic. The base class and the union type are conceptually the same thing, but have to be defined as two separate objects. If this became standard, it seems Python would be first language to separate the definition of an ADT into two different objects. The base class is not merely passive, either. There are a number of operations that will only work when using the base class instead of the union type. For example, matching only works on the base class, not the union type: .. code-block:: python maybe_node: Node | None = ... # must be Node to enforce exhaustiveness match maybe_node: case Node(): # TypeError: called match pattern must be a type ... case None: ... match maybe_node: case BaseNode(): # no error ... case None: ... Having to remember whether to use the base class or the union type in each situation is particularly unfriendly to the user of a sealed class. Footnotes ========= .. [1] https://en.wikipedia.org/wiki/Pattern_matching .. [2] https://en.wikipedia.org/wiki/Algebraic_data_type .. [3] https://docs.python.org/3/library/typing.html .. [4] https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types .. [5] https://kotlinlang.org/docs/sealed-classes.html .. [6] https://docs.scala-lang.org/tour/pattern-matching.html .. [7] https://openjdk.java.net/jeps/409 .. [8] https://peps.python.org/pep-0591/ .. [9] https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html .. [10] https://docs.scala-lang.org/scala3/reference/enums/adts.html .. [11] https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html .. [12] https://docs.python.org/3/library/enum.html Copyright ========= This document is placed in the public domain.
Just like Eric, appreciate the work that has gone into this. I am excited by the idea of ADTs since they are a key part of making invalid state unrepresentable, which is amazing for software correctness. However, I would be disappointed if this version of ADTs was accepted since I think basing our ADTs on the Pythonized version of the Rust syntax would be more usable and elegant. The argument given against it is that this would be a more invasive change, which may not matter for us to get it right the first time? Rust-style enums containing both classes and values would be my ideal scenario. I'm very happy to share actual examples from work where I've wished I had robust ADTs if that would help. On Sat, May 7, 2022 at 7:24 PM John Hagen <johnthagen@gmail.com> wrote:
David and I are pleased to share a draft PEP. We welcome any additional feedback.
It can be viewed rendered at: https://github.com/johnthagen/sealed-typing-pep
PEP: <REQUIRED: pep number> Title: Adding a sealed qualifier to typing Author: John Hagen <johnthagen@gmail.com>, David Hagen <david@drhagen.com> Sponsor: Jelle Zijlstra PEP-Delegate: <PEP delegate's real name> Discussions-To: typing-sig@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 17-Apr-2022 Python-Version: 3.12 Post-History: Resolution: <url>
Abstract ========
This PEP proposes a ``@sealed`` decorator be added to the ``typing`` module to support creating versatile algebraic data types (ADTs) which type checkers can exhaustively pattern match against.
Motivation ==========
Quite often it is desirable to apply exhaustiveness to a set of classes without defining ad-hoc union types, which is itself fragile if a class is missing in the union definition. A design pattern where a group of record-like classes is combined into a union is popular in other languages that support pattern matching [1]_ and is known as a nominal sum type, a key instatiation of algebraic data types [2]_.
We propose adding a special decorator class ``@sealed`` to the ``typing`` module [3]_, that will have no effect at runtime, but will indicate to static type checkers that all direct subclasses of this class should be defined in the same module as the base class.
The idea is that, since all subclasses are known, the type checker can treat the sealed base class as a union of all its subclasses. Together with dataclasses this allows a clean and safe support of algebraic data types in Python. Consider this example,
.. code-block:: python
from dataclasses import dataclass from typing import sealed
@sealed class Node: ...
@sealed class Expression(Node): ...
@sealed class Statement(Node): ...
@dataclass class Name(Expression): name: str
@dataclass class Operation(Expression): left: Expression op: str right: Expression
@dataclass class Assignment(Statement): target: str value: Expression
@dataclass class Print(Statement): value: Expression
With such a definition, a type checker can safely treat ``Node`` as ``Union[Expression, Statement]``, and also safely treat ``Expression`` as ``Union[Name, Operation]`` and ``Statement`` as ``Union[Assignment, Print]``. With these declarations, a type checking error will occur in the below snippet, because ``Name`` is not handled (and the type checker can give a useful error message).
.. code-block:: python
def dump(node: Node) -> str: match node: case Assignment(target, value): return f"{target} = {dump(value)}" case Print(value): return f"print({dump(value)})" case Operation(left, op, right): return f"({dump(left)} {op} {dump(right)})"
Note: This section was largely derived from PEP 622 [4]_.
Rationale =========
Kotlin [5]_, Scala 2 [6]_, and Java 17 [7]_ all support a ``sealed`` keyword that is used to declare algebraic data types. By using the same terminology, the ``@sealed`` decorator will be familiar to developers familiar with those languages.
Specification =============
The ``typing.sealed`` decorator can be applied to the declaration of any class. This decoration indicates to type checkers that all immediate subclasses of the decorated class are defined in the current file.
The exhaustiveness checking features of type checkers should assume that there are no subclasses outside the current file, treating the decorated class as a ``Union`` of all its same-file subclasses.
Type checkers should raise an error if a sealed class is inherited in a file different from where the sealed class is declared.
A sealed class is automatically declared to be abstract. Whatever actions a type checker normally takes with abstract classes should be taken with sealed classes as well. What exactly these behaviors are (e.g. disallowing instantiation) is outside the scope of this PEP.
Similar to the ``typing.final`` decorator [8]_, the only runtime behavior of this decorator is to set the ``__sealed__`` attribute of class to ``True`` so that the sealed property of the class can be introspected. There is no runtime enforcement of sealed class inheritance.
Reference Implementation ========================
[Link to any existing implementation and details about its state, e.g. proof-of-concept.]
Rejected Ideas ==============
Generalize ``Enum`` -------------------
Rust [9]_, Scala 3 [10]_, and Swift [11]_ support algebraic data types using a generalized ``enum`` mechanism.
.. code-block:: rust
enum Message { Quit, Move { x: i32, y: i32 }, Write(String), ChangeColor(i32, i32, i32), }
One could imagine a generalization of the Python ``Enum`` [12]_ to support variants of different shapes. But given that the Python ``Enum`` is more or less a normal class, with some magic internals, this would be a much more invasive change.
.. code-block:: python
from dataclasses import dataclass from enum import Enum
class Message(Enum): @dataclass class Quit: ...
@dataclass class Move: x: int y: int
@dataclass class Write: message: str
@dataclass class ChangeColor: r: int g: int b: int
Explicitly list subclasses --------------------------
Java requires that subclasses be explicitly listed with the base class.
.. code-block:: java
public sealed interface Node permits Leaf, Branch {}
public final class Leaf {} public final class Branch {}
The advantage of this requirement is that subclasses can be defined anywhere, not just in the same file, eliminating the somewhat weird file dependence of this feature. Once disadvantage is that requires that all subclasses to be written twice: once when defined and once in the enumerated list on the base class.
There is also an inherent circular reference when explicitly enumerating the subclasses. The subclass refers to the base class in order to inherit from it, and the base class refers to the subclasses in order to enumerate them. In statically typed languages, these kinds of circular references in the types can be managed, but in Python, it is much harder.
For example, this ``Sealed`` base class that behaves like ``Generic``:
.. code-block:: python
from typing import Sealed
class Node(Sealed[Leaf, Branch]): ...
class Leaf(Node): ... class Branch(Node): ...
This cannot work because ``Leaf`` must be defined before ``Node`` and ``Node`` must be defined before ``Leaf``. This is a not an annotation, so lazy annotations cannot save it. Perhaps, the subclasses in the enumerated list could be strings, but that severely hurts the ergonomics of this feature.
If the enumerated list was in an annotation, it could be made to work, but there is no natural place for the annotation to live. Here is one possibility:
.. code-block:: python
class Node: __sealed__: Leaf | Branch
class Leaf(Node): ... class Branch(Node): ...
``Union`` of independent variants ---------------------------------
Some of the behavior of ``sealed`` can be emulated with ``Union`` today.
.. code-block:: python
class Leaf: ... class Branch: ...
Node = Leaf | Branch
The main problem with this is that the ADT loses all the features of inheritance, which is rather featureful in Python, to put it mildly. There can be no abstract methods, private methods to be reused by the subclasses, public methods to be exposed on all subclasses, ``__init_subclass__``, etc. Even if a specific method is implemented on each subclass, then rename, jump-to-definition, find-usage, and other IDE features are difficult to make work reliably.
Adding a base class in addition to the union type alleviates some of these issues:
.. code-block:: python
class BaseNode: ...
class Leaf(BaseNode): ... class Branch(BaseNode): ...
Node = Leaf | Branch
Despite being possible today, this is quite unergonomic. The base class and the union type are conceptually the same thing, but have to be defined as two separate objects. If this became standard, it seems Python would be first language to separate the definition of an ADT into two different objects.
The base class is not merely passive, either. There are a number of operations that will only work when using the base class instead of the union type. For example, matching only works on the base class, not the union type:
.. code-block:: python
maybe_node: Node | None = ... # must be Node to enforce exhaustiveness
match maybe_node: case Node(): # TypeError: called match pattern must be a type ... case None: ...
match maybe_node: case BaseNode(): # no error ... case None: ...
Having to remember whether to use the base class or the union type in each situation is particularly unfriendly to the user of a sealed class.
Footnotes =========
.. [1] https://en.wikipedia.org/wiki/Pattern_matching
.. [2] https://en.wikipedia.org/wiki/Algebraic_data_type
.. [3] https://docs.python.org/3/library/typing.html
.. [4]
https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types
.. [5] https://kotlinlang.org/docs/sealed-classes.html
.. [6] https://docs.scala-lang.org/tour/pattern-matching.html
.. [7] https://openjdk.java.net/jeps/409
.. [8] https://peps.python.org/pep-0591/
.. [9] https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html
.. [10] https://docs.scala-lang.org/scala3/reference/enums/adts.html
.. [11] https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html
.. [12] https://docs.python.org/3/library/enum.html
Copyright =========
This document is placed in the public domain. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: tinchester@gmail.com
On Sat, May 7, 2022 at 4:39 PM Tin Tvrtković <tinchester@gmail.com> wrote:
Just like Eric, appreciate the work that has gone into this. I am excited by the idea of ADTs since they are a key part of making invalid state unrepresentable, which is amazing for software correctness. However, I would be disappointed if this version of ADTs was accepted since I think basing our ADTs on the Pythonized version of the Rust syntax would be more usable and elegant. The argument given against it is that this would be a more invasive change, which may not matter for us to get it right the first time?
Rust-style enums containing both classes and values would be my ideal scenario. I'm very happy to share actual examples from work where I've wished I had robust ADTs if that would help.
Few people here know Rust. (At least I'll admit that *I* don't know it.) I don't believe actual examples from your work would be that helpful (without knowing Rust I wouldn't be able to understand them). But perhaps you could sketch out a counter-proposal with a Pythonic syntax and some tutorial examples? (More than foo+bar, less than real-world code.) I certainly am not married to the @sealed syntax, it's just the only thing I feel I have a decent understanding of (roughly, "final outside this module"). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Alternative to Union Types
As shown, union types can be used to represent alternatives of several different types, without requiring those types to be part of a custom-crafted class hierarchy, or requiring explicit wrapping.
Pre-planning the Class Hierarchy Other languages would require pre-planning of the class hierarchy, like
[Responding to John Hagen] Thanks for writing up the PEP. However, like Eric, I'm strongly negative on this proposal. I find it unergonomic and not worth it for the benefits it claims to add. ## Favor Composition over Inheritance On a high level, we want to favor composition over inheritance. With `sealed`, we would be restricted to classes being defined in the same module and all part of a fixed class hierarchy. With union types, we can freely mix and match library classes and user-defined classes. And we can use a class as part of any number of unions, in any module. A *very* common pattern I see with type aliases is to have a large union of types imported from various modules, including third-party libraries. (For open-source examples, see [1].) This flexible mixing would not be possible with sealed classes: ``` from foo import Foo from bar import Bar MyUnion = Foo | Bar | ... def foo(x: MyUnion) -> int: if isinstance(x, Foo): return x.foo_method() if isinstance(x, Bar): return x.bar_method() ``` ## Scala and Kotlin: Sealed Classes for the Lack of Union Types You mention Scala as a language that has sealed types. But Scala only got union types in Scala 3 [2]. They were restricted to sealed classes before that. Their own docs point out the weaknesses of sealed classes/traits compared to union types [3]: the following example illustrates:
trait UsernameOrPassword case class Username(name: String) extends UsernameOrPassword case class Password(hash: Hash) extends UsernameOrPassword def help(id: UsernameOrPassword) = ...
Pre-planning does not scale very well since, for example, requirements of
Union types would make a big difference in Kotlin. At the moment, a sealed class is the closest thing to a union type, but it has several
API users might not be foreseeable. Additionally, cluttering the type hierarchy with marker traits like UsernameOrPassword also makes the code more difficult to read. The other reason Scala needed explicit case classes was to enable pattern-matching, since case classes automatically derive the necessary `unapply` method [6]. Python's structural pattern-matching doesn't have that limitation. Kotlin also lacks union types, and the language designers seem to frequently mention the same downsides as above [4]: limitations, the biggest being that you can only have subtypes defined by you as part of a sealed class.
[PEP] A design pattern where a group of record-like classes is combined into a union is popular in other languages that support pattern matching
TypeScript's type system is closer to Python's than Scala or Java. And TypeScript heavily uses union types for exhaustive pattern-matching. ## Motivating Example I found your motivating code snippet pretty hard to understand when it was written using `sealed`. We have to make each of the classes inherit from the sealed superclass. This is cumbersome for users and ties them down to a class hierarchy. This goes against the spirit of static duck typing, where we shouldn't have to reach into a class definition and add a parent just to make it acceptable to the type system. With `sealed`, we are limited to classes defined by the user in the same module. In real-world code, I find a lot of examples of unions of types imported from different modules, but very few of the same-module classes that you have. How often do you think this will be used? The PEP's approach also requires us to define empty shell classes for `Node`, `Expression`, or `Statement`. Finally, to understand the intended meaning of the `Expression` class, I have to go over the rest of the file and hunt for its subclasses (which could be anywhere in the module). That kind of non-local reasoning is highly unintuitive. With a union, `Expression = Name | Operation`, I can see the components of `Expression` right where it is defined. Contrast the motivating example to the more idiomatic `Union` version, where we have independent classes: ``` from __future__ import annotations from typing import * from dataclasses import dataclass @dataclass class Name: name: str @dataclass class Operation: left: Expression op: str right: Expression @dataclass class Assignment: target: str value: Expression @dataclass class Print: value: Expression Expression = Name | Operation Statement = Assignment | Print Node = Expression | Statement def dump(node: Node) -> None: match node: case Assignment(target, value): reveal_type(target) reveal_type(value) case Print(value): reveal_type(value) case Operation(left, op, right): reveal_type(left) reveal_type(op) reveal_type(right) case _: assert False ``` The above works just fine in Pyright. I also tested something similar in Pyre [5]. ## Specification
Once disadvantage [of explicitly listing subclasses, as in Java] is that requires that all subclasses to be written twice: once when defined and once in the enumerated list on the base class.
Doesn't your proposal also require the base class to be written an extra time for each subclass? For example, in your motivating example, Node, Expression, and Statement are repeated for each subclass. ## Rejected Ideas
Having to remember whether to use the base class or the union type in each situation is particularly unfriendly to the user of a sealed class.
Agreed that this is a drawback of the union approach. But this seems minor given that the example was somewhat contrived: the user matched on `Node | None` and did nothing with the `Node` branch. The most common use case is where we are matching on the individual variants of `Node`: `Expression`, `Statement`, etc. As Thomas (tmkehrenberg) mentioned:
Unions do support isinstance since python 3.10 but apparently they do not support pattern matching. This feels like an oversight to me.
That should be a reasonable solution to this problem. Even without that, I don't think this is a significant drawback. ## Conclusion It looks like the main benefits proposed by this PEP are: 1. Not having to import base classes in the cases where we want to match on the whole union instead of matching the individual variants. This seems like a minor benefit and can be addressed by allowing Unions in match. 2. Automatic exhaustiveness-checking: That is, if the match statement for `Node` doesn't check for `Name`, the type checker should warn about it. This is currently already expressible in 3.11 using Jelle's `case _: assert_never()`. Yes, this is cumbersome to add, but it's not always illegal to leave out a case branch. In cases where it is illegal - say, if each case branch defines a variable, but there is no case branch for `Name` - then the type checker already complains about it. For example: ``` match node: case Assignment(target, value): x = 1 case Print(value): x = 2 case Operation(left, op, right): x = 3 print(x) # Pyright: "x" is possibly unbound (since there was no branch for Name) ``` A better solution for your use case might be to have type checkers complain about missing case branches in their "strict" mode. That should satisfy users who want `match` to check for exhaustiveness without `assert_never`, while avoiding unnecessary noise for most users. Other than these two minor benefits, I don't see any compelling advantages. There are many downsides, such as: 1. Not being able to create an ADT using existing types. 2. Tying the user to a class hierarchy. 3. Causing unidiomatic code, with empty base classes that need to be repeated for each subclass and non-local reasoning needed to understand the meaning of the sealed base class. If we want "versatile ADTs", unions are much more flexible and versatile than the `@sealed` alternative, while providing similar exhaustiveness-checking. So, I'm strongly negative on this PEP. [1]: https://grep.app/search?q=%3D%20Union%5B&case=true&filter[lang][0]=Python [2]: http://dotty.epfl.ch/docs/reference/new-types/union-types-spec.html [3]: https://docs.scala-lang.org/scala3/book/types-union.html#alternative-to-unio... [4]: https://discuss.kotlinlang.org/t/union-types/77/103 [5]: shorturl.at/acF26 [6]: https://docs.scala-lang.org/overviews/scala-book/case-classes.html#an-unappl... On Sat, May 7, 2022 at 5:16 PM Guido van Rossum <guido@python.org> wrote:
On Sat, May 7, 2022 at 4:39 PM Tin Tvrtković <tinchester@gmail.com> wrote:
Just like Eric, appreciate the work that has gone into this. I am excited by the idea of ADTs since they are a key part of making invalid state unrepresentable, which is amazing for software correctness. However, I would be disappointed if this version of ADTs was accepted since I think basing our ADTs on the Pythonized version of the Rust syntax would be more usable and elegant. The argument given against it is that this would be a more invasive change, which may not matter for us to get it right the first time?
Rust-style enums containing both classes and values would be my ideal scenario. I'm very happy to share actual examples from work where I've wished I had robust ADTs if that would help.
Few people here know Rust. (At least I'll admit that *I* don't know it.) I don't believe actual examples from your work would be that helpful (without knowing Rust I wouldn't be able to understand them). But perhaps you could sketch out a counter-proposal with a Pythonic syntax and some tutorial examples? (More than foo+bar, less than real-world code.) I certainly am not married to the @sealed syntax, it's just the only thing I feel I have a decent understanding of (roughly, "final outside this module").
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/> _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: gohanpra@gmail.com
-- S Pradeep Kumar
A *very* common pattern I see with type aliases is to have a large union of types imported from various modules, including third-party libraries
This proposal does not remove or deprecate `Union`. Anonymous sum types (`Union`) and nominal sum types (`sealed`) would live side-by-side just as anonymous product types (`tuple`) and nominal product types (`dataclass`) do today. If there is no relationship between the subtypes (like UsernameOrPassword), then by all means, use a Union, but a type hierarchy is still very appropriate when the subtypes are related (like Result).
This goes against the spirit of static duck typing, where we shouldn't have to reach into a class definition and add a parent just to make it acceptable to the type system. With `sealed`, we are limited to classes defined by the user in the same module. In real-world code, I find a lot of examples of unions of types imported from different modules, but very few of the same-module classes that you have. How often do you think this will be used?
Yep, this unashamedly goes against the spirit of duck typing. This is a nominal typing feature. I searched my current project and counted 27 instances of abstract base classes whose subclasses were all defined in the same file. Now, I am not sure how many of them would break if I added a subclass, (I am pretty sure that some of them would be fine because consumers should only use abstract methods), but that's kind of the point. A `sealed` decorator would make that contract explicit. Consumers of the class would know when exhaustive matching was allowed. I would know when I was breaking that contract and mypy would show me all the matches that needed to be fixed when I did.
Doesn't your proposal also require the base class to be written an extra time for each subclass? For example, in your motivating example, Node, Expression, and Statement are repeated for each subclass.
True. That feels like the minimum price to pay for inheritance, but I can see how other people could feel differently.
Sorry if I gave the wrong impression, I work in Python so my examples would be Python. In any case, the PEP gives this Rust example: ``` enum Message { Quit, Move { x: i32, y: i32 }, Write(String), ChangeColor(i32, i32, i32), } ``` and then it proceeds to translate it into Python a few lines below. This is useful since it's an ADT with both classes and values (Message.Quit is a value - you can't instantiate it). This is possible today in Python, like this: ``` class MessageValue(Enum): QUIT = "quit" @dataclass class Move: x: int y: int @dataclass class Write: msg: str @dataclass class ChangeColor: r: int g: int b: int Message = MessageValue | Move | Write | ChangeColor ``` but it's verbose and disjointed. (Here's how this would be used: ``` def process_message(m: Message) -> None: match m: case MessageValue.QUIT: print("quit") case Move(x, y): print(f"move: {x} {y}") case Write(msg): print(f"write: {msg}") case ChangeColor(): print("change color") case _ as x: assert_never(x) ```) I propose adding some magic, so we get this instead: ``` class Message(Enum): QUIT = "quit" @dataclass class Move: x: int y: int @dataclass class Write: msg: str @dataclass class ChangeColor: r: int g: int b: int ``` with the same semantics as `Message` in the first example. (The dataclasses are *inside* the Message enum.) Given a Python Enum, type checkers already semantically treat it as a Union of its values. We extend this union to contain instances of classes defined inside. If backwards comp is an issue, we add a different base class in the enum module (so it'd be `Message(ADT)` instead of `Message(Enum)`. On Sun, May 8, 2022 at 2:16 AM Guido van Rossum <guido@python.org> wrote:
Few people here know Rust. (At least I'll admit that *I* don't know it.) I don't believe actual examples from your work would be that helpful (without knowing Rust I wouldn't be able to understand them). But perhaps you could sketch out a counter-proposal with a Pythonic syntax and some tutorial examples? (More than foo+bar, less than real-world code.) I certainly am not married to the @sealed syntax, it's just the only thing I feel I have a decent understanding of (roughly, "final outside this module").
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Tin Tvrtković wrote:
I propose adding some magic, so we get this instead: ``` class Message(Enum): QUIT = "quit" @dataclass class Move: x: int y: int @dataclass class Write: msg: str @dataclass class ChangeColor: r: int g: int b: int ```
I am sympathetic to this design. Right now, `Enum` rewrites that `"quit"` value so that `Message.QUIT` is not actually equal to `"quit"`, but is actually an instance of `Message`. If the syntax of `Enum` was expanded to allow not just values, but classes, then `Move`, `Write`, and `ChangeColor` would have to be rebuilt to be subclasses of `Message`. I am not sure what all the implications of that would be. I remember there was hesitation toward having dataclasses add `__slots__` because it required rebuilding each class from scratch [1], but that was ultimately resolved in favor of rebuilding [2]. I have not heard any horror stories about dataclass slots, so maybe it's ok. [1] https://bugs.python.org/issue42269 [2] https://github.com/python/cpython/blob/859250cc55711f4d62b65922d3f7537826c38...
On 5/8/2022 5:01 PM, David Hagen wrote:
I am sympathetic to this design. Right now, `Enum` rewrites that `"quit"` value so that `Message.QUIT` is not actually equal to `"quit"`, but is actually an instance of `Message`. If the syntax of `Enum` was expanded to allow not just values, but classes, then `Move`, `Write`, and `ChangeColor` would have to be rebuilt to be subclasses of `Message`. I am not sure what all the implications of that would be. I remember there was hesitation toward having dataclasses add `__slots__` because it required rebuilding each class from scratch [1], but that was ultimately resolved in favor of rebuilding [2]. I have not heard any horror stories about dataclass slots, so maybe it's ok.
There are definitely some problems with returning a new class for slots=True. For example, zero-argument super is broken: https://github.com/python/cpython/issues/90562 I'm sure there are others. Eric
I'm much more interested in the typing community's feedback; runtime-wise I'm sure something will be doable. If this style is accepted I volunteer to write the runtime implementation. A tangent, since we're discussing implementation: I'm the person who initially contributed slot support to attrs, back in 2016 (phew), after which it went through several iterations by myself and other people before being ported to dataclasses. It's not possible to convert an existing dict class into a slot class so the class needs to be cloned, like you mentioned. But in attrs, cloned classes work with bare super on all Python versions we support (including PyPy) since we maintain the gnarly workaround code for it. In any case, I don't know yet if the `__mro__` of a class can be changed at runtime or if the class needs to be cloned. On Mon, May 9, 2022 at 2:18 AM Eric V. Smith <eric@trueblade.com> wrote:
On 5/8/2022 5:01 PM, David Hagen wrote:
I am sympathetic to this design. Right now, `Enum` rewrites that
`"quit"` value so that `Message.QUIT` is not actually equal to `"quit"`, but is actually an instance of `Message`. If the syntax of `Enum` was expanded to allow not just values, but classes, then `Move`, `Write`, and `ChangeColor` would have to be rebuilt to be subclasses of `Message`. I am not sure what all the implications of that would be. I remember there was hesitation toward having dataclasses add `__slots__` because it required rebuilding each class from scratch [1], but that was ultimately resolved in favor of rebuilding [2]. I have not heard any horror stories about dataclass slots, so maybe it's ok.
There are definitely some problems with returning a new class for slots=True.
For example, zero-argument super is broken: https://github.com/python/cpython/issues/90562
I'm sure there are others.
Eric
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: tinchester@gmail.com
Rust-style enums containing both classes and values would be my ideal scenario. I'm very happy to share actual examples from work where I've wished I had robust ADTs if that would help.
I am sympathetic to Rust-style enums (subclasses listed inside the body of the base class). The problem is getting this to work with inheritance in Python. Rust doesn't have inheritance, so it doesn't care. Scala and Swift have a compiler that handles it. The obvious design (the rejected one in the PEP) would use the `class` keyword within the base class body. The problem is that the base class does not exist at this point (at least not under its name), so something like this will not work: ``` from enum import Enum class Result(Enum): class Success(Result): ... class Failure(Result): .... ``` We could have `Enum` do some magic (on top of what it does right now) that it injects itself into the `__bases__` of any class object found among its attributes. Or `Enum` could rebuild each class object found with itself added to the list of bases. This may be worth exploring more, because there are a number of edge cases that I don't know how to deal with from an implementation perspective.
On 2022-05-08, at 13:04, David Hagen <david@drhagen.com> wrote:
Rust-style enums containing both classes and values would be my ideal scenario. I'm very happy to share actual examples from work where I've wished I had robust ADTs if that would help.
I am sympathetic to Rust-style enums (subclasses listed inside the body of the base class). The problem is getting this to work with inheritance in Python. Rust doesn't have inheritance, so it doesn't care. Scala and Swift have a compiler that handles it.
The obvious design (the rejected one in the PEP) would use the `class` keyword within the base class body. The problem is that the base class does not exist at this point (at least not under its name), so something like this will not work:
``` from enum import Enum
class Result(Enum): class Success(Result): ...
class Failure(Result): .... ```
We could have `Enum` do some magic (on top of what it does right now) that it injects itself into the `__bases__` of any class object found among its attributes. Or `Enum` could rebuild each class object found with itself added to the list of bases. This may be worth exploring more, because there are a number of edge cases that I don't know how to deal with from an implementation perspective. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jakub@stasiak.at
Hey all, this reminds me that I actually played with the idea of Rust-like enums in Python a while ago and this is what I came up with: https://gist.github.com/jstasiak/7f1687100ec7b2000b5206a537b1a0b0 I focused on what kind of API I wanted and the support for all types of members that Rust has: singletons, "tuple structs" and regular structs. An excerpt: @enum class WebEvent: PageLoad: None KeyPress: tuple[str] class Click: x: int y: int def inspect(event: WebEvent) -> None: match event: case WebEvent.PageLoad: print("Page loaded") case WebEvent.KeyPress(c): print(f"Pressed key {c}") case WebEvent.Click(x=x, y=y): print(f"Clicked on {x}/{y}") Best, Jakub
Thank you, John. I agree that adding first-class ADTs support is a major thing. First of all, right now this is a real problem. Some example code I took from my real project (https://github.com/dry-python/returns): ``` from typing import Union class S: def method(self) -> int: ... class F: def method(self) -> int: ... Result1 = Union[S, F]reveal_type(Result1)# note: Revealed type is "builtins.object"reveal_type(Result1.method)# error: "object" has no attribute "method"# note: Revealed type is "Any" Result2: Union[S, F] # error: Variable "ex.Result2" is not valid as a type# note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-a... some() -> Result2: ... ``` This code creates a base `Result` type with several user-facing methods and only two possible subclasses: Success and Failure. Right now, this is complicated. You can create two different objects: `Result` base class with actual methods and some kind of `ResultType = Union[Success, Failure]` to enable exhaustive pattern matching. This is not friendly at all. See related discussion: https://github.com/dry-python/returns/issues/1361 But, the current proposal is limited to a single submodule. I see this as a major weakness and quite unintuitive thing to teach. I think that `__sealed__: Success | Failure` rejected idea might be actually better. Since we have `from __future__ import annotations` and string-based annotations this works in any case. Cycle imports can be suppressed with `if TYPE_CHECKING:`. Pros: - It will solve the initial problem - We are not limited to any module structure - It is easy to teach: we have similar class-level magic things, like `__slots__` - It will support runtime introspection out-of-the-box - We don't need to implement anything on CPython's side (except some new docs about `__sealed__`) So, my example becomes: ``` from __future__ import annotations class Result: __sealed__: Success | Failure def bind(self, function) -> Result: ... class Success(Result): ... class Failure(Result): ... ``` Best, Nikita сб, 7 мая 2022 г. в 20:24, John Hagen <johnthagen@gmail.com>:
David and I are pleased to share a draft PEP. We welcome any additional feedback.
It can be viewed rendered at: https://github.com/johnthagen/sealed-typing-pep
PEP: <REQUIRED: pep number> Title: Adding a sealed qualifier to typing Author: John Hagen <johnthagen@gmail.com>, David Hagen <david@drhagen.com> Sponsor: Jelle Zijlstra PEP-Delegate: <PEP delegate's real name> Discussions-To: typing-sig@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 17-Apr-2022 Python-Version: 3.12 Post-History: Resolution: <url>
Abstract ========
This PEP proposes a ``@sealed`` decorator be added to the ``typing`` module to support creating versatile algebraic data types (ADTs) which type checkers can exhaustively pattern match against.
Motivation ==========
Quite often it is desirable to apply exhaustiveness to a set of classes without defining ad-hoc union types, which is itself fragile if a class is missing in the union definition. A design pattern where a group of record-like classes is combined into a union is popular in other languages that support pattern matching [1]_ and is known as a nominal sum type, a key instatiation of algebraic data types [2]_.
We propose adding a special decorator class ``@sealed`` to the ``typing`` module [3]_, that will have no effect at runtime, but will indicate to static type checkers that all direct subclasses of this class should be defined in the same module as the base class.
The idea is that, since all subclasses are known, the type checker can treat the sealed base class as a union of all its subclasses. Together with dataclasses this allows a clean and safe support of algebraic data types in Python. Consider this example,
.. code-block:: python
from dataclasses import dataclass from typing import sealed
@sealed class Node: ...
@sealed class Expression(Node): ...
@sealed class Statement(Node): ...
@dataclass class Name(Expression): name: str
@dataclass class Operation(Expression): left: Expression op: str right: Expression
@dataclass class Assignment(Statement): target: str value: Expression
@dataclass class Print(Statement): value: Expression
With such a definition, a type checker can safely treat ``Node`` as ``Union[Expression, Statement]``, and also safely treat ``Expression`` as ``Union[Name, Operation]`` and ``Statement`` as ``Union[Assignment, Print]``. With these declarations, a type checking error will occur in the below snippet, because ``Name`` is not handled (and the type checker can give a useful error message).
.. code-block:: python
def dump(node: Node) -> str: match node: case Assignment(target, value): return f"{target} = {dump(value)}" case Print(value): return f"print({dump(value)})" case Operation(left, op, right): return f"({dump(left)} {op} {dump(right)})"
Note: This section was largely derived from PEP 622 [4]_.
Rationale =========
Kotlin [5]_, Scala 2 [6]_, and Java 17 [7]_ all support a ``sealed`` keyword that is used to declare algebraic data types. By using the same terminology, the ``@sealed`` decorator will be familiar to developers familiar with those languages.
Specification =============
The ``typing.sealed`` decorator can be applied to the declaration of any class. This decoration indicates to type checkers that all immediate subclasses of the decorated class are defined in the current file.
The exhaustiveness checking features of type checkers should assume that there are no subclasses outside the current file, treating the decorated class as a ``Union`` of all its same-file subclasses.
Type checkers should raise an error if a sealed class is inherited in a file different from where the sealed class is declared.
A sealed class is automatically declared to be abstract. Whatever actions a type checker normally takes with abstract classes should be taken with sealed classes as well. What exactly these behaviors are (e.g. disallowing instantiation) is outside the scope of this PEP.
Similar to the ``typing.final`` decorator [8]_, the only runtime behavior of this decorator is to set the ``__sealed__`` attribute of class to ``True`` so that the sealed property of the class can be introspected. There is no runtime enforcement of sealed class inheritance.
Reference Implementation ========================
[Link to any existing implementation and details about its state, e.g. proof-of-concept.]
Rejected Ideas ==============
Generalize ``Enum`` -------------------
Rust [9]_, Scala 3 [10]_, and Swift [11]_ support algebraic data types using a generalized ``enum`` mechanism.
.. code-block:: rust
enum Message { Quit, Move { x: i32, y: i32 }, Write(String), ChangeColor(i32, i32, i32), }
One could imagine a generalization of the Python ``Enum`` [12]_ to support variants of different shapes. But given that the Python ``Enum`` is more or less a normal class, with some magic internals, this would be a much more invasive change.
.. code-block:: python
from dataclasses import dataclass from enum import Enum
class Message(Enum): @dataclass class Quit: ...
@dataclass class Move: x: int y: int
@dataclass class Write: message: str
@dataclass class ChangeColor: r: int g: int b: int
Explicitly list subclasses --------------------------
Java requires that subclasses be explicitly listed with the base class.
.. code-block:: java
public sealed interface Node permits Leaf, Branch {}
public final class Leaf {} public final class Branch {}
The advantage of this requirement is that subclasses can be defined anywhere, not just in the same file, eliminating the somewhat weird file dependence of this feature. Once disadvantage is that requires that all subclasses to be written twice: once when defined and once in the enumerated list on the base class.
There is also an inherent circular reference when explicitly enumerating the subclasses. The subclass refers to the base class in order to inherit from it, and the base class refers to the subclasses in order to enumerate them. In statically typed languages, these kinds of circular references in the types can be managed, but in Python, it is much harder.
For example, this ``Sealed`` base class that behaves like ``Generic``:
.. code-block:: python
from typing import Sealed
class Node(Sealed[Leaf, Branch]): ...
class Leaf(Node): ... class Branch(Node): ...
This cannot work because ``Leaf`` must be defined before ``Node`` and ``Node`` must be defined before ``Leaf``. This is a not an annotation, so lazy annotations cannot save it. Perhaps, the subclasses in the enumerated list could be strings, but that severely hurts the ergonomics of this feature.
If the enumerated list was in an annotation, it could be made to work, but there is no natural place for the annotation to live. Here is one possibility:
.. code-block:: python
class Node: __sealed__: Leaf | Branch
class Leaf(Node): ... class Branch(Node): ...
``Union`` of independent variants ---------------------------------
Some of the behavior of ``sealed`` can be emulated with ``Union`` today.
.. code-block:: python
class Leaf: ... class Branch: ...
Node = Leaf | Branch
The main problem with this is that the ADT loses all the features of inheritance, which is rather featureful in Python, to put it mildly. There can be no abstract methods, private methods to be reused by the subclasses, public methods to be exposed on all subclasses, ``__init_subclass__``, etc. Even if a specific method is implemented on each subclass, then rename, jump-to-definition, find-usage, and other IDE features are difficult to make work reliably.
Adding a base class in addition to the union type alleviates some of these issues:
.. code-block:: python
class BaseNode: ...
class Leaf(BaseNode): ... class Branch(BaseNode): ...
Node = Leaf | Branch
Despite being possible today, this is quite unergonomic. The base class and the union type are conceptually the same thing, but have to be defined as two separate objects. If this became standard, it seems Python would be first language to separate the definition of an ADT into two different objects.
The base class is not merely passive, either. There are a number of operations that will only work when using the base class instead of the union type. For example, matching only works on the base class, not the union type:
.. code-block:: python
maybe_node: Node | None = ... # must be Node to enforce exhaustiveness
match maybe_node: case Node(): # TypeError: called match pattern must be a type ... case None: ...
match maybe_node: case BaseNode(): # no error ... case None: ...
Having to remember whether to use the base class or the union type in each situation is particularly unfriendly to the user of a sealed class.
Footnotes =========
.. [1] https://en.wikipedia.org/wiki/Pattern_matching
.. [2] https://en.wikipedia.org/wiki/Algebraic_data_type
.. [3] https://docs.python.org/3/library/typing.html
.. [4]
https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types
.. [5] https://kotlinlang.org/docs/sealed-classes.html
.. [6] https://docs.scala-lang.org/tour/pattern-matching.html
.. [7] https://openjdk.java.net/jeps/409
.. [8] https://peps.python.org/pep-0591/
.. [9] https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html
.. [10] https://docs.scala-lang.org/scala3/reference/enums/adts.html
.. [11] https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html
.. [12] https://docs.python.org/3/library/enum.html
Copyright =========
This document is placed in the public domain. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: n.a.sobolev@gmail.com
I've often wished for an ADT-style construct while modeling state, but I don't think the `@sealed`, inheritance-based approach from the PEP is actually good. You'd want the final ADT to be a combination of a union and an enum, so it can contain constants too, right? On Sat, Apr 16, 2022 at 9:51 PM John Hagen <johnthagen@gmail.com> wrote:
The original (superseded) PEP 622 proposed adding a @sealed decorator so that users could define algebraic data types (ADTs) that static type checkers could verify are `match`ed exhaustively:
- https://peps.python.org/pep-0622/#sealed-classes-as-algebraic-data-types
There is a somewhat fairly-upvoted question about ADTs in Python, indicating interest in this area:
- https://stackoverflow.com/questions/16258553
@sealed would be very valuable towards making ADT usage within Python more type-safe. Is there any historical discussion about why @sealed was dropped from the final pattern matching PEP? What would be the next steps needed to push this forward? Would writing a new PEP be the next step? _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: tinchester@gmail.com
I appreciate the work that went into the proposal, but I'm strongly negative on adding `@sealed` to the Python type system for the reasons I mentioned previously in this thread. It creates new problems by introducing the first and only case where type behavior is dependent on the module in which the type is declared. The use of unions with a separate base class is discussed in the "Rejected Ideas" section, but I find the argument extremely weak.
Despite being possible today, this is quite unergonomic.
The notion of "unergonomic" is subjective, but this doesn't look at all unergonomic to me. Once you know about the pattern, it's easy to use and understand. If you haven't seen it before, it might look odd initially, but the same is true for `@sealed`. And `@sealed` requires understanding new concepts within the type system, whereas Python users (and existing Python tools) already understand the concept of base classes and unions.
If this became standard, it seems Python would be first language to separate the definition of an ADT into two different objects.
That's not true. TypeScript (which is way more prevalent than Kotlin and Scala) takes this approach. IMO, this would be a net negative contribution to the type system. It would be much cheaper to document and evangelize the technique that uses a base class and unions. -Eric
participants (15)
-
Daniel Cooper
-
David Hagen
-
david@drhagen.com
-
Eric Traut
-
Eric V. Smith
-
Guido van Rossum
-
Jakub Stasiak
-
Jelle Zijlstra
-
John Hagen
-
Ran Benita
-
S Pradeep Kumar
-
shalokshalom@protonmail.ch
-
Tin Tvrtković
-
tmkehrenberg@gmail.com
-
Никита Соболев