[Python-Dev] Re: PEP 622: Structural Pattern Matching

26 Jun 2020

      Hi,

For a PEP to succeed it needs to show two things.

1. Exactly what problem is being solved, or need is to be fulfilled, and 
that is a sufficiently large problem, or need, to merit the proposed change.

2. That the proposed change is the best known solution for the problem 
being addressed.

IMO, PEP 622 fails on both counts.

This email addresses point 1.

Given the positive response to the PEP, it may well be that it does 
address a need. However, the PEP itself fails to show that.

Abstract
--------
...
This PEP proposes adding pattern matching statements [1] to Python in order to create more expressive ways of handling structured heterogeneous data. The authors take a holistic approach, providing both static and runtime specifications.
What does "static and dynamic specifications" mean? Surely, there are 
just specifications.
Python does not have a static checking phase, so static analysis tools 
need to understand the dynamic behaviour of the program, not have their 
own alternate semantics. There is no "static specification" of 
`isinstance()`, yet static analysis tools understand it.
...
PEP 275 and PEP 3103 previously proposed similar constructs, and were rejected. Instead of targeting the optimization of if ... elif ... else statements (as those PEPs did), this design focuses on generalizing sequence, mapping, and object destructuring. It uses syntactic features made possible by PEP 617, which introduced a more powerful method of parsing Python source code.
Why couple the choice part (a sort of enhanced elif) with destructing (a 
sort of enhanced unpacking)?
We could have a "switch" statement that chooses according to value, and 
we could have "destructuring" that pulls values apart. Why do they need 
to be coupled?

Rationale and Goals
-------------------
...
Let us start from some anecdotal evidence: isinstance() is one of the most called functions in large scale Python code-bases (by static call count). In particular, when analyzing some multi-million line production code base, it was discovered that isinstance() is the second most called builtin function (after len()). Even taking into account builtin classes, it is still in the top ten. Most of such calls are followed by specific attribute access.
Why use anecdotal evidence? I don't doubt the numbers, but it would be 
better to use the standard library, or the top N most popular packages 
from GitHub.
...
There are two possible conclusions that can be drawn from this information:
Handling of heterogeneous data (i.e. situations where a variable can take values of multiple types) is common in real world code.
    Python doesn't have expressive ways of destructuring object data (i.e. separating the content of an object into multiple variables).
I don't see how the second conclusion can be drawn.
How does the prevalence of `isinstance()` suggest that Python doesn't 
have expressive ways of destructuring object data?

That `len()` is also common, does suggests that some more expressive 
unpacking syntax might be useful. However, since `len()` only applies to 
sequences, it suggests to me that unpacking of non-sequences isn't 
generally useful.
...
This is in contrast with the opposite sides of both aspects:
This sentence makes no sense. What is "this"? Both aspects of what?
...
Its success in the numeric world indicates that Python is good when working with homogeneous data. It also has builtin support for homogeneous data structures such as e.g. lists and arrays, and semantic constructs such as iterators and generators.
    Python is expressive and flexible at constructing objects. It has syntactic support for collection literals and comprehensions. Custom objects can be created using positional and keyword calls that are customized by special __init__() method.
This PEP aims at improving the support for destructuring heterogeneous data by adding a dedicated syntactic support for it in the form of pattern matching. On a very high level it is similar to regular expressions, but instead of matching strings, it will be possible to match arbitrary Python objects.
An explanation is needed of why "destructuring" needs to be so tightly 
coupled with matching by class or value.
...
We believe this will improve both readability and reliability of relevant code. To illustrate the readability improvement, let us consider an actual example from the Python standard library:
def is_tuple(node):
    if isinstance(node, Node) and node.children == [LParen(), RParen()]:
        return True
    return (isinstance(node, Node)
            and len(node.children) == 3
            and isinstance(node.children[0], Leaf)
            and isinstance(node.children[1], Node)
            and isinstance(node.children[2], Leaf)
            and node.children[0].value == "("
            and node.children[2].value == ")")
Just one example?
The PEP needs to show that this sort of pattern is widespread.
...
With the syntax proposed in this PEP it can be rewritten as below. Note that the proposed code will work without any modifications to the definition of Node and other classes here:
Without modifying Node or Leaf, the matching code will need to access 
attributes. You should at least mention side effects and exceptions.
E.g. matching on ORM objects might be problematic.
...
def is_tuple(node: Node) -> bool:
    match node:
        case Node(children=[LParen(), RParen()]):
            return True
        case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
            return True
        case _:
            return False
Python's support for OOP provides an alternative to ADTs.
For example, by adding a simple "matches" method to Node and Leaf, 
`is_tuple` can be rewritten as something like:

def is_tuple(node):
     if not isinstance(node, Node):
         return False
     return node.matches("(", ")") or node.matches("(", ..., ")")
...
See the syntax sections below for a more detailed specification.
Similarly to how constructing objects can be customized by a user-defined __init__() method, we propose that destructuring objects can be customized by a new special __match__() method. As part of this PEP we specify the general __match__() API, its implementation for object.__match__(), and for some standard library classes (including PEP 557 dataclasses). See runtime section below.
You should mention that we already have the ability to "destructure", 
aka unpack, objects using __iter__.

t = 1, 2 # Creation
a, b = t # "Destructuring"
...
Finally, we aim to provide a comprehensive support for static type checkers and similar tools. For this purpose we propose to introduce a @typing.sealed class decorator that will be a no-op at runtime, but will indicate to static tools that all subclasses of this class must be defined in the same module. This will allow effective static exhaustiveness checks, and together with dataclasses, will provide a nice support for algebraic data types [2]. See the static checkers section for more details.
Shouldn't this be in a separate PEP? It seems only loosely related, and 
would have some value regardless of whether the rest of the PEP is accepted.
...
In general, we believe that pattern matching has been proved to be a useful and expressive tool in various modern languages. In particular, many aspects of this PEP were inspired by how pattern matching works in Rust [3] and Scala [4].
Both those languages are statically typed, which allows the compiler to 
perform the much of the pattern matching at compile time.

You should give examples from dynamic typed languages instead, e.g. clojure.

Cheers,
Mark.

[Python-Dev] Re: PEP 622: Structural Pattern Matching

Mark Shannon