Mailman 3 PEP 622 version 2 (Structural Pattern Matching) - Python-Dev

8 Jul 2020

      Today I’m happy (and a little trepidatious) to announce the next
version of PEP 622, Pattern Matching. As authors we welcome Daniel F
Moisset in our midst. Daniel wrote a lot of the new text in this
version, which introduces the subject matter much more gently than the
first version did. He also convinced us to drop the `__match__`
protocol for now: the proposal stands quite well without that kind of
extensibility, and postponing it will allow us to design it at a later
time when we have more experience with how `match` is being used.

That said, the new version does not differ dramatically in what we
propose. Apart from dropping `__match__` we’re dropping the leading
dot to mark named constants, without a replacement, and everything
else looks like we’re digging in our heels. Why is that? Given the
firestorm of feedback we received and the numerous proposals (still
coming) for alternative syntax, it seems a bad tactic not to give up
something more substantial in order to get this proposal passed. Let
me explain.

Language design is not like politics. It’s not like mathematics
either, but I don’t think this situation is at all similar to
negotiating a higher minimum wage in exchange for a lower pension,
where you can definitely argue about exactly how much lower/higher
you’re willing to go. So I don’t think it’s right to propose making
the feature a little bit uglier just to get it accepted.

Frankly, 90% of the issue is about what among the authors we’ve dubbed
the “load/store” problem (although Tobias never tires to explain that
the “load” part is really “load-and-compare”). There’s a considerable
section devoted to this topic in the PEP, but I’d like to give it
another try here.

In case you’ve been avoiding python-dev lately, the problem is
this. Pattern matching lets you capture values from the subject,
similar to sequence unpacking, so that you can write for example
```
    x = range(4)
    match x:
        case (a, b, *rest):
            print(f"first={a}, second={b}, rest={rest}")  # 0, 1, [2, 3]
```
Here the `case` line captures the contents of the subject `x` in three
variables named `a`, `b` and `rest`. This is easy to understand by
pretending that a pattern (i.e., what follows `case`) is like the LHS
of an assignment.

However, in order to make pattern matching more useful and versatile,
the pattern matching syntax also allows using literals instead of
capture variables. This is really handy when you want to distinguish
different cases based on some value, for example
```
    match t:
        case ("rect", real, imag):
            return complex(real, imag)
        case ("polar", r, phi):
            return complex(r * cos(phi), r * sin(phi))
```
You might not even notice anything funny here if I didn’t point out
that `"rect"` and `"polar"` are literals -- it’s really quite
natural for patterns to support this once you think about it.

The problem that everybody’s been concerned about is that Python
programmers, like C programmers before them, aren’t too keen to have
literals like this all over their code, and would rather give names to
the literals, for example
```
    USE_POLAR = "polar"
    USE_RECT = "rect"
```
Now we would like to be able to replace those literals with the
corresponding names throughout our code and have everything work like
before:
```
    match t:
        case (USE_RECT, real, imag):
            return complex(real, imag)
        case (USE_POLAR, r, phi):
            return complex(r * cos(phi), r * sin(phi))
```
Alas, the compiler doesn’t know that we want `USE_RECT` to be a
constant value to be matched while we intend `real` and `imag` to be
variables to be given the corresponding values captured from the
subject. So various clever ways have been proposed to distinguish the
two cases.

This discussion is not new to the authors: before we ever published
the first version of the PEP we vigorously debated this (it is Issue 1
in our tracker!), and other languages before us have also had to come
to grips with it. Even many statically compiled languages! The reason
is that for reasons of usability it’s usually deemed important that
their equivalent of `case` auto-declare the captured variables, and
variable declarations may hide (override) like-named variables in
outer scopes.

Scala, for example, uses several different rules: first, capture
variable names must start with a lowercase letter (so it would
handle the above example as intended); next, capture variables
cannot be dotted names (like `mod.var`); finally, you can enclose any
variable in backticks to force the compiler to see it as a load
instead of a store. Elixir uses another form of markup for loads: `x`
is a capture variable, but `^x` loads and compares the value of `x`.

There are a number of dead ends when looking for a solution that works
for Python. Checking at runtime whether a name is defined or not is
one of these: there are numerous reasons why this could be confusing,
not the least of which being that the `match` may be executed in a
loop and the variable may already be bound by a previous
iteration. (True, this has to do with the scope we’ve adopted for
capture variables. But believe me, giving each case clause its own
scope is quite horrible by itself, and there are other
action-at-a-distance effects that are equally bad.)

It’s been proposed to explicitly state the names of the variables
bound in a header of the `match` statement; but this doesn’t scale
when the number of cases becomes larger, and requires users to do
bookkeeping the compiler should be able to do. We’re really looking
for a solution that tells you when you’re looking at an individual
`case` which variables are captured and which are used for
load-and-compare.

Marking up the capture variables with some sigil (e.g. `$x` or `x?`)
or other markup (e.g. backticks or `<x>`) makes this common case ugly
and inconsistent: it’s unpleasant to see for example
```
    case %x, %y:
        print(x, y)
```
No other language we’ve surveyed uses special markup for capture
variables; some use special markup for load-and-compare, so we’ve
explored this. In fact, in version 1 of the PEP our long-debated
solution was to use a leading dot. This was however boohed off the
field, so for version 2 we reconsidered. In the end nothing struck our
fancy (if `.x` is unacceptable, it’s unclear why `^x` would be any
better), and we chose a simpler rule: named constants are only
recognized when referenced via some namespace, such as `mod.var` or
`Color.RED`.

We believe it’s acceptable that things looking like `mod.var` are
never considered capture variables -- the common use cases for `match`
are such that one would almost never want to capture into a different
namespace. (Just like you very rarely see `for self.i in …` and never
`except E as scope.var` -- the latter is illegal syntax and sets a
precedent.)

One author would dearly have seen Scala’s uppercase rule adopted, but
in the end was convinced by the others that this was a bad idea, both
because there’s no precedent in Python’s syntax, and because many
human languages simply don’t make the distinction between lowercase
and uppercase in their writing systems.

So what should you do if you have a local variable (say, a function
argument) that you want to use as a value in a pattern? One solution
is to capture the value in another variable and use a guard to compare
that variable to the argument:
```
    def foo(x, spam):
        match x:
            case Point(p, q, context=c) if c == spam:
                # Match
```
If this really is a deal-breaker after all other issues have been
settled, we could go back to considering some special markup for
load-and-compare of simple names (even though we expect this case to
be very rare). But there’s no pressing need to decide to do this now
-- we can always add new markup for this purpose in a future version,
as long as we continue to support dotted names without markup,
since that *is* a commonly needed case.

There’s one other issue where in the end we could be convinced to
compromise: whether to add an `else` clause in addition to `case
_`. In fact, we probably would already have added it, except for one
detail: it’s unclear whether the `else` should be aligned with `case`
or `match`. If we are to add this we would have to ask the Steering
Council to decide for us, as the authors deadlocked on this question.

Regarding the syntax for wildcards and OR patterns, the PEP explains
why `_` and `|` are the best choices here: no other language surveyed
uses anything but `_` for wildcards, and the vast majority uses `|`
for OR patterns.  A similar argument applies to class patterns.

If you've made it so far, here are the links to check out, with an
open mind.  As a reminder, the introductory sections (Abstract,
Overview, and Rationale and Goals) have been entirely rewritten and
also serve as introduction and tutorial.

- PEP 622: https://www.python.org/dev/peps/pep-0622/
- Playground:
https://mybinder.org/v2/gh/gvanrossum/patma/master?urlpath=lab/tree/playgrou...

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

PEP 622 version 2 (Structural Pattern Matching)

tags

participants (49)