Hi everyone,
By what I'm about to write below, I've aimed at a design to meet needs resembling those that PEP 622 deals with, although with some differences, especially in emphasis.
I'm not writing a full introduction here; the intended audience of this email is people somewhat familiar with PEP 622 and its discussions. This email doesn't have much structure or anything, but I hope it's sufficiently clear.
Things that this design aims to address:
* Check whether an object's structure matches a given pattern
* Optionally extract desired values from within that structure
* Learning: patterns are understandable in a logical and consistent manner
* Matching can be easily tinkered with interactively
* Names used in patterns behave in a clear and intuitive way for Python programmers
* ...
The most difficult thing is perhaps to understand what names mean: what is being bound to, and what is a value defined elsewhere and so on. For example, if a (somewhat unrealistic) pattern looks like
Point3D(x=3.14, y=6, z=_)
, there are four names that refer to something: `Point3D`, `x`, `y` and `z`. To understand this, it is useful to think of the pattern as corresponding to an expression, although it is not treated as quite the same in the end. (So, here x, y, z refer to internals/arguments of Point3D)
The situation becomes more difficult when the values to compare with are in variables, and/or if one wishes to extract a value from the structure.
Point3D(x=pi, y=SIX, z=value)
Now there is no way to tell, from this, which names refer to existing objects and which should be bound to by the operation, except by guessing. Here, `value` is supposed to be a binding target.
Python already has destructuring assignment (`a, b, *rest = values`), which is similar to what happens in function calls: `def func(a, b, *rest): ...`.
However, compared to this, it is more useful to see patterns as working backwards compared to this. For example, the semantics of the names in
lambda value: Point3D(x=pi, y=SIX, z=value)
are exactly as desired. That is, when the pattern matching is interpreted as "does this object look like what this function would produce, and if so, what would be the arguments of the function in that case?". Matching the pattern would bind to `value`.
So, with this, a previous understanding of function definitions already gives you a mental model for how names work in patterns *and* for what the pattern is supposed to do.
In theory, the lambda expression could BE the syntax for a pattern. However, many have wished for a different syntax even for lambdas. A slightly nicer form would be to omit the keyword `lambda`, but then one would still have to repeat the names to be bound. To avoid that, we need a way to explicitly mark "not-yet-bound names":
Point3D(x=pi, y=SIX, z=value?)
Before going any further, how would one invoke the matching machinery? It could be
<expression> matches <pattern>
and that would evaluate to a boolean-like value.
With this syntax, the `is_tuple` example from PEP 622 would look something like this:
def is_tuple(node: Node) -> bool:
if node matches Node(children=[LParen(), RParen()]):
return True
elif node matches Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
return True
return False
Or something like:
def is_tuple(node: Node) -> bool:
if node matches tuple_pattern:
return True
return False
Here, `tuple_pattern` would be a pattern pre-defined with a function-like syntax. Also, if there is a `tuple_pattern`, then `is_tuple` is probably not needed at all.
Note that in PEP 622, is_tuple uses a match statement with three cases. So, in effect, the full tuple_pattern had been split into subpatterns. This was only possible because it was an OR pattern. In general, splitting a longer pattern into cases like that is not possible. Longer function-like patterns, on the other hand, can be expressed using more pattern functions – just like regular functions can use helper functions.
Here, tuple_pattern isn't passed any arguments. It also doesn't have any parameters. However, if it did, those would be considered wildcards, as I believe we'd want both the programmer and the compiler/optimizers to explicitly see, from the match expression, which names will/would be bound. If something other than wildcard behavior is desired, that should be explicitly specified in a "pattern function call".
Let's take another example pattern:
pdef main_diagonal_point(pos):
return Point3D(pos, pos, pos)
Now, `main_diagonal_point(0)` would refer to the origin, and `point matches main_diagonal_point` would be true for every point with `x == y == z`. Similarly, also `point matches main_diagonal_point(pos?)` should only match if `x == y == z` — and then bind that value to `pos`. However, one might expect to be able to write the same thing inline as
point matches Point3D(pos?, pos?, pos?)
, so based on that, multiple occurrences of the same binding target should ensure equality.
What about wildcards? If ? were the wildcard, that would mean that
point matches main_diagonal_point(?)
should NOT mean the same as
point matches Point3D(?,?,?)
So each occurrence of ? would have to be treated as a "new" wildcard, but when it might be passed as an argument to a sub-pattern, it will be equivalent to effectively something like `0?` (and `1?` and so on) so that the tree occurrences (in the former case) would have to be the same. This is not the only worry about using ? for wildcards. I don't find _ as wildcard really that problematic, although it is a bit more problematic here than in PEP 622.
Another option would be to use `Any` as wildcard. However, that would sound like "any *type* of object is fine", although (hopefully) it is most often clear what the *type* should be, and that it is the *value* that can be anything. (And if the type is clear, no isinstance is necessary.)
Then the pattern for `isinstance(obj, Class)`. Quite similarly to using a "pattern function" without arguments above, this could be `obj matches Class`. This means that matching with an instance of `type` (inside a pattern) would default to an instance check, while other `object` instances might perhaps default to `==` if nothing else is specified.
Walrus patterns? While I think people probably understood PEP 622 walrus patterns quite well, I think walrus patterns harm the understanding patterns in general, because they break the mental model of going "backwards" in some sense. The walrus pattern in PEP 622 would be perfectly described by an AND pattern instead. This is because a binding target in PEP 622 is considered a "capture pattern". The django example in PEP 622 would then be
match value:
case [*v, label & (Promise() | str())] if v:
value = tuple(v)
case _:
label = key.replace('_', ' ').title()
However, I prefer not to think of a binding target as a "pattern", just like I don't think patterns are assignment targets. Instead, here, one might "annotate" a "not-yet-bound name" with a (sub-)pattern:
*v?, label?(Promise | str)
This would bind to `v` and `label` as well as check that the thing to be bound to `label` matches Promise | str.
The django example could be written like this:
if value matches *v?, label?(Promise | str) and v:
value = tuple(v)
else:
label = key.replace('_', ' ').title()
However, it would be quite possible to not add this possibility now, and instead use:
if value matches *v?, label? and v and label matches Promise | str:
value = tuple(v)
else:
label = key.replace('_', ' ').title()
I think I'll stop here for now – the internal workings and dunder methods are a whole different story.
I hope this was somewhat understandable. I know I didn't explicitly explain all the semantics – I tried to hit the main points and avoid distractions. If something was unclear etc., I'll be happy to answer to any questions or concerns :).
—Koos