I read the PEP, and a few thoughts:
I think one of the examples is some lib2to3 code? I think the matcher syntax is really great for that case (parse trees). The matcher syntax is definitely an improvement over the litany of helper functions and conditionals otherwise needed.
That said, I have a hard time seeing a particular use of this complicated pattern matching outside "hetergenous trees" (for lack of a better term) of objects? I've only really dealt with that problem with parse trees, but perhaps that just an artifact of the domains I've ended up working in.
In any case, it might be useful to include some/more examples or use cases that aren't as parser-centric.
Question: How are True, False, None, ..., etc handled? What does this do?
case True: ...
case False: ...
case None: ...
I would expect they would be treated as literals the same as e.g. numbers/strings, yes? Sorry if I missed this in the PEP.
I, too, had trouble understanding the __match__ protocol from the PEP text. Brett's comments largely capture my thoughts about this.
The need to use "." to indicate "look up name" to avoid "match anything" seems like a big foot gun. Simple examples such as:
FOO = 1
print("you chose one")
clearly illustrate this, but the problem is present in any case expression: a missing dot changes the meaning from "match this specific value" to almost the opposite: "match any value". And all you really need to do is miss a single leading dot anywhere in the case expression to trigger this. I agree with Barry (I think he said this) that it seems like an easy cause of mysterious bugs.
I think the foot-gun aspect derives directly from the change in how a symbol is interpreted. i.e., Everywhere (predominantly? everything I can think of atm) else in the language when you see "foo", you know it means some sort of lookup of the name "foo" is occurring. The exception to this is fairly simple: when there is some "assignment cue", e.g. "as", :=, =, import, etc, and those assignment cues are always very close by (pretty much always the leading/following token?). Anyways, my point is assignment has a cue close by.
The proposed syntax flips that and mixes it, so it's very confusing. Sometimes a symbol is a lookup, sometimes it's an assignment.
The PEP talks a bit about this in the "alternatives for constant value pattern" section. I don't find the rationale in that section particularly convincing. It basically says using "$FOO" to act as "look up value named FOO" is rejected because "it is new syntax for a narrow use case" and "name patterns are common in typical code ... so special syntax for the common case would be weird".
I don't find that convincing because it seems more weird to change the (otherwise consistent) lookup/assignment behavior of the language for a specific sub-syntax.
Anyways, when I rewrite the examples and use a token to indicate "matcher", I personally find them easier to read. I feel this is because it makes the matcher syntax feel more like templates or string interpolation (or things of that nature) that have some "placeholder" that gets "bound" to a value after being given some "input".
It also sort of honors the "assignment only happens with a localized cue" behavior that already exists.
ORIGIN = 0
case Point(ORIGIN, $end):
I will admit this gives me PHP flashbacks, but it's also very clear where assignments are happening, and I can just use the usual name-lookup rules. I just used $ since the PEP did.
As a bonus, I also think this largely mediates the foot gun problem because there's now a cue a binding is happening, so it's easy to trigger our "is that name already taken, is it safe to assign?" check we mentally perform.
In any case, this seems like a pretty fundamental either/or design decision someone will have to make:
names mean assignment, and the rules of what is a lookup vs assignment are different with some special case support (i.e. leading dot).
use some character to indicate assignment, and the lookup rules are the same.
Related to the above: I also raise this because, in my usage, I doubt I'll be using it as much more than a switch statement. I rarely have to match complicated patterns, but very often have a set of values that I need to test against. The combination of Literal and exhaustive-case checking is very appealing.
So I'm very often going to want to type, e.g.
ValidModes = Union[Literal[A], Literal[B], etc etc]
def foo(mode: ValidModes):
case A: ...
case B: ...
case etc etc
And eventually I'm going to foot-gun myself with a missing dot.
Related to the above, I don't find that e.g. "case Point(...)" not initializing a Point particularly confusing. This feels like it might be inconsistent with my whole thing above, but :shrug:. FWIW, I suspect it's just that the leading "case" cue makes it easy to entirely turn off the "parentheses means code gets called" logic in my mind-parser.
Related to the above, perhaps an unadorned name shouldn't be allowed? e.g. this should be invalid:
I raise this idea because of the foot-gun issue, but also because it creates more ways of doing the same thing: binding the name to a value. Using := doesn't seem like a particularly burdensome solution:
match shape := get_shape():
case: # or *, or _, or whatever
And then either only dotted names or patterns are allowed in cases, not plain names.
Making underscore a special match-anything-but-don't-bind struck me as a bit odd. Aside from the language grammar rules, there aren't really any "this is an OK name, this isn't" type of rules.
I think someone else mentioned using "*" instead of "_"? I had the same exact same thought. If it's not going to be bound to a name, why use an otherwise valid name to not bind it to? I get the ergonomics of it, but it seems like another special-case of how things get processed inside the case expression.
Why | instead of "or" ? "or" is used in other conditionals. This strikes me as another special case of the syntax that differs from elsewhere in the language.
I agree with not having flat indentation. I think having "case" indented from "match" makes it more readable overall.
Anyways, thanks for reading. HTH.