Advantages of pattern matching - a simple comparative analysis

Take as an example a function designed to process a tree of nodes similar to that which might be output by a JSON parser. There are 4 types of node:
- A node representing JSON strings - A node representing JSON numbers - A node representing JSON arrays - A node representing JSON dictionaries
The function transforms a tree of nodes, beginning at the root node, and proceeding recursively through each child node in turn. The result is a Python object, with the following transformation applied to each node type:
- A JSON string `->` Python `str` - A JSON number `->` Python `float` - A JSON array `->` Python `list` - A JSON dictionary `->` Python `dict`
I have implemented this function using 3 different approaches:
- The visitor pattern - `isinstance` checks against the node type - Pattern matching
Here is the implementation using the visitor pattern:
``` from typing import List, Tuple
class NodeVisitor: def visit_string_node(self, node: StringNode): pass
def visit_number_node(self, node: NumberNode): pass
def visit_list_node(self, node: ListNode): pass
def visit_dict_node(self, node: DictNode): pass
class Node: def visit(visitor: NodeVisitor): raise NotImplementedError()
class StringNode(Node): value: str
def visit(self, visitor: NodeVisitor): visitor.visit_string_node(self)
class NumberNode(Node): value: str
def visit(self, visitor: NodeVisitor): visitor.visit_number_node(self)
class ListNode(Node): children: List[Node]
def visit(self, visitor: NodeVisitor): visitor.visit_list_node(self)
class DictNode(Node): children: List[Tuple[str, Node]]
def visit(self, visitor: NodeVisitor): visitor.visit_dict_node(self)
class Processor(NodeVisitor): def process(root_node: Node): return root_node.visit(self)
def visit_string_node(self, node: StringNode): return node.value
def visit_number_node(self, node: NumberNode): return float(node.value)
def visit_list_node(self, node: ListNode): return [child_node.visit(self) for child_node in node.children]
def visit_dict_node(self, node: DictNode): return {key: child_node.visit(self) for key, child_node in node.children}
def process(root_node: Node): processor = Processor() return processor.process(root_node) ```
Here is the implementation using `isinstance` checks against the node type:
``` from typing import List, Tuple
class Node: pass
class StringNode(Node): value: str
class NumberNode(Node): value: str
class ListNode(Node): children: List[Node]
class DictNode(Node): children: List[Tuple[str, Node]]
def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node')
return process_node(root_node) ```
Finally here is the implementation using pattern matching:
``` from typing import List, Tuple
class Node: pass
class StringNode(Node): value: str
class NumberNode(Node): value: str
class ListNode(Node): children: List[Node]
class DictNode(Node): children: List[Tuple[str, Node]]
def process(root_node: Node): def process_node(node: Node): match node: case StringNode(value=str_value): return str_value case NumberNode(value=number_value): return float(number_value) case ListNode(children=child_nodes): return [process_node(child_node) for child_node in child_nodes] case DictNode(children=child_nodes): return {key: process_node(child_node) for key, child_node in child_nodes} case _: raise Exception('Unexpected node')
return process_node(root_node) ```
Here are the lengths of the different implementations:
- Pattern matching `->` 37 lines - `isinstance` checks `->` 36 lines - The visitor pattern `->` 69 lines
The visitor pattern implementation is by far the most verbose solution, weighing in at almost twice the length of the alternative implementations due to the large amount of boilerplate that is necessary to achieve double dispatch. The pattern matching and `isinstance` check implementations are very similar in length for this trivial example.
In each implementation, there are 2 operations performed on each node.
- Determine the type of the node - Destructure the node to extract the desired data
The visitor pattern and `isinstance` check implementations separate these 2 operations, whereas the pattern matching approach combines the operations together. I believe that it is the declarative nature of pattern matching, where the operations of determining the type of the node and destructuring the node are combined into a single clause, which allows pattern matching to express a concise solution to the problem. In this trivial example, the advantage of pattern matching over the alternative of using a sequence of `if`-`elif`-`else` statements is not as obvious as it would be when compared to a more complex example, where a sub-tree of nodes might be matched based on their type and be destructured in a single clause.
I have seen elsewhere an argument that pattern matching should not be accepted into Python as it introduces a pseudo-DSL that is separate from the rest of the language. I agree that pattern matching might be viewed as a pseudo-DSL, but I believe that it is a good thing, if it allows the solution to certain classes of problems to be expressed in a concise manner. People often raise similar objections to operator overloading in other languages, whereas the presence of operator overloading in Python allows mathematical expressions involving custom numeric types such as vectors to be expressed in a natural way. Furthermore, Python has a regular expression module which implements it's own DSL for the purpose of matching string patterns. Regular expressions, in a similar way to pattern matching, allow string patterns to be expressed in a concise and declarative manner.
I really hope that the Steering Council accepts pattern matching into Python. I think that it allows for processing of heterogeneous graphs of objects using recursion in a concise, declarative manner. I would like to thank the authors of the Structural Pattern Matching PEP for their hard work in designing this feature and developing an implementation of it. I believe that it will be a wonderful addition to the language that I am very much looking forward to using.

On 11/23/20 8:15 AM, Brian Coleman wrote:
def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node')
You don't need the "else". And you can change all your "elif"s into "if"s too. Now your "isinstance" version is 35 lines. Shorter than the pattern matching version, roughly the same speed, works in current Python, eminently readable to anybody conversant in current Python. A very reasonable solution to the problem.
There should be one--and preferably only one--obvious way to do it,
//arry/

On 11/23/20 11:06 AM, Larry Hastings wrote:
On 11/23/20 8:15 AM, Brian Coleman wrote:
def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node')
You don't need the "else".
Without the "else" errors will pass silently.
And you can change all your "elif"s into "if"s too.
On average, doubling your comparisons and therefore becoming slower.
Now your "isinstance" version is 35 lines. Shorter than the pattern matching version
But without error checking.
roughly the same speed,
See above comment about becoming slower.
works in current Python, eminently readable to anybody conversant in current Python. A very reasonable solution to
the problem.
Agreed. But (subjectively, of course) not as elegant as the match version, and deteriorates rapidly as complexity increases.
There should be one--and preferably only one--obvious way to do it,
But that is not a hard and fast rule or we wouldn't have decorators (zero lines saved), the "with" statement (two lines saved), and four different ways spanning three different protocols to write an iterator [1].
-- ~Ethan~
[1] https://stackoverflow.com/questions/19151/build-a-basic-python-iterator/7542...

On Mon, Nov 23, 2020, 2:58 PM Ethan Furman
if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in
node.children]
elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node
in node.children}
else: raise Exception('Unexpected node')
You don't need the "else".
Without the "else" errors will pass silently.
Not in the particular example. It can just be a bunch of ifs, each of which returns something. The last line of the function can be raise.
Obviously, epic and else are very important in general. I think I wouldn't use any for that particular code though.

On Tue, Nov 24, 2020 at 7:00 AM Ethan Furman ethan@stoneleaf.us wrote:
On 11/23/20 11:06 AM, Larry Hastings wrote:
On 11/23/20 8:15 AM, Brian Coleman wrote:
def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node')
You don't need the "else".
Without the "else" errors will pass silently.
And you can change all your "elif"s into "if"s too.
On average, doubling your comparisons and therefore becoming slower.
I think the implication is that, since all the successful if statements go into returns, you can leave off the "else" and just put the "raise" immediately. It's a special case that applies ONLY to situations where the pattern matching is the entire purpose of the function.
ChrisA

On 11/23/20 12:05 PM, Chris Angelico wrote:
On Tue, Nov 24, 2020 at 7:00 AM Ethan Furman ethan@stoneleaf.us wrote:
On 11/23/20 11:06 AM, Larry Hastings wrote:
On 11/23/20 8:15 AM, Brian Coleman wrote:
def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node')
You don't need the "else".
Without the "else" errors will pass silently.
And you can change all your "elif"s into "if"s too.
On average, doubling your comparisons and therefore becoming slower.
I think the implication is that, since all the successful if statements go into returns, you can leave off the "else" and just put the "raise" immediately. It's a special case that applies ONLY to situations where the pattern matching is the entire purpose of the function.
Ah, of course. Thank you for correction.
-- ~Ethan~

Larry Hastings wrote:
On 11/23/20 8:15 AM, Brian Coleman wrote:
def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node') You don't need the "else". And you can change all your "elif"s into
"if"s too. Now your "isinstance" version is 35 lines. Shorter than the pattern matching version, roughly the same speed, works in current Python, eminently readable to anybody conversant in current Python. A very reasonable solution to the problem. There should be one--and preferably only one--obvious way to do it, //arry/
It was not my intention to suggest that the solution implemented in the shortest number of lines was superior.
I think that one of the advantages of the pattern matching implementation over the sequence of `isinstance` checks is that there is no need to worry about whether using `if`, `elif`or `else` is the best approach. There is a single elegant and natural way to express the solution with pattern matching.

I have a little bit of skepticism about the pattern matching syntax, for similar reasons to those Larry expresses, and that Steve Dower mentioned on Discourse.
Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this:
m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()}
I don't disagree that the pattern mini-language looks nice as syntax. But there's nothing about that mini-language that couldn't be put in a library (with the caveat that patterns would need to be quoted in some way).

David Mertz wrote:
I have a little bit of skepticism about the pattern matching syntax, for similar reasons to those Larry expresses, and that Steve Dower mentioned on Discourse. Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this: m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()} I don't disagree that the pattern mini-language looks nice as syntax. But there's nothing about that mini-language that couldn't be put in a library (with the caveat that patterns would need to be quoted in some way).
What you are proposing here is very similar in effect to executing pattern matching statements using `eval`. What is the advantage of implementing the pattern matching functionality in a library if the user interface for that functionality has the same pitfalls as `eval`?

On Mon, Nov 23, 2020 at 9:02 PM Brian Coleman brianfcoleman@gmail.com wrote:
Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not
require
a language change. For example, I could create a library to allow this:
m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val.s) elif m.case("[a, 5, 6, b]"): process_two_free_vars(m.val.a, m.val.b)
What you are proposing here is very similar in effect to executing pattern matching statements using `eval`. What is the advantage of implementing the pattern matching functionality in a library if the user interface for that functionality has the same pitfalls as `eval`?
I don't understand the similarity with `eval` that you are suggesting.
My hypothetical library would store the value passed as initializer to `Matcher`, and provide a method `.case()`. That method would need to do some moderately complicated parsing of the pattern passed into it, but using parsing techniques, not any eval() call. Btw. I modified my above strawman just slightly to what might be a bit better API.
If there are any free variables in the pattern, they would be provided by the Matcher object. For example, they could be attributes of the property `m.val`. Or I suppose we could make them attributes of the Matcher object itself, e.g. `m.a` and `m.b`, but I think name conflicts with e.g. `m.case`. Anyway, it's a strawman not an API design.
If the pattern looked kinda like a class constructor (i.e. exactly the same as PEP 634 rules), the `.case()` method would do an `isinstance()` check before doing its other stuff. If the pattern looks like a list, it would scan the freevars inside it and match the constant values. And so on.
The actual return value from `.m.case(...)` would be True or False (or at least something truthy or falsy). However, it MIGHT create some new attributes (or keys, or something else) on the Matcher object if the pattern actually matched.
I agree the above is slightly less readable than PEP 635 syntax, but it seems to have the entire power of the proposal without changing Python syntax.

David Mertz wrote:
On Mon, Nov 23, 2020 at 9:02 PM Brian Coleman brianfcoleman@gmail.com wrote:
Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this: m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val.s) elif m.case("[a, 5, 6, b]"): process_two_free_vars(m.val.a, m.val.b) What you are proposing here is very similar in effect to executing pattern matching statements using eval. What is the advantage of implementing the pattern matching functionality in a library if the user interface for that functionality has the same pitfalls as eval? I don't understand the similarity with eval that you are
suggesting. My hypothetical library would store the value passed as initializer to Matcher, and provide a method .case(). That method would need to do some moderately complicated parsing of the pattern passed into it, but using parsing techniques, not any eval() call. Btw. I modified my above strawman just slightly to what might be a bit better API. If there are any free variables in the pattern, they would be provided by the Matcher object. For example, they could be attributes of the property m.val. Or I suppose we could make them attributes of the Matcher object itself, e.g. m.a and m.b, but I think name conflicts with e.g. m.case. Anyway, it's a strawman not an API design. If the pattern looked kinda like a class constructor (i.e. exactly the same as PEP 634 rules), the .case() method would do an isinstance() check before doing its other stuff. If the pattern looks like a list, it would scan the freevars inside it and match the constant values. And so on. The actual return value from .m.case(...) would be True or False (or at least something truthy or falsy). However, it MIGHT create some new attributes (or keys, or something else) on the Matcher object if the pattern actually matched. I agree the above is slightly less readable than PEP 635 syntax, but it seems to have the entire power of the proposal without changing Python syntax.
To be more precise, the similarity that I see to `eval` is that syntax errors in the patterns that are passed to the `case` method cannot be detected at compile time and instead will be detected at runtime. Building the syntax into the language gives the advantage of producing a syntax error at compile time. It also makes it easier for linters and type checkers to validate the pattern matching clauses if the syntax is built into the language.

David Mertz wrote:
On Mon, Nov 23, 2020 at 9:02 PM Brian Coleman brianfcoleman@gmail.com wrote:
Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this: m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val.s) elif m.case("[a, 5, 6, b]"): process_two_free_vars(m.val.a, m.val.b) What you are proposing here is very similar in effect to executing pattern matching statements using eval. What is the advantage of implementing the pattern matching functionality in a library if the user interface for that functionality has the same pitfalls as eval? I don't understand the similarity with eval that you are
suggesting. My hypothetical library would store the value passed as initializer to Matcher, and provide a method .case(). That method would need to do some moderately complicated parsing of the pattern passed into it, but using parsing techniques, not any eval() call. Btw. I modified my above strawman just slightly to what might be a bit better API. If there are any free variables in the pattern, they would be provided by the Matcher object. For example, they could be attributes of the property m.val. Or I suppose we could make them attributes of the Matcher object itself, e.g. m.a and m.b, but I think name conflicts with e.g. m.case. Anyway, it's a strawman not an API design. If the pattern looked kinda like a class constructor (i.e. exactly the same as PEP 634 rules), the .case() method would do an isinstance() check before doing its other stuff. If the pattern looks like a list, it would scan the freevars inside it and match the constant values. And so on. The actual return value from .m.case(...) would be True or False (or at least something truthy or falsy). However, it MIGHT create some new attributes (or keys, or something else) on the Matcher object if the pattern actually matched. I agree the above is slightly less readable than PEP 635 syntax, but it seems to have the entire power of the proposal without changing Python syntax.
Because syntax errors in the patterns passed to the `case` method are detected at runtime, rather than at compile time, it is necessary to ensure that you have code coverage of every call to a `case` method to be confident that there are no syntax errors in the patterns.
By comparison, if the pattern matching syntax is built into the language, the compiler will detect syntax errors in any and all patterns in `case` clauses. I think that this is a major advantage as compared to implementing pattern matching via a library.

On 11/23/2020 3:44 PM, David Mertz wrote:
I have a little bit of skepticism about the pattern matching syntax, for similar reasons to those Larry expresses, and that Steve Dower mentioned on Discourse.
Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this:
m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()}
I don't disagree that the pattern mini-language looks nice as syntax. But there's nothing about that mini-language that couldn't be put in a library (with the caveat that patterns would need to be quoted in some way).
I just commented on Steve's post over on Discourse. The problem with this is that the called function (m.case, here) needs to have access to the caller's namespace in order to resolve the expressions, such as StringNode and PairNone. This is one of the reasons f-strings weren't implemented as a function, and is also the source of many headaches with string type annotations.
My conclusion is that if you want something that operates on DSLs (especially ones that can't be evaluated as expressions), the compiler is going to need to know about it somehow so it can help you with it. I wish there were a general-purpose mechanism for this. Maybe it's PEP 638, although I haven't really investigated it much, and pattern matching might be a bad fit for it.
Eric

On Mon, Nov 23, 2020, 4:32 PM Eric V. Smith
I just commented on Steve's post over on Discourse. The problem with this is that the called function (m.case, here) needs to have access to the caller's namespace in order to resolve the expressions, such as StringNode and PairNone. This is one of the reasons f-strings weren't implemented as a function, and is also the source of many headaches with string type annotations.
Is this really true though? Yes, it would require magic of the sort zero-argument `super()` does. But can't the Matcher constructor capture the locals one step up the stack on instance creation? That's largely why my strawman version is slightly different from Steve's strawman.
I'd put the question this way: assuming Matcher can be written (with a bit of stack magic), and assuming that the strings inside m.case() calls are exactly the same mini-languague PEP 634 proposes, would you want a syntax change?
That's not rhetorical, I'm of mixed feeling myself.

On 11/23/20 1:49 PM, David Mertz wrote:
On Mon, Nov 23, 2020, 4:32 PM Eric V. Smith
I just commented on Steve's post over on Discourse. The problem with this is that the called function (m.case, here) needs to have access to the caller's namespace in order to resolve the expressions, such as StringNode and PairNone. This is one of the reasons f-strings weren't implemented as a function, and is also the source of many headaches with string type annotations.
Is this really true though? Yes, it would require magic of the sort zero-argument `super()` does. But can't the Matcher constructor capture the locals one step up the stack on instance creation? That's largely why my strawman version is slightly different from Steve's strawman.
I'd put the question this way: assuming Matcher can be written (with a bit of stack magic), and assuming that the strings inside m.case() calls are exactly the same mini-languague PEP 634 proposes, would you want a syntax change?
That's not rhetorical, I'm of mixed feeling myself.
I would, yes. Writing Python code in strings to be processed by another function/library is a pain:
- no syntax highlighting because the code is in a string - no syntax checking because the code is in a string - lots of quotes because the code is in a string
All in all, it feels very inelegant to me.
-- ~Ethan~

On 11/23/2020 4:49 PM, David Mertz wrote:
On Mon, Nov 23, 2020, 4:32 PM Eric V. Smith
I just commented on Steve's post over on Discourse. The problem with this is that the called function (m.case, here) needs to have access to the caller's namespace in order to resolve the expressions, such as StringNode and PairNone. This is one of the reasons f-strings weren't implemented as a function, and is also the source of many headaches with string type annotations.
Is this really true though? Yes, it would require magic of the sort zero-argument `super()` does. But can't the Matcher constructor capture the locals one step up the stack on instance creation? That's largely why my strawman version is slightly different from Steve's strawman.
Beyond the issue of stack access being frowned on, there are some practical problems. One that's given in PEP 498 is closures accessing variables that aren't otherwise referenced:
def outer(x):
... def inner(): ... return 'x={x}'.format_map(locals()) ... return inner ...
outer(42)()
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in inner KeyError: 'x'
Whereas an f-string in that scenario does work:
def outer(x):
... def inner(): ... return f'x={x}' ... return inner ...
outer(42)()
'x=42'
I'd put the question this way: assuming Matcher can be written (with a bit of stack magic), and assuming that the strings inside m.case() calls are exactly the same mini-languague PEP 634 proposes, would you want a syntax change?
No, I wouldn't!
Something that gets brought up every now and again is: what if there were a way to pass an AST to a function, such that it received the AST instead of the evaluated value for a parameter. Let's say that backticks (`) meant "compute the AST of the enclosed expression", and that was passed it to the function. (I always choose backticks for this example because we all know it isn't going to happen.)
Then you write your original example using backticks instead of quotes, and m.case would get an AST it could inspect. It would probably still need some help in evaluating the AST nodes in the caller's context, but at least it would be theoretically possible to do so with the compiler's assistance.
Another option would be the function itself saying "give me an AST instead of evaluating a particular parameter". But that's all but impossible, since the compiler couldn't look at the called function to know it wants to do that.
I think we're in python-ideas land now.
Eric
That's not rhetorical, I'm of mixed feeling myself.
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZQBAYYR5... Code of Conduct: http://python.org/psf/codeofconduct/

I'd put the question this way: assuming Matcher can be written (with a bit of stack magic), and assuming that the strings inside m.case() calls are exactly the same mini-languague PEP 634 proposes, would you want a syntax change?
No, I wouldn't!
Is one of us mixing up negations? Are you saying you do NOT support PEP 634 (i.e. syntax change)?
A reasonable enough opinion that several core developers hold. Just trying to understand.

On 11/23/2020 5:44 PM, David Mertz wrote:
I'd put the question this way: assuming Matcher can be written (with a bit of stack magic), and assuming that the strings inside m.case() calls are exactly the same mini-languague PEP 634 proposes, would you want a syntax change?
No, I wouldn't!
Is one of us mixing up negations? Are you saying you do NOT support PEP 634 (i.e. syntax change)?
A reasonable enough opinion that several core developers hold. Just trying to understand.
Sorry I wasn't clear. I wouldn't want a syntax change specific to matching if it could be done with a library. I just don't think it can be done with a library without other language changes. But I think those other language changes could be used in ways outside of just matching.
Eric

Hi Eric,
On 23/11/2020 9:32 pm, Eric V. Smith wrote:
On 11/23/2020 3:44 PM, David Mertz wrote:
I have a little bit of skepticism about the pattern matching syntax, for similar reasons to those Larry expresses, and that Steve Dower mentioned on Discourse.
Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this:
m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()}
I don't disagree that the pattern mini-language looks nice as syntax. But there's nothing about that mini-language that couldn't be put in a library (with the caveat that patterns would need to be quoted in some way).
I just commented on Steve's post over on Discourse. The problem with this is that the called function (m.case, here) needs to have access to the caller's namespace in order to resolve the expressions, such as StringNode and PairNone. This is one of the reasons f-strings weren't implemented as a function, and is also the source of many headaches with string type annotations.
My conclusion is that if you want something that operates on DSLs (especially ones that can't be evaluated as expressions), the compiler is going to need to know about it somehow so it can help you with it. I wish there were a general-purpose mechanism for this. Maybe it's PEP 638, although I haven't really investigated it much, and pattern matching might be a bad fit for it.
Hygienic macros (PEP 638) solve two problems with a string based library (in my obviously biased opinion).
1. The pattern is parsed by the normal parser, so must have correct syntax, and the contents are visible to IDEs and editors.
if m.case("StringNode(s)"): the pattern is just a string.
case!(StringNode(s)): the pattern is validated Python syntax.
2. The transformation is done at compile time, so the generated code will execute in the correct context. Basically, the macro generates the correct series of if/elifs for you.
Cheers, Mark.
Eric
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CA6FJTEC... Code of Conduct: http://python.org/psf/codeofconduct/

On Mon, Nov 23, 2020 at 08:44:21PM +0000, David Mertz wrote:
Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this:
m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()}
Look at all those strings. So much for syntax highlighting and compile-time syntax checking. Who needs them anyway?
# Perfectly legal syntax. if m.case("StringNode[s)"): ... elif m.case("[a, 5 6, b"): ...
It's a good thing we already have comprehensions, because if we didn't have them, people would argue that they aren't necessary, we can just write a `comprehension` function that takes the loop expression as a string:
# [(a + int(''.join(b)))*c) for a, b, c in items] comprehension("(a + int(''.join(b)))*c)", items]
Imagine how silly we would be to dedicate actual syntax to comprehensions, when we can write a library to do it. It's all this syntactic sugar (comprehensions, decorators, f-strings, async, with statements, etc) that is killing Python.
*wink*
I don't disagree that the pattern mini-language looks nice as syntax. But there's nothing about that mini-language that couldn't be put in a library (with the caveat that patterns would need to be quoted in some way).
One thing that we get with pattern matching syntax is the absense of certain *misfeatures*.
# Oops, I forgot to use `elif`, now I have fall-through semantics # which is widely agreed to be a bad idea. m = Matcher([4, 5, 6, 7]) if m.case("[a, 5, 6, b]"): print("first case") if m.case("[4, a, b, 7]"): print("second case")
As the library author, I hope you are prepared for many, many bug reports about why people's code behaves differently if they use `if` compared to `elif`.
Another misfeature: the ability to scatter your pattern matching cases all over the place.
m = Matcher(expression) do_this() do_that() if m.case(something): process("this case") do_another_thing()
class Spam: # snip five pages of code
if m.case(something_else): print("Did you forget we were inside a match pseudo-block?")
Objection: "But coders won't do that!"
No, *sensible* coders won't do that. With pattern matching syntax, even the other sort of coder can't do that.
Objection: "But they could put five pages of code inside a case block too."
True, but only if it is *conditional* to that specific case. You can't mix unconditional code and cases. And the extra indentation will hint that something odd is going on.
Another misfeature: the ability to modify the value being matched in the middle of the pattern matching pseudo-block.
m = Matcher(something) if m.case(spam): process("case 1") m = Matcher(another_thing) if m.case(eggs): process("case 2")
People can write obfuscated, confusing, poor-quality code with anything, but syntax can limit their opportunities to do so.

On 24/11/20 9:44 am, David Mertz wrote:
m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()}
How does the Matcher object know the meanings of "StringNode", "PairNode", etc.? Presumably they're bound to classes in the namespace of the caller, but the Matcher doesn't know anything about that namespace.

On Mon, 23 Nov 2020 16:15:12 -0000 "Brian Coleman" brianfcoleman@gmail.com wrote:
Furthermore, Python has a regular expression module which implements it's own DSL for the purpose of matching string patterns. Regular expressions, in a similar way to pattern matching, allow string patterns to be expressed in a concise and declarative manner.
Uh, without regular expressions, a lot of functions would be massively more complicated and annoying to write.
However, your example shows that pattern matching barely makes common code shorter (admittedly, on this _one_ example, but that's the one you chose ;-)).
While I agree that regular expressions are far less Pythonic than the proposed variant of pattern matching, they also have a tremendously better cost/benefit ratio, IMHO.
Regards
Antoine.

Antoine Pitrou wrote:
On Mon, 23 Nov 2020 16:15:12 -0000 "Brian Coleman" brianfcoleman@gmail.com wrote:
Furthermore, Python has a regular expression module which implements it's own DSL for the purpose of matching string patterns. Regular expressions, in a similar way to pattern matching, allow string patterns to be expressed in a concise and declarative manner. Uh, without regular expressions, a lot of functions would be massively
more complicated and annoying to write. However, your example shows that pattern matching barely makes common code shorter (admittedly, on this _one_ example, but that's the one you chose ;-)). While I agree that regular expressions are far less Pythonic than the proposed variant of pattern matching, they also have a tremendously better cost/benefit ratio, IMHO. Regards Antoine.
In my opinion, the object oriented solution to this problem is to use the visitor pattern. The solution using the visitor pattern is almost twice the length of the other solutions. Pattern matching is certainly no worse than the solution using a chain of `isinstance` checks in this case.
However as the patterns to be checked against a candidate object become more complex, I believe that the pattern matching solution will retain the same level of elegance and obviousness that it possesses in this example, whereas the solution involving a chain of comparisons will quickly degrade in terms of obviousness.

On 24/11/20 9:31 am, Brian Coleman wrote:
In my opinion, the object oriented solution to this problem is to use the visitor pattern.
Which is a good thing only if you believe that OO solutions are always better than non-OO ones.
IMO, the visitor pattern is a workaround for when your language doesn't permit introspection on types. It's unnecessary in Python and better avoided.

On Mon, Nov 23, 2020 at 8:20 AM Brian Coleman brianfcoleman@gmail.com wrote:
Take as an example a function designed to process a tree of nodes similar to that which might be output by a JSON parser. There are 4 types of node:
- A node representing JSON strings
- A node representing JSON numbers
- A node representing JSON arrays
- A node representing JSON dictionaries
The function transforms a tree of nodes, beginning at the root node, and proceeding recursively through each child node in turn. The result is a Python object, with the following transformation applied to each node type:
- A JSON string `->` Python `str`
- A JSON number `->` Python `float`
- A JSON array `->` Python `list`
- A JSON dictionary `->` Python `dict`
I have implemented this function using 3 different approaches:
- The visitor pattern
- `isinstance` checks against the node type
- Pattern matching
I've always thought that the alternative to a "switch case" construct in Python (and I suppose most OO languages) is subclassing and method overriding. I guess that's what the "visitor pattern" is here, but it seems to be adding a bunch of unnecessary bolierplate.
So: given that you have a special "Node" object anyway, the thing to do is to have those Node object know how to unpack themselves. Then the top "traverse the tree" function becomes a single method or attribute access:
tree = node_tree.value
here's what the Nodes look like in this example:
class Node: def __init__(self, val): self._value = val
@property def value(self): return self._value
class StringNode(Node): pass
class NumberNode(Node): pass
class ListNode(Node): @property def value(self): return [item.value for item in self._value]
class DictNode(Node): @property def value(self): return {k: item.value for k, item in self._value.items()}
Of course, this requires that you have control of the Node objects, rather than getting them from some other library -- but that seems to be what all the examples here are anyway.
If you do need to parse out a tree of object that are not "special" already, then you need to do some type of pattern matching / isinstance checking. In this case, I wrote a function that builds up a tree of Nodes from arbitrary Python objects:
def make_nodes_from_obj(obj): if isinstance(obj, str): return StringNode(obj) if isinstance(obj, Real): return NumberNode(obj) if isinstance(obj, Sequence): return ListNode([make_nodes_from_obj(item) for item in obj]) if isinstance(obj, Mapping): return DictNode({k: make_nodes_from_obj(item) for k, item in obj.items()})
And that could benefit from pattern matching, I suppose, though it's not very compelling to me.
And in "real world" code, I've done just this -- building a system for saving / restoring dataclasses to/from JSON. In that case, each of the dataclasses knows how to save itself and build itself from JSON-compatible python objects (numbers, dicts, strings, lists) -- so again, no need for pattern matching there either. And what I really like about the approach of putting all the logic in the "nodes" is that I can make new types of nodes without having to touch the code at the "top" that visits those nodes.
In short -- I'm still looking for a more compelling example :-)
-CHB

On Mon, 18 Jan 2021 12:41:45 -0800 Chris Barker via Python-Dev python-dev@python.org wrote:
And in "real world" code, I've done just this -- building a system for saving / restoring dataclasses to/from JSON. In that case, each of the dataclasses knows how to save itself and build itself from JSON-compatible python objects (numbers, dicts, strings, lists) -- so again, no need for pattern matching there either. And what I really like about the approach of putting all the logic in the "nodes" is that I can make new types of nodes without having to touch the code at the "top" that visits those nodes.
Note that another approach is the little-used `functools.singledispatch`.
Regards
Antoine.
participants (12)
-
Antoine Pitrou
-
Antoine Pitrou
-
Brian Coleman
-
Chris Angelico
-
Chris Barker
-
David Mertz
-
Eric V. Smith
-
Ethan Furman
-
Greg Ewing
-
Larry Hastings
-
Mark Shannon
-
Steven D'Aprano