Advantages of pattern matching - a simple comparative analysis
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
Take as an example a function designed to process a tree of nodes similar to that which might be output by a JSON parser. There are 4 types of node: - A node representing JSON strings - A node representing JSON numbers - A node representing JSON arrays - A node representing JSON dictionaries The function transforms a tree of nodes, beginning at the root node, and proceeding recursively through each child node in turn. The result is a Python object, with the following transformation applied to each node type: - A JSON string `->` Python `str` - A JSON number `->` Python `float` - A JSON array `->` Python `list` - A JSON dictionary `->` Python `dict` I have implemented this function using 3 different approaches: - The visitor pattern - `isinstance` checks against the node type - Pattern matching Here is the implementation using the visitor pattern: ``` from typing import List, Tuple class NodeVisitor: def visit_string_node(self, node: StringNode): pass def visit_number_node(self, node: NumberNode): pass def visit_list_node(self, node: ListNode): pass def visit_dict_node(self, node: DictNode): pass class Node: def visit(visitor: NodeVisitor): raise NotImplementedError() class StringNode(Node): value: str def visit(self, visitor: NodeVisitor): visitor.visit_string_node(self) class NumberNode(Node): value: str def visit(self, visitor: NodeVisitor): visitor.visit_number_node(self) class ListNode(Node): children: List[Node] def visit(self, visitor: NodeVisitor): visitor.visit_list_node(self) class DictNode(Node): children: List[Tuple[str, Node]] def visit(self, visitor: NodeVisitor): visitor.visit_dict_node(self) class Processor(NodeVisitor): def process(root_node: Node): return root_node.visit(self) def visit_string_node(self, node: StringNode): return node.value def visit_number_node(self, node: NumberNode): return float(node.value) def visit_list_node(self, node: ListNode): return [child_node.visit(self) for child_node in node.children] def visit_dict_node(self, node: DictNode): return {key: child_node.visit(self) for key, child_node in node.children} def process(root_node: Node): processor = Processor() return processor.process(root_node) ``` Here is the implementation using `isinstance` checks against the node type: ``` from typing import List, Tuple class Node: pass class StringNode(Node): value: str class NumberNode(Node): value: str class ListNode(Node): children: List[Node] class DictNode(Node): children: List[Tuple[str, Node]] def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node') return process_node(root_node) ``` Finally here is the implementation using pattern matching: ``` from typing import List, Tuple class Node: pass class StringNode(Node): value: str class NumberNode(Node): value: str class ListNode(Node): children: List[Node] class DictNode(Node): children: List[Tuple[str, Node]] def process(root_node: Node): def process_node(node: Node): match node: case StringNode(value=str_value): return str_value case NumberNode(value=number_value): return float(number_value) case ListNode(children=child_nodes): return [process_node(child_node) for child_node in child_nodes] case DictNode(children=child_nodes): return {key: process_node(child_node) for key, child_node in child_nodes} case _: raise Exception('Unexpected node') return process_node(root_node) ``` Here are the lengths of the different implementations: - Pattern matching `->` 37 lines - `isinstance` checks `->` 36 lines - The visitor pattern `->` 69 lines The visitor pattern implementation is by far the most verbose solution, weighing in at almost twice the length of the alternative implementations due to the large amount of boilerplate that is necessary to achieve double dispatch. The pattern matching and `isinstance` check implementations are very similar in length for this trivial example. In each implementation, there are 2 operations performed on each node. - Determine the type of the node - Destructure the node to extract the desired data The visitor pattern and `isinstance` check implementations separate these 2 operations, whereas the pattern matching approach combines the operations together. I believe that it is the declarative nature of pattern matching, where the operations of determining the type of the node and destructuring the node are combined into a single clause, which allows pattern matching to express a concise solution to the problem. In this trivial example, the advantage of pattern matching over the alternative of using a sequence of `if`-`elif`-`else` statements is not as obvious as it would be when compared to a more complex example, where a sub-tree of nodes might be matched based on their type and be destructured in a single clause. I have seen elsewhere an argument that pattern matching should not be accepted into Python as it introduces a pseudo-DSL that is separate from the rest of the language. I agree that pattern matching might be viewed as a pseudo-DSL, but I believe that it is a good thing, if it allows the solution to certain classes of problems to be expressed in a concise manner. People often raise similar objections to operator overloading in other languages, whereas the presence of operator overloading in Python allows mathematical expressions involving custom numeric types such as vectors to be expressed in a natural way. Furthermore, Python has a regular expression module which implements it's own DSL for the purpose of matching string patterns. Regular expressions, in a similar way to pattern matching, allow string patterns to be expressed in a concise and declarative manner. I really hope that the Steering Council accepts pattern matching into Python. I think that it allows for processing of heterogeneous graphs of objects using recursion in a concise, declarative manner. I would like to thank the authors of the Structural Pattern Matching PEP for their hard work in designing this feature and developing an implementation of it. I believe that it will be a wonderful addition to the language that I am very much looking forward to using.
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 11/23/20 8:15 AM, Brian Coleman wrote:
You don't need the "else". And you can change all your "elif"s into "if"s too. Now your "isinstance" version is 35 lines. Shorter than the pattern matching version, roughly the same speed, works in current Python, eminently readable to anybody conversant in current Python. A very reasonable solution to the problem. There should be one--and preferably only one--obvious way to do it, //arry/
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 11/23/20 11:06 AM, Larry Hastings wrote:
Without the "else" errors will pass silently.
And you can change all your "elif"s into "if"s too.
On average, doubling your comparisons and therefore becoming slower.
Now your "isinstance" version is 35 lines. Shorter than the pattern matching version
But without error checking.
roughly the same speed,
works in current Python, eminently readable to anybody conversant in current Python. A very reasonable solution to
See above comment about becoming slower. the problem. Agreed. But (subjectively, of course) not as elegant as the match version, and deteriorates rapidly as complexity increases.
There should be one--and preferably only one--obvious way to do it,
But that is not a hard and fast rule or we wouldn't have decorators (zero lines saved), the "with" statement (two lines saved), and four different ways spanning three different protocols to write an iterator [1]. -- ~Ethan~ [1] https://stackoverflow.com/questions/19151/build-a-basic-python-iterator/7542...
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Mon, Nov 23, 2020, 2:58 PM Ethan Furman
Not in the particular example. It can just be a bunch of ifs, each of which returns something. The last line of the function can be raise. Obviously, epic and else are very important in general. I think I wouldn't use any for that particular code though.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Tue, Nov 24, 2020 at 7:00 AM Ethan Furman <ethan@stoneleaf.us> wrote:
I think the implication is that, since all the successful if statements go into returns, you can leave off the "else" and just put the "raise" immediately. It's a special case that applies ONLY to situations where the pattern matching is the entire purpose of the function. ChrisA
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
Larry Hastings wrote:
It was not my intention to suggest that the solution implemented in the shortest number of lines was superior. I think that one of the advantages of the pattern matching implementation over the sequence of `isinstance` checks is that there is no need to worry about whether using `if`, `elif`or `else` is the best approach. There is a single elegant and natural way to express the solution with pattern matching.
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I have a little bit of skepticism about the pattern matching syntax, for similar reasons to those Larry expresses, and that Steve Dower mentioned on Discourse. Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this: m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()} I don't disagree that the pattern mini-language looks nice as syntax. But there's nothing about that mini-language that couldn't be put in a library (with the caveat that patterns would need to be quoted in some way). -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
David Mertz wrote:
What you are proposing here is very similar in effect to executing pattern matching statements using `eval`. What is the advantage of implementing the pattern matching functionality in a library if the user interface for that functionality has the same pitfalls as `eval`?
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Mon, Nov 23, 2020 at 9:02 PM Brian Coleman <brianfcoleman@gmail.com> wrote:
I don't understand the similarity with `eval` that you are suggesting. My hypothetical library would store the value passed as initializer to `Matcher`, and provide a method `.case()`. That method would need to do some moderately complicated parsing of the pattern passed into it, but using parsing techniques, not any eval() call. Btw. I modified my above strawman just slightly to what might be a bit better API. If there are any free variables in the pattern, they would be provided by the Matcher object. For example, they could be attributes of the property `m.val`. Or I suppose we could make them attributes of the Matcher object itself, e.g. `m.a` and `m.b`, but I think name conflicts with e.g. `m.case`. Anyway, it's a strawman not an API design. If the pattern looked kinda like a class constructor (i.e. exactly the same as PEP 634 rules), the `.case()` method would do an `isinstance()` check before doing its other stuff. If the pattern looks like a list, it would scan the freevars inside it and match the constant values. And so on. The actual return value from `.m.case(...)` would be True or False (or at least something truthy or falsy). However, it MIGHT create some new attributes (or keys, or something else) on the Matcher object if the pattern actually matched. I agree the above is slightly less readable than PEP 635 syntax, but it seems to have the entire power of the proposal without changing Python syntax. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
David Mertz wrote:
To be more precise, the similarity that I see to `eval` is that syntax errors in the patterns that are passed to the `case` method cannot be detected at compile time and instead will be detected at runtime. Building the syntax into the language gives the advantage of producing a syntax error at compile time. It also makes it easier for linters and type checkers to validate the pattern matching clauses if the syntax is built into the language.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
David Mertz wrote:
Because syntax errors in the patterns passed to the `case` method are detected at runtime, rather than at compile time, it is necessary to ensure that you have code coverage of every call to a `case` method to be confident that there are no syntax errors in the patterns. By comparison, if the pattern matching syntax is built into the language, the compiler will detect syntax errors in any and all patterns in `case` clauses. I think that this is a major advantage as compared to implementing pattern matching via a library.
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 11/23/2020 3:44 PM, David Mertz wrote:
I just commented on Steve's post over on Discourse. The problem with this is that the called function (m.case, here) needs to have access to the caller's namespace in order to resolve the expressions, such as StringNode and PairNone. This is one of the reasons f-strings weren't implemented as a function, and is also the source of many headaches with string type annotations. My conclusion is that if you want something that operates on DSLs (especially ones that can't be evaluated as expressions), the compiler is going to need to know about it somehow so it can help you with it. I wish there were a general-purpose mechanism for this. Maybe it's PEP 638, although I haven't really investigated it much, and pattern matching might be a bad fit for it. Eric
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Mon, Nov 23, 2020, 4:32 PM Eric V. Smith
Is this really true though? Yes, it would require magic of the sort zero-argument `super()` does. But can't the Matcher constructor capture the locals one step up the stack on instance creation? That's largely why my strawman version is slightly different from Steve's strawman. I'd put the question this way: assuming Matcher can be written (with a bit of stack magic), and assuming that the strings inside m.case() calls are exactly the same mini-languague PEP 634 proposes, would you want a syntax change? That's not rhetorical, I'm of mixed feeling myself.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 11/23/20 1:49 PM, David Mertz wrote:
On Mon, Nov 23, 2020, 4:32 PM Eric V. Smith
I would, yes. Writing Python code in strings to be processed by another function/library is a pain: - no syntax highlighting because the code is in a string - no syntax checking because the code is in a string - lots of quotes because the code is in a string All in all, it feels very inelegant to me. -- ~Ethan~
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 11/23/2020 4:49 PM, David Mertz wrote:
Beyond the issue of stack access being frowned on, there are some practical problems. One that's given in PEP 498 is closures accessing variables that aren't otherwise referenced:
Whereas an f-string in that scenario does work:
No, I wouldn't! Something that gets brought up every now and again is: what if there were a way to pass an AST to a function, such that it received the AST instead of the evaluated value for a parameter. Let's say that backticks (`) meant "compute the AST of the enclosed expression", and that was passed it to the function. (I always choose backticks for this example because we all know it isn't going to happen.) Then you write your original example using backticks instead of quotes, and m.case would get an AST it could inspect. It would probably still need some help in evaluating the AST nodes in the caller's context, but at least it would be theoretically possible to do so with the compiler's assistance. Another option would be the function itself saying "give me an AST instead of evaluating a particular parameter". But that's all but impossible, since the compiler couldn't look at the called function to know it wants to do that. I think we're in python-ideas land now. Eric
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 11/23/2020 5:44 PM, David Mertz wrote:
Sorry I wasn't clear. I wouldn't want a syntax change specific to matching if it could be done with a library. I just don't think it can be done with a library without other language changes. But I think those other language changes could be used in ways outside of just matching. Eric
data:image/s3,"s3://crabby-images/90304/903046437766f403ab1dffb9b01cc036820247a4" alt=""
Hi Eric, On 23/11/2020 9:32 pm, Eric V. Smith wrote:
Hygienic macros (PEP 638) solve two problems with a string based library (in my obviously biased opinion). 1. The pattern is parsed by the normal parser, so must have correct syntax, and the contents are visible to IDEs and editors. if m.case("StringNode(s)"): the pattern is just a string. case!(StringNode(s)): the pattern is validated Python syntax. 2. The transformation is done at compile time, so the generated code will execute in the correct context. Basically, the macro generates the correct series of if/elifs for you. Cheers, Mark.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 23, 2020 at 08:44:21PM +0000, David Mertz wrote:
Look at all those strings. So much for syntax highlighting and compile-time syntax checking. Who needs them anyway? # Perfectly legal syntax. if m.case("StringNode[s)"): ... elif m.case("[a, 5 6, b"): ... It's a good thing we already have comprehensions, because if we didn't have them, people would argue that they aren't necessary, we can just write a `comprehension` function that takes the loop expression as a string: # [(a + int(''.join(b)))*c) for a, b, c in items] comprehension("(a + int(''.join(b)))*c)", items] Imagine how silly we would be to dedicate actual syntax to comprehensions, when we can write a library to do it. It's all this syntactic sugar (comprehensions, decorators, f-strings, async, with statements, etc) that is killing Python. *wink*
One thing that we get with pattern matching syntax is the absense of certain *misfeatures*. # Oops, I forgot to use `elif`, now I have fall-through semantics # which is widely agreed to be a bad idea. m = Matcher([4, 5, 6, 7]) if m.case("[a, 5, 6, b]"): print("first case") if m.case("[4, a, b, 7]"): print("second case") As the library author, I hope you are prepared for many, many bug reports about why people's code behaves differently if they use `if` compared to `elif`. Another misfeature: the ability to scatter your pattern matching cases all over the place. m = Matcher(expression) do_this() do_that() if m.case(something): process("this case") do_another_thing() class Spam: # snip five pages of code if m.case(something_else): print("Did you forget we were inside a match pseudo-block?") Objection: "But coders won't do that!" No, *sensible* coders won't do that. With pattern matching syntax, even the other sort of coder can't do that. Objection: "But they could put five pages of code inside a case block too." True, but only if it is *conditional* to that specific case. You can't mix unconditional code and cases. And the extra indentation will hint that something odd is going on. Another misfeature: the ability to modify the value being matched in the middle of the pattern matching pseudo-block. m = Matcher(something) if m.case(spam): process("case 1") m = Matcher(another_thing) if m.case(eggs): process("case 2") People can write obfuscated, confusing, poor-quality code with anything, but syntax can limit their opportunities to do so. -- Steve
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Mon, 23 Nov 2020 16:15:12 -0000 "Brian Coleman" <brianfcoleman@gmail.com> wrote:
Furthermore, Python has a regular expression module which implements it's own DSL for the purpose of matching string patterns. Regular expressions, in a similar way to pattern matching, allow string patterns to be expressed in a concise and declarative manner.
Uh, without regular expressions, a lot of functions would be massively more complicated and annoying to write. However, your example shows that pattern matching barely makes common code shorter (admittedly, on this _one_ example, but that's the one you chose ;-)). While I agree that regular expressions are far less Pythonic than the proposed variant of pattern matching, they also have a tremendously better cost/benefit ratio, IMHO. Regards Antoine.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
Antoine Pitrou wrote:
In my opinion, the object oriented solution to this problem is to use the visitor pattern. The solution using the visitor pattern is almost twice the length of the other solutions. Pattern matching is certainly no worse than the solution using a chain of `isinstance` checks in this case. However as the patterns to be checked against a candidate object become more complex, I believe that the pattern matching solution will retain the same level of elegance and obviousness that it possesses in this example, whereas the solution involving a chain of comparisons will quickly degrade in terms of obviousness.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 24/11/20 9:31 am, Brian Coleman wrote:
In my opinion, the object oriented solution to this problem is to use the visitor pattern.
Which is a good thing only if you believe that OO solutions are always better than non-OO ones. IMO, the visitor pattern is a workaround for when your language doesn't permit introspection on types. It's unnecessary in Python and better avoided. -- Greg
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Nov 23, 2020 at 8:20 AM Brian Coleman <brianfcoleman@gmail.com> wrote:
I've always thought that the alternative to a "switch case" construct in Python (and I suppose most OO languages) is subclassing and method overriding. I guess that's what the "visitor pattern" is here, but it seems to be adding a bunch of unnecessary bolierplate. So: given that you have a special "Node" object anyway, the thing to do is to have those Node object know how to unpack themselves. Then the top "traverse the tree" function becomes a single method or attribute access: tree = node_tree.value here's what the Nodes look like in this example: class Node: def __init__(self, val): self._value = val @property def value(self): return self._value class StringNode(Node): pass class NumberNode(Node): pass class ListNode(Node): @property def value(self): return [item.value for item in self._value] class DictNode(Node): @property def value(self): return {k: item.value for k, item in self._value.items()} Of course, this requires that you have control of the Node objects, rather than getting them from some other library -- but that seems to be what all the examples here are anyway. If you do need to parse out a tree of object that are not "special" already, then you need to do some type of pattern matching / isinstance checking. In this case, I wrote a function that builds up a tree of Nodes from arbitrary Python objects: def make_nodes_from_obj(obj): if isinstance(obj, str): return StringNode(obj) if isinstance(obj, Real): return NumberNode(obj) if isinstance(obj, Sequence): return ListNode([make_nodes_from_obj(item) for item in obj]) if isinstance(obj, Mapping): return DictNode({k: make_nodes_from_obj(item) for k, item in obj.items()}) And that could benefit from pattern matching, I suppose, though it's not very compelling to me. And in "real world" code, I've done just this -- building a system for saving / restoring dataclasses to/from JSON. In that case, each of the dataclasses knows how to save itself and build itself from JSON-compatible python objects (numbers, dicts, strings, lists) -- so again, no need for pattern matching there either. And what I really like about the approach of putting all the logic in the "nodes" is that I can make new types of nodes without having to touch the code at the "top" that visits those nodes. In short -- I'm still looking for a more compelling example :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 11/23/20 8:15 AM, Brian Coleman wrote:
You don't need the "else". And you can change all your "elif"s into "if"s too. Now your "isinstance" version is 35 lines. Shorter than the pattern matching version, roughly the same speed, works in current Python, eminently readable to anybody conversant in current Python. A very reasonable solution to the problem. There should be one--and preferably only one--obvious way to do it, //arry/
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 11/23/20 11:06 AM, Larry Hastings wrote:
Without the "else" errors will pass silently.
And you can change all your "elif"s into "if"s too.
On average, doubling your comparisons and therefore becoming slower.
Now your "isinstance" version is 35 lines. Shorter than the pattern matching version
But without error checking.
roughly the same speed,
works in current Python, eminently readable to anybody conversant in current Python. A very reasonable solution to
See above comment about becoming slower. the problem. Agreed. But (subjectively, of course) not as elegant as the match version, and deteriorates rapidly as complexity increases.
There should be one--and preferably only one--obvious way to do it,
But that is not a hard and fast rule or we wouldn't have decorators (zero lines saved), the "with" statement (two lines saved), and four different ways spanning three different protocols to write an iterator [1]. -- ~Ethan~ [1] https://stackoverflow.com/questions/19151/build-a-basic-python-iterator/7542...
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Mon, Nov 23, 2020, 2:58 PM Ethan Furman
Not in the particular example. It can just be a bunch of ifs, each of which returns something. The last line of the function can be raise. Obviously, epic and else are very important in general. I think I wouldn't use any for that particular code though.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Tue, Nov 24, 2020 at 7:00 AM Ethan Furman <ethan@stoneleaf.us> wrote:
I think the implication is that, since all the successful if statements go into returns, you can leave off the "else" and just put the "raise" immediately. It's a special case that applies ONLY to situations where the pattern matching is the entire purpose of the function. ChrisA
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
Larry Hastings wrote:
It was not my intention to suggest that the solution implemented in the shortest number of lines was superior. I think that one of the advantages of the pattern matching implementation over the sequence of `isinstance` checks is that there is no need to worry about whether using `if`, `elif`or `else` is the best approach. There is a single elegant and natural way to express the solution with pattern matching.
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I have a little bit of skepticism about the pattern matching syntax, for similar reasons to those Larry expresses, and that Steve Dower mentioned on Discourse. Basically, I agree matching/destructuring is a powerful idea. But I also wonder how much genuinely better it is than a library that does not require a language change. For example, I could create a library to allow this: m = Matcher(arbitrary_expression) if m.case("StringNode(s)"): process_string(m.val) elif m.case("[a, 5, 6, b]"): process_two_free_vars(*m.values) elif m.case("PairNone(a, b)"): a, b = m.values process_pair(a, b) elif m.case("DictNode"): foo = {key, process_node(child_node) for key, child_node in m.values.items()} I don't disagree that the pattern mini-language looks nice as syntax. But there's nothing about that mini-language that couldn't be put in a library (with the caveat that patterns would need to be quoted in some way). -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
David Mertz wrote:
What you are proposing here is very similar in effect to executing pattern matching statements using `eval`. What is the advantage of implementing the pattern matching functionality in a library if the user interface for that functionality has the same pitfalls as `eval`?
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Mon, Nov 23, 2020 at 9:02 PM Brian Coleman <brianfcoleman@gmail.com> wrote:
I don't understand the similarity with `eval` that you are suggesting. My hypothetical library would store the value passed as initializer to `Matcher`, and provide a method `.case()`. That method would need to do some moderately complicated parsing of the pattern passed into it, but using parsing techniques, not any eval() call. Btw. I modified my above strawman just slightly to what might be a bit better API. If there are any free variables in the pattern, they would be provided by the Matcher object. For example, they could be attributes of the property `m.val`. Or I suppose we could make them attributes of the Matcher object itself, e.g. `m.a` and `m.b`, but I think name conflicts with e.g. `m.case`. Anyway, it's a strawman not an API design. If the pattern looked kinda like a class constructor (i.e. exactly the same as PEP 634 rules), the `.case()` method would do an `isinstance()` check before doing its other stuff. If the pattern looks like a list, it would scan the freevars inside it and match the constant values. And so on. The actual return value from `.m.case(...)` would be True or False (or at least something truthy or falsy). However, it MIGHT create some new attributes (or keys, or something else) on the Matcher object if the pattern actually matched. I agree the above is slightly less readable than PEP 635 syntax, but it seems to have the entire power of the proposal without changing Python syntax. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
David Mertz wrote:
To be more precise, the similarity that I see to `eval` is that syntax errors in the patterns that are passed to the `case` method cannot be detected at compile time and instead will be detected at runtime. Building the syntax into the language gives the advantage of producing a syntax error at compile time. It also makes it easier for linters and type checkers to validate the pattern matching clauses if the syntax is built into the language.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
David Mertz wrote:
Because syntax errors in the patterns passed to the `case` method are detected at runtime, rather than at compile time, it is necessary to ensure that you have code coverage of every call to a `case` method to be confident that there are no syntax errors in the patterns. By comparison, if the pattern matching syntax is built into the language, the compiler will detect syntax errors in any and all patterns in `case` clauses. I think that this is a major advantage as compared to implementing pattern matching via a library.
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 11/23/2020 3:44 PM, David Mertz wrote:
I just commented on Steve's post over on Discourse. The problem with this is that the called function (m.case, here) needs to have access to the caller's namespace in order to resolve the expressions, such as StringNode and PairNone. This is one of the reasons f-strings weren't implemented as a function, and is also the source of many headaches with string type annotations. My conclusion is that if you want something that operates on DSLs (especially ones that can't be evaluated as expressions), the compiler is going to need to know about it somehow so it can help you with it. I wish there were a general-purpose mechanism for this. Maybe it's PEP 638, although I haven't really investigated it much, and pattern matching might be a bad fit for it. Eric
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On Mon, Nov 23, 2020, 4:32 PM Eric V. Smith
Is this really true though? Yes, it would require magic of the sort zero-argument `super()` does. But can't the Matcher constructor capture the locals one step up the stack on instance creation? That's largely why my strawman version is slightly different from Steve's strawman. I'd put the question this way: assuming Matcher can be written (with a bit of stack magic), and assuming that the strings inside m.case() calls are exactly the same mini-languague PEP 634 proposes, would you want a syntax change? That's not rhetorical, I'm of mixed feeling myself.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 11/23/20 1:49 PM, David Mertz wrote:
On Mon, Nov 23, 2020, 4:32 PM Eric V. Smith
I would, yes. Writing Python code in strings to be processed by another function/library is a pain: - no syntax highlighting because the code is in a string - no syntax checking because the code is in a string - lots of quotes because the code is in a string All in all, it feels very inelegant to me. -- ~Ethan~
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 11/23/2020 4:49 PM, David Mertz wrote:
Beyond the issue of stack access being frowned on, there are some practical problems. One that's given in PEP 498 is closures accessing variables that aren't otherwise referenced:
Whereas an f-string in that scenario does work:
No, I wouldn't! Something that gets brought up every now and again is: what if there were a way to pass an AST to a function, such that it received the AST instead of the evaluated value for a parameter. Let's say that backticks (`) meant "compute the AST of the enclosed expression", and that was passed it to the function. (I always choose backticks for this example because we all know it isn't going to happen.) Then you write your original example using backticks instead of quotes, and m.case would get an AST it could inspect. It would probably still need some help in evaluating the AST nodes in the caller's context, but at least it would be theoretically possible to do so with the compiler's assistance. Another option would be the function itself saying "give me an AST instead of evaluating a particular parameter". But that's all but impossible, since the compiler couldn't look at the called function to know it wants to do that. I think we're in python-ideas land now. Eric
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 11/23/2020 5:44 PM, David Mertz wrote:
Sorry I wasn't clear. I wouldn't want a syntax change specific to matching if it could be done with a library. I just don't think it can be done with a library without other language changes. But I think those other language changes could be used in ways outside of just matching. Eric
data:image/s3,"s3://crabby-images/90304/903046437766f403ab1dffb9b01cc036820247a4" alt=""
Hi Eric, On 23/11/2020 9:32 pm, Eric V. Smith wrote:
Hygienic macros (PEP 638) solve two problems with a string based library (in my obviously biased opinion). 1. The pattern is parsed by the normal parser, so must have correct syntax, and the contents are visible to IDEs and editors. if m.case("StringNode(s)"): the pattern is just a string. case!(StringNode(s)): the pattern is validated Python syntax. 2. The transformation is done at compile time, so the generated code will execute in the correct context. Basically, the macro generates the correct series of if/elifs for you. Cheers, Mark.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Nov 23, 2020 at 08:44:21PM +0000, David Mertz wrote:
Look at all those strings. So much for syntax highlighting and compile-time syntax checking. Who needs them anyway? # Perfectly legal syntax. if m.case("StringNode[s)"): ... elif m.case("[a, 5 6, b"): ... It's a good thing we already have comprehensions, because if we didn't have them, people would argue that they aren't necessary, we can just write a `comprehension` function that takes the loop expression as a string: # [(a + int(''.join(b)))*c) for a, b, c in items] comprehension("(a + int(''.join(b)))*c)", items] Imagine how silly we would be to dedicate actual syntax to comprehensions, when we can write a library to do it. It's all this syntactic sugar (comprehensions, decorators, f-strings, async, with statements, etc) that is killing Python. *wink*
One thing that we get with pattern matching syntax is the absense of certain *misfeatures*. # Oops, I forgot to use `elif`, now I have fall-through semantics # which is widely agreed to be a bad idea. m = Matcher([4, 5, 6, 7]) if m.case("[a, 5, 6, b]"): print("first case") if m.case("[4, a, b, 7]"): print("second case") As the library author, I hope you are prepared for many, many bug reports about why people's code behaves differently if they use `if` compared to `elif`. Another misfeature: the ability to scatter your pattern matching cases all over the place. m = Matcher(expression) do_this() do_that() if m.case(something): process("this case") do_another_thing() class Spam: # snip five pages of code if m.case(something_else): print("Did you forget we were inside a match pseudo-block?") Objection: "But coders won't do that!" No, *sensible* coders won't do that. With pattern matching syntax, even the other sort of coder can't do that. Objection: "But they could put five pages of code inside a case block too." True, but only if it is *conditional* to that specific case. You can't mix unconditional code and cases. And the extra indentation will hint that something odd is going on. Another misfeature: the ability to modify the value being matched in the middle of the pattern matching pseudo-block. m = Matcher(something) if m.case(spam): process("case 1") m = Matcher(another_thing) if m.case(eggs): process("case 2") People can write obfuscated, confusing, poor-quality code with anything, but syntax can limit their opportunities to do so. -- Steve
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Mon, 23 Nov 2020 16:15:12 -0000 "Brian Coleman" <brianfcoleman@gmail.com> wrote:
Furthermore, Python has a regular expression module which implements it's own DSL for the purpose of matching string patterns. Regular expressions, in a similar way to pattern matching, allow string patterns to be expressed in a concise and declarative manner.
Uh, without regular expressions, a lot of functions would be massively more complicated and annoying to write. However, your example shows that pattern matching barely makes common code shorter (admittedly, on this _one_ example, but that's the one you chose ;-)). While I agree that regular expressions are far less Pythonic than the proposed variant of pattern matching, they also have a tremendously better cost/benefit ratio, IMHO. Regards Antoine.
data:image/s3,"s3://crabby-images/db3ee/db3eedc36fda93906eff1dd5f408d31cf0228f1c" alt=""
Antoine Pitrou wrote:
In my opinion, the object oriented solution to this problem is to use the visitor pattern. The solution using the visitor pattern is almost twice the length of the other solutions. Pattern matching is certainly no worse than the solution using a chain of `isinstance` checks in this case. However as the patterns to be checked against a candidate object become more complex, I believe that the pattern matching solution will retain the same level of elegance and obviousness that it possesses in this example, whereas the solution involving a chain of comparisons will quickly degrade in terms of obviousness.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 24/11/20 9:31 am, Brian Coleman wrote:
In my opinion, the object oriented solution to this problem is to use the visitor pattern.
Which is a good thing only if you believe that OO solutions are always better than non-OO ones. IMO, the visitor pattern is a workaround for when your language doesn't permit introspection on types. It's unnecessary in Python and better avoided. -- Greg
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Nov 23, 2020 at 8:20 AM Brian Coleman <brianfcoleman@gmail.com> wrote:
I've always thought that the alternative to a "switch case" construct in Python (and I suppose most OO languages) is subclassing and method overriding. I guess that's what the "visitor pattern" is here, but it seems to be adding a bunch of unnecessary bolierplate. So: given that you have a special "Node" object anyway, the thing to do is to have those Node object know how to unpack themselves. Then the top "traverse the tree" function becomes a single method or attribute access: tree = node_tree.value here's what the Nodes look like in this example: class Node: def __init__(self, val): self._value = val @property def value(self): return self._value class StringNode(Node): pass class NumberNode(Node): pass class ListNode(Node): @property def value(self): return [item.value for item in self._value] class DictNode(Node): @property def value(self): return {k: item.value for k, item in self._value.items()} Of course, this requires that you have control of the Node objects, rather than getting them from some other library -- but that seems to be what all the examples here are anyway. If you do need to parse out a tree of object that are not "special" already, then you need to do some type of pattern matching / isinstance checking. In this case, I wrote a function that builds up a tree of Nodes from arbitrary Python objects: def make_nodes_from_obj(obj): if isinstance(obj, str): return StringNode(obj) if isinstance(obj, Real): return NumberNode(obj) if isinstance(obj, Sequence): return ListNode([make_nodes_from_obj(item) for item in obj]) if isinstance(obj, Mapping): return DictNode({k: make_nodes_from_obj(item) for k, item in obj.items()}) And that could benefit from pattern matching, I suppose, though it's not very compelling to me. And in "real world" code, I've done just this -- building a system for saving / restoring dataclasses to/from JSON. In that case, each of the dataclasses knows how to save itself and build itself from JSON-compatible python objects (numbers, dicts, strings, lists) -- so again, no need for pattern matching there either. And what I really like about the approach of putting all the logic in the "nodes" is that I can make new types of nodes without having to touch the code at the "top" that visits those nodes. In short -- I'm still looking for a more compelling example :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (12)
-
Antoine Pitrou
-
Antoine Pitrou
-
Brian Coleman
-
Chris Angelico
-
Chris Barker
-
David Mertz
-
Eric V. Smith
-
Ethan Furman
-
Greg Ewing
-
Larry Hastings
-
Mark Shannon
-
Steven D'Aprano