Proposal to introduce pattern matching syntax

Hello, I have been working on an idea that would introduce pattern matching syntax to python. I now have this syntax implemented in cpython, and feel this is the right time to gather further input. The repository and branch can be found at https://github.com/natelust/cpython/tree/match_syntax. The new syntax would clean up readability, ease things like visitor pattern style programming, localize matching behavior to a class, and support better signaling amongst other things. This is the tl;dr, I will get into a longer discussion below, but first I want to introduce the syntax and how it works with the following simple example. result = some_function_call() try match result: # try match some_function_call(): is also supported as Dog: print("is a dog") as Cat(lives): print(f"is a cat with {lives} lives") as tuple(result1, result2): print(f"got two results {result1} and {result2}") else: print("unknown result") The statement begins with a new compound keyword "try match" . This is treated as one logical block, the word match is not being turned into a keyword. There are no backwards compatibility issues as previously no symbols were allowed between try and :. The try match compound keyword was chosen to make it clearer to users that this is distinct from a try block, provide a hint on what the block is doing, and follow the Python tradition of being sensible when spoken out loud to an English speaker. This keyword is followed by an expression that is to be matched, called the match target. What follows is one or more match blocks. A match block is started with the keyword ‘as’ followed by a type, and optionally parameters. Matching begins by calling a __match__ (class)method on the type, with the match target as a parameter. The match method must return an object that can be evaluated as a bool. If the return value is True, the code block in this match branch is executed, and execution is passed to whatever comes after the match syntax. If __match__ returns False, execution is passed to the next match branch for testing. If a match branch contains a group of parameters, they are used in the matching process as well. If __match__ returns True, then the match target will be tested for the presence of a __unpack__ method. If there is no such method, the match target is tried as a sequence. If both of these fail, execution moves on. If there is a __unpack__ method, it is called and is expected to return a sequence. The length of the sequence (either the result of __unpack__, or the match target itself) is compared to the number of supplied arguments. If they match the sequence is unpacked into variables defined by the arguments and the match is considered a success and the body is executed. If the length of the sequence does not match the number of arguments, the match branch fails and execution continues. This is useful for differentiating tuples of different lengths, or objects that unpack differently depending on state. If all the match blocks are tested and fail, the try match statement will check for the presence of an else clause. If this is present the body is executed. This serves as a default execution block. What is the __match__ method and how does it determine if a match target is a match? This change introduces __match__ as a new default method on ‘object’. The default implementation first checks the match target with the ‘is’ operator against the object containing the __match__ method. If that is false, then it checks the match target using isinstnace. Objects are free to implement whatever __match__ method they want provided it matches the interface. This proposal also introduces __unpack__ as a new interface, but does not define any default implementation. This method should return a sequence. There is no specific form outside this definition, a class author is free to implement whatever representation they would like to use in a match statement. One use case for this method could be a class that stores the parameters passed to __init__ (or some other parameters) so someone could construct a new object such as Animal(*animal_instance.__unpack__()) [possibly with a builtin for calling unpack]. Another use, in keeping with the match syntax, is something like Stateful Enums. The behavior covered by try match can be emulated with some combination of compound and or nested if statements alongside type checking and parameter unpacking, so why introduce the new syntax? The benefits I see are: * Much easier to read and follow compared complicated if branching logic * the matching logic is now defined alongside the class in the __match__ method. This is in contrast to if statements where the logic is duplicated in each place. If it factored out into a function, the function may be unknown or unused in a cross package setting. Refactoring distributed logic to live next to an object is similar to the introduction of the format method on strings. * It introduces a well defined interface for programmers to depend on in contrast to new interfaces being built on a package by package base. For instance __unpack__ and __match__ behavior may be implemented on objects today with any variety of names, making discoverability difficult, and making if blocks more difficult for a user to parse. * A pattern often found in python is to use Exceptions for signaling conditions inside of execution which muddles the distinction between handling exceptional behavior and normal code execution. The try match syntax, along with object unpacking, provides a standardized way to signal information back to callers and standardized syntax for handling those signals. The try match syntax fits in well with the ethos of including more ways to use typing information within python. For instance something like typing.Union could have a corresponding __match__ method such that all of those types would be covariant under a single match block. In the opposite sense the try match syntax is also useful for taking a parameter defined as a union and dispatching to individual functions that transform the variable into a standardized type. Points I am unsure of: I am not sure about using “(parameters)” as part of the patch branch. In some ways it is very familiar to look at, but it also makes it look like these must be parameters used to construct the object, not parameters returned from __unpack__ (though they may be the same, they may not be). I have toyed with the the idea of something like “-> (a,b)” or uisng braces “{a,b}” and the serve the purpose of being different, but then that also makes them look different, so I dont really have a strong opinion formed on this part of the syntax. The implementation on my branch does work, but I am by no means an expert on all of python, there may be much better ways to do what I did. In particular I implemented some of the logic in the compiler, but it may be better served as an op code. The exact boundary of where best to put some logic is unclear. This implementation currently stops at the first matching block it comes to, not the best match out of all blocks. This is meant to make it easier to understand the “flow” of the statement, but it might be preferable to execute the block associated with the best match, though this would complicate the implementation a good deal. I am sure there is more that I have not considered with this proposal and I appreciate any feedback you choose to provide. Thank you for your time in reading this. -- Nate Lust, PhD. Astrophysics Dept. Princeton University

Concerning parameters in the as section: I think using `with` would make it easier to understand. try match result: as Dog: print("Dog") as Cat with lives: print(f"Cat with {lives} lives") as tuple with (arg1, arg2): print(f"Tuple with args {arg1}, {arg2}") else: print("Nothing matched") Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: nate lust <natelust@linux.com> Sent: Monday, June 22, 2020 9:17:01 AM To: python-ideas <python-ideas@python.org> Subject: [Python-ideas] Proposal to introduce pattern matching syntax Hello, I have been working on an idea that would introduce pattern matching syntax to python. I now have this syntax implemented in cpython, and feel this is the right time to gather further input. The repository and branch can be found at https://github.com/natelust/cpython/tree/match_syntax. The new syntax would clean up readability, ease things like visitor pattern style programming, localize matching behavior to a class, and support better signaling amongst other things. This is the tl;dr, I will get into a longer discussion below, but first I want to introduce the syntax and how it works with the following simple example. result = some_function_call() try match result: # try match some_function_call(): is also supported as Dog: print("is a dog") as Cat(lives): print(f"is a cat with {lives} lives") as tuple(result1, result2): print(f"got two results {result1} and {result2}") else: print("unknown result") The statement begins with a new compound keyword "try match" . This is treated as one logical block, the word match is not being turned into a keyword. There are no backwards compatibility issues as previously no symbols were allowed between try and :. The try match compound keyword was chosen to make it clearer to users that this is distinct from a try block, provide a hint on what the block is doing, and follow the Python tradition of being sensible when spoken out loud to an English speaker. This keyword is followed by an expression that is to be matched, called the match target. What follows is one or more match blocks. A match block is started with the keyword ‘as’ followed by a type, and optionally parameters. Matching begins by calling a __match__ (class)method on the type, with the match target as a parameter. The match method must return an object that can be evaluated as a bool. If the return value is True, the code block in this match branch is executed, and execution is passed to whatever comes after the match syntax. If __match__ returns False, execution is passed to the next match branch for testing. If a match branch contains a group of parameters, they are used in the matching process as well. If __match__ returns True, then the match target will be tested for the presence of a __unpack__ method. If there is no such method, the match target is tried as a sequence. If both of these fail, execution moves on. If there is a __unpack__ method, it is called and is expected to return a sequence. The length of the sequence (either the result of __unpack__, or the match target itself) is compared to the number of supplied arguments. If they match the sequence is unpacked into variables defined by the arguments and the match is considered a success and the body is executed. If the length of the sequence does not match the number of arguments, the match branch fails and execution continues. This is useful for differentiating tuples of different lengths, or objects that unpack differently depending on state. If all the match blocks are tested and fail, the try match statement will check for the presence of an else clause. If this is present the body is executed. This serves as a default execution block. What is the __match__ method and how does it determine if a match target is a match? This change introduces __match__ as a new default method on ‘object’. The default implementation first checks the match target with the ‘is’ operator against the object containing the __match__ method. If that is false, then it checks the match target using isinstnace. Objects are free to implement whatever __match__ method they want provided it matches the interface. This proposal also introduces __unpack__ as a new interface, but does not define any default implementation. This method should return a sequence. There is no specific form outside this definition, a class author is free to implement whatever representation they would like to use in a match statement. One use case for this method could be a class that stores the parameters passed to __init__ (or some other parameters) so someone could construct a new object such as Animal(*animal_instance.__unpack__()) [possibly with a builtin for calling unpack]. Another use, in keeping with the match syntax, is something like Stateful Enums. The behavior covered by try match can be emulated with some combination of compound and or nested if statements alongside type checking and parameter unpacking, so why introduce the new syntax? The benefits I see are: * Much easier to read and follow compared complicated if branching logic * the matching logic is now defined alongside the class in the __match__ method. This is in contrast to if statements where the logic is duplicated in each place. If it factored out into a function, the function may be unknown or unused in a cross package setting. Refactoring distributed logic to live next to an object is similar to the introduction of the format method on strings. * It introduces a well defined interface for programmers to depend on in contrast to new interfaces being built on a package by package base. For instance __unpack__ and __match__ behavior may be implemented on objects today with any variety of names, making discoverability difficult, and making if blocks more difficult for a user to parse. * A pattern often found in python is to use Exceptions for signaling conditions inside of execution which muddles the distinction between handling exceptional behavior and normal code execution. The try match syntax, along with object unpacking, provides a standardized way to signal information back to callers and standardized syntax for handling those signals. The try match syntax fits in well with the ethos of including more ways to use typing information within python. For instance something like typing.Union could have a corresponding __match__ method such that all of those types would be covariant under a single match block. In the opposite sense the try match syntax is also useful for taking a parameter defined as a union and dispatching to individual functions that transform the variable into a standardized type. Points I am unsure of: I am not sure about using “(parameters)” as part of the patch branch. In some ways it is very familiar to look at, but it also makes it look like these must be parameters used to construct the object, not parameters returned from __unpack__ (though they may be the same, they may not be). I have toyed with the the idea of something like “-> (a,b)” or uisng braces “{a,b}” and the serve the purpose of being different, but then that also makes them look different, so I dont really have a strong opinion formed on this part of the syntax. The implementation on my branch does work, but I am by no means an expert on all of python, there may be much better ways to do what I did. In particular I implemented some of the logic in the compiler, but it may be better served as an op code. The exact boundary of where best to put some logic is unclear. This implementation currently stops at the first matching block it comes to, not the best match out of all blocks. This is meant to make it easier to understand the “flow” of the statement, but it might be preferable to execute the block associated with the best match, though this would complicate the implementation a good deal. I am sure there is more that I have not considered with this proposal and I appreciate any feedback you choose to provide. Thank you for your time in reading this. -- Nate Lust, PhD. Astrophysics Dept. Princeton University

On Mon, 22 Jun 2020 at 13:08, Michael Christensen <amc135@outlook.com> wrote:
I for one could not even parse the example in the first e-mail. This version using "with" makes sense for me. To start, it should be a starter that between `as` and `:` one should able to write an arbitrary Python expression, and not trying to invent yet another new language inside Python for this syntax (as what has taken place with annotations)
The statement begins with a new compound keyword "try match" . This is

On Mon, Jun 22, 2020 at 8:20 AM nate lust <natelust@linux.com> wrote:
Great! This is a time of synchronicity -- with a few core devs and some others I have been quietly working on a similar proposal. We've got an implementation and a draft PEP -- we were planning to publish the PEP last week but got distracted and there were a bunch of last-minute edits we wanted to apply (and we thought there was no hurry). We should join forces somehow.
Our syntax is similar but uses plain "match" instead of "try match". We considered "as", but in the end settled on "case". We ended up rejecting a separate "else" clause, using "case _" (where "_" matches everything) instead. There are other small syntactic differences but the general feel is very similar!
Note that the new PEG parser makes it possible to recognize plain "match <expr>:" without fear of backward incompatibility.
We have a __match__ class method too, though its semantics are somewhat different.
Our __match__ handles the __unpack__ functionality.
I agree with these advantages. I hadn't thought of matching as a replacement for exceptions but I think it's a good idea.
Thanks for taking the time to show and explain your proposal! I'm sorry I cannot yet show our full proposal. Hopefully it'll be ready later this week. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

22.06.20 18:17, nate lust пише:
I afraid this is very poor kind of pattern matching, and most users will need something more powerful. For tuples: * If its length is 2, and its first item is GET, and its second item's type is int, then do A. * If its first item is GET, and its second item's type is str, then save the second item and do B (the rest of items are ignored). * If its first item is PUT, then ignore other items and do C. For dicts: * If its "type" key is INT, and it has key "value", then save the value for "value", and save all other key-values, and do A. * If its "type" key is STRING, and it has keys "data" and "encoding", then save the value for "data" and "encoding" and do B. * If it is an empty dict, then do C. * If it does not have the "type" key, then do D. For arbitrary objects: * If it is an instance of Dog, and its "name" attribute has type str and starts with "F", and its "color" attribute is contained in required_colors, and its "legs" attribute > 3, and calling its "has_tail()" method returns false (not necessary literal False, but arbitrary false value), then do A. Oh, and it should work for nested patterns.

Concerning parameters in the as section: I think using `with` would make it easier to understand. try match result: as Dog: print("Dog") as Cat with lives: print(f"Cat with {lives} lives") as tuple with (arg1, arg2): print(f"Tuple with args {arg1}, {arg2}") else: print("Nothing matched") Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: nate lust <natelust@linux.com> Sent: Monday, June 22, 2020 9:17:01 AM To: python-ideas <python-ideas@python.org> Subject: [Python-ideas] Proposal to introduce pattern matching syntax Hello, I have been working on an idea that would introduce pattern matching syntax to python. I now have this syntax implemented in cpython, and feel this is the right time to gather further input. The repository and branch can be found at https://github.com/natelust/cpython/tree/match_syntax. The new syntax would clean up readability, ease things like visitor pattern style programming, localize matching behavior to a class, and support better signaling amongst other things. This is the tl;dr, I will get into a longer discussion below, but first I want to introduce the syntax and how it works with the following simple example. result = some_function_call() try match result: # try match some_function_call(): is also supported as Dog: print("is a dog") as Cat(lives): print(f"is a cat with {lives} lives") as tuple(result1, result2): print(f"got two results {result1} and {result2}") else: print("unknown result") The statement begins with a new compound keyword "try match" . This is treated as one logical block, the word match is not being turned into a keyword. There are no backwards compatibility issues as previously no symbols were allowed between try and :. The try match compound keyword was chosen to make it clearer to users that this is distinct from a try block, provide a hint on what the block is doing, and follow the Python tradition of being sensible when spoken out loud to an English speaker. This keyword is followed by an expression that is to be matched, called the match target. What follows is one or more match blocks. A match block is started with the keyword ‘as’ followed by a type, and optionally parameters. Matching begins by calling a __match__ (class)method on the type, with the match target as a parameter. The match method must return an object that can be evaluated as a bool. If the return value is True, the code block in this match branch is executed, and execution is passed to whatever comes after the match syntax. If __match__ returns False, execution is passed to the next match branch for testing. If a match branch contains a group of parameters, they are used in the matching process as well. If __match__ returns True, then the match target will be tested for the presence of a __unpack__ method. If there is no such method, the match target is tried as a sequence. If both of these fail, execution moves on. If there is a __unpack__ method, it is called and is expected to return a sequence. The length of the sequence (either the result of __unpack__, or the match target itself) is compared to the number of supplied arguments. If they match the sequence is unpacked into variables defined by the arguments and the match is considered a success and the body is executed. If the length of the sequence does not match the number of arguments, the match branch fails and execution continues. This is useful for differentiating tuples of different lengths, or objects that unpack differently depending on state. If all the match blocks are tested and fail, the try match statement will check for the presence of an else clause. If this is present the body is executed. This serves as a default execution block. What is the __match__ method and how does it determine if a match target is a match? This change introduces __match__ as a new default method on ‘object’. The default implementation first checks the match target with the ‘is’ operator against the object containing the __match__ method. If that is false, then it checks the match target using isinstnace. Objects are free to implement whatever __match__ method they want provided it matches the interface. This proposal also introduces __unpack__ as a new interface, but does not define any default implementation. This method should return a sequence. There is no specific form outside this definition, a class author is free to implement whatever representation they would like to use in a match statement. One use case for this method could be a class that stores the parameters passed to __init__ (or some other parameters) so someone could construct a new object such as Animal(*animal_instance.__unpack__()) [possibly with a builtin for calling unpack]. Another use, in keeping with the match syntax, is something like Stateful Enums. The behavior covered by try match can be emulated with some combination of compound and or nested if statements alongside type checking and parameter unpacking, so why introduce the new syntax? The benefits I see are: * Much easier to read and follow compared complicated if branching logic * the matching logic is now defined alongside the class in the __match__ method. This is in contrast to if statements where the logic is duplicated in each place. If it factored out into a function, the function may be unknown or unused in a cross package setting. Refactoring distributed logic to live next to an object is similar to the introduction of the format method on strings. * It introduces a well defined interface for programmers to depend on in contrast to new interfaces being built on a package by package base. For instance __unpack__ and __match__ behavior may be implemented on objects today with any variety of names, making discoverability difficult, and making if blocks more difficult for a user to parse. * A pattern often found in python is to use Exceptions for signaling conditions inside of execution which muddles the distinction between handling exceptional behavior and normal code execution. The try match syntax, along with object unpacking, provides a standardized way to signal information back to callers and standardized syntax for handling those signals. The try match syntax fits in well with the ethos of including more ways to use typing information within python. For instance something like typing.Union could have a corresponding __match__ method such that all of those types would be covariant under a single match block. In the opposite sense the try match syntax is also useful for taking a parameter defined as a union and dispatching to individual functions that transform the variable into a standardized type. Points I am unsure of: I am not sure about using “(parameters)” as part of the patch branch. In some ways it is very familiar to look at, but it also makes it look like these must be parameters used to construct the object, not parameters returned from __unpack__ (though they may be the same, they may not be). I have toyed with the the idea of something like “-> (a,b)” or uisng braces “{a,b}” and the serve the purpose of being different, but then that also makes them look different, so I dont really have a strong opinion formed on this part of the syntax. The implementation on my branch does work, but I am by no means an expert on all of python, there may be much better ways to do what I did. In particular I implemented some of the logic in the compiler, but it may be better served as an op code. The exact boundary of where best to put some logic is unclear. This implementation currently stops at the first matching block it comes to, not the best match out of all blocks. This is meant to make it easier to understand the “flow” of the statement, but it might be preferable to execute the block associated with the best match, though this would complicate the implementation a good deal. I am sure there is more that I have not considered with this proposal and I appreciate any feedback you choose to provide. Thank you for your time in reading this. -- Nate Lust, PhD. Astrophysics Dept. Princeton University

On Mon, 22 Jun 2020 at 13:08, Michael Christensen <amc135@outlook.com> wrote:
I for one could not even parse the example in the first e-mail. This version using "with" makes sense for me. To start, it should be a starter that between `as` and `:` one should able to write an arbitrary Python expression, and not trying to invent yet another new language inside Python for this syntax (as what has taken place with annotations)
The statement begins with a new compound keyword "try match" . This is

On Mon, Jun 22, 2020 at 8:20 AM nate lust <natelust@linux.com> wrote:
Great! This is a time of synchronicity -- with a few core devs and some others I have been quietly working on a similar proposal. We've got an implementation and a draft PEP -- we were planning to publish the PEP last week but got distracted and there were a bunch of last-minute edits we wanted to apply (and we thought there was no hurry). We should join forces somehow.
Our syntax is similar but uses plain "match" instead of "try match". We considered "as", but in the end settled on "case". We ended up rejecting a separate "else" clause, using "case _" (where "_" matches everything) instead. There are other small syntactic differences but the general feel is very similar!
Note that the new PEG parser makes it possible to recognize plain "match <expr>:" without fear of backward incompatibility.
We have a __match__ class method too, though its semantics are somewhat different.
Our __match__ handles the __unpack__ functionality.
I agree with these advantages. I hadn't thought of matching as a replacement for exceptions but I think it's a good idea.
Thanks for taking the time to show and explain your proposal! I'm sorry I cannot yet show our full proposal. Hopefully it'll be ready later this week. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

22.06.20 18:17, nate lust пише:
I afraid this is very poor kind of pattern matching, and most users will need something more powerful. For tuples: * If its length is 2, and its first item is GET, and its second item's type is int, then do A. * If its first item is GET, and its second item's type is str, then save the second item and do B (the rest of items are ignored). * If its first item is PUT, then ignore other items and do C. For dicts: * If its "type" key is INT, and it has key "value", then save the value for "value", and save all other key-values, and do A. * If its "type" key is STRING, and it has keys "data" and "encoding", then save the value for "data" and "encoding" and do B. * If it is an empty dict, then do C. * If it does not have the "type" key, then do D. For arbitrary objects: * If it is an instance of Dog, and its "name" attribute has type str and starts with "F", and its "color" attribute is contained in required_colors, and its "legs" attribute > 3, and calling its "has_tail()" method returns false (not necessary literal False, but arbitrary false value), then do A. Oh, and it should work for nested patterns.
participants (5)
-
Guido van Rossum
-
Joao S. O. Bueno
-
Michael Christensen
-
nate lust
-
Serhiy Storchaka