PEP 622 idea: "match case object" to represent match patterns

Recent discussion on the store vs. load syntax issues of PEP 622 prompted a (yet unripe) idea that could hopefully spur some progress. What if cases required some sort of MatchCase (name pending) object which could be preconstructed by the user if needed, but also inferred on the fly if absent? I think an example would clarify what I mean. Consider the following block (with & to mark "load" just for demonstration purposes): match some_point: case Point(x, y, _) if x > y: print(f"Tridimensional point with x > y") case Point(_, _, _): print("Some other 3D point.") case Point(_, &my_y) as pt: print(f"{pt} has y=={my_y}.") case Point(10, _): print("Some bidimensional point with x==10.") case Point(_, y): print(f"Bidimensional point with {y=}.") case _: print("Something else.") This is admittedly simple enough that it wouldn't require what I'm about to show, but it does require marking my_y in some way to prevent it from being interpreted as a "store" identifier, or to namespace it for the same reason. Now consider the following refactor: 3d_point = MatchCase("Point(x, y, z)") other_3d_point = MatchCase("Point(_, _, _)") point_with_my_y = MatchCase("Point(_, {})", my_y) # syntax with {} is debatable. other_2d_point = MatchCase("Point(_, y)") match some_point: case 3d_point if 3d_point[0] > 3d_point[1]: # 3d_point exposes x, y, and zwith the names given # in the declaration, as well as numerically, # through __getitem__ or perhaps also through attributes # like .args and .kwargs. ... case other_3d_point: # other_3d_point[0] through [2] are available, while # other_3d_point["_"] == other_3d_point[2] because it's the last # one that was given. Presumably the programmer wouldn't care about it. ... case point_with_my_y as pt: # pt still exposes positional arguments like pt[0], pt[1] or pt.args # as well as pt["_"] (== pt[0] in this case), but pt[1] has no # equivalent in pt.kwargs because no name was given. ... case Point(10, _): # Python would construct its own MatchCase here and behave as expected. ... case other_2d_point: print(f"Bidimensional point with y={other_2d_point['y']}.") case _: ... Every case above could be constructed in the same way as in the example without MatchCase, except for point_with_my_y, which contains a lookup: every identifier inside the match is interpreted as "store" unless it's namespaced; the others are included for demonstration purposes. As seen in case Point(10, _), patterns that contain no "load" identifiers can be written without using MatchCase instances. In effect, its usage would be a sort of "plug-in" for when complex or verbose functionality is needed, allowing to load identifiers or encode patterns for readability and/or reuse through the match block. The value of the identifiers to load would be read at MatchCase instantiation and internally converted to a constant value. Guard clauses could optionally be included in the MatchCase instantiation. If both it and the case: provide guards, they are anded together. If deemed appropriate, a special meaning could be given to MatchCase() (no arguments, or with "" or even any falsy value as argument) to represent the default case to appease those who don't like _. Note that the above examples assume at least _ (and possibly any identifier) can be repeated. My proposed MatchCase idea is compatible with the possibility of disallowing repeated identifiers, while still allowing them in MatchCase()'s first argument with the behaviour seen in the other_3d_point case. The constructor MatchCase could be either provided as a builtin or in a stdlib module that must be imported. I think the latter would be more reasonable. Any specific syntax or name presented above (MatchCase, the {}s in its first argument, the names of the MatchCase.args and MatchCase.kwargs attributes) is debatable. I hope I haven't missed any important point that would make the above demonstration invalid. Any thoughts?

On Thu, Jul 16, 2020 at 9:21 AM Federico Salerno <salernof11@gmail.com> wrote:
[...]
Now consider the following refactor:
3d_point = MatchCase("Point(x, y, z)") other_3d_point = MatchCase("Point(_, _, _)") point_with_my_y = MatchCase("Point(_, {})", my_y) # syntax with {} is debatable. other_2d_point = MatchCase("Point(_, y)")
match some_point: case 3d_point if 3d_point[0] > 3d_point[1]: # 3d_point exposes x, y, and zwith the names given # in the declaration, as well as numerically, # through __getitem__ or perhaps also through attributes # like .args and .kwargs. ... case other_3d_point: # other_3d_point[0] through [2] are available, while # other_3d_point["_"] == other_3d_point[2] because it's the last # one that was given. Presumably the programmer wouldn't care about it. ... case point_with_my_y as pt: # pt still exposes positional arguments like pt[0], pt[1] or pt.args # as well as pt["_"] (== pt[0] in this case), but pt[1] has no # equivalent in pt.kwargs because no name was given. ... case Point(10, _): # Python would construct its own MatchCase here and behave as expected. ... case other_2d_point: print(f"Bidimensional point with y={other_2d_point['y']}.") case _: ...
During our internal discussions (before we published the first draft) I briefly proposed a similar idea. But it was quickly pointed out by my co-authors that this doesn't fly, because when the parser sees `case other_3d_point:` it doesn't know whether you meant this as a capture pattern (binding the variable `other_3d_point`) or as a pattern object. Also with your proposal the compiler would not be able to tell that x, y and z are local variables, because they are only mentioned inside string literals. Another way to look at it: if you have some common assignment target, say: ``` x, y, z = point1 x, y, z = point2 x, y, z = point3 ``` you can't define an object that serves as a shortcut for the `x, y, z` assignment target -- this (obviously!) does not work: ``` xyz = AssignmentTarget("x, y, z") xyz = point1 xyz = point2 xyz = point3 ``` (You could construct similarly absurd examples for function signatures.) Patterns are more like assignment targets or function signatures than like expressions, so we shouldn't be surprised that we can't define objects to serve as shortcuts here. It could probably be done with a suitable macro preprocessing feature, but that's a much larger discussion (that I'm definitely not going to tackle). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 18/07/2020 02:10, Guido van Rossum wrote:
[...] it was quickly pointed out by my co-authors that this doesn't fly, because when the parser sees `case other_3d_point:` it doesn't know whether you meant this as a capture pattern (binding the variable `other_3d_point`) or as a pattern object.
This could be solved. One option would be to make pattern objects callable, such that when you write: my_pattern = MatchCase("something") match stuff: case my_pattern(): ... my_pattern() is used as a pattern object, and a single identifier like my_pattern is interpreted, as expected by the current version of the PEP, as a capture. I believe this is not subtle enough to be confusing to anyone who learns the feature as a whole. We don't, after all, confuse function calls for references to a function, or vice-versa.
Also with your proposal the compiler would not be able to tell that x, y and z are local variables, because they are only mentioned inside string literals. I was hoping the implementation of MatchCase could take care of this. I suppose it would require MatchCase to have access to locals() everywhere it's called; would this be a problem?

On Sat, Jul 18, 2020 at 10:58:17AM +0200, Federico Salerno wrote:
We don't, after all, confuse function calls for references to a function, or vice-versa.
Beginners do. Frequently. Sometimes it is quite a hurdle for them to learn to write `function()` instead of `function`. And even experienced developers sometimes forget to put parentheses after file.close, or at least we used to before context managers. (I know I did. I don't think I'm alone.) -- Steven

On 18/07/2020 11:09, Steven D'Aprano wrote:
On Sat, Jul 18, 2020 at 10:58:17AM +0200, Federico Salerno wrote:
We don't, after all, confuse function calls for references to a function, or vice-versa. Beginners do. Frequently. Sometimes it is quite a hurdle for them to learn to write `function()` instead of `function`.
And even experienced developers sometimes forget to put parentheses after file.close, or at least we used to before context managers.
(I know I did. I don't think I'm alone.)
I consider myself far from being an experienced developer, but I cannot in all honesty say I'm likely to forget parens after file.close any more than after any function call. Perhaps it comes down to being used to some other language I have never worked with. I can see why beginners would find the distinction between func and func() to be challenging at first, but that is something that subsides relatively often. Most likely way before one needs to be aware of the intricacies of a complex feature such as pattern matching, considering Python, contrary to e.g. Rust or functional languages, is not built around that concept and has survived without it thus far. Pattern matching would be useful and nice, but not essential to the point that a beginner would have to learn it before getting a solid grasp on how functions are called or referenced.
participants (3)
-
Federico Salerno
-
Guido van Rossum
-
Steven D'Aprano