Mailman 3 Pattern matching - Python-ideas

Pattern matching

older
Re: [Python-ideas] Top 10 Python...

Szymon Pyżalski

7 Apr 2015 7 Apr '15

10:54 p.m.

Hello! I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this? Greetings zefciu

Show replies by date

Todd

7 Apr 7 Apr

11:44 p.m.

On Tue, Apr 7, 2015 at 4:54 PM, Szymon Pyżalski wrote:

...

Hello!

I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?

Not everyone is familiar with Haskell. Can you provide an example of how this would work (in pseudocode, perhaps)?

Paul Moore

11:46 p.m.

On 7 April 2015 at 15:54, Szymon Pyżalski wrote:

...

I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?

...

From a quick look:

https://pypi.python.org/pypi/patterns https://pypi.python.org/pypi/py-pattern-matching I'm not sure if either of these is the sort of thing you're looking for. Paul

Guido van Rossum

11:52 p.m.

Quick idea: this might integrate well with PEP 484, or with the (still speculative) multiple dispatch that Lukas Lang wants to write. On Apr 7, 2015 8:46 AM, "Paul Moore" wrote:

...

On 7 April 2015 at 15:54, Szymon Pyżalski wrote:

...
I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?

From a quick look:

https://pypi.python.org/pypi/patterns https://pypi.python.org/pypi/py-pattern-matching

I'm not sure if either of these is the sort of thing you're looking for.

Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Szymon Pyżalski

8 Apr 8 Apr

3:25 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07.04.2015 17:52, Guido van Rossum wrote:

...

On Apr 7, 2015 8:46 AM, "Paul Moore" mailto:p.f.moore@gmail.com> wrote:

On 7 April 2015 at 15:54, Szymon Pyżalski mailto:szymon@pythonista.net> wrote:

...
I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?

From a quick look:

https://pypi.python.org/pypi/patterns https://pypi.python.org/pypi/py-pattern-matching

Thanks these are nice but I think they try to mimic languages based on algebraic data types too much. I was thinking about something like this: Magic method for patterns - ----------------------------- Any object can be a pattern. If an object has a special method (called ``__pattern__`` or ``__match__``) defined then this method will govern how this object behaves as pattern. If not - pattern matching works the same as ``__eq__`` The ``__pattern__`` method can return ``False`` for no match. For a match it can return ``True`` (simplest patterns) a sequence or a mapping . Syntax for pattern matching - ------------------------------- The syntax could look something like this:: for object: pattern1: statement1 pattern2: statement2 statement3 pattern3 as var1, var2: statement4 statement5 For the simplest cases this works simply by calling appropriate statement block. If the ``__pattern__`` method returns a mapping, then the values from it will be merged into the local namespace. If it returns a sequence and there is the ``as`` clause, the values will be assigned to variables specified.

...

Quick idea: this might integrate well with PEP 484, or with the (still speculative) multiple dispatch that Lukas Lang wants to write.

Function dispatch - -----------------------[ Besides the above syntax the patterns could be also used to register functions for dispatching. I dont' know how precisely it would integrate with PEP 484, but I agree that any valid type hint should also be a valid pattern. Existing objects as patterns - ---------------------------------- * Types should match their instances * Tuples should match sequences that have the same length and whose elements match against the elements of a tuple (subpatterns) * Dictionaries should match mappings that contain all the keys in the pattern and the values match the subpatterns under these keys * Regular expressions should match strings. For named groups they should populate the local namespace. * The ``Ellipsis`` object can be used as a match-all pattern. Example - ----------- The code below would work if the ``point`` variable is specified as: * a tuple (x, y) * a mapping {'x': x, 'y': y} * a string "{x}-{y}".format(x, y) :: def print_point(x, y): print('x = {}'.format(x)) print('y = {}'.format(y)) for point: (Number, Number) as x, y: print_point(x, y) {'x': Number, 'y': Number}: print_point(x, y) re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'): # Or re.compile('([0-9]+)-([0-9])+') as x, y: print_point(int(x), int(y)) ...: raise TypeError('Expected something else') Greetings zefciu -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVJC8YAAoJEBoIz6vZoBBhm04H/i5dJAMTloRxqSXXyY7QU8kx 0gBPjCoCsw6XZSu1nlKP0/yY+tO5uXjQEK5s1vid8gjuEBnrZCtwjfCYiiqSs96P 7iT9kSnyx/0jr5FvAC1ZItb3KN7G+MBxMfCHENM8yMz7Yxdw2F5ziT7zFE5aOELW y3Mz37+GytmMUT/DXA4wIHBFJgALspWgoU2P/ilzg4oCtbQREmX5pcMnFIsMavPI oIu7KJ0ZW2hJ4K2a0h8HWCpZ9aYwcCTF51HMkTAXo6leLSm3XG36jKiCEFJiZpxe sLH8e/2rF93tBUVunJfyEQG/bOnlkoTX2rONavDZCqSqt8t3ehT170TtHJL10Os= =iO75 -----END PGP SIGNATURE-----

Chris Angelico

3:52 a.m.

On Wed, Apr 8, 2015 at 5:25 AM, Szymon Pyżalski wrote:

...

Syntax for pattern matching - -------------------------------

The syntax could look something like this::

for object: pattern1: statement1 pattern2: statement2 statement3 pattern3 as var1, var2: statement4 statement5

For the simplest cases this works simply by calling appropriate statement block. If the ``__pattern__`` method returns a mapping, then the values from it will be merged into the local namespace. If it returns a sequence and there is the ``as`` clause, the values will be assigned to variables specified.

So, tell me if I'm understanding you correctly: The advantage over a basic if/elif/else tree is that it can assign stuff as well as giving a true/false result? Because if all you want is the true/false, there's not a lot of advantage over what currently exists. The sequence and "as" clause syntax is reasonable, but I do not like the idea that a mapping's keys would quietly become local name bindings. It'd be like "from ... import *" inside a function - suddenly _any_ name could become local, without any way for the compiler to know. Also - although this is really just bikeshedding - not a fan of the use of 'for' here. I'm imagining code something like this: for value in gen(): # this one iterates for value: # this one doesn't int: yield ("(%d)" if value<0 else "%d") % value str: yield repr(value) datetime.datetime: yield value.strftime(timefmt) ...: yield str(value) Come to think of it, not a fan of ellipsis here either; "else" seems better. But that's minor bikeshedding too. The key is to show how this is materially better than an if/elif tree *and* better than a dictionary lookup. (Obviously isinstance checks are superior to dict lookups based on type(value), but I don't know how often people actually write code like the above.)

...

Besides the above syntax the patterns could be also used to register functions for dispatching. I dont' know how precisely it would integrate with PEP 484, but I agree that any valid type hint should also be a valid pattern.

Existing objects as patterns - ----------------------------------

* Types should match their instances * Tuples should match sequences that have the same length and whose elements match against the elements of a tuple (subpatterns) * Dictionaries should match mappings that contain all the keys in the pattern and the values match the subpatterns under these keys * Regular expressions should match strings. For named groups they should populate the local namespace. * The ``Ellipsis`` object can be used as a match-all pattern.

This is where the type hints could save you a lot of trouble. They already define a sloppy form of isinstance checking, so you don't need to do all the work of defining tuples, dicts, etc - all you have to say is "typing.py type hint objects match any object which they would accept", and let the details be handled elsewhere. That'd get you Union types and such, as well as what you're talking about above.

...

for point: (Number, Number) as x, y: print_point(x, y) {'x': Number, 'y': Number}: print_point(x, y) re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'): # Or re.compile('([0-9]+)-([0-9])+') as x, y: print_point(int(x), int(y)) ...: raise TypeError('Expected something else')

I'm not sure how to rescue the "regex with named groups" concept, but this code really doesn't look good. Using "as" wouldn't work reliably, since dicts are iterable (without a useful order). Something along the lines of: x, y from re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'): or maybe switch the arguments (to be more like "from ... import x, y")? But whatever it is, I'd prefer to be explicit: This has to return a dictionary containing keys 'x' and 'y' (what if there are others? Ignore them?), and then they will be bound to identical names. All in all, an interesting idea, but one that's going to require a good bit of getting-the-head-around, so it wants some strong use cases. Got some real-world example code that would benefit from this? ChrisA

Andrew Barnert

6:59 a.m.

On Apr 7, 2015, at 12:25, Szymon Pyżalski wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...
On 07.04.2015 17:52, Guido van Rossum wrote:

On Apr 7, 2015 8:46 AM, "Paul Moore" mailto:p.f.moore@gmail.com> wrote:

On 7 April 2015 at 15:54, Szymon Pyżalski mailto:szymon@pythonista.net> wrote:

...
I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?

From a quick look:

https://pypi.python.org/pypi/patterns https://pypi.python.org/pypi/py-pattern-matching Thanks these are nice but I think they try to mimic languages based on algebraic data types too much. I was thinking about something like this:

Magic method for patterns - -----------------------------

Any object can be a pattern. If an object has a special method (called ``__pattern__`` or ``__match__``) defined then this method will govern how this object behaves as pattern. If not - pattern matching works the same as ``__eq__``

"Works the same as __eq__" either doesn't allow you to assign parts while decomposing, or turns out to have a lot of problems. In my proposal, I restricted that to only work on values that are expressions that can't be read as assignment targets, and handled assignment lists recursively, to get around those problems. Yours just punts on decomposition here and offers something different (the mapping in the next paragraph), but I don't think that covers many common cases.

...

The ``__pattern__`` method can return ``False`` for no match. For a match it can return ``True`` (simplest patterns) a sequence or a mapping .

The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching. And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.

...

Syntax for pattern matching - -------------------------------

The syntax could look something like this::

for object: pattern1: statement1 pattern2: statement2 statement3 pattern3 as var1, var2: statement4 statement5

I don't like "for" here. Avoiding new keywords is good, but reusing a keyword to mean something unrelated is often confusing. With "for object in iterable:", the "object" names the thing that will be assigned each time through the loop. Here, it looks like you're omitting the "in iterable", but really you're omitting the "object in" part, which is half of one clause and half of another. Also, in cases where the matchable object/iterable is a long and complex expression (as it often is in both pattern matching and loops), it'll be very confusing to the eye if you misinterpret which kind of for you have. On the other hand, I like the keyword-less match clauses better than my version with "of". The syntax may be harder to describe to the grammar, but it's easier to a read for a human. The only potential advantage I can see to the "of" clause is that it opens the door to let a single-match case be written without requiring a double indent, or even in a single line: case obj: of pattern: statement statement case obj: of pattern: statement But I left that out of my proposal because it breaks the compound statement rule that (informally) you can't put two colons on the same line.

...

For the simplest cases this works simply by calling appropriate statement block. If the ``__pattern__`` method returns a mapping, then the values from it will be merged into the local namespace. If it returns a sequence and there is the ``as`` clause, the values will be assigned to variables specified.

...
Quick idea: this might integrate well with PEP 484, or with the (still speculative) multiple dispatch that Lukas Lang wants to write.

Function dispatch - -----------------------[

Besides the above syntax the patterns could be also used to register functions for dispatching. I dont' know how precisely it would integrate with PEP 484, but I agree that any valid type hint should also be a valid pattern.

Consider the syntax from Matthew Rocklin's multipledispatch library on PyPI: @dispatch(int, int) def add(x, y): return x+y @dispatch(object, object): def add(x, y): return "%s + %s" % (x, y) Lukas pointed out that you can out the types in the annotations rather than the decorator, and use MyPy syntax for them instead of having to invent a similar but not identical syntax as Matthew does. Anyway, it should be easy to see how this maps to a single function with pattern matching and vice-versa. Especially once you look at how Haskell implements overloading--it's just syntactic sugar for a single definition with a case expression inside.

...

Existing objects as patterns - ----------------------------------

* Types should match their instances

This is an interesting idea. This allows you to do something halfway between taking any value and checking a specific value--and something that would often be useful in Python--which my proposal didn't have. If we had a syntax for annotating values or variables (rather than just function params) with types, you could do the same thing by specifying "_: Number", which means you could also match "x: Number" to check the type and bind the value. That's something that was missing from my proposal (see the JSON schema example in the linked post on ADTs). Without that, the only way I can think of to check the type but nothing else in my proposal is something like Breakfast(*) or something equally ugly, so that's a win for your syntax.

...

* Tuples should match sequences that have the same length and whose elements match against the elements of a tuple (subpatterns)

My proposal just used the same rules as assignment to a target list (except that it recurses on pattern matching each element instead of assigning each element, of course). The upside is more flexibility, and the details have already been worked out and proven useful in Python assignments (and extended a couple of times over the past few decades in ways that work just as well here as they did there). The downside is that if you pattern match an iterator, each tuple decomposition case will eat elements from the iterator. But presumably anyone who understands iterators already knows that.

...

* Dictionaries should match mappings that contain all the keys in the pattern and the values match the subpatterns under these keys

This definitely solves the problem of how to deal with objects that are usually constructed with keyword arguments. But it doesn't help for objects usually constructed with a combination of positional and keyword arguments. (Of course you can make __pattern__ return an OrderedDict; the problem is on the other end, how to write a pattern that matches arg #0, arg #1, and arg "spam" all in one pattern.) Anyway, I still don't like the idea of neither getting nor matching the type here, or the idea of dumping all the names into locals instead of binding names that you ask for specifically, as I mentioned earlier. Also, consider how you'd map this relatively common ML/Haskell use: case p1 of Point(x1, y1): case p2 of Point(x2, y2): if x1 <= x < x2 and y1 <= y < y2: With a syntax like mine, where you specify the names to be bound, this is easy; with yours, where the pattern just dumps its names into the namespace, you have to write something like: x0, y0 = x, y case p1 of {'x': Any, 'y': Any}: x1, y1 = x, y case p2 of {'x': Any, 'y': Any}: x2, y2 = x, y if x1 <= x0 < x2 and y1 <= y0 < y2

...

* Regular expressions should match strings. For named groups they should populate the local namespace.

Good idea. Anyone coming from another modern scripting language would expect that. Then again, they expect regexp support in a whole lot of places Python doesn't have them, and that may not be a bad thing. Code that's inherently about regexps always looks simpler in other languages than in Python--but code that really isn't about regexps often gets written with regexps anyway in those languages, but rethought into something more readable in Python... And here, my alternative of specifying values to check and names to bind instead of just dumping everything into the local namespace doesn't seem to fit very well. (Without regexp literals as part of the language syntax instead of as strings, you can't cram an arbitrary target or expression inside the <P> block.)

...

* The ``Ellipsis`` object can be used as a match-all pattern.

I used the else keyword, but if you're going with keywordless match clauses, this makes sense.

...

Example - -----------

The code below would work if the ``point`` variable is specified as:

* a tuple (x, y) * a mapping {'x': x, 'y': y} * a string "{x}-{y}".format(x, y)

But what if it's specified as an instance of a Point type, which is pretty much the paradigm case for pattern matching, and the only one that causes a problem? Also, this raises an obvious problem with the regexp alternative: the first two cases match any Number, but the last case only matches nonnegative integers. And think about what it would mean to match any Number instance with a regexp. It's not just impossible, it doesn't even make sense, because values and their reprs aren't the same thing. If I import quaternion, its instances will match for the first two, but their reprs definitely won't match your regexp. This is one of those "now they have two problems" cases: a regexp looks like it will give you a simple solution to what you want, but it's just misleading you into thinking there's a simple solution that won't even pass your second test case.

...

::

def print_point(x, y): print('x = {}'.format(x)) print('y = {}'.format(y))

for point: (Number, Number) as x, y: print_point(x, y) {'x': Number, 'y': Number}: print_point(x, y) re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'): # Or re.compile('([0-9]+)-([0-9])+') as x, y: print_point(int(x), int(y)) ...: raise TypeError('Expected something else')

Greetings zefciu -----BEGIN PGP SIGNATURE----- Version: GnuPG v1

iQEcBAEBAgAGBQJVJC8YAAoJEBoIz6vZoBBhm04H/i5dJAMTloRxqSXXyY7QU8kx 0gBPjCoCsw6XZSu1nlKP0/yY+tO5uXjQEK5s1vid8gjuEBnrZCtwjfCYiiqSs96P 7iT9kSnyx/0jr5FvAC1ZItb3KN7G+MBxMfCHENM8yMz7Yxdw2F5ziT7zFE5aOELW y3Mz37+GytmMUT/DXA4wIHBFJgALspWgoU2P/ilzg4oCtbQREmX5pcMnFIsMavPI oIu7KJ0ZW2hJ4K2a0h8HWCpZ9aYwcCTF51HMkTAXo6leLSm3XG36jKiCEFJiZpxe sLH8e/2rF93tBUVunJfyEQG/bOnlkoTX2rONavDZCqSqt8t3ehT170TtHJL10Os= =iO75 -----END PGP SIGNATURE----- _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Grant Jenks

7:35 a.m.

On Tue, Apr 7, 2015 at 3:59 PM, Andrew Barnert < abarnert@yahoo.com.dmarc.invalid> wrote:

...

The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.

And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.

Wouldn't it be better to be explicit about the type matching? For example, using pypatt, I've done this: In [19]: import pypatt In [20]: Point = namedtuple('Point', 'x y') In [21]: @pypatt.transform def origin_distance(point): with match((type(point), point)): with (Point, (0, quote(y))): return y with (Point, (quote(x), 0)): return x with (Point, (quote(x), quote(y))): return (x ** 2 + y ** 2) ** 0.5 ....: In [22]: origin_distance(Point(0, 5)) Out[22]: 5 In [23]: origin_distance(Point(10, 0)) Out[23]: 10 In [24]: origin_distance(Point(3, 4)) Out[24]: 5.0 In [25]: origin_distance(Point(0, 'far')) Out[25]: 'far' In [26]: origin_distance(Point('near', 0)) Out[26]: 'near' https://pypi.python.org/pypi/pypatt/ Grant

Andrew Barnert

9:10 a.m.

On Apr 7, 2015, at 16:35, Grant Jenks wrote:

...

...
On Tue, Apr 7, 2015 at 3:59 PM, Andrew Barnert wrote: The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.

And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.

Wouldn't it be better to be explicit about the type matching?

You mean matching the type of the object? If so, that's my point. A Point and a Vector aren't the same thing just because they have the same two members, which is why in ML or Haskell or my proposal you have to match Point(x, y), not just (x, y) or {'x': x, 'y': y}. (If you mean the types of the matched elements, the trick is to be able to specify the name and the type in one place; the obvious solution is to extend annotations to assignment/match targets, so you can match Point(x: int, y: int). But anyway, I don't think that's what you meant.) With a library that doesn't extend Python syntax, that would have to look something like (Point, (q.x, q.y)), which you can probably make a little nicer with some hacks that inspect a compiled function object or a stack frame (to remove the need for quotes around variable names), but then presumably you knew that because you wrote pypatt, which I'm guessing uses one of those hacks. :)

...

For example, using pypatt, I've done this:

In [19]: import pypatt

In [20]: Point = namedtuple('Point', 'x y')

In [21]: @pypatt.transform def origin_distance(point): with match((type(point), point)): with (Point, (0, quote(y))): return y with (Point, (quote(x), 0)): return x with (Point, (quote(x), quote(y))): return (x ** 2 + y ** 2) ** 0.5 ....:

In my proposed syntax, you'd write that as: case point: of Point(0, y): return y of Point(x, 0): return x of Point(x, y): return (x ** 2 + y ** 2) ** 0.5 In his syntax, I don't think you can write it at all. You either get a type match or a decomposition, not both, and there's no way (implicitly or explicitly) to match targets to bind and values to check; the best you can do is something like: for point: Point: if x == 0: return y elif y == 0: etc. ... using the fact that Point.__pattern__ returns a dict which is magically dumped into the enclosing scope. So it's simultaneously less explicit and more verbose.

...

In [22]: origin_distance(Point(0, 5)) Out[22]: 5

In [23]: origin_distance(Point(10, 0)) Out[23]: 10

In [24]: origin_distance(Point(3, 4)) Out[24]: 5.0

In [25]: origin_distance(Point(0, 'far')) Out[25]: 'far'

In [26]: origin_distance(Point('near', 0)) Out[26]: 'near'

https://pypi.python.org/pypi/pypatt/

PyPatt looks pretty neat, but I suspect it has a _different_ fundamental problem. As far as I can tell from a quick glance, it doesn't seem to have any way for types to opt in to the matching protocol like Szymon's proposal or mine, which presumably means that it matches by constructing a value and checking it for equality, which means that any type that does anything non-trivial on construction is unmatchable (or maybe even dangerous--consider FileIO(path, 'w') as a pattern). I'm sure Szymon didn't want to add his __pattern__ protocol any more than I wanted to add my __match__ protocol, but once you start trying it with real types I don't think there's any real alternative. (See my linked blog post for more on that.)

Szymon Pyżalski

9 Apr 9 Apr

1:47 a.m.

...

...
"Works the same as __eq__" either doesn't allow you to assign parts while decomposing, or turns out to have a lot of problems. The "works the same as __eq__" referers to simple objects like numbers, strings or booleans. You could for example expect a tuple containing an element that is some metadata with information, how to handle the data. You could then use patterns like (True, Number), (False, Number). The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching. The mapping is only a way to populate the local namespace. If the

On 08.04.2015 00:59, Andrew Barnert wrote: pattern matches or not is a different notion.

...

...
And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict. I believe that objects that we want to be matchable in this way should be subclasses of a namedtuple.------

* Types should match their instances

...
This is an interesting idea. This allows you to do something halfway between taking any value and checking a specific value--and something that would often be useful in Python--which my proposal didn't have.

There is a problem with this idea. The isinstance method can accept a tuple. And the tuple means "OR" in this case. In pattern matching however the tuple would match a tuple. This is an ugly inconsistency. Greetings zefciu

Anthony Towns

8 Apr 8 Apr

1:12 p.m.

On 8 April 2015 at 05:25, Szymon Pyżalski wrote:

...

Any object can be a pattern. If an object has a special method (called ``__pattern__`` or ``__match__``) defined then this method will govern how this object behaves as pattern. If not - pattern matching works the same as ``__eq__``

...

The ``__pattern__`` method can return ``False`` for no match. For a match it can return ``True`` (simplest patterns) a sequence or a mapping

Always returning a sequence or None would seem simpler to deal with, wouldn't it? I'm imaginging the default implementations would be something like: class object: def __match__(self, item): if item == self: return [item] else: return None class type: def __match__(cls, item): if isinstance(item, cls): return [item] else: return None class tuple: def __match__(self, item): if isinstance(item, tuple): if all( si.__match__(ii) for si,ii in zip(self, item) ): return item return None Syntax for pattern matching

...

- ------------------------------- The syntax could look something like this::

for object: pattern1: statement1

Is the two level indent actually a win? It saves re-evaluating the expression, but you could just as easly have: object = f(x, y+w, g(z/3 for z in x)) match object ~~ pattern1 as a,b,c: statements... ormatch object ~~ pattern2 as a,d: statements... but in that case you're coming pretty close to an 'if' block that's able to set locals: object = ... if object ~~ pattern1 as a,b,c: ... elif object ~~ pattern2 as a,d: ... (Using ~~ as a placeholder for the syntax for "matches against") for point:

...

(Number, Number) as x, y: print_point(x, y) {'x': Number, 'y': Number}: print_point(x, y) re.compile( '(?P<x>[0-9]+)-(?P<y>[0-9])+'): # Or re.compile('([0-9]+)-([0-9])+') as x, y: print_point(int(x), int(y)) ...: raise TypeError('Expected something else')

My understanding of the way Haskell does name assignment in case matching is that it lets you put placeholders in a constructor. So where you might say x = MyX(a=1, b=2) to create an object 'x', you could then say something along the lines of: x ~~ MyX(a=foo, b=bar) to pull the values (1, 2) back out (as foo and bar). Doing exactly that doesn't make sense for python because reversing a constructor doesn't make much sense. If you could make that work, you'd get the python version of Haskell pattern matching looking like: re_pt = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+') case point of: (Number(x), Number(y)): ... ('x': Number(x), 'y': Number(y)): ... re_pt(x=x, y=y): .. where you're at least mentioning the 'x' and 'y' variables that are getting filled in. Okay, probably horrible idea: Matching syntax looks like: case point is: (x:=Number, y:=Number): ... a:=x is the same as calling assigning_match(locals(), x, "a") run in the local scope. a,b:=x is the same as calling assigning_match(locals(), x, ("a", "b")). 'case obj is: \n pattern[0]: stmt[0] \n pattern[1]: stmt[1]' does: for i in range(2): try: pattern[i].__match__(obj) except NoMatch: continue stmt[i] break __match__() raises NoMatch on failure, or returns the match: class object: def __match__(self, item): if self == item: return item raise NoMatch class type: def __match__(self, item): if isinstance(item, self): return item raise NoMatch class tuple: def __match__(self, item): if isinstance(item, tuple): if all( s_i.__match__(i_i) for s_i, i_i in zip(self, item) ): return item raise NoMatch class dict: def __match__(self, item): if isinstance(item, dict): if item.keys() == self.keys(): if all(s_v.__match__(item[s_k]) for s_k,s_v in self.items()): return item raise NoMatch class SRE_Pattern: def __match__(self, item): m = self.match(item) if m is None: raise NoMatch return m.groups() assigning_match works something like: class assigning_match(object): def __init__(self, l, matchobj, varname): assert isinstance(varname, str) or isinstance(varname, tuple) self.l = l self.m = matchobj self.n = varname def __match__(self, item): r = self.m.__match__(item) if isinstance(self.n, str): self.l[self.n] = other else: for n, r_i in zip(self.n, r): self.l[n] = r_i return r Then the example looks like: case point is: (x:=Number, y:=Number): ... {"x": x:=Number, "y": y:=Number}: ... x,y:=re_pt: ... Being able to pull out and name something that's a deep part of an object seems like it could be useful; otherwise it's not adding much over if/elif. Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie: Foo(a,b,key=c) = foo is equivalent to something like: x = Foo.__deconstruct__(foo) a = x[0] b = x[1] c = x["key"] I think it might even extend sanely to unpacking subobjects: Foo(Bar(a), b) = foo becomes: x = Foo.__deconstruct__(foo) Bar(a) = x[0] b = x[1] becomes: x = Foo.__deconstruct__(foo) x2 = Bar.__deconstruct__(x[0]) a = x2[0] b = x[1] Using "_" to accept and ignore any value would work fine there. If you allowed literals instead of an identifier, you could treat: Foo(0, 99, key="x") = foo as: x = Foo.__deconstruct__(foo) assert 0 == x[0] assert 99 == x[1] assert "x" = x["key"] That'd be enough to use it for matching I think: case foo is: Foo(a,b): ... would become: try: Foo(a,b) = foo except: continue ... break But it might be confusing figuring out whether: b = 0 case foo is: Foo(a,b): print(a) should result in value checking: if foo ~~ Foo(a,0): print(a) or value assignment: del b if foo ~~ Foo(a,b): print(a) Eh, I don't know. Maybe something there is interesting though. Cheers, aj -- Anthony Towns

Andrew Barnert

3:25 p.m.

On Apr 7, 2015, at 22:12, Anthony Towns wrote:

...

...
On 8 April 2015 at 05:25, Szymon Pyżalski wrote: Any object can be a pattern. If an object has a special method (called ``__pattern__`` or ``__match__``) defined then this method will govern how this object behaves as pattern. If not - pattern matching works the same as ``__eq__``

...
The ``__pattern__`` method can return ``False`` for no match. For a match it can return ``True`` (simplest patterns) a sequence or a mapping

Always returning a sequence or None would seem simpler to deal with, wouldn't it?

...

I'm imaginging the default implementations would be something like:

class object: def __match__(self, item): if item == self: return [item] else: return None

class type: def __match__(cls, item): if isinstance(item, cls): return [item] else: return None

class tuple: def __match__(self, item): if isinstance(item, tuple): if all( si.__match__(ii) for si,ii in zip(self, item) ): return item

return None

...
Syntax for pattern matching - ------------------------------- The syntax could look something like this::

for object: pattern1: statement1

Is the two level indent actually a win? It saves re-evaluating the expression, but you could just as easly have:

object = f(x, y+w, g(z/3 for z in x)) match object ~~ pattern1 as a,b,c: statements... ormatch object ~~ pattern2 as a,d: statements...

but in that case you're coming pretty close to an 'if' block that's able to set locals:

object = ... if object ~~ pattern1 as a,b,c: ... elif object ~~ pattern2 as a,d: ...

(Using ~~ as a placeholder for the syntax for "matches against")

...
for point: (Number, Number) as x, y: print_point(x, y) {'x': Number, 'y': Number}: print_point(x, y) re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'): # Or re.compile('([0-9]+)-([0-9])+') as x, y: print_point(int(x), int(y)) ...: raise TypeError('Expected something else')

My understanding of the way Haskell does name assignment in case matching is that it lets you put placeholders in a constructor. So where you might say

x = MyX(a=1, b=2)

to create an object 'x', you could then say something along the lines of:

x ~~ MyX(a=foo, b=bar)

to pull the values (1, 2) back out (as foo and bar). Doing exactly that doesn't make sense for python because reversing a constructor doesn't make much sense.

This is the big thing that Szymon is trying to add, by adding a __match__ protocol that gives types a way to, in effect, reverse their constructor. That's the same thing I was trying to do in my blog post that I linked earlier. If you don't have that, then what you're building is a strictly weaker version of tuple unpacking (which we already have), together with a less clumsy way to write "try each of these blocks until one of them doesn't raise" (which you could do more generally), not pattern matching.

...

If you could make that work, you'd get the python version of Haskell pattern matching looking like:

re_pt = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+') case point of: (Number(x), Number(y)): ... ('x': Number(x), 'y': Number(y)): ... re_pt(x=x, y=y): ..

where you're at least mentioning the 'x' and 'y' variables that are getting filled in.

Okay, probably horrible idea:

Matching syntax looks like:

case point is: (x:=Number, y:=Number): ...

a:=x is the same as calling assigning_match(locals(), x, "a") run in the local scope.

Except, of course, that modifying locals isn't guaranteed to actually affect the locals, and in current CPython it generally doesn't...

...

a,b:=x is the same as calling assigning_match(locals(), x, ("a", "b")).

'case obj is: \n pattern[0]: stmt[0] \n pattern[1]: stmt[1]' does:

for i in range(2): try: pattern[i].__match__(obj) except NoMatch: continue stmt[i] break

__match__() raises NoMatch on failure, or returns the match:

class object: def __match__(self, item): if self == item: return item raise NoMatch

class type: def __match__(self, item): if isinstance(item, self): return item raise NoMatch

class tuple: def __match__(self, item): if isinstance(item, tuple): if all( s_i.__match__(i_i) for s_i, i_i in zip(self, item) ): return item raise NoMatch

class dict: def __match__(self, item): if isinstance(item, dict): if item.keys() == self.keys(): if all(s_v.__match__(item[s_k]) for s_k,s_v in self.items()): return item raise NoMatch

class SRE_Pattern: def __match__(self, item): m = self.match(item) if m is None: raise NoMatch return m.groups()

assigning_match works something like:

class assigning_match(object): def __init__(self, l, matchobj, varname): assert isinstance(varname, str) or isinstance(varname, tuple)

self.l = l self.m = matchobj self.n = varname

def __match__(self, item): r = self.m.__match__(item) if isinstance(self.n, str): self.l[self.n] = other else: for n, r_i in zip(self.n, r): self.l[n] = r_i return r

Then the example looks like:

case point is: (x:=Number, y:=Number): ... {"x": x:=Number, "y": y:=Number}: ... x,y:=re_pt: ...

This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like: case point: x: Number, y: Number: ... {"x": x: Number, "y": y: Number}: ... re_pt(x, y): ... (My blog post has an "of" for each clause, but I don't think it's necessary. It also doesn't have type annotations or dict matching; since it builds on normal assignment targets instead of reproducing most of the work done there, the obvious way to add those is to add them to assignments, which has other benefits--no more "a = 3 # type: int" in place of "a: int = 3". And meanwhile, I don't think dict matching is all that useful.)

...

Being able to pull out and name something that's a deep part of an object seems like it could be useful; otherwise it's not adding much over if/elif.

Exactly.

...

Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:

Foo(a,b,key=c) = foo

is equivalent to something like:

x = Foo.__deconstruct__(foo) a = x[0] b = x[1] c = x["key"]

That is almost exactly my version of __match__, except for two things. I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through. I didn't try to replace everything tuple unpacking does, but to add onto it. We already have the concept of assignment targets and target lists; specifying what happens for each of those, and what happens when something which isn't a target is in a target list, gets you the behavior you want for matching tuples--including matching other iterables, matching first, *rest, and so on--without having to build a complicated tuple.__match__ and then duplicate it in every other sequence type.

...

I think it might even extend sanely to unpacking subobjects:

Foo(Bar(a), b) = foo

becomes:

x = Foo.__deconstruct__(foo) Bar(a) = x[0] b = x[1]

becomes:

x = Foo.__deconstruct__(foo) x2 = Bar.__deconstruct__(x[0]) a = x2[0] b = x[1]

Using "_" to accept and ignore any value would work fine there. If you allowed literals instead of an identifier, you could treat:

Foo(0, 99, key="x") = foo

as:

x = Foo.__deconstruct__(foo) assert 0 == x[0] assert 99 == x[1] assert "x" = x["key"]

That'd be enough to use it for matching I think:

case foo is: Foo(a,b): ...

would become:

try: Foo(a,b) = foo except: continue ... break

Sure. Haskell actually defines pattern-matching let in terms of a translation to case--and pattern-matching function overload definitions too. Once you define either assignment or case, the other two come easy.

...

But it might be confusing figuring out whether:

b = 0 case foo is: Foo(a,b): print(a)

should result in value checking:

if foo ~~ Foo(a,0): print(a)

or value assignment:

del b if foo ~~ Foo(a,b): print(a)

This last part is really the only problem. I suggested that it always means "bind to b", not "match the current value or b", with special syntax for when you want the latter. That's effectively what ML-family languages do, and in my experience you usually go a long time before you run into the need to match b's value or read some code that does so, at which point you finally learn the @b or whatever syntax.

...

Eh, I don't know. Maybe something there is interesting though.

Cheers, aj

-- Anthony Towns _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Anthony Towns

8:34 p.m.

On 8 April 2015 at 17:25, Andrew Barnert wrote:

...

On Apr 7, 2015, at 22:12, Anthony Towns wrote:

case point is:

...

(x:=Number, y:=Number): ... {"x": x:=Number, "y": y:=Number}: ... x,y:=re_pt: ...

This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:

Hmm, I thought that was reasonably straightforward. You'd say things like: case rect_coords is: (pos := (left := Number, top := Number), size := (width := Number, height := Number)): ... to pull out rect_coords == (pos, size), pos == (left, top), size == (width, height). Though maybe it'd be interesting to have a "deconstructor" protocol for

...

that instead, ie:

Foo(a,b,key=c) = foo

is equivalent to something like:

x = Foo.__deconstruct__(foo) a = x[0] b = x[1] c = x["key"]

That is almost exactly my version of __match__, except for two things.

...

I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.

I think you could deal with that okay: class type: def __deconstruct__(cls, obj): if not isinstance(obj, cls): raise NotDeconstructable result = obj.__dict__.copy() if callable(cls.__init__): argspec = inspect.getargspec(cls.__init__) for i, argname in enumeraate(argspec[1:]): result[i] = result.get(argname, None) return result So if your have def __init__(a, b, key) and you just store them as attributes, you can access them by name or by the same position as per __init__. I guess you could deconstruct *args and **kwargs too, by pulling out the unused values from result, though you'd probably need to use an ordereddict to keep track of things then. Though that's how recursive unpacking of lists work anyway, isn't it? Match against [head, *tail]? The only other difference is I was just ignoring unspecified bits, maybe it would make more sense to require you to unpack everything in the same way list/tuple unpacking does. Foo(x, y, *args, kw=z, **kwargs) = foo becomes something like: __unpack = Foo.__deconstruct__(foo) __seen = set() __it = iter(__unpack) x = __unpack.pop( next(__it) ) y = __unpack.pop( next(__it) ) z = __unpack.pop( "kw" ) args = [ __unpack.pop(__i) for __i in __unpack ] kwargs = { k: __unpack.pop(k) for k in __unpack.keys() } if __unpack: raise ValueError("too many values to unpack") You'd need something fancier than an ordereddict for *args to leave anything for **kwargs to pick up though. I think that adds up to letting you say: re_point = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+') case '1-3' is: re_point(x=xcoord, y=ycoord): print("X coordinate: %s, Y coordinate: %s" % (xcoord, ycoord)) by defining class SRE_Pattern: def __deconstruct__(self, obj): m = self.match(obj) if m is None: raise NotDeconstructable result = m.groupdict() return result which seems plausible. That seems like a prtty friendly syntax for dealing with regexps actually... (I think that's an advantage of having the protocol be Foo.__deconstruct__(obj) rather than obj.__match__() -- I don't think you could handle regexps without having both params) Hmm, as far as literal-checking versus binding goes, if you're using := or "as" to bind subexpressions, as in: case order is: customer, Breakfast(spam, eggs, beans) as breakfast: ... or case order is: customer, breakfast:=Breakfast(spam, eggs, beans) then I /think/ you could use that to distinguish between binding and checking. So: customer = Customer("alice") name = "bob" case customer is: Customer(name): # succeeds, binds name as "alice" case customer is: Customer("bob"): # compares "alice" to "bob", fails # if it had succeeded, wouldn't have bound any variables name = "bob" case customer is: Customer(name as x): # compares "alice" to name ("bob") and fails # but would have bound x as "alice" if it had succeeded That seems like a rule that's reasonably understandable to me? ("expr as name" has definitely grown on me over "name := expr") I guess I'm thinking "literal" matching is just a special case of deconstructing in general, and done by a method like: class object: def __deconstruct__(self, obj): if self == obj: return () else: raise NotDeconstructable That would have the potential side-effect of allowing things like: 0 = n as a valid statement (it'd raise NotDeconstructable if 0 != n). Not sure that's much worse than: [] = n though, which is already allowed. You could have a syntax error if there wasn't anything to bind though. Cheers, aj -- Anthony Towns

Andrew Barnert

10:07 p.m.

On Apr 8, 2015, at 05:34, Anthony Towns wrote:

...

...
...
On 8 April 2015 at 17:25, Andrew Barnert wrote:

...
...
On Apr 7, 2015, at 22:12, Anthony Towns wrote: case point is: (x:=Number, y:=Number): ... {"x": x:=Number, "y": y:=Number}: ... x,y:=re_pt: ...

This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:

Hmm, I thought that was reasonably straightforward. You'd say things like:

case rect_coords is: (pos := (left := Number, top := Number), size := (width := Number, height := Number)): ...

to pull out rect_coords == (pos, size), pos == (left, top), size == (width, height).

Ah, I see. I was thrown by := looking like assignment in Pascal or some of the mutable ML variants, where it obviously can't be part of a subexpression, but once you get past that, yeah, it's obvious. Sorry.

...

...
...
Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:

Foo(a,b,key=c) = foo

is equivalent to something like:

x = Foo.__deconstruct__(foo) a = x[0] b = x[1] c = x["key"]

That is almost exactly my version of __match__, except for two things.

I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.

I think you could deal with that okay:

class type: def __deconstruct__(cls, obj): if not isinstance(obj, cls): raise NotDeconstructable

result = obj.__dict__.copy() if callable(cls.__init__): argspec = inspect.getargspec(cls.__init__) for i, argname in enumeraate(argspec[1:]): result[i] = result.get(argname, None)

return result

So if your have def __init__(a, b, key) and you just store them as attributes, you can access them by name or by the same position as per __init__. I guess you could deconstruct *args and **kwargs too, by pulling out the unused values from result, though you'd probably need to use an ordereddict to keep track of things then.

But argspec is more powerful and simpler than that--well, not argspec, but fullargspec and especially Signature. Just use inspect.signature and its bind method. That gives you a BoundArguments, and then you can just do boundargs.args[0] to match by position or boundargs.arguments['spam'] to match by keyword; for an argument that could have been passed either way, the value will show up both ways. (And for args that were passed into *args, however they got there, that works too--for example, if you def __init__(self, x, *args) and then do signature(cls.__init__).bind(1, 2, 3), you can see the 3 as args[3] or as arguments['args'][1].) Meanwhile, I'm a bit wary about type implicitly assuming that attributes and __init__ parameters match one-to-one like this. Sure, that's often right, and it will often raise an AttributeError or a TypeError (from Signature.bind) when it's not right--but those cases where it does the wrong thing will be pretty surprising, and hard for a novice to work around. Making classes explicitly opt in to matching seems safer. On the other hand, allowing them to opt in with a dead-simple decorator like @simplematch (or a base class or metaclass or register function or whatever) that just adds a __deconstruct__ method like the one you defined does seem a lot easier than any of the ideas I originally suggested. One last thing here: I like your NotDeconstructable; if we make that a subclass of ValueError, we can change normal tuple unpacking to raise that as well, without breaking any code.

...

Though that's how recursive unpacking of lists work anyway, isn't it? Match against [head, *tail]?

Sure, but that's a pretty bad way to process Python lists (you end up making N copies of average length N/2, plus N stack frames out of a finite limit, and the code ends up more verbose and less readable), so anything that makes that Scheme/ML style look more inviting than iteration is probably not an advantage...

...

The only other difference is I was just ignoring unspecified bits, maybe it would make more sense to require you to unpack everything in the same way list/tuple unpacking does.

Foo(x, y, *args, kw=z, **kwargs) = foo

becomes something like:

__unpack = Foo.__deconstruct__(foo) __seen = set() __it = iter(__unpack)

x = __unpack.pop( next(__it) ) y = __unpack.pop( next(__it) ) z = __unpack.pop( "kw" )

args = [ __unpack.pop(__i) for __i in __unpack ] kwargs = { k: __unpack.pop(k) for k in __unpack.keys() }

if __unpack: raise ValueError("too many values to unpack")

You'd need something fancier than an ordereddict for *args to leave anything for **kwargs to pick up though.

I think that adds up to letting you say:

re_point = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+')

case '1-3' is: re_point(x=xcoord, y=ycoord): print("X coordinate: %s, Y coordinate: %s" % (xcoord, ycoord))

by defining

class SRE_Pattern: def __deconstruct__(self, obj): m = self.match(obj) if m is None: raise NotDeconstructable

result = m.groupdict() return result

which seems plausible. That seems like a prtty friendly syntax for dealing with regexps actually...

You're starting to turn me around on my dislike of using pattern matching for regexps; that's more readable than existing re code...

...

(I think that's an advantage of having the protocol be Foo.__deconstruct__(obj) rather than obj.__match__() -- I don't think you could handle regexps without having both params)

Yes. I originally thought the extra indirection of letting something pose as the match value's type while actually not being that type was unnecessary, but this case, an re object's type posing as the string's type, seems like a great counterexample. And it doesn't add much complexity, so... I think it's worth it.

...

Hmm, as far as literal-checking versus binding goes, if you're using := or "as" to bind subexpressions, as in:

case order is: customer, Breakfast(spam, eggs, beans) as breakfast: ...

or

case order is: customer, breakfast:=Breakfast(spam, eggs, beans)

then I /think/ you could use that to distinguish between binding and checking. So:

customer = Customer("alice")

name = "bob" case customer is: Customer(name): # succeeds, binds name as "alice"

case customer is: Customer("bob"): # compares "alice" to "bob", fails # if it had succeeded, wouldn't have bound any variables

name = "bob" case customer is: Customer(name as x): # compares "alice" to name ("bob") and fails # but would have bound x as "alice" if it had succeeded

That seems like a rule that's reasonably understandable to me?

I don't know about that. It wouldn't be hard to remember the rule once you learn it, but it may not be that easy to learn, because at first glance it doesn't make sense. It's weird to have "foo" mean "assign to foo" but "foo as bar" in the exact same context mean "use the value of foo, then assign to bar". It's unprecedented in Python (with doesn't assign to the context manager name instead of using it as an expression if you leave off the "as") and I can't think of any other language that does something equivalent. (And I'm pretty sure it would inspire rants about "warts" by a certain person on this list. :) And again, unless use cases for pattern matching in Python turn out to be very different than in ML and friends (which is certainly _possible_, but not something I'd want to just assume--if we don't expect somewhat similar uses, why are we even looking to them for inspiration?), I think the need for matching a variable's value won't be that common, so we shouldn't waste the nicest syntactic form on it... (Although I'm definitely not sold on my throwaway suggestion of "@foo", which has misleading parallels both in Python and in some of the languages the syntax is inspired by...)

...

("expr as name" has definitely grown on me over "name := expr")

I guess I'm thinking "literal" matching is just a special case of deconstructing in general, and done by a method like:

class object: def __deconstruct__(self, obj): if self == obj: return () else: raise NotDeconstructable

That would have the potential side-effect of allowing things like:

0 = n

as a valid statement (it'd raise NotDeconstructable if 0 != n). Not sure that's much worse than:

[] = n

though, which is already allowed. You could have a syntax error if there wasn't anything to bind though.

Yeah, I tried to come up with a rule that disallowed literal values in an assignment (but not case) target list unless there's some kind of "real pattern" somewhere within (possibly recursively) the target list, but all of the possibilities seem horrible. So I decided the best thing to do was just not mention that "0 = n" is no longer a syntax error, but a legal runtime assertion that hopefully people don't use. Also, while I don't think people should explicitly write "0 = n", I can imagine a code generator or MacroPy macro or whatever that takes generates a pattern based on its arguments that might want to generate something like "0 = n" for the no-args case, and why not allow it? Anyway, I think your additions and changes make a much better proposal than what I originally had, and we're now probably on track to the best Pythonic design for ML-style pattern matching--but I still don't think it's worth adding to Python, or even writing as a PEP for Guido to reject. Have you ever written code where you really missed the feature (except in the first hour back on Python after a week of Haskell hacking)? Or seen any real code that would be substantially improved? If not, this still just seems like a feature for its own sake, or more than one way to do it for no reason.

...

Cheers, aj

-- Anthony Towns

Devin Jeanpierre

9 Apr 9 Apr

12:24 a.m.

On Wed, Apr 8, 2015 at 10:07 AM, Andrew Barnert wrote:

...

On Apr 8, 2015, at 05:34, Anthony Towns wrote: Anyway, I think your additions and changes make a much better proposal than what I originally had, and we're now probably on track to the best Pythonic design for ML-style pattern matching--but I still don't think it's worth adding to Python, or even writing as a PEP for Guido to reject. Have you ever written code where you really missed the feature (except in the first hour back on Python after a week of Haskell hacking)? Or seen any real code that would be substantially improved? If not, this still just seems like a feature for its own sake, or more than one way to do it for no reason.

All compiling / tree handling code is really annoying without pattern patching. Here is a fragment of code from the ast module: elif isinstance(node, BinOp) and \ isinstance(node.op, (Add, Sub)) and \ isinstance(node.right, Num) and \ isinstance(node.right.n, complex) and \ isinstance(node.left, Num) and \ isinstance(node.left.n, (int, long, float)): Here is what it would look like if we were inside a match statement thingy: # I picked the arg order out of the docs, but it's a bit too "clever" case BinOp(Num(left), (Add | Sub), Num(right)): if isinstance(left, (int, long, float)) and isinstance(right, complex): ... NodeVisitor etc. are hacked up pattern matching-like thingies that mitigates this when they apply. It's harder to demonstrate, but sre_compile is even worse -- it takes in a tree of opcodes and argument vectors, where the argument vector values are totally undocumented etc. (and, IIRC, sometimes they are lists, sometimes they aren't, depending on the opcode). If it used some disjoint union type that unpacked nicely in a match statement, the compiling code would be much more readable, and the internal data structure would be easier to understand. -- Devin

Andrew Barnert

8 Apr 8 Apr

5:10 a.m.

On Apr 7, 2015, at 07:54, Szymon Pyżalski wrote:

...

Hello!

I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?

http://stupidpythonideas.blogspot.com/2014/08/a-pattern-matching-case-statem... I wrote this up after getting frustrated with people suggesting Python should have "a case statement like every other language" where by "every other language" they meant C and Pascal and their descendants with the relatively useless case that's just an elif chain rather than ML and its descendants with the more useful pattern matching case statement. (Nick Coghlan pointed out that go has an interesting middle ground--it looks like a JavaScript switch, but with an as clause to optionally bind the result of each expression to a local variable.) Once you've defined the case statement, you can add pattern-matching assignments and function definitions (especially if you know Haskell, where the equivalents are just syntactic sugar for using case). The big problem that you have in Python that you don't have in ML, Haskell, etc. is that most values don't have a unique, evaluable representation that can be pattern-matched. I think the best solution to this is a __match__ protocol, but I went over some other ideas. The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses, and comprehensions are expressions that get compiled into calls to hidden functions in part to deal with this, but I'm not sure either of those is something you'd want to extend here. Anyway, I think that my straw man proposal is a start toward the best design you'd be able to come up with (although it's not in itself the best design, probably), and it still doesn't fit into Python. But read it and tell me if you disagree with either half of that.

...

Greetings zefciu _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

10:04 a.m.

On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert wrote:

...

The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...

FWIW it's not hackery around scope at all - it's simply that the name gets unbound:

...

...
...
e = 2.71828 try: 1/0 ... except ZeroDivisionError as e: print("1/0!") ... 1/0! e Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'e' is not defined

ChrisA

Andrew Barnert

10:45 a.m.

On Apr 7, 2015, at 19:04, Chris Angelico wrote:

...

On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert wrote:

...
The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...

FWIW it's not hackery around scope at all - it's simply that the name gets unbound:

Sorry, you're right, "hackery" is misleading here; I should have just said "(yes, I know about comprehensions and except clauses, but they're not relevant here)" or something similar. I don't consider the 3.x rules for except "hacky" or bad in any way, and didn't mean to imply otherwise. Anyway, an ML/Haskell case expression binds the matched variables inside the case branch; the most obvious interpretation of this proposal (or mine) would bind them in the entire function. Of course it's possible to solve that (if you think it needs solving) either the way comprehensions do (compile the case statement, or each branch, as a function definition and call) or the way except does (unbind names that appear in the "as" clause after the statement), or probably other ways, but if you don't do anything, they work like assignments. (Which is actually a perfectly good parallel to ML, where they work like let expressions, it's just that let expressions don't work like assignments.)

Chris Angelico

12:56 p.m.

On Wed, Apr 8, 2015 at 12:45 PM, Andrew Barnert wrote:

...

On Apr 7, 2015, at 19:04, Chris Angelico wrote:

...
On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert wrote:

...
The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...

FWIW it's not hackery around scope at all - it's simply that the name gets unbound:

Sorry, you're right, "hackery" is misleading here; I should have just said "(yes, I know about comprehensions and except clauses, but they're not relevant here)" or something similar. I don't consider the 3.x rules for except "hacky" or bad in any way, and didn't mean to imply otherwise.

Gotcha.

...

Anyway, an ML/Haskell case expression binds the matched variables inside the case branch; the most obvious interpretation of this proposal (or mine) would bind them in the entire function. Of course it's possible to solve that (if you think it needs solving) either the way comprehensions do (compile the case statement, or each branch, as a function definition and call) or the way except does (unbind names that appear in the "as" clause after the statement), or probably other ways, but if you don't do anything, they work like assignments. (Which is actually a perfectly good parallel to ML, where they work like let expressions, it's just that let expressions don't work like assignments.)

I'd treat these case branches like 'with' statements. The name binding from "as" lasts past the end of the block, it's no different from any other assignment. The only reason exceptions unbind is to avoid the refloop of an exception having a reference to locals, and the local name referencing the exception. Everything else can be just like assignment. So basically, what we have here is a cross between a 'with' statement and an 'if'. I started writing up a demo that mainly used 'with', and then realized that I had no idea how to have the __enter__ method stipulate that the body should be skipped... oh, hello PEP 377, that's where the problem is. This proposal would need some new syntax, but conceptually, it'd be something like an __enter__ method returning "hey, this body should be skipped", based on whatever criteria it likes (isinstance, regex matching, etc). ChrisA

Andrew Barnert

2:39 p.m.

On Apr 7, 2015, at 21:56, Chris Angelico wrote:

...

...
On Wed, Apr 8, 2015 at 12:45 PM, Andrew Barnert wrote:

...
On Apr 7, 2015, at 19:04, Chris Angelico wrote:

On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert wrote:

...
The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...

FWIW it's not hackery around scope at all - it's simply that the name gets unbound:

Sorry, you're right, "hackery" is misleading here; I should have just said "(yes, I know about comprehensions and except clauses, but they're not relevant here)" or something similar. I don't consider the 3.x rules for except "hacky" or bad in any way, and didn't mean to imply otherwise.

Gotcha.

...
Anyway, an ML/Haskell case expression binds the matched variables inside the case branch; the most obvious interpretation of this proposal (or mine) would bind them in the entire function. Of course it's possible to solve that (if you think it needs solving) either the way comprehensions do (compile the case statement, or each branch, as a function definition and call) or the way except does (unbind names that appear in the "as" clause after the statement), or probably other ways, but if you don't do anything, they work like assignments. (Which is actually a perfectly good parallel to ML, where they work like let expressions, it's just that let expressions don't work like assignments.)

I'd treat these case branches like 'with' statements. The name binding from "as" lasts past the end of the block, it's no different from any other assignment. The only reason exceptions unbind is to avoid the refloop of an exception having a reference to locals, and the local name referencing the exception. Everything else can be just like assignment.

Exactly my thought. Binding in Python case statements is different from Haskell case expressions in exactly the same way that assignment statements are different from let expressions, which is pretty much what you should expect.

...

So basically, what we have here is a cross between a 'with' statement and an 'if'.

I understand why people keep reaching for "with" here, but I really don't like it. Part of the attraction is the "as" clause, which looks like the "as" clause in an ML case expression. But that's missing the point of patterns: what you mostly want to do is decompose the pattern, binding variables to (some of) the components; you don't want to bind a variable to the whole thing. Except that _occasionally_ you _also_ want to bind the whole thing (or bind a subexpression and also some of its own subexpressions), which is what "as" can add. Szymon's insight seems to be that pattern matching can be done, not directly in the way Haskell and ML do it, but instead by having types that know how to transform themselves into simpler structures that Python already knows how to decompose (tuples for tuple unpacking when possible--where the "as" clause is useful--and dicts for updating locals for the less-trivial cases where it's not). That's clever, and it makes good use of an "as" clause, but (especially with the locals-updating dict) it seems so far from what a with statement is intended to do that it seems like an abuse to try to fit his solution into that syntax. Also, "with" just sounds wrong when you read it out loud. Although I'm not sure how much that matters; the same is arguably true with using with to simulate the related but different "if let" statement (as seen in Swift, as Grant sort of proposed in the last email I responded to), but I think I like it there...

...

I started writing up a demo that mainly used 'with', and then realized that I had no idea how to have the __enter__ method stipulate that the body should be skipped... oh, hello PEP 377, that's where the problem is.

There are some hacks for getting around that in CPython 2.6-7 or 3.3+, which you can use for experimenting with it. Or you can always use MacroPy to transform the ASTs before it even gets that far.

...

This proposal would need some new syntax, but conceptually, it'd be something like an __enter__ method returning "hey, this body should be skipped", based on whatever criteria it likes (isinstance, regex matching, etc).

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

5 p.m.

On Wed, Apr 8, 2015 at 4:39 PM, Andrew Barnert wrote:

...

...
So basically, what we have here is a cross between a 'with' statement and an 'if'.

I understand why people keep reaching for "with" here, but I really don't like it.

Part of the attraction is the "as" clause, which looks like the "as" clause in an ML case expression. But that's missing the point of patterns: what you mostly want to do is decompose the pattern, binding variables to (some of) the components; you don't want to bind a variable to the whole thing. Except that _occasionally_ you _also_ want to bind the whole thing (or bind a subexpression and also some of its own subexpressions), which is what "as" can add.

The other similar syntax that I looked at was the from-import. Positing a new keyword "patmatch": regex = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+') case line: from regex patmatch x, y: print(x,y) (or possibly combine in the "case" part into a single statement, to remove the two-level indentation and some of the magic) What this means is: 1) Evaluate line and regex (they'd allow arbitrary expressions) 2) Call regex.__patmatch__(line) - I'll call that "ret" 3a) If it returns None, skip to the next patmatch statement. 3b) If it returns a tuple, assign x=ret[0]; y=ret[1] 3c) If it returns a dict, assign x=ret["x"]; y=ret["y"] 3d) If it returns any other object, assign x=ret.x; y=ret.y 4) Discard "ret" itself (ie it never actually gets bound to a name) 5) Execute the block. This would cover all the obvious use-cases. Specifically for cases 3c and 3d, you could also use 'as' syntax to bind to other names: case line: from re.compile('(?P<x>[0-9]+)') patmatch x as y: print(y) This would also deal with ambiguities like namedtuple; you can either use "patmatch x, y" to treat it as a tuple, or "patmatch x as x, y as y" to force attribute access. ChrisA

3299

Age (days ago)

3300

Last active (days ago)

List overview

Download

20 comments

9 participants

участники (9)

Andrew Barnert
Anthony Towns
Chris Angelico
Devin Jeanpierre
Grant Jenks
Guido van Rossum
Paul Moore
Szymon Pyżalski
Todd

Pattern matching

Szymon Pyżalski

Szymon Pyżalski

Szymon Pyżalski

Anthony Towns

Anthony Towns

tags

участники (9)