[Python-ideas] Pattern matching

Andrew Barnert abarnert at yahoo.com
Wed Apr 8 09:25:15 CEST 2015


On Apr 7, 2015, at 22:12, Anthony Towns <aj at erisian.com.au> wrote:
> 
>> On 8 April 2015 at 05:25, Szymon Pyżalski <szymon at pythonista.net> wrote:
>> Any object can be a pattern. If an object has a special method (called
>> ``__pattern__`` or ``__match__``) defined then this method will govern
>> how this object behaves as pattern. If not - pattern matching works
>> the same as ``__eq__``
>  
>> The ``__pattern__`` method can return ``False`` for no match. For a
>> match it can return ``True`` (simplest patterns) a sequence or a mapping
> 
> ​Always returning a sequence or None would seem simpler to deal with, wouldn't it?

> I'm imaginging the default implementations would be something like:
> 
>   class object:
>     def __match__(self, item):
>       if item == self​:
>         return [item]
>       else:
>         return None
> 
>   class type:
>     def __match__(cls, item):
>       if isinstance(item, cls):
>         return [item]
>       else:
>         return None
> 
>   class tuple:
>     def __match__(self, item):
>       if isinstance(item, tuple):
>         if all( si.__match__(ii) for si,ii in zip(self, item) ):
>           return item
>       
>       return None
> 
>> Syntax for pattern matching
>> - -------------------------------
>> The syntax could look something like this::
>> 
>> for object:
>>     pattern1: statement1
> 
> Is the two level indent actually a win? It saves re-evaluating the expression, but you could just as easly have:
> 
>   object = f(x, y+w, g(z/3 for z in x))
>   match object ~~ pattern1 as a,b,c:
>     statements...
>   ormatch object ~~ pattern2 as a,d:
>     statements...
> 
> ​but in that case you're coming pretty close to an 'if' block that's able to set locals:
> 
>   object = ...
>   if object ~~ pattern1 as a,b,c:
>     ...
>   elif object ~~ pattern2 as a,d:
>     ...
> 
> (Using ~~ as a placeholder for the syntax for "matches against")
> 
>>     for point:
>>         (Number, Number) as x, y:
>>             print_point(x, y)
>>         {'x': Number, 'y': Number}:
>>             print_point(x, y)
>>         re.compile(​​'(?P<x>[0-9]+)-(?P<y>[0-9])+'):
>>         # Or re.compile('([0-9]+)-([0-9])+') as x, y:
>>             print_point(int(x), int(y))
>>         ...:
>>             raise TypeError('Expected something else')
> 
> My understanding of the way Haskell does name assignment in case matching is that it lets you put placeholders in a constructor. So where you might say
> 
>   x = MyX(a=1, b=2)
> 
> to create an object 'x', you could then say something along the lines of:
> 
>   x ~~ MyX(a=foo, b=bar)
> 
> to pull the values (1, 2) back out (as foo and bar). Doing exactly that doesn't make sense for python because reversing a constructor doesn't make much sense.

This is the big thing that Szymon is trying to add, by adding a __match__ protocol that gives types a way to, in effect, reverse their constructor. That's the same thing I was trying to do in my blog post that I linked earlier. If you don't have that, then what you're building is a strictly weaker version of tuple unpacking (which we already have), together with a less clumsy way to write "try each of these blocks until one of them doesn't raise" (which you could do more generally), not pattern matching.

> If you could make that work, you'd get the python version of Haskell pattern matching looking like:
> 
>  re_pt = re.compile(​'(?P<x>[0-9]+)-(?P<y>[0-9])+')
>  case point of:
>    (Number(x), Number(y)): ...
>    ('x': Number(x), 'y': Number(y)): ...
>    re_pt(x=x, y=y): ..
> 
> ​where you're at least mentioning the 'x' and 'y' variables that are getting filled in.
> 
> Okay, probably horrible idea:
> 
> Matching syntax looks like:
> 
>   case point is:
>     (x:=Number, y:=Number): ...
> 
> a:=x is the same as calling assigning_match(locals(), x, "a") run in the local scope. 

Except, of course, that modifying locals isn't guaranteed to actually affect the locals, and in current CPython it generally doesn't...

> a,b:=x is the same as calling assigning_match(locals(), x, ("a", "b")).
> 
> 'case obj is: \n pattern[0]: stmt[0] \n pattern[1]: stmt[1]' does:
> 
>    for i in range(2):
>      try:
>        pattern[i].__match__(obj)
>      except NoMatch:
>        continue
>      stmt[i]
>      break
> 
> 
> __match__() raises NoMatch on failure, or returns the match:
> 
>   class object:
>     def __match__(self, item):
>       if self == item:
>         return item
>       raise NoMatch
> 
>   class type:
>     def __match__(self, item):
>       if isinstance(item, self):
>         return item
>       raise NoMatch
> 
>   class tuple:
>     def __match__(self, item):
>       if isinstance(item, tuple):
>         if all( s_i.__match__(i_i) for s_i, i_i in zip(self, item) ):
>             return item
>       raise NoMatch
> 
>   class dict:
>     def __match__(self, item):
>       if isinstance(item, dict):
>         if item.keys() == self.keys():          
>           if all(s_v.__match__(item[s_k]) for s_k,s_v in self.items()):
>             return item
>       raise NoMatch
> 
> ​  class SRE_Pattern:
>     def __match__(self, item):
>        m = self.match(it​em)
>        if m is None:
>            raise NoMatch
>        return m.groups()
> 
> ​assigning_match works something like:​
> 
>   class assigning_match(object):
>     def __init__(self, l, matchobj, varname):
>       assert isinstance(varname, str) or isinstance(varname, tuple)
> 
>       self.l = l
>       self.m = matchobj
>       self.n = varname
> 
>     def __match__(self, item):
>       r = self.m.__match__(item)
>       if isinstance(self.n, str):
>           self.l[self.n] = other
>       else:
>           for n, r_i in zip(self.n, r):
>               self.l[n] = r_i
>       return r
> 
> Then the example looks like:
> 
>   case point is:
>     (x:=Number, y:=Number):
>         ...
>     {"x": x:=Number, "y": y:=Number}:
>         ...
>     x,y:=re_pt:
>         ...

This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:

    case point:
        x: Number, y: Number:
            ...
        {"x": x: Number, "y": y: Number}:
            ...
        re_pt(x, y):
            ...

(My blog post has an "of" for each clause, but I don't think it's necessary. It also doesn't have type annotations or dict matching; since it builds on normal assignment targets instead of reproducing most of the work done there, the obvious way to add those is to add them to assignments, which has other benefits--no more "a = 3 # type: int" in place of "a: int = 3". And meanwhile, I don't think dict matching is all that useful.)

> Being able to pull out and name something that's a deep part of an object seems like it could be useful; otherwise it's not adding much over if/elif.

Exactly.

> Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:
> 
>     Foo(a,b,key=c) = foo
> 
> is equivalent to something like:
> 
>     x = Foo.__deconstruct__(foo)
>     a = x[0]
>     b = x[1]
>     c = x["key"]

That is almost exactly my version of __match__, except for two things.

I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.

I didn't try to replace everything tuple unpacking does, but to add onto it. We already have the concept of assignment targets and target lists; specifying what happens for each of those, and what happens when something which isn't a target is in a target list, gets you the behavior you want for matching tuples--including matching other iterables, matching first, *rest, and so on--without having to build a complicated tuple.__match__ and then duplicate it in every other sequence type.

> I think it might even extend sanely to unpacking subobjects:
> 
>     Foo(Bar(a), b) = foo
> 
> becomes:
> 
>     x = Foo.__deconstruct__(foo)
>     Bar(a) = x[0]
>     b = x[1]
> 
> becomes:
> 
>     x = Foo.__deconstruct__(foo)
>     x2 = Bar.__deconstruct__(x[0])
>     a = x2[0]
>     b = x[1]
> 
> Using "_" to accept and ignore any value would work fine there. If you allowed literals instead of an identifier, you could treat:
> 
>     Foo(0, 99, key="x") = foo
> 
> as:
> 
>     x = Foo.__deconstruct__(foo)
>     assert 0 == x[0]
>     assert 99 == x[1]
>     assert "x" = x["key"]
> 
> That'd be enough to use it for matching I think:
> 
>     case foo is:
>        Foo(a,b): ...
> 
> would become:
> 
>     try:
>          Foo(a,b) = foo
>     except:
>          continue
>      ...
>      break

Sure. Haskell actually defines pattern-matching let in terms of a translation to case--and pattern-matching function overload definitions too. Once you define either assignment or case, the other two come easy.

> But it might be confusing figuring out whether:
> 
>     b = 0
>     case foo is:
>         Foo(a,b): print(a)
> 
> should result in value checking:
> 
>     if foo ~~ Foo(a,0): print(a)
> 
> or value assignment:
> 
>     del b
>     if foo ~~ Foo(a,b): print(a)

This last part is really the only problem.

I suggested that it always means "bind to b", not "match the current value or b", with special syntax for when you want the latter. That's effectively what ML-family languages do, and in my experience you usually go a long time before you run into the need to match b's value or read some code that does so, at which point you finally learn the @b or whatever syntax.

> Eh, I don't know. Maybe something there is interesting though.
> 
> Cheers,
> aj
> 
> -- 
> Anthony Towns <aj at erisian.com.au>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150408/9f3eb9ac/attachment-0001.html>


More information about the Python-ideas mailing list