[Python-ideas] Pattern matching

Andrew Barnert abarnert at yahoo.com
Wed Apr 8 16:07:39 CEST 2015


On Apr 8, 2015, at 05:34, Anthony Towns <aj at erisian.com.au> wrote:
> 
>>> On 8 April 2015 at 17:25, Andrew Barnert <abarnert at yahoo.com> wrote:
> 
>>> On Apr 7, 2015, at 22:12, Anthony Towns <aj at erisian.com.au> wrote: 
>>>   case point is:
>>>     (x:=Number, y:=Number):
>>>         ...
>>>     {"x": x:=Number, "y": y:=Number}:
>>>         ...
>>>     x,y:=re_pt:
>>>         ...
>> 
>> This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:
> 
> ​Hmm, I thought that was reasonably straightforward. You'd say things like:
> 
>     case rect_coords is:
>       (pos := (left := Number, top := Number), size := (width := Number, height := Number)):
>           ...
> 
> to pull out rect_coords == (pos, size), pos == (left, top), size == (width, height).

Ah, I see. I was thrown by := looking like assignment in Pascal or some of the mutable ML variants, where it obviously can't be part of a subexpression, but once you get past that, yeah, it's obvious. Sorry.

>>> Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:
>>> 
>>>     Foo(a,b,key=c) = foo
>>> 
>>> is equivalent to something like:
>>> 
>>>     x = Foo.__deconstruct__(foo)
>>>     a = x[0]
>>>     b = x[1]
>>>     c = x["key"]
>> 
>> That is almost exactly my version of __match__, except for two things. 
>> 
>> I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.
> 
> ​I think you could deal with that okay:
> 
> class type:
>     def __deconstruct__(cls, obj):
>         if not isinstance(obj, cls):
>             raise NotDeconstructable
> 
>         result = obj.__dict__.copy()
>         if callable(cls.__init__):
>             argspec = inspect.getargspec(cls.__init__)
>             for i, argname in enumeraate(argspec[1:]):
>                 result[i] = result.get(argname, None)
> 
>         return result
> 
> So if your have def __init__(a, b, key) and you just store them as attributes, you can access them by name or by the same position as per __init__. I guess you could deconstruct *args and **kwargs too, by pulling out the unused values from result, though you'd probably need to use an ordereddict to keep track of things then.

But argspec is more powerful and simpler than that--well, not argspec, but fullargspec and especially Signature. Just use inspect.signature and its bind method. That gives you a BoundArguments, and then you can just do boundargs.args[0] to match by position or boundargs.arguments['spam'] to match by keyword; for an argument that could have been passed either way, the value will show up both ways. (And for args that were passed into *args, however they got there, that works too--for example, if you def __init__(self, x, *args) and then do signature(cls.__init__).bind(1, 2, 3), you can see the 3 as args[3] or as arguments['args'][1].)

Meanwhile, I'm a bit wary about type implicitly assuming that attributes and __init__ parameters match one-to-one like this. Sure, that's often right, and it will often raise an AttributeError or a TypeError (from Signature.bind) when it's not right--but those cases where it does the wrong thing will be pretty surprising, and hard for a novice to work around. Making classes explicitly opt in to matching seems safer.

On the other hand, allowing them to opt in with a dead-simple decorator like @simplematch (or a base class or metaclass or register function or whatever) that just adds a __deconstruct__ method like the one you defined does seem a lot easier than any of the ideas I originally suggested.

One last thing here: I like your NotDeconstructable; if we make that a subclass of ValueError, we can change normal tuple unpacking to raise that as well, without breaking any code.

> Though that's how recursive unpacking of lists work anyway, isn't it? Match against [head, *tail]?

Sure, but that's a pretty bad way to process Python lists (you end up making N copies of average length N/2, plus N stack frames out of a finite limit, and the code ends up more verbose and less readable), so anything that makes that Scheme/ML style look more inviting than iteration is probably not an advantage...

> The only other difference is I was just ignoring unspecified bits, maybe it would make more sense to require you to unpack everything in the same way list/tuple unpacking does.
> 
>     Foo(x, y, *args, kw=z, **kwargs) = foo
> 
> becomes something like:
> 
>      __unpack = Foo.__deconstruct__(foo)
>      __seen = set()
>      __it = iter(__unpack)
>      
>      x = __unpack.pop( next(__it) )
>      y = __unpack.pop( next(__it) )
>      z = __unpack.pop( "kw" )
> 
>      args = [ __unpack.pop(__i) for __i in __unpack ]
>      kwargs = { k: __unpack.pop(k) for k in __unpack.keys() }
> 
>      if __unpack:
>          raise ValueError("too many values to unpack")
> 
> ​You'd need something fancier than an ordereddict for *args to leave anything for **kwargs to pick up though.​
> 
> I think that adds up to letting you say:
> 
>     re_point = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+')
> 
>     case '1-3' is:
>         re_point(x=xcoord, y=ycoord):
>            print("X coordinate: %s, Y coordinate: %s" % (xcoord, ycoord))
> 
> by defining
> 
>     class SRE_Pattern:
>         def __deconstruct__(self, obj):
>             m = self.match(obj)
>             if m is None:
>                 raise NotDeconstructable
> 
>             result = m.groupdict()
>             return result
> 
> which seems plausible. That seems like a prtty friendly syntax for dealing with regexps actually...

You're starting to turn me around on my dislike of using pattern matching for regexps; that's more readable than existing re code...

> (I think that's an advantage of having the protocol be Foo.__deconstruct__(obj) rather than obj.__match__() -- I don't think you could handle regexps without having both params)

Yes.

I originally thought the extra indirection of letting something pose as the match value's type while actually not being that type was unnecessary, but this case, an re object's type posing as the string's type, seems like a great counterexample.

And it doesn't add much complexity, so... I think it's worth it.

> Hmm, as far as literal-checking versus binding goes, if you're using := or "as" to bind subexpressions, as in:
> 
>     case order is:
>         customer, Breakfast(spam, eggs, beans) as breakfast: ...
> 
> or
> 
>     case order is:
>         customer, breakfast:=Breakfast(spam, eggs, beans)
> 
> then I /think/ you could use that to distinguish between binding and checking. So:
> 
>     customer = Customer("alice")
> 
>     name = "bob"
>     case customer is:
>         Customer(name):
>            # succeeds, binds name as "alice"
> 
>     case customer is:
>         Customer("bob"): 
>            # compares "alice" to "bob", fails
>            # if it had succeeded, wouldn't have bound any variables
> 
>     name = "bob"
>     case customer is:
>         Customer(name as x):
>            # compares "alice" to name ("bob") and fails
>            # but would have bound x as "alice" if it had succeeded
> 
> That seems like a rule that's reasonably understandable to me?

I don't know about that. It wouldn't be hard to remember the rule once you learn it, but it may not be that easy to learn, because at first glance it doesn't make sense. It's weird to have "foo" mean "assign to foo" but "foo as bar" in the exact same context mean "use the value of foo, then assign to bar". It's unprecedented in Python (with doesn't assign to the context manager name instead of using it as an expression if you leave off the "as") and I can't think of any other language that does something equivalent. (And I'm pretty sure it would inspire rants about "warts" by a certain person on this list. :)

And again, unless use cases for pattern matching in Python turn out to be very different than in ML and friends (which is certainly _possible_, but not something I'd want to just assume--if we don't expect somewhat similar uses, why are we even looking to them for inspiration?), I think the need for matching a variable's value won't be that common, so we shouldn't waste the nicest syntactic form on it... (Although I'm definitely not sold on my throwaway suggestion of "@foo", which has misleading parallels both in Python and in some of the languages the syntax is inspired by...)

> ("expr as name" has definitely grown on me over "name := expr")
> 
> 
> 
> I guess I'm thinking "literal" matching is just a special case of deconstructing in general, and done by a method like:
> 
>     class object:
>         def __deconstruct__(self, obj):
>             if self == obj:
>                 return ()
>             else:
>                 raise NotDeconstructable
> 
> That would have the potential side-effect of allowing things like:
> 
>    0 = n
> 
> as a valid statement (it'd raise NotDeconstructable if 0 != n). Not sure that's much worse than:
> 
>    [] = n
> 
> though, which is already allowed. You could have a syntax error if there wasn't anything to bind though.

Yeah, I tried to come up with a rule that disallowed literal values in an assignment (but not case) target list unless there's some kind of "real pattern" somewhere within (possibly recursively) the target list, but all of the possibilities seem horrible. So I decided the best thing to do was just not mention that "0 = n" is no longer a syntax error, but a legal runtime assertion that hopefully people don't use.

Also, while I don't think people should explicitly write "0 = n", I can imagine a code generator or MacroPy macro or whatever that takes generates a pattern based on its arguments that might want to generate something like "0 = n" for the no-args case, and why not allow it?

Anyway, I think your additions and changes make a much better proposal than what I originally had, and we're now probably on track to the best Pythonic design for ML-style pattern matching--but I still don't think it's worth adding to Python, or even writing as a PEP for Guido to reject. Have you ever written code where you really missed the feature (except in the first hour back on Python after a week of Haskell hacking)? Or seen any real code that would be substantially improved? If not, this still just seems like a feature for its own sake, or more than one way to do it for no reason.

> 
> Cheers,
> aj
> 
> -- 
> Anthony Towns <aj at erisian.com.au>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150408/a0f39d47/attachment-0001.html>


More information about the Python-ideas mailing list