[Python-ideas] Pattern matching

Wed Apr 8 14:34:59 CEST 2015

On 8 April 2015 at 17:25, Andrew Barnert <abarnert at yahoo.com> wrote:

> On Apr 7, 2015, at 22:12, Anthony Towns <aj at erisian.com.au> wrote:
>
  case point is:
>     (x:=Number, y:=Number):
>         ...
>     {"x": x:=Number, "y": y:=Number}:
>         ...
>     x,y:=re_pt:
>         ...
>
> This is pretty much my proposal with different syntax. I don't think the
> := buys you anything, and it doesn't make it obvious how to have a pattern
> that recursively matches its parts. In my proposal, this would look like:
>

Hmm, I thought that was reasonably straightforward. You'd say things like:

    case rect_coords is:
      (pos := (left := Number, top := Number), size := (width := Number,
height := Number)):
          ...

to pull out rect_coords == (pos, size), pos == (left, top), size == (width,
height).

Though maybe it'd be interesting to have a "deconstructor" protocol for
> that instead, ie:
>
>     Foo(a,b,key=c) = foo
>
> is equivalent to something like:
>
>     x = Foo.__deconstruct__(foo)
>     a = x[0]
>     b = x[1]
>     c = x["key"]
>
>
> That is almost exactly my version of __match__, except for two things.
>

> I didn't think through keyword arguments. You've partly solved the
> problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b,
> key=c) may be identical calls, but the deconstructor can only return one of
> them, and it may not even be the one that was used for construction. I
> think something like inspect's argspec objects may handle this, but I'd
> have to work it through.
>

I think you could deal with that okay:

class type:
    def __deconstruct__(cls, obj):
        if not isinstance(obj, cls):
            raise NotDeconstructable

        result = obj.__dict__.copy()
        if callable(cls.__init__):
            argspec = inspect.getargspec(cls.__init__)
            for i, argname in enumeraate(argspec[1:]):
                result[i] = result.get(argname, None)

        return result

So if your have def __init__(a, b, key) and you just store them as
attributes, you can access them by name or by the same position as per
__init__. I guess you could deconstruct *args and **kwargs too, by pulling
out the unused values from result, though you'd probably need to use an
ordereddict to keep track of things then. Though that's how recursive
unpacking of lists work anyway, isn't it? Match against [head, *tail]? The
only other difference is I was just ignoring unspecified bits, maybe it
would make more sense to require you to unpack everything in the same way
list/tuple unpacking does.

    Foo(x, y, *args, kw=z, **kwargs) = foo

becomes something like:

     __unpack = Foo.__deconstruct__(foo)
     __seen = set()
     __it = iter(__unpack)

     x = __unpack.pop( next(__it) )
     y = __unpack.pop( next(__it) )
     z = __unpack.pop( "kw" )

     args = [ __unpack.pop(__i) for __i in __unpack ]
     kwargs = { k: __unpack.pop(k) for k in __unpack.keys() }

     if __unpack:
         raise ValueError("too many values to unpack")

You'd need something fancier than an ordereddict for *args to leave
anything for **kwargs to pick up though.

I think that adds up to letting you say:

    re_point = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+')

    case '1-3' is:
        re_point(x=xcoord, y=ycoord):
           print("X coordinate: %s, Y coordinate: %s" % (xcoord, ycoord))

by defining

    class SRE_Pattern:
        def __deconstruct__(self, obj):
            m = self.match(obj)
            if m is None:
                raise NotDeconstructable

            result = m.groupdict()
            return result

which seems plausible. That seems like a prtty friendly syntax for dealing
with regexps actually...

(I think that's an advantage of having the protocol be
Foo.__deconstruct__(obj) rather than obj.__match__() -- I don't think you
could handle regexps without having both params)

Hmm, as far as literal-checking versus binding goes, if you're using := or
"as" to bind subexpressions, as in:

    case order is:
        customer, Breakfast(spam, eggs, beans) as breakfast: ...

or

    case order is:
        customer, breakfast:=Breakfast(spam, eggs, beans)

then I /think/ you could use that to distinguish between binding and
checking. So:

    customer = Customer("alice")

    name = "bob"
    case customer is:
        Customer(name):
           # succeeds, binds name as "alice"

    case customer is:
        Customer("bob"):
           # compares "alice" to "bob", fails
           # if it had succeeded, wouldn't have bound any variables

    name = "bob"
    case customer is:
        Customer(name as x):
           # compares "alice" to name ("bob") and fails
           # but would have bound x as "alice" if it had succeeded

That seems like a rule that's reasonably understandable to me?

("expr as name" has definitely grown on me over "name := expr")

I guess I'm thinking "literal" matching is just a special case of
deconstructing in general, and done by a method like:

    class object:
        def __deconstruct__(self, obj):
            if self == obj:
                return ()
            else:
                raise NotDeconstructable

That would have the potential side-effect of allowing things like:

   0 = n

as a valid statement (it'd raise NotDeconstructable if 0 != n). Not sure
that's much worse than:

   [] = n

though, which is already allowed. You could have a syntax error if there
wasn't anything to bind though.

Cheers,
aj

-- 
Anthony Towns <aj at erisian.com.au>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150408/4e1d7325/attachment.html>