[Python-ideas] Pattern matching

Andrew Barnert abarnert at yahoo.com
Wed Apr 8 00:59:44 CEST 2015


On Apr 7, 2015, at 12:25, Szymon Pyżalski <szymon at pythonista.net> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
>> On 07.04.2015 17:52, Guido van Rossum wrote:
>> 
>> On Apr 7, 2015 8:46 AM, "Paul Moore" <p.f.moore at gmail.com 
>> <mailto:p.f.moore at gmail.com>> wrote:
>> 
>> On 7 April 2015 at 15:54, Szymon Pyżalski <szymon at pythonista.net 
>> <mailto:szymon at pythonista.net>> wrote:
>>> I was trying to google if there were some attempts to add
>>> pattern matching feature (as seen in e.g. haskell) to Python. I
>>> couldn't
>> however
>>> find anything. Were there any proposals for a feature like this?
>> 
>> From a quick look:
>> 
>> https://pypi.python.org/pypi/patterns 
>> https://pypi.python.org/pypi/py-pattern-matching
> Thanks these are nice but I think they try to mimic languages based on
> algebraic data types too much. I was thinking about something like this:
> 
> Magic method for patterns
> - -----------------------------
> 
> Any object can be a pattern. If an object has a special method (called
> ``__pattern__`` or ``__match__``) defined then this method will govern
> how this object behaves as pattern. If not - pattern matching works
> the same as ``__eq__``

"Works the same as __eq__" either doesn't allow you to assign parts while decomposing, or turns out to have a lot of problems. In my proposal, I restricted that to only work on values that are expressions that can't be read as assignment targets, and handled assignment lists recursively, to get around those problems. Yours just punts on decomposition here and offers something different (the mapping in the next paragraph), but I don't think that covers many common cases.

> The ``__pattern__`` method can return ``False`` for no match. For a
> match it can return ``True`` (simplest patterns) a sequence or a mapping
> .

The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.

And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.

> Syntax for pattern matching
> - -------------------------------
> 
> The syntax could look something like this::
> 
> for object:
>    pattern1: statement1
>    pattern2:
>        statement2
>        statement3
>    pattern3 as var1, var2:
>        statement4
>        statement5

I don't like "for" here. Avoiding new keywords is good, but reusing a keyword to mean something unrelated is often confusing. With "for object in iterable:", the "object" names the thing that will be assigned each time through the loop. Here, it looks like you're omitting the "in iterable", but really you're omitting the "object in" part, which is half of one clause and half of another. Also, in cases where the matchable object/iterable is a long and complex expression (as it often is in both pattern matching and loops), it'll be very confusing to the eye if you misinterpret which kind of for you have.

On the other hand, I like the keyword-less match clauses better than my version with "of". The syntax may be harder to describe to the grammar, but it's easier to a read for a human.

The only potential advantage I can see to the "of" clause is that it opens the door to let a single-match case be written without requiring a double indent, or even in a single line:

    case obj: of pattern:
        statement
        statement

    case obj: of pattern: statement

But I left that out of my proposal because it breaks the compound statement rule that (informally) you can't put two colons on the same line.

> For the simplest cases this works simply by calling appropriate
> statement block. If the ``__pattern__`` method returns a mapping, then
> the values from it will be merged into the local namespace. If it
> returns a sequence and there is the ``as`` clause, the values will be
> assigned to variables specified.
> 
>> Quick idea: this might integrate well with PEP 484, or with the
>> (still speculative) multiple dispatch that Lukas Lang wants to
>> write.
> 
> Function dispatch
> - -----------------------[
> 
> Besides the above syntax the patterns could be also used to register
> functions for dispatching. I dont' know how precisely it would
> integrate with PEP 484, but I agree that any valid type hint should
> also be a valid pattern.

Consider the syntax from Matthew Rocklin's multipledispatch library on PyPI:

    @dispatch(int, int)
    def add(x, y): return x+y

    @dispatch(object, object):
    def add(x, y): return "%s + %s" % (x, y)

Lukas pointed out that you can out the types in the annotations rather than the decorator, and use MyPy syntax for them instead of having to invent a similar but not identical syntax as Matthew does.

Anyway, it should be easy to see how this maps to a single function with pattern matching and vice-versa. Especially once you look at how Haskell implements overloading--it's just syntactic sugar for a single definition with a case expression inside.

> Existing objects as patterns
> - ----------------------------------
> 
> * Types should match their instances

This is an interesting idea. This allows you to do something halfway between taking any value and checking a specific value--and something that would often be useful in Python--which my proposal didn't have.

If we had a syntax for annotating values or variables (rather than just function params) with types, you could do the same thing by specifying "_: Number", which means you could also match "x: Number" to check the type and bind the value. That's something that was missing from my proposal (see the JSON schema example in the linked post on ADTs).

Without that, the only way I can think of to check the type but nothing else in my proposal is something like Breakfast(*) or something equally ugly, so that's a win for your syntax.

> * Tuples should match sequences that have the same length and whose
>  elements match against the elements of a tuple (subpatterns)

My proposal just used the same rules as assignment to a target list (except that it recurses on pattern matching each element instead of assigning each element, of course). The upside is more flexibility, and the details have already been worked out and proven useful in Python assignments (and extended a couple of times over the past few decades in ways that work just as well here as they did there). The downside is that if you pattern match an iterator, each tuple decomposition case will eat elements from the iterator. But presumably anyone who understands iterators already knows that.

> * Dictionaries should match mappings that contain all the keys in the
>  pattern and the values match the subpatterns under these keys

This definitely solves the problem of how to deal with objects that are usually constructed with keyword arguments. But it doesn't help for objects usually constructed with a combination of positional and keyword arguments. (Of course you can make __pattern__ return an OrderedDict; the problem is on the other end, how to write a pattern that matches arg #0, arg #1, and arg "spam" all in one pattern.)

Anyway, I still don't like the idea of neither getting nor matching the type here, or the idea of dumping all the names into locals instead of binding names that you ask for specifically, as I mentioned earlier. Also, consider how you'd map this relatively common ML/Haskell use:

    case p1 of Point(x1, y1):
        case p2 of Point(x2, y2):
            if x1 <= x < x2 and y1 <= y < y2:

With a syntax like mine, where you specify the names to be bound, this is easy; with yours, where the pattern just dumps its names into the namespace, you have to write something like:

    x0, y0 = x, y
    case p1 of {'x': Any, 'y': Any}:
        x1, y1 = x, y
        case p2 of {'x': Any, 'y': Any}:
            x2, y2 = x, y
            if x1 <= x0 < x2 and y1 <= y0 < y2

> * Regular expressions should match strings. For named groups they
>  should populate the local namespace.

Good idea. Anyone coming from another modern scripting language would expect that. Then again, they expect regexp support in a whole lot of places Python doesn't have them, and that may not be a bad thing. Code that's inherently about regexps always looks simpler in other languages than in Python--but code that really isn't about regexps often gets written with regexps anyway in those languages, but rethought into something more readable in Python...

And here, my alternative of specifying values to check and names to bind instead of just dumping everything into the local namespace doesn't seem to fit very well. (Without regexp literals as part of the language syntax instead of as strings, you can't cram an arbitrary target or expression inside the <P> block.)

> * The ``Ellipsis`` object can be used as a match-all pattern.

I used the else keyword, but if you're going with keywordless match clauses, this makes sense.

> Example
> - -----------
> 
> The code below would work if the ``point`` variable is specified as:
> 
> * a tuple (x, y)
> * a mapping {'x': x, 'y': y}
> * a string "{x}-{y}".format(x, y)

But what if it's specified as an instance of a Point type, which is pretty much the paradigm case for pattern matching, and the only one that causes a problem?

Also, this raises an obvious problem with the regexp alternative: the first two cases match any Number, but the last case only matches nonnegative integers. And think about what it would mean to match any Number instance with a regexp. It's not just impossible, it doesn't even make sense, because values and their reprs aren't the same thing. If I import quaternion, its instances will match for the first two, but their reprs definitely won't match your regexp. This is one of those "now they have two problems" cases: a regexp looks like it will give you a simple solution to what you want, but it's just misleading you into thinking there's a simple solution that won't even pass your second test case.

> ::
> 
>    def print_point(x, y):
>        print('x = {}'.format(x))
>        print('y = {}'.format(y))
> 
>    for point:
>        (Number, Number) as x, y:
>            print_point(x, y)
>        {'x': Number, 'y': Number}:
>            print_point(x, y)
>        re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'):
>        # Or re.compile('([0-9]+)-([0-9])+') as x, y:
>            print_point(int(x), int(y))
>        ...:
>            raise TypeError('Expected something else')
> 
> 
> Greetings
> zefciu
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> 
> iQEcBAEBAgAGBQJVJC8YAAoJEBoIz6vZoBBhm04H/i5dJAMTloRxqSXXyY7QU8kx
> 0gBPjCoCsw6XZSu1nlKP0/yY+tO5uXjQEK5s1vid8gjuEBnrZCtwjfCYiiqSs96P
> 7iT9kSnyx/0jr5FvAC1ZItb3KN7G+MBxMfCHENM8yMz7Yxdw2F5ziT7zFE5aOELW
> y3Mz37+GytmMUT/DXA4wIHBFJgALspWgoU2P/ilzg4oCtbQREmX5pcMnFIsMavPI
> oIu7KJ0ZW2hJ4K2a0h8HWCpZ9aYwcCTF51HMkTAXo6leLSm3XG36jKiCEFJiZpxe
> sLH8e/2rF93tBUVunJfyEQG/bOnlkoTX2rONavDZCqSqt8t3ehT170TtHJL10Os=
> =iO75
> -----END PGP SIGNATURE-----
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


More information about the Python-ideas mailing list