[Python-ideas] Pattern matching

Cara ceridwen.mailing.lists at gmail.com
Thu Apr 9 15:56:54 CEST 2015


Fair warning: this is long.

> I was trying to google if there were some attempts to add pattern
> matching feature (as seen in e.g. haskell) to Python. I couldn't however
> find anything. Were there any proposals for a feature like this?
>
> Greetings
> zefciu

Many.  In no particular order:

http://pypi.python.org/pypi/pyfpm
https://github.com/martinblech/pyfpm

http://www.aclevername.com/projects/splarnektity/
https://pypi.python.org/pypi/splarnektity/0.3

http://blog.chadselph.com/adding-functional-style-pattern-matching-to-python.html
https://gist.github.com/chadselph/1320421

http://svn.colorstudy.com/home/ianb/recipes/patmatch.py

http://monkey.org/~marius/pattern-matching-in-python.html
https://github.com/mariusae/match/tree/master

https://pypi.python.org/pypi/PEAK-Rules
http://peak.telecommunity.com/DevCenter/PEAK-Rules

http://home.in.tum.de/~bayerj/patternmatch.py

https://github.com/admk/patmat

https://github.com/lihaoyi/macropy#pattern-matching

These are in addition to the ones Paul gave.  There are some more
attempts at related areas (unification and other forms of logic
programming, multiple dispatch) and discussions without implementations
that I didn't list.

Before I say anything else, I should disclose my bias: I think pattern
matching would be a great addition to Python.  Python tries hard to be
readable, and pattern matching would make certain kinds of code much
more legible than Python currently allows.  I'll give two examples, one
canonical and one from something I've been working on.

The first is red-black tree balancing.  Not by accident,
rosettacode.org's pattern matching comparison page
( http://rosettacode.org/wiki/Pattern_matching ) uses that as its
example for pattern matching in different languages.  Most of the
implementations fit onto a single page.  I won't argue they're marvels
of readability, but that's mostly because of too-succinct variable
naming.  Here's an implementation of the same algorithm in Python:
http://scienceblogs.com/goodmath/2008/05/28/the-basic-balanced-search-tree
.  I'd paste the code, but it's too long to be instructive because the
algorithm gets scattered across against five nontrivial functions,
each of which has nested conditionals, not counting the implicit
nesting from the functions calling one another.

The second is performing algebra on the struct module's format strings.
I've written some automatic optimization of them with concatenation and
scalar multiplication, so that for instance 'b' + 'b' = '2b' rather than
'bb' and '10s' + '10s' = '10s10s'.  The value of this is in dealing with
format strings that can be combined in various ways to describe the
layouts of complicated binary data formats.  With some context stripped
for brevity,

    def _check_add(self, other):
        if not isinstance(other, _StructBase):
            return NotImplemented
        if not self.same_endianness(other):
            return NotImplemented
        if self.endianness == other.endianness or len(self.endianness) >
len(other.endianness):
            return type(self)
        else:
            if isinstance(other, _StructSequence):
                return type(other)
            else:
                return other._STRUCT_SEQUENCES[other.endianness]

    def __add__(self, other):
        struct_sequence = self._check_add(other)
        if isinstance(other, _StructSequence):
            if other._elements == []:
                return struct_sequence(self._elements.copy())
            else:
                begin = other._elements[0]
                rest = other._elements[1:]
        else:
            begin = other
            rest = []
        if self._elements == []:
            rest.insert(0, begin)
            return struct_sequence(rest)
        if self._elements[-1].character == begin.character:
            return struct_sequence(self._elements[:-1] +
[self._elements[-1] + begin] + rest)
        else:
            return struct_sequence(self._elements + [begin] + rest)

The exact details aren't important, my point is this also leads to
complicated nested conditionals because to know what actions I need to
perform on a given data type, I need checks to make sure the endianness
of the two strings matches, multiple type checks, checks for whether one
or the other object is empty, and so on, then after verifying what I'm
dealing with, decompositions to break a list apart into its first
element and remainder and handling the empty-list case.  This
implementation isn't even complete because it doesn't, for instance,
deal with all the possible endianness permutations.  There are some
simplifications possible, for instance moving all information described
by finite sets like endianness into types (so there's a StructSequence,
a BigEndianStructSequence, a LittleEndianStructSequence, ...) but this
can't handle all the necessary conditionals and doesn't scale well.

Those are two small-scale examples.  Some broad areas where pattern
matching is useful are:

1) Parsing: It's useful both for performing manipulations on grammars
like left factoring and operating on the trees that parsers produce.

2) Mathematics: manipulations of abstract mathematical objects like
algebraic expressions (not numerical analysis), for instance algebraic
simplification rules.

3) Recursive data structures: Python doesn't have algebraic data types
or embed recursion into everything the way languages like Haskell do,
but there's plenty of recursively-structured data out in the wild like
HTML and JSON.

4) Compilers/interpreters: this use is closely related to parsing and
recursive data structures because compilers and interpreters are often
acting on abstract syntax trees.  Often transformations like flattening
an AST into bytecode and constant folding can be implemented with
pattern matching.

I'll stop there because those are the applications I know best, but I'm
sure there are others.  Pattern matching also subsumes features
including multiple dispatch on types that have been proposed, discussed,
and implemented many times for Python.  I'd argue the existence of more
than a dozen versions of pattern matching and their cumulative downloads
from PyPi says the desire for the feature isn't zero, but I'm not sure
how indicative they are.

Whether I've convinced you or not that pattern matching is a worthwhile
addition to the language, I'd like it, and if I could, I'd write a
library to get it.  Past discussions on pattern matching have started
and ended with debates about the syntax for adding it as a core language
feature.  Because the whole point of pattern matching is readability,
since you can't do anything with it you couldn't with a
sufficiently-complicated nested if-elif-else, arguing about syntax isn't
putting the cart before the horse like it is with some features, but
there's more to implementing pattern matching than just syntax, and I'd
like to expand the discussion to some of those issues.  More than that,
though, I think pattern matching with imperfect syntax would still be
useful, and I'd rather see a mypy-like library now than a language
feature that might make it into Python in some distant future.  In that
spirit, I'll offer some syntax that doesn't involve changing the
grammar.  To emulate a case expression like Haskell, the only way I can
see to do it is abusing 'with' statements because they're the only
block-like statement aside from exceptions that allow assignment:

with match(expr) as case:
    with case(pattern) as matches:
        if matches:
            target_list = matches
            suite
    with case(pattern) as matches:
        if matches:
            target_list = matches
            suite
    ...

Pattern matching fits naturally into for statements and comprehensions:

for target_list in pattern(expr):
    suite

(expr for target_list in pattern(expr) if expr)

A syntax similar to functools.singledispatch's works for pattern
dispatch:

@match(pattern)
def f(parameter_list):
    suite

@f.case(pattern)
def g(parameter_list):
    suite

@f.case(pattern)
def h(parameter_list):
    suite

For defining list and dictionary patterns themselves, my best thought is
abusing __getitem__, ellipsis, and the slice syntax:

Listpattern[1, ..., 'a': int]

[1, 'a', 'b', 'c', 2] # matches, name a is bound to 2
[3, 4, 5] # doesn't match because the first element is 1, no binding
[1, 2, None] # also doesn't match because None is not an int

Dictpattern['a': 1, 'b': 2, 'c': 'a': str]

['a': 1, 'b': 2, 'c': 'foo'] # matches, name a is bound to 'foo'
['a': 1, 'b': 2, 'c': 'bar', 'd': 3] # doesn't match because no ellipsis

These are far from the best possible syntax available by changing
Python's grammar or syntax, and some of them would benefit from using
Macropy to rewrite ASTs.  I'm sure they could be improved even within
those constraints, and if anyone has ideas, I'm all ears.

A lot of work has been done on pattern matching in object-oriented
languages with and without algebraic data types.  I think Scala's
implementation is most mature and well worth looking at for ideas.
(Macropy's experimental pattern matching is patterned on Scala's.)
"Patterns as Objects in
Grace" ( http://homepages.ecs.vuw.ac.nz/~djp/files/DLS2012.pdf ) is a
paper on pattern matching in an object-oriented teaching language with
structural typing.  "Robust Scripting Via
Patterns" ( http://hirzels.com/martin/papers/dls12-thorn-patterns.pdf )
is about pattern matching in Thorn, a dynamically-typed language
implemented on the JVM.  Thorn's pattern matching is integrated into the
language at a deep level, and it has some good ideas about using
patterns in control structures, some of which I shamelessly stole in
writing the above suggestions for pattern matching syntax in Python.
I'm just scratching the surface here, much more can be and has been
said.

I haven't talked at all about the nitty-gritty of implementation, though
I can.  I think the most productive way to have that discussion on this
list would be to look at what smaller features would be most helpful for
implementing pattern matching as a library, and likewise for syntax,
shades of the recent discussion about macros.

Cara


More information about the Python-ideas mailing list