[Python-ideas] Pattern matching

Fri Apr 10 01:52:49 CEST 2015

On Thursday, April 9, 2015 6:57 AM, Cara <ceridwen.mailing.lists at gmail.com> wrote:

> > Fair warning: this is long.
> 
>>  I was trying to google if there were some attempts to add pattern
>>  matching feature (as seen in e.g. haskell) to Python. I couldn't 
> however
>>  find anything. Were there any proposals for a feature like this?
>> 
>>  Greetings
>>  zefciu
> 
> Many.  In no particular order:
> 
> http://pypi.python.org/pypi/pyfpm
> https://github.com/martinblech/pyfpm
> 
> http://www.aclevername.com/projects/splarnektity/
> https://pypi.python.org/pypi/splarnektity/0.3
> 
> http://blog.chadselph.com/adding-functional-style-pattern-matching-to-python.html
> https://gist.github.com/chadselph/1320421
> 
> http://svn.colorstudy.com/home/ianb/recipes/patmatch.py
> 
> http://monkey.org/~marius/pattern-matching-in-python.html
> https://github.com/mariusae/match/tree/master
> 
> https://pypi.python.org/pypi/PEAK-Rules
> http://peak.telecommunity.com/DevCenter/PEAK-Rules
> 
> http://home.in.tum.de/~bayerj/patternmatch.py
> 
> https://github.com/admk/patmat
> 
> https://github.com/lihaoyi/macropy#pattern-matching
> 
> These are in addition to the ones Paul gave.  There are some more
> attempts at related areas (unification and other forms of logic
> programming, multiple dispatch) and discussions without implementations
> that I didn't list.
> 
> Before I say anything else, I should disclose my bias: I think pattern
> matching would be a great addition to Python.  Python tries hard to be
> readable, and pattern matching would make certain kinds of code much
> more legible than Python currently allows.  I'll give two examples, one
> canonical and one from something I've been working on.
> 
> The first is red-black tree balancing.  Not by accident,
> rosettacode.org's pattern matching comparison page
> ( http://rosettacode.org/wiki/Pattern_matching ) uses that as its
> example for pattern matching in different languages.  Most of the
> implementations fit onto a single page.  I won't argue they're marvels
> of readability, but that's mostly because of too-succinct variable
> naming.  Here's an implementation of the same algorithm in Python:
> http://scienceblogs.com/goodmath/2008/05/28/the-basic-balanced-search-tree
> .  I'd paste the code, but it's too long to be instructive because the
> algorithm gets scattered across against five nontrivial functions,
> each of which has nested conditionals, not counting the implicit
> nesting from the functions calling one another.
> 
> The second is performing algebra on the struct module's format strings.
> I've written some automatic optimization of them with concatenation and
> scalar multiplication, so that for instance 'b' + 'b' = 
> '2b' rather than
> 'bb' and '10s' + '10s' = '10s10s'.  The value of 
> this is in dealing with
> format strings that can be combined in various ways to describe the
> layouts of complicated binary data formats.  With some context stripped
> for brevity,
> 
>     def _check_add(self, other):
>         if not isinstance(other, _StructBase):
>             return NotImplemented
>         if not self.same_endianness(other):
>             return NotImplemented
>         if self.endianness == other.endianness or len(self.endianness) >
> len(other.endianness):
>             return type(self)
>         else:
>             if isinstance(other, _StructSequence):
>                 return type(other)
>             else:
>                 return other._STRUCT_SEQUENCES[other.endianness]
> 
>     def __add__(self, other):
>         struct_sequence = self._check_add(other)
>         if isinstance(other, _StructSequence):
>             if other._elements == []:
>                 return struct_sequence(self._elements.copy())
>             else:
>                 begin = other._elements[0]
>                 rest = other._elements[1:]
>         else:
>             begin = other
>             rest = []
>         if self._elements == []:
>             rest.insert(0, begin)
>             return struct_sequence(rest)
>         if self._elements[-1].character == begin.character:
>             return struct_sequence(self._elements[:-1] +
> [self._elements[-1] + begin] + rest)
>         else:
>             return struct_sequence(self._elements + [begin] + rest)

This seems to be overcomplicating things by trying to write ML code in Python instead of just writing Python code. For example, the only reason you decompose other._elements into begin and rest is so you can recompose them (a copy of) into the same list you started with. Why bother? (Also, if you're avoiding the limited pattern-matching already available in Python by not writing "begin, *rest = other._elements", that doesn't make a great case for adding more.)

    def __add__(self, other):
        struct_sequence = self._check_add(other)
        if isinstance(other, _StructSequence):
            elements = other._elements
        else:
            elements = [other]
        if (self._elements and elements 
            and self.elements[-1].character == elements[0].character):
            return struct_sequence(self._elements[:-1] +
                                   [self._elements[-1] + elements[0]] + elements[1:])
        return struct_sequence(self._elements + elements)

If this is all you're doing, your argument is equivalent to showing some inherently iterative code written instead in tail-recursive form to argue that Python should do tail call elimination. You're asking to make it easier to provide another, less Pythonic way to do something that can already be easily done Pythonically.

> The exact details aren't important, my point is this also leads to

> complicated nested conditionals because to know what actions I need to
> perform on a given data type, I need checks to make sure the endianness
> of the two strings matches, multiple type checks, checks for whether one
> or the other object is empty, and so on, then after verifying what I'm
> dealing with, decompositions to break a list apart into its first
> element and remainder and handling the empty-list case.

But everything from "decompositions" on is only necessary because you're thinking in terms of decomposition instead of list operations.

> This
> implementation isn't even complete because it doesn't, for instance,
> deal with all the possible endianness permutations.  There are some
> simplifications possible, for instance moving all information described
> by finite sets like endianness into types (so there's a StructSequence,
> a BigEndianStructSequence, a LittleEndianStructSequence, ...) but this
> can't handle all the necessary conditionals and doesn't scale well.
> 
> Those are two small-scale examples.  Some broad areas where pattern
> matching is useful are:
> 
> 1) Parsing: It's useful both for performing manipulations on grammars
> like left factoring and operating on the trees that parsers produce.

Parsing, and related things (compiling, schema validation, etc.) looks like a good case, but I'm not sure it is.

For simple languages, an expression-tree library like PyParsing or Boost.Spirit seems to be more than sufficient, and often be even nicer than explicit pattern matching.

For more complex languages (and sometimes even for simple ones), you can use or write a framework that lets you write a fully-declarative syntax and then executes it or generates executable code from it. For example, if you want to validate and parse an XML language, doing it by hand in ML would be nicer than in Python, but writing it in XML Schema and using a schema-validating parser is much nicer than either. Parsing the Python grammar by hand would be nicer in ML than in C or Python, but writing it as a declarative GRAMMAR file is nicer than either. And so on. You could argue that the work to write that framework is wasted if it's only going to be used once, but how many complex languages really need to be parsed only once?Even CPython's parser, which isn't used for anything but parsing Python's GRAMMAR, is effectively reused every time Python's syntax changes, and every time someone wants to hack on it to play with new syntax.

The one really good case for pattern matching seems to be when you pretty much _have_ to write a custom parser--maybe because the language is full of special-case rules and tangled context, like C++. At least the GCC and LLVM/Clang groups gave up trying to write C++ as a declarative specification and instead wrote custom recursive-descent parsers in C, which I suspect could be a lot nicer with pattern matching. But has anyone written an equivalent C++ parser in Haskell, ML, etc. to compare, or is that just a suspicion?

> 2) Mathematics: manipulations of abstract mathematical objects like
> algebraic expressions (not numerical analysis), for instance algebraic
> simplification rules.

Are you talking about parsing the expressions into a tree? If so, we're still on parsing.

> 3) Recursive data structures: Python doesn't have algebraic data types
> or embed recursion into everything the way languages like Haskell do,
> but there's plenty of recursively-structured data out in the wild like
> HTML and JSON.

The parsing of this data is covered above. But what do you do with it once it's parsed? If you're transforming it, that again seems to be a case where writing the transformer declaratively in, say, XSSL wins. And if you're just extracting data from it, usually you're better off with the simple data['machines'][0]['name'] syntax (in an except KeyError block), and when that's not good enough, you're usually better off with a declarative search language like XPath. So this seems to be the same as with parsing.

> 4) Compilers/interpreters: this use is closely related to parsing and

> recursive data structures because compilers and interpreters are often
> acting on abstract syntax trees.  Often transformations like flattening
> an AST into bytecode and constant folding can be implemented with
> pattern matching.
> 
> I'll stop there because those are the applications I know best, but I'm
> sure there are others.  Pattern matching also subsumes features
> including multiple dispatch on types that have been proposed, discussed,
> and implemented many times for Python.  I'd argue the existence of more
> than a dozen versions of pattern matching and their cumulative downloads
> from PyPi says the desire for the feature isn't zero, but I'm not sure
> how indicative they are.
> 
> Whether I've convinced you or not that pattern matching is a worthwhile
> addition to the language, I'd like it, and if I could, I'd write a
> library to get it.

That raises the question: what's wrong with the more than a dozen libraries already written?

> Past discussions on pattern matching have started
> and ended with debates about the syntax for adding it as a core language
> feature.  Because the whole point of pattern matching is readability,
> since you can't do anything with it you couldn't with a
> sufficiently-complicated nested if-elif-else, arguing about syntax isn't
> putting the cart before the horse like it is with some features, but
> there's more to implementing pattern matching than just syntax, and I'd
> like to expand the discussion to some of those issues.  More than that,
> though, I think pattern matching with imperfect syntax would still be
> useful, and I'd rather see a mypy-like library now than a language
> feature that might make it into Python in some distant future.  In that
> spirit, I'll offer some syntax that doesn't involve changing the
> grammar.  To emulate a case expression like Haskell, the only way I can
> see to do it is abusing 'with' statements because they're the only
> block-like statement aside from exceptions that allow assignment:
> 
> with match(expr) as case:
>     with case(pattern) as matches:
>         if matches:
>             target_list = matches
>             suite
>     with case(pattern) as matches:
>         if matches:
>             target_list = matches
>             suite
>     ...

First, this seems like it will execute _every_ branch that matches, instead of just the first one. Surely you don't want that, do you? You can wrap the whole thing in a "while True:" and have each suite (and the "else" case) end with "break", or wrap it in a try/except and have each suite raise a custom exception, but you need some way to get out.

Also, why not just put the target_list in the as clause (inside parens, to fit it in a target, which is what an as clause actually takes)? Of course that means you get an exception rather than an else when it doesn't match, but that's fine. (It would be nice if it were a subclass of ValueError, in case the suite raises some ValueError of its own, however...)

> Pattern matching fits naturally into for statements and comprehensions:

> 
> for target_list in pattern(expr):
>     suite

What is this going to do? From your above example, if looks like match(expr)(pattern) returns either an iterable that can be assigned to a target_list or None. What does pattern(expr) return? Return an iterable of one target_list or an empty iterable?

Also, the very idea of having an object which can be called on an expression or can have a (processed) expression called on it with similar but not identical results seems very confusing.

> (expr for target_list in pattern(expr) if expr)
> 
> A syntax similar to functools.singledispatch's works for pattern
> dispatch:
> 
> @match(pattern)
> def f(parameter_list):
>     suite
> 
> @f.case(pattern)
> def g(parameter_list):
>     suite
> 
> @f.case(pattern)
> def h(parameter_list):
>     suite
> 
> For defining list and dictionary patterns themselves, my best thought is
> abusing __getitem__, ellipsis, and the slice syntax:
> 
> Listpattern[1, ..., 'a': int]
> 
> [1, 'a', 'b', 'c', 2] # matches, name a is bound to 2
> [3, 4, 5] # doesn't match because the first element is 1, no binding
> [1, 2, None] # also doesn't match because None is not an int
> 
> Dictpattern['a': 1, 'b': 2, 'c': 'a': str]
> 
> ['a': 1, 'b': 2, 'c': 'foo'] # matches, name a 
> is bound to 'foo'
> ['a': 1, 'b': 2, 'c': 'bar', 'd': 3] # 
> doesn't match because no ellipsis

The useful part of pattern matching--and the hard part, too--is decomposing structures. A proposal that ignores that and only decomposes lists and dicts misses the point.

Yes, it's true that Python doesn't have ADTs. So that means one of two things: (1) types that want to be matched will have to do a bit of work to expose their structure and/or to decompose themselves, or (2) pattern matching isn't useful in Python. The examples you give from other languages show that the first is at least plausible. And both Szymon's proposal and mine and various other ideas suggested on this thread show attempts at a Pythonic design. But just ignoring structure decomposition, as you're doing, is not a way toward anything useful.

> These are far from the best possible syntax available by changing
> Python's grammar or syntax, and some of them would benefit from using
> Macropy to rewrite ASTs.  I'm sure they could be improved even within
> those constraints, and if anyone has ideas, I'm all ears.

If you're willing to use MacroPy, that brings up the question: what's wrong with the pattern matching that it comes with as an example, and why don't you want to build on it instead of starting over?

> A lot of work has been done on pattern matching in object-oriented
> languages with and without algebraic data types.  I think Scala's
> implementation is most mature and well worth looking at for ideas.
> (Macropy's experimental pattern matching is patterned on Scala's.)
> "Patterns as Objects in
> Grace" ( http://homepages.ecs.vuw.ac.nz/~djp/files/DLS2012.pdf ) is a
> paper on pattern matching in an object-oriented teaching language with
> structural typing.  "Robust Scripting Via
> Patterns" ( http://hirzels.com/martin/papers/dls12-thorn-patterns.pdf )
> is about pattern matching in Thorn, a dynamically-typed language
> implemented on the JVM.  Thorn's pattern matching is integrated into the
> language at a deep level, and it has some good ideas about using
> patterns in control structures, some of which I shamelessly stole in
> writing the above suggestions for pattern matching syntax in Python.
> I'm just scratching the surface here, much more can be and has been
> said.
> 
> I haven't talked at all about the nitty-gritty of implementation, though
> I can.  I think the most productive way to have that discussion on this
> list would be to look at what smaller features would be most helpful for
> implementing pattern matching as a library, and likewise for syntax,
> shades of the recent discussion about macros.

This last part, I think, is very useful: After building the best pattern-matching library you can with Python today, figure out what small changes to the language would allow you to either make it better or simplify its implementation, then propose those small changes instead of proposing pattern matching itself. For example, if you have to use nonportable hacks (or a MacroPy macro) so your context manager's __enter__ method can skip the statement body, that would be a good argument for reopening PEP 377 (because you have a new use case that wasn't known at the time). But trying to guess in advance what features might be helpful seems harder and less likely to pay off.

That being said, I suspect it's not going to pay off, because I think the with statement is a false lead in the first place. The semantics look promising, but they're not actually what you want to build on (unless you want to completely transform with statements into something that isn't actually a with statement, using MacroPy AST transformations or otherwise). I'd love to be proven wrong, of course.

Meanwhile transforming a pattern-match into something that can be assigned to a target_list instead of just assigning in the match does work in simple cases, but I think Szymon's proposal shows why it's not sufficient, and I think any real examples will show that it's redundant. Even without the language support in my proposal, explicit quoting of names like in Grant Jenks' PyPatt library shows that you can get most of the way there. (The "as" clause is useful for the case when you want to bind the whole structure at the same time you decompose it and potentially bind its parts, but that's pretty rare--which is another reason I think "with" is misleading.)

> 
> Cara
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>