Re: [Python-ideas] Pattern matching
Fair warning: this is long.
I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?
Greetings zefciu
Many. In no particular order: http://pypi.python.org/pypi/pyfpm https://github.com/martinblech/pyfpm http://www.aclevername.com/projects/splarnektity/ https://pypi.python.org/pypi/splarnektity/0.3 http://blog.chadselph.com/adding-functional-style-pattern-matching-to-python... https://gist.github.com/chadselph/1320421 http://svn.colorstudy.com/home/ianb/recipes/patmatch.py http://monkey.org/~marius/pattern-matching-in-python.html https://github.com/mariusae/match/tree/master https://pypi.python.org/pypi/PEAK-Rules http://peak.telecommunity.com/DevCenter/PEAK-Rules http://home.in.tum.de/~bayerj/patternmatch.py https://github.com/admk/patmat https://github.com/lihaoyi/macropy#pattern-matching These are in addition to the ones Paul gave. There are some more attempts at related areas (unification and other forms of logic programming, multiple dispatch) and discussions without implementations that I didn't list. Before I say anything else, I should disclose my bias: I think pattern matching would be a great addition to Python. Python tries hard to be readable, and pattern matching would make certain kinds of code much more legible than Python currently allows. I'll give two examples, one canonical and one from something I've been working on. The first is red-black tree balancing. Not by accident, rosettacode.org's pattern matching comparison page ( http://rosettacode.org/wiki/Pattern_matching ) uses that as its example for pattern matching in different languages. Most of the implementations fit onto a single page. I won't argue they're marvels of readability, but that's mostly because of too-succinct variable naming. Here's an implementation of the same algorithm in Python: http://scienceblogs.com/goodmath/2008/05/28/the-basic-balanced-search-tree . I'd paste the code, but it's too long to be instructive because the algorithm gets scattered across against five nontrivial functions, each of which has nested conditionals, not counting the implicit nesting from the functions calling one another. The second is performing algebra on the struct module's format strings. I've written some automatic optimization of them with concatenation and scalar multiplication, so that for instance 'b' + 'b' = '2b' rather than 'bb' and '10s' + '10s' = '10s10s'. The value of this is in dealing with format strings that can be combined in various ways to describe the layouts of complicated binary data formats. With some context stripped for brevity, def _check_add(self, other): if not isinstance(other, _StructBase): return NotImplemented if not self.same_endianness(other): return NotImplemented if self.endianness == other.endianness or len(self.endianness) > len(other.endianness): return type(self) else: if isinstance(other, _StructSequence): return type(other) else: return other._STRUCT_SEQUENCES[other.endianness] def __add__(self, other): struct_sequence = self._check_add(other) if isinstance(other, _StructSequence): if other._elements == []: return struct_sequence(self._elements.copy()) else: begin = other._elements[0] rest = other._elements[1:] else: begin = other rest = [] if self._elements == []: rest.insert(0, begin) return struct_sequence(rest) if self._elements[-1].character == begin.character: return struct_sequence(self._elements[:-1] + [self._elements[-1] + begin] + rest) else: return struct_sequence(self._elements + [begin] + rest) The exact details aren't important, my point is this also leads to complicated nested conditionals because to know what actions I need to perform on a given data type, I need checks to make sure the endianness of the two strings matches, multiple type checks, checks for whether one or the other object is empty, and so on, then after verifying what I'm dealing with, decompositions to break a list apart into its first element and remainder and handling the empty-list case. This implementation isn't even complete because it doesn't, for instance, deal with all the possible endianness permutations. There are some simplifications possible, for instance moving all information described by finite sets like endianness into types (so there's a StructSequence, a BigEndianStructSequence, a LittleEndianStructSequence, ...) but this can't handle all the necessary conditionals and doesn't scale well. Those are two small-scale examples. Some broad areas where pattern matching is useful are: 1) Parsing: It's useful both for performing manipulations on grammars like left factoring and operating on the trees that parsers produce. 2) Mathematics: manipulations of abstract mathematical objects like algebraic expressions (not numerical analysis), for instance algebraic simplification rules. 3) Recursive data structures: Python doesn't have algebraic data types or embed recursion into everything the way languages like Haskell do, but there's plenty of recursively-structured data out in the wild like HTML and JSON. 4) Compilers/interpreters: this use is closely related to parsing and recursive data structures because compilers and interpreters are often acting on abstract syntax trees. Often transformations like flattening an AST into bytecode and constant folding can be implemented with pattern matching. I'll stop there because those are the applications I know best, but I'm sure there are others. Pattern matching also subsumes features including multiple dispatch on types that have been proposed, discussed, and implemented many times for Python. I'd argue the existence of more than a dozen versions of pattern matching and their cumulative downloads from PyPi says the desire for the feature isn't zero, but I'm not sure how indicative they are. Whether I've convinced you or not that pattern matching is a worthwhile addition to the language, I'd like it, and if I could, I'd write a library to get it. Past discussions on pattern matching have started and ended with debates about the syntax for adding it as a core language feature. Because the whole point of pattern matching is readability, since you can't do anything with it you couldn't with a sufficiently-complicated nested if-elif-else, arguing about syntax isn't putting the cart before the horse like it is with some features, but there's more to implementing pattern matching than just syntax, and I'd like to expand the discussion to some of those issues. More than that, though, I think pattern matching with imperfect syntax would still be useful, and I'd rather see a mypy-like library now than a language feature that might make it into Python in some distant future. In that spirit, I'll offer some syntax that doesn't involve changing the grammar. To emulate a case expression like Haskell, the only way I can see to do it is abusing 'with' statements because they're the only block-like statement aside from exceptions that allow assignment: with match(expr) as case: with case(pattern) as matches: if matches: target_list = matches suite with case(pattern) as matches: if matches: target_list = matches suite ... Pattern matching fits naturally into for statements and comprehensions: for target_list in pattern(expr): suite (expr for target_list in pattern(expr) if expr) A syntax similar to functools.singledispatch's works for pattern dispatch: @match(pattern) def f(parameter_list): suite @f.case(pattern) def g(parameter_list): suite @f.case(pattern) def h(parameter_list): suite For defining list and dictionary patterns themselves, my best thought is abusing __getitem__, ellipsis, and the slice syntax: Listpattern[1, ..., 'a': int] [1, 'a', 'b', 'c', 2] # matches, name a is bound to 2 [3, 4, 5] # doesn't match because the first element is 1, no binding [1, 2, None] # also doesn't match because None is not an int Dictpattern['a': 1, 'b': 2, 'c': 'a': str] ['a': 1, 'b': 2, 'c': 'foo'] # matches, name a is bound to 'foo' ['a': 1, 'b': 2, 'c': 'bar', 'd': 3] # doesn't match because no ellipsis These are far from the best possible syntax available by changing Python's grammar or syntax, and some of them would benefit from using Macropy to rewrite ASTs. I'm sure they could be improved even within those constraints, and if anyone has ideas, I'm all ears. A lot of work has been done on pattern matching in object-oriented languages with and without algebraic data types. I think Scala's implementation is most mature and well worth looking at for ideas. (Macropy's experimental pattern matching is patterned on Scala's.) "Patterns as Objects in Grace" ( http://homepages.ecs.vuw.ac.nz/~djp/files/DLS2012.pdf ) is a paper on pattern matching in an object-oriented teaching language with structural typing. "Robust Scripting Via Patterns" ( http://hirzels.com/martin/papers/dls12-thorn-patterns.pdf ) is about pattern matching in Thorn, a dynamically-typed language implemented on the JVM. Thorn's pattern matching is integrated into the language at a deep level, and it has some good ideas about using patterns in control structures, some of which I shamelessly stole in writing the above suggestions for pattern matching syntax in Python. I'm just scratching the surface here, much more can be and has been said. I haven't talked at all about the nitty-gritty of implementation, though I can. I think the most productive way to have that discussion on this list would be to look at what smaller features would be most helpful for implementing pattern matching as a library, and likewise for syntax, shades of the recent discussion about macros. Cara
On Thursday, April 9, 2015 6:57 AM, Cara
Fair warning: this is long.
I was trying to google if there were some attempts to add pattern matching feature (as seen in e.g. haskell) to Python. I couldn't however find anything. Were there any proposals for a feature like this?
Greetings zefciu
Many. In no particular order:
http://pypi.python.org/pypi/pyfpm https://github.com/martinblech/pyfpm
http://www.aclevername.com/projects/splarnektity/ https://pypi.python.org/pypi/splarnektity/0.3
http://blog.chadselph.com/adding-functional-style-pattern-matching-to-python... https://gist.github.com/chadselph/1320421
http://svn.colorstudy.com/home/ianb/recipes/patmatch.py
http://monkey.org/~marius/pattern-matching-in-python.html https://github.com/mariusae/match/tree/master
https://pypi.python.org/pypi/PEAK-Rules http://peak.telecommunity.com/DevCenter/PEAK-Rules
http://home.in.tum.de/~bayerj/patternmatch.py
https://github.com/admk/patmat
https://github.com/lihaoyi/macropy#pattern-matching
These are in addition to the ones Paul gave. There are some more attempts at related areas (unification and other forms of logic programming, multiple dispatch) and discussions without implementations that I didn't list.
Before I say anything else, I should disclose my bias: I think pattern matching would be a great addition to Python. Python tries hard to be readable, and pattern matching would make certain kinds of code much more legible than Python currently allows. I'll give two examples, one canonical and one from something I've been working on.
The first is red-black tree balancing. Not by accident, rosettacode.org's pattern matching comparison page ( http://rosettacode.org/wiki/Pattern_matching ) uses that as its example for pattern matching in different languages. Most of the implementations fit onto a single page. I won't argue they're marvels of readability, but that's mostly because of too-succinct variable naming. Here's an implementation of the same algorithm in Python: http://scienceblogs.com/goodmath/2008/05/28/the-basic-balanced-search-tree . I'd paste the code, but it's too long to be instructive because the algorithm gets scattered across against five nontrivial functions, each of which has nested conditionals, not counting the implicit nesting from the functions calling one another.
The second is performing algebra on the struct module's format strings. I've written some automatic optimization of them with concatenation and scalar multiplication, so that for instance 'b' + 'b' = '2b' rather than 'bb' and '10s' + '10s' = '10s10s'. The value of this is in dealing with format strings that can be combined in various ways to describe the layouts of complicated binary data formats. With some context stripped for brevity,
def _check_add(self, other): if not isinstance(other, _StructBase): return NotImplemented if not self.same_endianness(other): return NotImplemented if self.endianness == other.endianness or len(self.endianness) > len(other.endianness): return type(self) else: if isinstance(other, _StructSequence): return type(other) else: return other._STRUCT_SEQUENCES[other.endianness]
def __add__(self, other): struct_sequence = self._check_add(other) if isinstance(other, _StructSequence): if other._elements == []: return struct_sequence(self._elements.copy()) else: begin = other._elements[0] rest = other._elements[1:] else: begin = other rest = [] if self._elements == []: rest.insert(0, begin) return struct_sequence(rest) if self._elements[-1].character == begin.character: return struct_sequence(self._elements[:-1] + [self._elements[-1] + begin] + rest) else: return struct_sequence(self._elements + [begin] + rest)
This seems to be overcomplicating things by trying to write ML code in Python instead of just writing Python code. For example, the only reason you decompose other._elements into begin and rest is so you can recompose them (a copy of) into the same list you started with. Why bother? (Also, if you're avoiding the limited pattern-matching already available in Python by not writing "begin, *rest = other._elements", that doesn't make a great case for adding more.) def __add__(self, other): struct_sequence = self._check_add(other) if isinstance(other, _StructSequence): elements = other._elements else: elements = [other] if (self._elements and elements and self.elements[-1].character == elements[0].character): return struct_sequence(self._elements[:-1] + [self._elements[-1] + elements[0]] + elements[1:]) return struct_sequence(self._elements + elements) If this is all you're doing, your argument is equivalent to showing some inherently iterative code written instead in tail-recursive form to argue that Python should do tail call elimination. You're asking to make it easier to provide another, less Pythonic way to do something that can already be easily done Pythonically.
The exact details aren't important, my point is this also leads to
complicated nested conditionals because to know what actions I need to perform on a given data type, I need checks to make sure the endianness of the two strings matches, multiple type checks, checks for whether one or the other object is empty, and so on, then after verifying what I'm dealing with, decompositions to break a list apart into its first element and remainder and handling the empty-list case.
But everything from "decompositions" on is only necessary because you're thinking in terms of decomposition instead of list operations.
This implementation isn't even complete because it doesn't, for instance, deal with all the possible endianness permutations. There are some simplifications possible, for instance moving all information described by finite sets like endianness into types (so there's a StructSequence, a BigEndianStructSequence, a LittleEndianStructSequence, ...) but this can't handle all the necessary conditionals and doesn't scale well.
Those are two small-scale examples. Some broad areas where pattern matching is useful are:
1) Parsing: It's useful both for performing manipulations on grammars like left factoring and operating on the trees that parsers produce.
Parsing, and related things (compiling, schema validation, etc.) looks like a good case, but I'm not sure it is. For simple languages, an expression-tree library like PyParsing or Boost.Spirit seems to be more than sufficient, and often be even nicer than explicit pattern matching. For more complex languages (and sometimes even for simple ones), you can use or write a framework that lets you write a fully-declarative syntax and then executes it or generates executable code from it. For example, if you want to validate and parse an XML language, doing it by hand in ML would be nicer than in Python, but writing it in XML Schema and using a schema-validating parser is much nicer than either. Parsing the Python grammar by hand would be nicer in ML than in C or Python, but writing it as a declarative GRAMMAR file is nicer than either. And so on. You could argue that the work to write that framework is wasted if it's only going to be used once, but how many complex languages really need to be parsed only once?Even CPython's parser, which isn't used for anything but parsing Python's GRAMMAR, is effectively reused every time Python's syntax changes, and every time someone wants to hack on it to play with new syntax. The one really good case for pattern matching seems to be when you pretty much _have_ to write a custom parser--maybe because the language is full of special-case rules and tangled context, like C++. At least the GCC and LLVM/Clang groups gave up trying to write C++ as a declarative specification and instead wrote custom recursive-descent parsers in C, which I suspect could be a lot nicer with pattern matching. But has anyone written an equivalent C++ parser in Haskell, ML, etc. to compare, or is that just a suspicion?
2) Mathematics: manipulations of abstract mathematical objects like algebraic expressions (not numerical analysis), for instance algebraic simplification rules.
Are you talking about parsing the expressions into a tree? If so, we're still on parsing.
3) Recursive data structures: Python doesn't have algebraic data types or embed recursion into everything the way languages like Haskell do, but there's plenty of recursively-structured data out in the wild like HTML and JSON.
The parsing of this data is covered above. But what do you do with it once it's parsed? If you're transforming it, that again seems to be a case where writing the transformer declaratively in, say, XSSL wins. And if you're just extracting data from it, usually you're better off with the simple data['machines'][0]['name'] syntax (in an except KeyError block), and when that's not good enough, you're usually better off with a declarative search language like XPath. So this seems to be the same as with parsing.
4) Compilers/interpreters: this use is closely related to parsing and
recursive data structures because compilers and interpreters are often acting on abstract syntax trees. Often transformations like flattening an AST into bytecode and constant folding can be implemented with pattern matching.
I'll stop there because those are the applications I know best, but I'm sure there are others. Pattern matching also subsumes features including multiple dispatch on types that have been proposed, discussed, and implemented many times for Python. I'd argue the existence of more than a dozen versions of pattern matching and their cumulative downloads from PyPi says the desire for the feature isn't zero, but I'm not sure how indicative they are.
Whether I've convinced you or not that pattern matching is a worthwhile addition to the language, I'd like it, and if I could, I'd write a library to get it.
That raises the question: what's wrong with the more than a dozen libraries already written?
Past discussions on pattern matching have started and ended with debates about the syntax for adding it as a core language feature. Because the whole point of pattern matching is readability, since you can't do anything with it you couldn't with a sufficiently-complicated nested if-elif-else, arguing about syntax isn't putting the cart before the horse like it is with some features, but there's more to implementing pattern matching than just syntax, and I'd like to expand the discussion to some of those issues. More than that, though, I think pattern matching with imperfect syntax would still be useful, and I'd rather see a mypy-like library now than a language feature that might make it into Python in some distant future. In that spirit, I'll offer some syntax that doesn't involve changing the grammar. To emulate a case expression like Haskell, the only way I can see to do it is abusing 'with' statements because they're the only block-like statement aside from exceptions that allow assignment:
with match(expr) as case: with case(pattern) as matches: if matches: target_list = matches suite with case(pattern) as matches: if matches: target_list = matches suite ...
First, this seems like it will execute _every_ branch that matches, instead of just the first one. Surely you don't want that, do you? You can wrap the whole thing in a "while True:" and have each suite (and the "else" case) end with "break", or wrap it in a try/except and have each suite raise a custom exception, but you need some way to get out. Also, why not just put the target_list in the as clause (inside parens, to fit it in a target, which is what an as clause actually takes)? Of course that means you get an exception rather than an else when it doesn't match, but that's fine. (It would be nice if it were a subclass of ValueError, in case the suite raises some ValueError of its own, however...)
Pattern matching fits naturally into for statements and comprehensions:
for target_list in pattern(expr): suite
What is this going to do? From your above example, if looks like match(expr)(pattern) returns either an iterable that can be assigned to a target_list or None. What does pattern(expr) return? Return an iterable of one target_list or an empty iterable? Also, the very idea of having an object which can be called on an expression or can have a (processed) expression called on it with similar but not identical results seems very confusing.
(expr for target_list in pattern(expr) if expr)
A syntax similar to functools.singledispatch's works for pattern dispatch:
@match(pattern) def f(parameter_list): suite
@f.case(pattern) def g(parameter_list): suite
@f.case(pattern) def h(parameter_list): suite
For defining list and dictionary patterns themselves, my best thought is abusing __getitem__, ellipsis, and the slice syntax:
Listpattern[1, ..., 'a': int]
[1, 'a', 'b', 'c', 2] # matches, name a is bound to 2 [3, 4, 5] # doesn't match because the first element is 1, no binding [1, 2, None] # also doesn't match because None is not an int
Dictpattern['a': 1, 'b': 2, 'c': 'a': str]
['a': 1, 'b': 2, 'c': 'foo'] # matches, name a is bound to 'foo' ['a': 1, 'b': 2, 'c': 'bar', 'd': 3] # doesn't match because no ellipsis
The useful part of pattern matching--and the hard part, too--is decomposing structures. A proposal that ignores that and only decomposes lists and dicts misses the point. Yes, it's true that Python doesn't have ADTs. So that means one of two things: (1) types that want to be matched will have to do a bit of work to expose their structure and/or to decompose themselves, or (2) pattern matching isn't useful in Python. The examples you give from other languages show that the first is at least plausible. And both Szymon's proposal and mine and various other ideas suggested on this thread show attempts at a Pythonic design. But just ignoring structure decomposition, as you're doing, is not a way toward anything useful.
These are far from the best possible syntax available by changing Python's grammar or syntax, and some of them would benefit from using Macropy to rewrite ASTs. I'm sure they could be improved even within those constraints, and if anyone has ideas, I'm all ears.
If you're willing to use MacroPy, that brings up the question: what's wrong with the pattern matching that it comes with as an example, and why don't you want to build on it instead of starting over?
A lot of work has been done on pattern matching in object-oriented languages with and without algebraic data types. I think Scala's implementation is most mature and well worth looking at for ideas. (Macropy's experimental pattern matching is patterned on Scala's.) "Patterns as Objects in Grace" ( http://homepages.ecs.vuw.ac.nz/~djp/files/DLS2012.pdf ) is a paper on pattern matching in an object-oriented teaching language with structural typing. "Robust Scripting Via Patterns" ( http://hirzels.com/martin/papers/dls12-thorn-patterns.pdf ) is about pattern matching in Thorn, a dynamically-typed language implemented on the JVM. Thorn's pattern matching is integrated into the language at a deep level, and it has some good ideas about using patterns in control structures, some of which I shamelessly stole in writing the above suggestions for pattern matching syntax in Python. I'm just scratching the surface here, much more can be and has been said.
I haven't talked at all about the nitty-gritty of implementation, though I can. I think the most productive way to have that discussion on this list would be to look at what smaller features would be most helpful for implementing pattern matching as a library, and likewise for syntax, shades of the recent discussion about macros.
This last part, I think, is very useful: After building the best pattern-matching library you can with Python today, figure out what small changes to the language would allow you to either make it better or simplify its implementation, then propose those small changes instead of proposing pattern matching itself. For example, if you have to use nonportable hacks (or a MacroPy macro) so your context manager's __enter__ method can skip the statement body, that would be a good argument for reopening PEP 377 (because you have a new use case that wasn't known at the time). But trying to guess in advance what features might be helpful seems harder and less likely to pay off. That being said, I suspect it's not going to pay off, because I think the with statement is a false lead in the first place. The semantics look promising, but they're not actually what you want to build on (unless you want to completely transform with statements into something that isn't actually a with statement, using MacroPy AST transformations or otherwise). I'd love to be proven wrong, of course. Meanwhile transforming a pattern-match into something that can be assigned to a target_list instead of just assigning in the match does work in simple cases, but I think Szymon's proposal shows why it's not sufficient, and I think any real examples will show that it's redundant. Even without the language support in my proposal, explicit quoting of names like in Grant Jenks' PyPatt library shows that you can get most of the way there. (The "as" clause is useful for the case when you want to bind the whole structure at the same time you decompose it and potentially bind its parts, but that's pretty rare--which is another reason I think "with" is misleading.)
Cara _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
In my previous post I aimed for brevity and skimmed over some of the details. In some places, I was a bit too brief, and so this post will be much more detailed.
The second is performing algebra on the struct module's format strings. I've written some automatic optimization of them with concatenation and scalar multiplication, so that for instance 'b' + 'b' = '2b' rather than 'bb' and '10s' + '10s' = '10s10s'. The value of this is in dealing with format strings that can be combined in various ways to describe the layouts of complicated binary data formats. With some context stripped for brevity,
def _check_add(self, other): if not isinstance(other, _StructBase): return NotImplemented if not self.same_endianness(other): return NotImplemented if self.endianness == other.endianness or len(self.endianness) > len(other.endianness): return type(self) else: if isinstance(other, _StructSequence): return type(other) else: return other._STRUCT_SEQUENCES[other.endianness]
def __add__(self, other): struct_sequence = self._check_add(other) if isinstance(other, _StructSequence): if other._elements == []: return struct_sequence(self._elements.copy()) else: begin = other._elements[0] rest = other._elements[1:] else: begin = other rest = [] if self._elements == []: rest.insert(0, begin) return struct_sequence(rest) if self._elements[-1].character == begin.character: return struct_sequence(self._elements[:-1] + [self._elements[-1] + begin] + rest) else: return struct_sequence(self._elements + [begin] + rest)
This seems to be overcomplicating things by trying to write ML code in Python instead of just writing Python code. For example, the only reason you decompose other._elements into begin and rest is so you can recompose them (a copy of) into the same list you started with. Why bother? (Also, if you're avoiding the limited pattern-matching already available in Python by not writing "begin, *rest = other._elements", that doesn't make a great case for adding more.)
def __add__(self, other): struct_sequence = self._check_add(other) if isinstance(other, _StructSequence): elements = other._elements else: elements = [other] if (self._elements and elements and self.elements[-1].character == elements[0].character): return struct_sequence(self._elements[:-1] + [self._elements[-1] + elements[0]] + elements[1:]) return struct_sequence(self._elements + elements)
If this is all you're doing, your argument is equivalent to showing some inherently iterative code written instead in tail-recursive form to argue that Python should do tail call elimination. You're asking to make it easier to provide another, less Pythonic way to do something that can already be easily done Pythonically.
I didn't use "begin, *rest = other._elements" because this code has to be Python-2 compatible. (This is another argument for a library solution.) I meant more to give the flavor of the complications of needing to test and decompose objects at the same time in Python than provide the best possible solution. Your version of __add__ is better and simpler than mine, but I still find it hard to read and understand, and you're using _check_add too. The following is another description of this algorithm, simplifying by ignoring the case where you're adding a StructAtom to a StructSequence. (Arguably, that should be another method, or StructAtoms shouldn't be part of the public API just as Python has no char built-in type). A StructSequence is a sequence of StructAtoms. A StructAtom is composed of an endianness, a legal format character, and a positive integer. To concatenate two StructSequences in __add__ or __radd__, here are the steps: 1) Check that `other` is a StructSequence. If it's not, return NotImplemented. 2) Check that `self` and `other` have compatible endiannesses. Probably the best way to do this is use a dictionary with two-element frozensets of endianness characters mapping to one endianness character each. I used conditionals above in _check_add, and that's messier. A missing key means they aren't compatible, and then __radd__ has to raise some kind of exception indicating invalid endiannesses for concatenation. __add__ needs to return NotImplemented. A successful lookup will give the endianness character for the result StructSequence. 3) Check that neither `self` nor `other` are empty. If either is empty, return a copy of the other. (If they're both empty, you'll end up returning a copy of an empty StructSequence, which is fine.) 4) Compare the first StructAtom of one StructSequence with the final StructAtom of the other. (Which is `self` and which is `other` depends on whether we're in __add__ or __radd__.) If the format characters are the same and not 's' or 'p', then create a new StructSequence composed of all but the last character of one, all but the first character of the other, and a StructAtom that has the endianness of the result StructSequence, the same character as the two original StructAtoms, and an integer that's the sum of the two integers in the original StructAtoms. If the format characters aren't the same or are both 's' or 'p', return a new StructSequence that's the concatenation of the two original sequences. Replacing _check_add with the aforesaid dictionary won't improve its clarity much, if any. You still end up with at least four conditionals of some form and needing to do object decomposition in some of the branches. Moreover, struct format string manipulation is simple for problems of these kinds: consider doing simplification on regular expressions. Devin's examples from the ast and sre_compile modules are better illustrations than this. If you don't find the code in these examples hard to read or comprehend, I don't expect you to agree that pattern matching is useful. Do you or don't you? That's an honest question.
Those are two small-scale examples. Some broad areas where pattern matching is useful are:
1) Parsing: It's useful both for performing manipulations on grammars like left factoring and operating on the trees that parsers produce.
Parsing, and related things (compiling, schema validation, etc.) looks like a good case, but I'm not sure it is.
For simple languages, an expression-tree library like PyParsing or Boost.Spirit seems to be more than sufficient, and often be even nicer than explicit pattern matching.
For more complex languages (and sometimes even for simple ones), you can use or write a framework that lets you write a fully-declarative syntax and then executes it or generates executable code from it. For example, if you want to validate and parse an XML language, doing it by hand in ML would be nicer than in Python, but writing it in XML Schema and using a schema-validating parser is much nicer than either. Parsing the Python grammar by hand would be nicer in ML than in C or Python, but writing it as a declarative GRAMMAR file is nicer than either. And so on. You could argue that the work to write that framework is wasted if it's only going to be used once, but how many complex languages really need to be parsed only once?Even CPython's parser, which isn't used for anything but parsing Python's GRAMMAR, is effectively reused every time Python's syntax changes, and every time someone wants to hack on it to play with new syntax.
The one really good case for pattern matching seems to be when you pretty much _have_ to write a custom parser--maybe because the language is full of special-case rules and tangled context, like C++. At least the GCC and LLVM/Clang groups gave up trying to write C++ as a declarative specification and instead wrote custom recursive-descent parsers in C, which I suspect could be a lot nicer with pattern matching. But has anyone written an equivalent C++ parser in Haskell, ML, etc. to compare, or is that just a suspicion?
I became interested in pattern matching when I started writing a parsing library and realized that both the manipulations I needed to perform in the parser itself and the manipulations I would need to do on the AST after parsing were better expressed using pattern matching than the if-elif-else trees I was writing. As part of this, I've been doing some informal research on how people do parsing in the real world. Unfortunately, I don't have any formal data and don't know how I'd collect it, not to mention not having the money or time to do a formal survey even if I knew how to do it. With that caveat, some patterns I've noticed: 1) Hand-written recursive-descent parsers are ubiquitous in real code. If you take a random program that needs to do parsing somewhere, I'd give better than even odds that it's done with a hand-written recursive-descent parser, not a library. I have some guesses as to why this is, but I won't speculate here. I think there's a solid case for putting a parser package in the standard library---Haskell and Scala, for instance, have already done this---which helps with one part of this problem, but then you have to do something with the tree you've parsed, and pattern matching is also useful for that. 2) Many programmers don't know (E)BNF or formal grammars. I prefer to use some declarative DSL too, but I know the languages and the concepts already. Those who don't might benefit from features that make parsers easier to write without that background. 3) The world is filled with obscure data formats that none of us have ever heard of; this is is especially true of binary data formats. Each of these requires some new parser whether hand-written, built with a library, or generated. 4) Most languages and formats can't be parsed with anything less than a context-sensitive grammar, and sometimes require more than that. To handle this problem, successful parser libraries make it easy to add custom code to the parsing process. Thus, there's a lot of parsing code in the world, whether there should be or not, and even if you know how formal grammars and how to use parser generators and similar tools, you might still end needing to write parsing code because you have an obscure format to parse, a format that has non-standard properties, or both. Aside: if you do end up needing to write a framework for a language or format, wouldn't pattern matching make it easier to write that framework? As far as I know, there's no C++ parser in Haskell or ML, mainly because C++ is so hard to parse. There exist parsers for other languages (C, for instance) that do make extensive use of pattern matching.
2) Mathematics: manipulations of abstract mathematical objects like algebraic expressions (not numerical analysis), for instance algebraic simplification rules.
Are you talking about parsing the expressions into a tree? If so, we're still on parsing.
I'm talking about doing manipulations on mathematical objects, like say doing substitutions with trigonometric identities.
3) Recursive data structures: Python doesn't have algebraic data types or embed recursion into everything the way languages like Haskell do, but there's plenty of recursively-structured data out in the wild like HTML and JSON.
The parsing of this data is covered above. But what do you do with it once it's parsed? If you're transforming it, that again seems to be a case where writing the transformer declaratively in, say, XSSL wins. And if you're just extracting data from it, usually you're better off with the simple data['machines'][0]['name'] syntax (in an except KeyError block), and when that's not good enough, you're usually better off with a declarative search language like XPath. So this seems to be the same as with parsing.
What is XSSL? All my comments above on parsing apply here, too. One problem I've encountered is that since most languages and formats don't have nice solutions like XPath already available, you end up implementing your own.
Whether I've convinced you or not that pattern matching is a worthwhile addition to the language, I'd like it, and if I could, I'd write a library to get it.
That raises the question: what's wrong with the more than a dozen libraries already written?
The short answer is some combination of poor performance, lack of compatibility (with 2.7, with 3.4, with PyPy, with other alternate interpreters), bad syntax, and missing functionality, depending on the library. If you have a particular library you're curious about, I can be more specific.
with match(expr) as case: with case(pattern) as matches: if matches: target_list = matches suite with case(pattern) as matches: if matches: target_list = matches suite ...
First, this seems like it will execute _every_ branch that matches, instead of just the first one. Surely you don't want that, do you? You can wrap the whole thing in a "while True:" and have each suite (and the "else" case) end with "break", or wrap it in a try/except and have each suite raise a custom exception, but you need some way to get out.
Yes, I have `while True:` in my notes, it was a clumsy copy-paste, and omitting the break lines was a simple mistake. There are some reasons that try/except might be better.
Also, why not just put the target_list in the as clause (inside parens, to fit it in a target, which is what an as clause actually takes)? Of course that means you get an exception rather than an else when it doesn't match, but that's fine. (It would be nice if it were a subclass of ValueError, in case the suite raises some ValueError of its own, however...)
I didn't know you could do that :). Yes, that's preferable.
for target_list in pattern(expr): suite
What is this going to do? From your above example, if looks like match(expr)(pattern) returns either an iterable that can be assigned to a target_list or None. What does pattern(expr) return? Return an iterable of one target_list or an empty iterable?
Also, the very idea of having an object which can be called on an expression or can have a (processed) expression called on it with similar but not identical results seems very confusing.
Here, what should happen is that `expr` should be an iterator and calling `pattern()` on it returns another iterator containing the deconstructed parts of each object in the original iterator *if* they match. I agree the lack of parallelism is confusing.
The useful part of pattern matching--and the hard part, too--is decomposing structures. A proposal that ignores that and only decomposes lists and dicts misses the point.
Yes, it's true that Python doesn't have ADTs. So that means one of two things: (1) types that want to be matched will have to do a bit of work to expose their structure and/or to decompose themselves, or (2) pattern matching isn't useful in Python. The examples you give from other languages show that the first is at least plausible. And both Szymon's proposal and mine and various other ideas suggested on this thread show attempts at a Pythonic design. But just ignoring structure decomposition, as you're doing, is not a way toward anything useful.
I assumed the answer to this was obvious because you've already given it :). Objects have to opt in to pattern matching by having some method that inverts their constructor or at least provides some plausible facsimile thereof. For any object-oriented language, if you want to preserve the ability to define with any attributes and have pattern matching, you have to do it this way or with structural typing, and Python doesn't really have that. Python classes don't have private variables like Java except by convention, but neither do they define their variables except by convention. __init__ is a black box that could be doing any Turing-complete computation on its arguments and with properties, any attribute access could be the result of a computation rather than a member of the class. Scala calls its pattern-matching method unapply by analogy with its apply method. In Python, I'd probably pick a different name, but the concept is the same.
If you're willing to use MacroPy, that brings up the question: what's wrong with the pattern matching that it comes with as an example, and why don't you want to build on it instead of starting over?
I'm tempted to answer, "Nothing!" because I am thinking about building on it, but the short answer is that I'm not sure the syntactic and performance advantages of macros outweigh the usability problems introduced by macros (unfamiliarity of macros to most Python developers, bootstrapping, complication of the implementation). This is my major concern: if you have a case to make one way or the other, I'd like to hear it. Other secondary concerns are that, despite doing AST rewriting, MacroPy's pattern matching doesn't seem to do any optimization, it isn't feature-complete (it's still listed as experimental for good reasons), and MacroPy itself requires updating to 3.4 (and soon 3.5). None of these are insurmountable problems if I were convinced that macros were the right approach. I would prefer to build off one of the existing solutions rather than make a new one. Unfortunately, for most of them it would be more work to comprehend the existing code than it would be to reimplement their functionality, especially for the ones with nonexistent documentation. In that respect, MacroPy's is definitely one of the more promising options.
I haven't talked at all about the nitty-gritty of implementation, though I can. I think the most productive way to have that discussion on this list would be to look at what smaller features would be most helpful for implementing pattern matching as a library, and likewise for syntax, shades of the recent discussion about macros.
This last part, I think, is very useful: After building the best pattern-matching library you can with Python today, figure out what small changes to the language would allow you to either make it better or simplify its implementation, then propose those small changes instead of proposing pattern matching itself. For example, if you have to use nonportable hacks (or a MacroPy macro) so your context manager's __enter__ method can skip the statement body, that would be a good argument for reopening PEP 377 (because you have a new use case that wasn't known at the time). But trying to guess in advance what features might be helpful seems harder and less likely to pay off.
That being said, I suspect it's not going to pay off, because I think the with statement is a false lead in the first place. The semantics look promising, but they're not actually what you want to build on (unless you want to completely transform with statements into something that isn't actually a with statement, using MacroPy AST transformations or otherwise). I'd love to be proven wrong, of course.
After some thinking about it, I've concluded you're right, because both of the other options (try and for) combine binding with control flow, while with requires an additional control flow statement without something like PEP 377. Let's leave aside MacroPy and other ways of changing Python's semantics like the decorators in Grant's PyPatt and \url{https://github.com/Suor/patterns} for the moment. Pattern matching has two basic forms, the case/match block and function definition, but before I can talk about them, I have to discuss patterns themselves. Without metaprogramming, patterns have to be defined either as ordinary functions or classes. You have some set of elementary patterns like class patterns, dict patterns, list patterns, variable patterns, and so on. You construct or initialize patterns with either objects to match against or other patterns, mimicking the structure of the data you want to match. With the same tentative syntax I suggested last time: ClassPattern(Foo(1, VariablePattern('a', str))) DictPattern['a': 1, 'b': 2, 'c': 'a': int] ListPattern[1, ..., 'a': Bar] Alternately, if you don't want to use the slicing syntax like I am there, you can make everything very explicit: ClassPattern(Foo(1, VariablePattern('a', str)) DictPattern(('a', 1), ('b', 2), ('c', VariablePattern('a', int))) ListPattern(1, ..., VariablePattern('a', Bar)) Even the slicing syntax is verbose. Both forms of syntax could be made less verbose. Would that make it more or less clear? I'm not sure. This syntax is tightly circumscribed because the whole point of pattern matching is that the patterns should look like the objects they're being matched against. The closer the resemblance, the happier I'd be. After defining a pattern, you call it or one of its methods on the object you want to match against, and it returns either some indication of failure, maybe None, maybe some object with information about the non-match, or an object containing the deconstructed objects in a form suitable for tuple assignment. The case/match block should start with the object that the patterns are to be matched against (avoiding this duplication is a syntactic advantage of pattern matching over if/elif/else blocks), contain a list of patterns to match against, have access to variables in the local scope, and bind variables in the same scope. The only statements that bind variables in the same scope beyond basic assignment statements in Python are, as far as I know, try-except, with, and for. (Well, import, but it's obviously unsuitable.) Here are some sketches of syntax using try-except and for. try: pattern0(expression, pattern1, pattern2, ...) except pattern0.exception as identifier: target_list = identifier suite except pattern1.exception as identifier: target_list = identifier suite except pattern2.exception as identifier: target_list = identifier suite ... else: suite Here, each pattern has an associated exception. If a pattern matches, it throws its exception; if it doesn't match, it calls the first pattern in its varargs list on the value of the expression it was passed, and so on. The last pattern has no pattern to call so returns None if it fails to match. The exceptions have iterator methods for tuple unpacking their arguments after they've been caught. Another possible choice for setting up the patterns uses operator overloading and a dummy function to look vaguely like ML's syntax: match(expression, (pattern0 | pattern1 | pattern2 ...)) Using for statements can avoid the extra assignment statements at the cost of needing to reiterate the expression and some raises. try: for target_list in pattern0(expression): suite raise MatchFound for target_list in pattern1(expression): suite raise MatchFound ... except MatchFound: pass else: suite A pattern either returns a one-element iterator containing an iterator to be bound to target_list or raises StopIteration, in which case the loop body gets skipped and the next pattern is tried. Once any loop body is executed, it raises MatchFound, which skips all the other patterns. In Python, function definition is easier both for syntax and implementation. All the pattern matching implementations for Python that dispatch to functions ( http://blog.chadselph.com/adding-functional-style-pattern-matching-to-python... , https://github.com/martinblech/pyfpm , http://svn.colorstudy.com/home/ianb/recipes/patmatch.py , https://github.com/admk/patmat , http://peak.telecommunity.com/DevCenter/PEAK-Rules ) use decorators and all use them in similar ways. There are two major variations. First, some decorators take patterns as arguments while others use introspection on keyword arguments to embed the patterns in the function definition. You could use annotations instead of keyword arguments, though I don't know how easily that would integrate with type-hinting---it's probably possible because pattern matching and typing are closely related. I assume the reason no one's done so is because annotations are Python-3 only. Second, some implementations have one uniform decorator across all definitions of a function with frame-hacking or similar hacks to introspect the namespace and link up the decorated functions, while others use two decorators like functools.singledispatch, one to create a pattern-matched function and another to link other functions to the first one. I prefer the latter approach because I dislike frame-hacking on principle, since it introduces compatibility issues, and I think two decorators is more explicit as to what's going on. Beyond those differences, they're all similar: there's an outer function that tries patterns against its arguments until it finds one that matches, then it takes the names bound by the pattern and passes them as arguments to the inner function. Quoting my suggested function definition syntax from my last message: @match(pattern) def f(parameter_list): suite @f.case(pattern) def g(parameter_list): suite @f.case(pattern) def h(parameter_list): suite f(expression) In all of these potential syntax choices, the names of bound variables are assigned manually in assignment statements or function arguments. These approaches all have some boilerplate, but I'm more worried about overhead. Without any desugaring, most of them will require at least one function call per pattern on top of the conditionals and copying necessary. With metaprogramming, it's possible to eliminate some of the function calls for function definitions and the pseudo-ML match-case statement by constructing a single dispatch function or single pattern before calling any patterns. Unfortunately, in the match-case statement, moving overhead to pattern definition time doesn't help because the patterns have to be redefined every time they're executed unless they're explicitly defined outside it, making the code less readable; this is another way in which pattern matching for function definition is easier. Any metaprogramming could remove the boilerplate, but MacroPy could move the overhead to module import time. This is, I think, the best reason to use MacroPy over both non-metaprogramming approaches and other metaprogramming approaches. Unfortunately, Python's current implementation imposes some limitations on desugaring. In ML and Haskell, desugaring pattern matching involves creating a finite automaton. In Python, the VM doesn't provide any way to jump to an offset determined by runtime information short of Philip J. Eby's computed goto hack ( https://mail.python.org/pipermail/python-dev/2006-June/066115.html ), which isn't portable. This means that any state with more than two outgoing transitions has to be represented with an if-else tree that turns what could be a constant-time operation (look up the jump location in a hash table, jump) to a linear-time operation (check conditionals equal to one minus the number of transitions). This is, of course, the switch statement, PEPs 275 and 3103, in another guise. Another, less serious problem is there's a constant-time performance penalty from needing to use exceptions to implement backtracking or nodes with more than one incoming transition. Supposing I were to build on MacroPy's pattern matching, is MacroPy's current syntax for the match-case statement good enough? If not, how could it be improved? As I understand it, by using import hooks like MacroPy does and a new parser, the sky's the limit as far as changing syntax goes, but I'd much rather stay within MacroPy's limitations, using Python's current parsers, than go outside them.
Meanwhile transforming a pattern-match into something that can be assigned to a target_list instead of just assigning in the match does work in simple cases, but I think Szymon's proposal shows why it's not sufficient, and I think any real examples will show that it's redundant. Even without the language support in my proposal, explicit quoting of names like in Grant Jenks' PyPatt library shows that you can get most of the way there. (The "as" clause is useful for the case when you want to bind the whole structure at the same time you decompose it and potentially bind its parts, but that's pretty rare--which is another reason I think "with" is misleading.)
I don't understand your comment about how Szymon's proposal shows assigning to a target_list isn't sufficient. Cara
участники (2)
-
Andrew Barnert
-
Cara