[Python-ideas] Pattern Matching Syntax
Daniel Moisset
dmoisset at machinalis.com
Fri May 4 10:45:26 EDT 2018
This email from Steve has some good questions, let me try to help organize
ideas:
On 4 May 2018 at 13:11, Steven D'Aprano <steve at pearwood.info> wrote:
> I'll make a start, and you can correct me if I get any of it wrong.
>
> (1) Pattern matching takes a value, and compares it to a series of
> *patterns* until the first match, at which point it returns a specified
> value, skipping the rest of the patterns.
>
In a general sense based in most other languages, patterns are a syntactic
construct that can be "matched" with a value in runtime. The matching
process has two effects at once:
1) check that the value has some specific form dictated by the pattern
(which can have a yes/no result)
2) bind some assignable targets referenced in the pattern to components of
the value matched. The binding is usually done only if there is a match
according to (1)
Python actually has some form of patterns (called "target_list" in the
formal syntax) that are used in assignments, for loops, and other places.
As it is mostly restricted to assign single values, or decompose iterables,
we normally say "tuple unpacking" instead of "pattern matching". And
there's a second type of pattern which is included in the try/except
statement, which matches by subtype (and also can bind a name)
As a language designer, once you have your notion on matching defined, you
can choose which kind of constructs use patterns (I just mentioned left of
assignemnts, between "for" and "in", etc in python). Usual constructs are
multi branch statement/expression that match a single value between several
patterns, and run a branch of code depending on what pattern matched (After
performing the corresponding bindings). That's not the only option, you
could also implement patterns in other places, like regular assuments, or
the conditions of loops and conditionals [resulting in an effect similar to
some of the ones being discussed in the PEP572 thread]; although this last
sentence is a beyond what the OP was suggesting and a generalization of the
idea.
(2) Patterns typically are single values, and the match is by equality,
> although other kinds of patterns are available as well.
>
Typical patterns in other languages include:
a) single values (matched by equality)
b) assignables (names, stuff like mylist[0] or self.color) which match
anything and bind the value to assignables
c) type patterns (a value matches if the type of the value has a certain
supertype)
d) structure patterns (a value matches if it has certain structure. For
example, being a dict with certain keys, or an iterable of certain amount
of elements). These usually are recursive, and components of the structure
can be also patterns
e) arbitrary boolean conditions (that can use the names bound by other
parts of the pattern)
Python has support for (b) and (c) in both assignment and for loops. Python
supports (b) and (c) in try statements. The proposal for the OP offers
expanding to most of these patterns, and implement some sort of pattern
matching expression. I argued in another email that a pattern matching
statement feels more adequate to Python (I'm not arguing at this point if
it's a good idea, just that IF any is a good idea, it's the statement)
As an example, you could have a pattern (invented syntax) like "(1, 'foo',
bar, z: int)" which would positively match 4-element tuples that have 1 in
its first position, foo in its second, and an int instance in the last;
when matching it would bind the names "bar" and "z" to the last 2 elements
in the tuple.
> (3) Unlike a case/switch statement, there's no implication that the
> compiler could optimise the order of look-ups; it is purely top to
> bottom.
>
[we are talking about a multi-branch pattern matching statement now, not
just "apttern matching"] In most practical cases, a compiler can do
relatively simple static analysis (even in python) that could result in
performance improvements. One obvious improvement is that the matched
expression can be evaluated once (you can achieve the same effect always
with a dummy variable assignment right before the if/elif statement). But
for multiple literal string patterns (a common scenario), you can compile a
string matcher that is faster than a sequence of equality comparisons
(either through hashing+comparison fallback or creating some decision tree
that goes through the input string once). For integers you can make lookup
tables. Even an ifinstance check choosing between several branches (a not
so uncommon occurrence) could be implemented by a faster operation if
somewhat considered that relevant.
> (4) Unlike if...elif, each branch is limited to a single expression, not
> a block. That's a feature: a match expression takes an input, and
> returns a value, and typically we don't have to worry about it having
> side-effects.
>
> So it is intentionally less general than a chain of if...elif blocks.
>
>
That's the OP proposal, yes (as I mentioned, I argued with some simple data
that a feature like that is of a more limited use. Of course, I'd love to
see deeper analysis with data that proves me wrong, or arguing that what I
looked that is irrelevant ;-) )
> (5) We could think of it as somewhat analogous to a case/switch
> statement, a dict lookup, or if...elif, only better.
>
> (Why is it better?)
>
>
I'll leave the OP to argue his side here. I've mentioned some opportunities
for efficiency (which are IMO secondary) and I understand that there's an
argument for readability, especially when using the binding feature.
> Here is a list of patterns I would hope to support, off the top of my
> head:
>
> * match by equality;
>
> * match by arbitrary predicates such as "greater than X" or
> "between X and Y";
>
> * match by string prefix, suffix, or substring;
>
> * match by type (isinstance).
>
> I think that before we start talking about syntax, we need to know what
> features we need syntax for.
>
>
> There's probably more to it, because so far it doesn't look like
> anything but a restricted switch statement. Over to someone else with a
> better idea of why pattern matching has become ubiquitous in functional
> programming.
>
a multi branch statement tends to be present but it's not necessarily
ubiquitous in FP. "pattern matching" as an idea is one of those
pseudo-unviersal generalizations of computing that FP language designers
love. Essentially it covers with a single thing what we do in python with
several different features (assignment, argument passing, conditionals,
exception catching, unpacking of data structures, instance checking). It
works very well when you use algebraic data types (which are like unions of
namedtuples)as your primary data structure, because there are very natural
patterns to decompose those. In Python, there's less value to this because
well... it already has all these features so adding a unifying concept
after the fact doesn't make it simpler, but more complicated. So the main
argument to talk about here is if the expressivity added can be of value
(if we talk about pattern matching in many places of the language, it
*might*)
Best,
--
Daniel F. Moisset - UK Country Manager - Machinalis Limited
www.machinalis.co.uk <http://www.machinalis.com>
Skype: @dmoisset T: + 44 7398 827139
1 Fore St, London, EC2Y 9DT
Machinalis Limited is a company registered in England and Wales. Registered
number: 10574987.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180504/1572ae71/attachment.html>
More information about the Python-ideas
mailing list