[Python-ideas] Pattern Matching Syntax

Chris Angelico rosuav at gmail.com
Fri May 4 11:43:23 EDT 2018

On Sat, May 5, 2018 at 12:45 AM, Daniel Moisset <dmoisset at machinalis.com> wrote:
>> (3) Unlike a case/switch statement, there's no implication that the
>> compiler could optimise the order of look-ups; it is purely top to
>> bottom.
> [we are talking about a multi-branch pattern matching statement now, not
> just "apttern matching"] In most practical cases, a compiler can do
> relatively simple static analysis (even in python) that could result in
> performance improvements. One obvious improvement is that the matched
> expression can be evaluated once (you can achieve the same effect always
> with a dummy variable assignment right before the if/elif statement).

That one isn't an optimization, but part of the specification; it is
an advantage of the fact that you're writing the match expression just
once. But all the rest of your optimizations aren't trustworthy.

> But
> for multiple literal string patterns (a common scenario), you can compile a
> string matcher that is faster than a sequence of equality comparisons
> (either through hashing+comparison fallback or creating some decision tree
> that goes through the input string once).

Only if you're doing equality checks (no substrings or anything else
where it might match more than one of them). And if you're doing
"pattern matching" that's nothing more than string equality
comparisons, a dict is a better way to spell it.

> For integers you can make lookup tables.

If they're just equality checks, again, a dict is better. If they're
ranges, you would have to ensure that they don't overlap (no problem
if they're all literals), and then you could potentially optimize it.

> Even an ifinstance check choosing between several branches (a not so
> uncommon occurrence) could be implemented by a faster operation if somewhat
> considered that relevant.

Only if you can guarantee that no single object can be an instance of
more than one of the types. Otherwise, you still have to check in some
particular order. In CPython, you can guarantee that isinstance(x,
int) and isinstance(x, str) can't both be true, but that's basically a
CPython implementation detail, due to the way C-implemented classes
work. You can't use this to dispatch based on exception types, for
instance. Let's say you try to separately dispatch ValueError,
ZeroDivisionError, and OSError; and then you get this:

>>> class DivisionByOSError(ZeroDivisionError, OSError, ValueError): pass
>>> raise DivisionByOSError()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

That thing really truly is all three of those types, and you have to
decide how to dispatch that. So there needs to be an order to the
checks, with no optimization.


More information about the Python-ideas mailing list