[Python-ideas] Pattern Matching Syntax

Daniel Moisset dmoisset at machinalis.com
Fri May 4 09:37:05 EDT 2018


Note that most languages that you mentioned as references are functional
(so they don't have a statement/expression distinction like Python has),
and those that are not, have matching statements. The only exception is
Javascript, but in Javascript the distinction is not that
hard given that it has the idiom (function() {stmt; stmt; stmt})() to have
any statement block as an expression. And again, as I mentioned it's an
outlier. Other imperative languages like C, Java, have of course switch
statements which are similar

Making a quick search for real code that could benefit for this, I mostly
found situations where a matching *statement* would be required instead of
a matching *expression*. To give you the examples I found in the stdlib for
Python3.6 (I grepped for "elif" and looked for "similar" branches manually,
covering the first ~20%):

fnmatch.translate (match c: ... string options)
telnetlib.Telnet.process_rawq (match len(self.iacseq): ... integer options)
mimetypes[module __main__ body] (match opt: ... multiple str options per
match)
typing._remove_dups_flatten (match p: ... isinstance checks + custom
condition) [this *could* be an expression with some creativity]
typing.GenericMeta.__getitem__ (match self: ... single and multiple type
options by identity)
turtle.Shape.__init__ (match type_:... str options)
turtle.TurtleScreen._color (match len(cstr): ... int options)
turtle.TurtleScreen.colormode (match cmode: ... mixed type options)
turtle.TNavigator.distance (match x: ... isinstance checks) [could be an
expression]
turtle.TNavigator.towards (match x: ... isinstance checks) [could be an
expression]
turtle.TPen.color (match l: ... integer options. l is set to len(args) the
line before)
turtle._TurtleImage._setshape (match self._type: ... str options) [could be
an expression]
turtle.RawTurtle.__init__ (match canvas: ... isinstance checks)
turtle.RawTurtle.clone (match ttype: ... str options) [ could be an
expression]
turtle.RawTurtle._getshapepoly (match self._resizemode: ... str options,
one with a custom condition or'ed)
turtle.RawTurtle._drawturtle (match ttype: ... str options)
turtle.RawTurtle.stamp (match ttype: ... str options)
turtle.RawTurtle._undo (match action: ... str options)
ntpath.expandvars (match c: ... str optoins)
sre_parse.Subpattern.getwidth (match op: ... nonliteral int constants,
actually a NamedIntConstant which subclasses int)
sre_parse._class_escape (match c: ... string options with custom
conditions, and inclusion+equality mixed)
sre_parse._escape (match c: ... string options with custom conditions, and
inclusion+equality mixed)
sre_parse._parse ( match this: ... string options with in, not in, and
equality)
sre_parse._parse ( match char: ... string options with in, and equality)
sre_parse.parse_template (match c: ... string options with in)
netrc.netrc._parse (match tt: ... string options with custom conditions)
netrc.netrc._parse (match tt: ... string options with custom conditions)
[not a duplicate, there are two possible statements here]
argparse.HelpFormatter._format_args (match action.nargs: ... str/None
options) [this *could* be an expression with some
creativity/transformations]
argparse.ArgumentParser._get_nargs_pattern (match nargs: ... str/None
options) [could be an expression]
argparse.ArgumentParser._get_values (match action.nargs: ... str/None
options with extra conditions)
_strptime._strptime (match group_key: ... str options)
datetime._wrap_strftime (match ch: ... str optoins)
pickletools.optimize (match opcode,name: ... str options with reverse
inclusion and equiality)
json/encoder._make_iterencode(match value: ... mixed options and isinstance
checks)
json/encoder._make_iterencode._iterencode dict (match key: ... mixed
options and isinstance checks)
json/encoder._make_iterencode._iterencode dict (match value: ... mixed
options and isinstance checks)
json/encoder._make_iterencode._iterencode (match o: ... mixed options and
isinstance checks)
json/scanner.py_make_scanner._scan_once (match nextchar: ... str options)
[could be turned into expression with some transformation]
unittest.mock._Call.__new__ (match _len: ... int options)
unittest.mock._Call.eq__ (match len_other: ... int options)

(I'm not saying that all these should be match statements, only that they
could be). Cases where an expression would solve the issue are somewhat
uncommon (there are many state machines, including many string or argument
parsers that set state depending on the option, or object builders that
grow data structures). An usual situation is that some of the branches need
to raise exceptions (and raise in python is a statement, not an
expression). This could be workarounded making the default a raise
ValueError that can be caught and reraised as soemthing else, but that
would end up making the code deeper, and IMO, more complicated. Also, many
of the situations where an expression could be used, are string matches
where a dictionary lookup would work well anyway.

My conclusions for this are:
1. It makes more sense to talk about a statement, not an expression
2. good/clear support for strings, ints and isinstancechecks is essential
(other fancier things may help more circumstancially)
3. the "behaviour when there's no match" should be quite flexible. I saw
many "do nothing" and many "do something" (with a large part of the latter
being "raise an exception")
4. There's a pattern of re-evaluating something on each branch of an
if/elif (like len(foo) or self.attr); and also common to create a dummy
variable just before the if/elif. This can also be fodder for PEP-572
discussion

That's what I have for now

On 4 May 2018 at 08:26, Jacco van Dorp <j.van.dorp at deonet.nl> wrote:

> Would this be valid?
>
> # Pattern matching with guards
> x = 'three'
>
> number = match x:
>     1 => "one"
>     y if y is str => f'The string is {y}'
>     z if z is int => f'the int is {z}'
>     _ => "anything"
>
> print(number)  # The string is three
>
> If so, why are y and z both valid here ? Is the match variable rebound
> to any other ? Or even to all names ?
>
> ofc, you could improve the clarity here with:
>
> number = match x as y:
>
> or any variant thereof. This way, you'd explicitely bind the variable
> you use for testing. If you don't, the interpreter would never know
> which ones to treat as rebindings and which to draw from surrounding
> scopes, if any.
>
> I also haven't really seen a function where this would be better than
> existing syntax, and the above is the only one to actually try
> something not possible with dicts. The type checking one could better
> be:
>
> x = 1
> d = {
>   int:"integer",
>   float:"float",
>   str:"str"
> }
> d.get(type(x), None)
>
> The production datetime code could be:
>
> def convert_time_to_timedelta_with_match(unit:str, amount:int, now:date):
>    return {
>         "days":timedelta(**{unit: amount}),
>         "hours":timedelta(**{unit: amount}),
>         "weeks":timedelta(**{unit: amount}),
>         # why not something like subtracting two dates here to get an
> accurate timedelta for your specific interval ?
>         "months":timedelta(days = 30*amount),  # days = (365.25 /
> 12)*amount ? Would be a lot more accurate for average month length.
> (30.4375)
>         "years":timedelta(days=365*amount),  # days = 365.25*amount ?
>         "cal_years":timedelta(now - now.replace(year=now.year - amount)),
>       }.get(unit)
>
> I honestly don't see the advantages of new syntax here.
> Unless you hate the eager evaluation in the dict literal getting
> indexed, so if it's performance critical an if/else might be better.
> But I can't see a match statement outperforming if/else. (and if you
> really need faster than if/else, you should perhaps move that bit of
> code to C or something.)
>
> 2018-05-04 0:34 GMT+02:00 Ed Kellett <e+python-ideas at kellett.im>:
> > On 2018-05-03 20:17, Chris Angelico wrote:
> >>> def convert_time_to_timedelta_with_match(unit:str, amount:int,
> now:date):
> >>>  return match unit:
> >>>      x if x in ('days', 'hours', 'weeks') => timedelta(**{unit:
> amount})
> >>>      'months' => timedelta(days=30 * amount)
> >>>      'years' => timedelta(days=365 * amount)
> >>>      'cal_years' => now - now.replace(year=now.year - amount)
> >>
> >> And then this comes down to the same as all the other comparisons -
> >> the "x if x" gets duplicated. So maybe it would be best to describe
> >> this thus:
> >>
> >> match <expr> :
> >>     <expr> | (<comp_op> <expr>) => <expr>
> >>
> >> If it's just an expression, it's equivalent to a comp_op of '=='. The
> >> result of evaluating the match expression is then used as the left
> >> operand for ALL the comparisons. So you could write your example as:
> >>
> >> return match unit:
> >>     in ('days', 'hours', 'weeks') => timedelta(**{unit: amount})
> >>     'months' => timedelta(days=30 * amount)
> >>     'years' => timedelta(days=365 * amount)
> >>     'cal_years' => now - now.replace(year=now.year - amount)
> >>
> >> Then there's room to expand that to a comma-separated list of values,
> >> which would pattern-match a tuple.
> >
> > I believe there are some problems with this approach. That case uses no
> > destructuring at all, so the syntax that supports destructuring looks
> > clumsy. In general, if you want to support something like:
> >
> >     match spec:
> >         (None, const) => const
> >         (env, fmt) if env => fmt.format(**env)
> >
> > then I think something like the 'if' syntax is essential for guards.
> >
> > One could also imagine cases where it'd be useful to guard on more
> > involved properties of things:
> >
> >     match number_ish:
> >         x:str if x.lower().startswith('0x') => int(x[2:], 16)
> >         x:str => int(x)
> >         x => x  #yolo
> >
> > (I know base=0 exists, but let's imagine we're implementing base=0, or
> > something).
> >
> > I'm usually against naming things, and deeply resent having to name the
> > x in [x for x in ... if ...] and similar constructs. But in this
> > specific case, where destructuring is kind of the point, I don't think
> > there's much value in compromising that to avoid a name.
> >
> > I'd suggest something like this instead:
> >
> >     return match unit:
> >         _ in {'days', 'hours', 'weeks'} => timedelta(**{unit: amount})
> >         ...
> >
> > So a match entry would be one of:
> > - A pattern. See below
> > - A pattern followed by "if" <expr>, e.g.:
> >   (False, x) if len(x) >= 7
> > - A comparison where the left-hand side is a pattern, e.g.:
> >   _ in {'days', 'hours', 'weeks'}
> >
> > Where a pattern is one of:
> > - A display of patterns, e.g.:
> >   {'key': v, 'ignore': _}
> >   I think *x and **x should be allowed here.
> > - A comma-separated list of patterns, making a tuple
> > - A pattern enclosed in parentheses
> > - A literal (that is not a formatted string literal, for sanity)
> > - A name
> > - A name with a type annotation
> >
> > To give a not-at-all-motivating but hopefully illustrative example:
> >
> >     return match x:
> >         (0, _) => None
> >         (n, x) if n < 32 => ', '.join([x] * n)
> >         x:str if len(x) <= 5 => x
> >         x:str => x[:2] + '...'
> >         n:Integral < 32 => '!' * n
> >
> > Where:
> >     (0, 'blorp')    would match the first case, yielding None
> >     (3, 'hello')    would match the second case, yielding
> >                     "hello, hello, hello"
> >     'frogs'         would match the third case, yielding "frogs"
> >     'frogs!'        would match the fourth case, yielding "fr..."
> >     3               would match the fifth case, yielding '!!!'
> >
> > I think the matching process would mostly be intuitive, but one detail
> > that might raise some questions: (x, x) could be allowed, and it'd make
> > a lot of sense for that to match only (1, 1), (2, 2), ('hi', 'hi'), etc.
> > But that'd make the _ convention less useful unless it became more than
> > a convention.
> >
> > All in all, I like this idea, but I think it might be a bit too heavy to
> > get into Python. It has the feel of requiring quite a lot of new things.
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Daniel F. Moisset - UK Country Manager - Machinalis Limited
www.machinalis.co.uk <http://www.machinalis.com>
Skype: @dmoisset T: + 44 7398 827139

1 Fore St, London, EC2Y 9DT

Machinalis Limited is a company registered in England and Wales. Registered
number: 10574987.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180504/449ee54c/attachment-0001.html>


More information about the Python-ideas mailing list