On Wed, Jul 15, 2020 at 4:41 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

I've taken a look through PEP 622 and I've been thinking about how it

could be used with sympy.

In principle case/match and destructuring should be useful for sympy

because sympy has a class Basic which defines a common structure for

~1000 subclasses. There are a lot of places where it is necessary to

dispatch on the type of some object including in places that are

performance sensitive so those would seem like good candidates for

case/match. However the PEP doesn't quite seem as I hoped because it

only handles positional arguments indirectly and it does not seem to

directly handle types with variadic positional args.

The objects I refer to in sympy represent mathematical expressions e.g.:

>>> from sympy import *

>>> x, y = symbols('x, y')

>>> expr = x**2 + 2*x*y

>>> expr

x**2 + 2*x*y

You can see the structure of the object explicitly using sympy's srepr function:

>>> print(srepr(expr))

Add(Pow(Symbol('x'), Integer(2)), Mul(Integer(2), Symbol('x'), Symbol('y')))

There are a bunch of classes there (Add, Pow, Symbol, Mul, Integer)

but these are a tiny subset of the possibilities. The key feature of

Basic instances is that they have an .args attribute which can be used

to rebuild the object like:

>>> expr.args

(x**2, 2*x*y)

>>> type(expr)

<class 'sympy.core.add.Add'>

>>> type(expr)(*expr.args)

x**2 + 2*x*y

>>> type(expr)(*expr.args) == expr

True

This is known as the func-args invariant in sympy and is used to

destructure and rebuild the expression tree in different ways e.g. for

performing a substitution:

>>> expr.subs(x, 5)

10*y + 25

All Basic classes are strictly constructed using positional only

arguments and not keyword arguments. In the PEP it seems that we can

handle positional arguments when their number is fixed by the type.

For example a simplified version of Pow could be:

class Pow:

def __init__(self, base, exp):

self.args = (base, exp)

__match_args__ == ("base", "exp")

@property

def base(self):

return self.args[0]

@property

def exp(self):

return self.args[1]

Then I could match Pow in case/match with

obj = Pow(Symbol('x'), Integer(4))

match obj:

case Pow(base, exp):

# do stuff with base, exp

It seems awkward and inefficient though to go through __match_args__

and the base and exp property-methods to match the positional

arguments when they are already available as a tuple in obj.args. Note

that performance is a concern: just dispatching on isinstance() has a

measurable overhead in sympy code which is almost always CPU-bound.

The main problem though is with variadic positional arguments. For

example sympy has a symbolic Tuple class which is much like a regular

python tuple except that it takes multiple positional args rather than

a single iterable arg:

class Tuple:

def __init__(self, *args):

self.args = args

So now how do I match a 2-Tuple of two integers? I can't use

__match_args__ because that's a class attribute and different

instances have different numbers of args. It seems I can do this:

obj = Tuple(2, 4)

match obj:

case Tuple(args=(2, 4)):

That's awkward though because it doesn't match the constructor syntax

which strictly uses positional-only args. It also doesn't scale well

with nesting:

obj = Tuple(Tuple(1, 2), Tuple(3, 4))

match obj:

case Tuple(args=(Tuple(args=(1, 2)), Tuple(args=(3, 4))):

# handle ((1, 2), (3, 4)) case

Another option would be to fake a single positional argument for

matching purposes:

class Tuple:

__match_args__ == ("args",)

def __init__(self, *args):

self.args = args

match obj:

case Tuple((Tuple((1, 2)), Tuple((3, 4)))):

This requires an extra level of brackets for each node and also

doesn't match the actual constructor syntax: evaluating that pattern

in sympy turns each Tuple into a 1-Tuple containing another Tuple of

the args:

>>> t = Tuple((Tuple((1, 2)), Tuple((3, 4))))

>>> print(srepr(t))

Tuple(Tuple(Tuple(Tuple(Integer(1), Integer(2))),

Tuple(Tuple(Integer(3), Integer(4)))))

I've used Tuple in the examples above but the same applies to all

variadic Basic classes: Add, Mul, And, Or, FiniteSet, Union,

Intersection, ProductSet, ...

From a first glimpse of the proposal I thought I could do matches like this:

match obj:

case Add(Mul(x, y), Mul(z, t)) if y == t:

case Add(*terms):

case Mul(coeff, *factors):

case And(Or(A, B), Or(C, D)) if B == D:

case Union(Interval(x1, y1), Interval(x2, y2)) if y1 == x2:

case Union(Interval(x, y), FiniteSet(*p)) | Union(FiniteSet(*p), Interval(x, y)):

case Union(*sets):

Knowing the sympy codebase each of those patterns would look quite

natural because they resemble the constructors for the corresponding

objects (as intended in the PEP). It seems instead that many of these

constructors would need to have args= so it becomes:

match obj:

case Add(args=(Mul(args=(x, y)), Mul(args=(z, t)))) if y == t:

case Add(args=terms):

case Mul(args=(coeff, *factors)):

case And(args=(Or(args=(A, B)), Or(args=(C, D)))) if C == D:

case Union(args=(Interval(x1, y1), Interval(x2, y2))) if y1 == x2:

case Union(args=(Interval(x, y), FiniteSet(args=p))) | Union(args=(FiniteSet(args=p), Interval(x, y))):

case Union(args=sets):

Each of these looks less natural as they don't match the constructors

and the syntax gets messier with nesting.

That's a really interesting new use case you're bringing up.

You may have noticed that between v1 and v2 of the PEP we withdrew the `__match__` protocol; we've been brainstorming about different forms a future `__match__` protocol could take, once we have more practical experience. One possible variant we've been looking at would be something that would *only* be used for positional arguments -- `__match__` would just return a tuple of values extracted from the object that can then be matched by the interpreter's match machinery. Your use case could then (almost, see below) be handled by having `__match__` just return `self.args`.

I also think there's a hack that will work today, assuming your users aren't going to write match statements with insanely long parameter lists for class patterns: You can set `__match_args__ = ["__0__", "__1__", "__2__", ..., "__25__"]`, and define 26 properties like this:

```

@property

def __0__(self):

return self.args[0]

@property

def __1__(self):

return self.args[1]

# etc.

```

But now for the caveat.

As the PEP currently stands, you don't *have* to specify all parameters in a class pattern. For example, using a Point3d class that takes `(x, y, z)`, you can write `Point(x, y)` as a shorthand for `Point(x, y, _)`. This is intended to make life easy for classes that have several positional arguments with defaults, where the constructor can also omit some of the arguments. (It also avoids needing to have a special case for a class with *no* arguments, which covers the important use case of *just* wanting to check the type.) But for your use case it seems it would be less than ideal to have `Add(Symbol('x'), Symbol('y'))` be a valid match for `x + y + z`. I can think of a workaround (pass a sentinel to the pattern) but it would be uglier than doubling the parentheses.

Note that this only applies to class patterns -- sequence patterns require an explicit `*_` to ignore excess values. Because of this, writing `Add(args=(...))` or `Add((...))` would circumvent the problem (but it would have the problems you pointed out of course). When we design the `__match__` protocol in the future we can make sure there's a way to specify this. For example, we could pass *in* the number of positional sub-patterns. This has been proposed, but we weren't sure of the use case -- now we have one (unless I'm misunderstanding your intentions).

Thoughts?

--

--Guido van Rossum (python.org/~guido)