Hi Oscar

On Wed, Jul 15, 2020 at 4:41 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

> I've taken a look through PEP 622 and I've been thinking about how it could be used with sympy.

Thank you very much for taking the time to carefully elaborate an interesting possible use case. I find this very helpful and a great basis for further/future discussions on the design of pattern matching.

A deliberate part of the current design was to address the structure/shape of objects rather than the constructor directly (which could be an arbitrary complex function after all). Writing `Add(args=(Mul(...), Mul(...))` for instance is therefore consistent as it reflects the actual structure of your objects. The `__match_args__` is primarily intended for rather simple object shapes, where it is quite obvious what attributes constitute the object and in which order (similar to the `_fields` attribute in AST nodes).

From this perspective, your use case makes an argument for something I would call 'variadic shape' (in lack of a better word). Structurally, your objects behave like sequences or tuples, adorned with a specific type/class---which, again, is currently expressed as the "class(tuple)" pattern such as `Add((Mul(), Mul()))`.

There are two possibilities to approach this issue. We could introduce patterns that extract "sequential elements" via `__getitem__` rather than attributes. Or we could have a special method `__match__` that might return a representation of the object's data in sequential form.

The `__getitem__` approach turned out to come with quite a few complications. In short: it is very hard to assess an object's possibly sequential structure in a non-destructive way. Because of the multiple cases in the new pattern matching structure, we cannot just use an iterator as in unpacking. And `__getitem__` itself is extremely versatile, being used, e.g., for both sequences as well as mappings. We therefore ended up supporting only built-in structures like tuples, list, and dicts for now, for which the interpreter can easily determine how to handle `__getitem__`.

The `__match__` protocol, on the other hand, is something that we deferred so that we can make sure it really is powerful and well designed enough to handle a wide range of use cases. One of the more interesting use cases, e.g., I had in mind was to destructure data that comes as byte strings, say (something that Rhodri James [1] has brought up, too). And I think you have just added another very interesting use case to take into consideration. But probably the best course of action is really to gain some experience and collect some additional use cases.

Kind regards,
Tobias

[1] https://mail.python.org/archives/list/python-dev@python.org/message/WD2E3K5TWR4E6PZBM4TKGHTJ7VDERTDG/