[Python-Dev] Re: PEP 622 aspects

19 Jul 2020

      Hi Koos,

  Let me try and address some of the concerns and questions you are  
rising.  I am replying here to two emails of yours so as to keep  
traffic down.

  Quoting Koos Zevenhoven :
...
(1) Class pattern that does isinstance and nothing else.
If I understand the proposed semantics correctly, `Class()` is  
equivalent to checking `isinstance(obj, Class)`, also when  
`__match_args__` is not present. However, if a future match protocol  
is allowed to override this behavior to mean something else, for  
example `Class() == obj`, then the plain isinstance checks won't  
work anymore! I do find `Class() == obj` to be a more intuitive and  
consistent meaning for `Class()` than plain `isinstance` is.
Instead, the plain isinstance check would seem to be well described  
by a pattern like `Class(...)`. This would allow isinstance checks  
for any class, and there is even a workaround if you really want to  
refer to the Ellipsis object. This is also related to the following  
point.
(2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)`
In the proposed semantics, cases like this are equivalent. I can see  
why that is desirable in many cases, although Class(x=1, ...)` would  
make it more clear. A possible improvement might be to add an  
optional element to  `__match_args__` that separates optional  
arguments from required ones (although "optional" is not the same as  
"don't care").
...
This is related to one of my concerns regarding PEP 622. It may be  
tempting to see pattern matching as a form of assignment. However,
Please let me answer these two questions in reverse order, as I think  
it makes more sense to tackle the second one first.

  **2. ATTRIBUTES**

  There actually is an important difference between `Class(x=1, y=_)`  
and `Class(x=1)` and it won't do to just write `Class(x=1,...)`  
instead.  The form `Class(x=1, y=_)` ensures that the object has an  
attribute `y`.  In a way, this is where the "duck typing" is coming in.

  The class of an object and its actual shape (i.e. the set of  
attributes it has) are rather loosely coupled in Python: there is  
usually nothing in the class itself that specifies what attributes an  
object has (other than the good sense to add these attributes in  
`__init__`).  Conceptually, it therefore makes sense to not only  
support `isinstance` but also `hasattr`/`getattr` as a means to  
specify the shape/structure of an object.

  Let me give a very simple example from Python's `AST` module.  We  
know that compound statements have a field `body` (for the suite) and  
possibly even a field `orelse` (for the `else` part).  But there is no  
common superclass for compound statements.  Hence, although it is  
shared by several objects, you cannot detect this structure through  
`isinstance` alone.  By allowing you to explicitly specify attributes  
in patterns, you can still use pattern matching notwithstanding:
```
MATCH node:
    CASE ast.stmt(body=suite, orelse=else_suite) if else_suite:
        # a statement with a non-empty else-part
        ...
    CASE ast.stmt(body=suite):
        # a compound statement without else-part
        ...
    CASE ast.stmt():
        # a simple statement
        ...
```

  The very basic form of class patterns could be described as  
`C(a_1=P_1, a_2=P_2, ...)`, where `C` is a class to be checked through  
`isinstance`, and the `a_i` are attribute names to be extracted by  
means of `getattr` to then be matched against the subpatterns `P_i`.   
In short: you specify the structure not only by class, but also by its  
actual structure in form of required attributes.

  Particularly for very simple objects, it becomes annoying to specify  
the attribute names each time.  Take, for instance, the  
`Num`-expression from the AST.  It has just a single field `n` to hold  
the actual number.  But the AST objects also contain an attribute  
`_fields = ('n',)` that not only lists the *relevant* attributes, but  
also specifies an order.  It thus makes sense to introduce a  
convention that in `Num(x)` without argument name, the `x` corresponds  
to the first field `n`.  Likewise, you write `UnarOp('+', item)`  
without the attribute names because `_fields=('op', 'operand')`  
already tells you what attributes are meant.  That is essentially the  
principle we adopted through introduction of `__match_args__`.

**1. MATCH PROTOCOL**

  I am not entirely sure what you mean by `C() == obj`.  In most cases  
you could not actually create an instance of `C` without some  
meaningful arguments for the constructor.

  The idea of the match-protocol is very similar to how you can  
already override the behaviour of `isinstance`.  It is not meant to  
completely change the semantics of what is already there, but to allow  
you to customise it (in some exciting ways ^_^).  Of course, as with  
everything customisable, you could go off and do something funny with  
it, but if it then breaks, that's quite on you.

  On the caveat that this is **NOT PART OF THIS PEP (!)**, let me try  
and explain why we would consider a match protocol in the first  
place.  The standard example to consider are complex numbers.  In  
Python complex numbers are represented in their "rectangular" form,  
i.e. as `c = a + b*j` with a real and an imaginary part.  However,  
this is not the only way to represent a complex number.  You could  
equally write it in its polar form as `c = r * exp(i * phi)`.   
Depending on the context, this second form has some advantages, e.g.,  
when computing the root or power of `c`.

  So, what we would like to do is write a pattern like this:
```
CLASS Polar:
    DEF __init__(self, r, p=0):
        IF isinstance(r, complex):
            r, p = rect_to_polar(r)
        self.radius = r
        self.phi = p

MATCH some_complex_number:
    CASE Polar(radius=r, phi=p):
        ...
```
Naively, however, this will always fail because a complex number `c`  
in Python is never an instance of my custom class `Polar`.  Just  
overriding the `isinstance` behaviour of `Polar` will not suffice,  
either, because we are then trying to access attributes that are not  
there (namely `radius` and `phi`).  Our original approach was  
therefore to allow `Polar` to swap the subject of pattern matching for  
further processing inside a given case clause.  An `instancecheck` on  
steroids if you will.  Something along the lines of:
```
CLASS Polar:
    @staticmethod
    DEF __match__(original_subject):
        IF isinstance(original_subject, Polar):
            RETURN original_subject)
        ELIF isinstance(original_subject, complex):
            RETURN Polar(original_subject)
        ELSE:
            RETURN CANNOT_MATCH
```
There are various valid concerns with this initial idea of the match  
protocol, and we will probably be aiming for a simpler, less complex  
variant that addresses actual use cases as best as possible.  But any  
such future extension will be an opt-in extension of current semantics  
and not a replacement that suddenly changes the meaning of class  
pattern altogether.

Quoting Koos Zevenhoven :

that is quite a stretch, both conceptually and as a future direction.  
There is no way these 'match expressions' could be allowed in regular  
assignments – the way names are treated just needs to be different.  
And allowing them in walrus assignments doesn't make much sense either.

We probably agree here on an important aspect: in Python we cannot  
simply extend the idea of patterns (as proposed by PEP 622) to general  
assignments.  But this in itself hardly says anything about the  
validity of the proposal.  Passing arguments to a function is clearly  
a form of assignment that follows slightly different rules to "normal"  
assignment as a stand-alone statement, not to mention special  
assignment structures like `for` loops or `with` blocks.  Up to a  
certain point, it is even debatable what "assignment" means: some  
functional languages would argue that they work without assignments  
altogether because it is well hidden in parameter passing.

  Then there are other statements such as `return` that are only valid  
in the context of a function.  Although we certainly could attach  
meaning to `return` on the module level as well.  But there are good  
reasons not to do that, and yet it does not decrease the validity of  
it in any way.
...
Conceptually, it is strange to call this match operation an  
assignment. Most of the added power comes from checking that the  
object has a certain structure or contents – and in many cases, that  
is the only thing it does! As a (not always) handy side product, it  
is also able to assign things to specified targets. Even then, the  
whole pattern is not assigned to, only parts of it are.
In mathematics, assignment (definition) and re-assignment is often  
denoted with the same sign as equality/identity, because it is  
usually clear from the context, which one is in question. Usually,  
however, it matters which one is in question. Therefore, as we well  
know, we have = for assignment, == for equality, and := to emphasize  
assignment. Matching is closer to ==, or almost :==.

In general, patterns have a "compare and assign" semantics, or  
perhaps "filter and assign".  And, indeed, you can forgo the  
assignment aspect completely, but that's conceptually not what  
patterns are meant for.  Moreover, the syntax is flexible enough to  
mask a lot of what would have been an assignment in the original  
conception of a pattern.

  I would claim this is quite similar to functions.  In Python all  
functions return a value, even though you might throw away that value  
in many cases, particularly if the only value a function can return is  
`None`.  Still I would certainly not go as far as saying the concept  
of returning a value is wrong because it might only apply in so many  
cases.  In the end, having one unifying concept for everything can be  
quite helpful (although it is an abstraction that does not come easy  
to novices but must be explicitly learned first).

  Just as an aside: mathematics actually has neither assignments nor  
re-assignments.  That's a very "computer sciency" reading of  
mathematical equations, just as `x = x + 1` in programming famously is  
_not_ a mathematical equation but rather an assignment.  As this  
confusion is often an issue for students learning to program, I think  
we should be careful to properly distinguish these concepts.

  Kind regards,
Tobias

[Python-Dev] Re: PEP 622 aspects

Tobias Kohn