[Python-Dev] Re: The semantics of pattern matching for Python

20 Nov 2020

      Hi Daniel and Mark,

Sorry for being slightly late to the party, but please let me add a  
few remarks of my own to the discussion here.

1. MUST HAVE PRECISELY DEFINED SEMANTICS

Yes, there are some aspects that we left open intentionally.  Most  
prominently the question of how often the pattern matching engine will  
check whether the subject is an instance of a particular class.  Take  
the following trivial example::

  match some_data:
      case Pair(12, 34):
          ...
      case Triple(12, 34, z):
          ...
      case Pair(12, y):
          ...
      case Pair(x, y):
          ...

In a perfect world, the compiler discovers that it must check whether  
``some_data`` is an instance of ``Pair`` exactly once and not three  
times.  This, of course, plays right into Mark's second point on  
efficiency and seems obvious enough.  Yet, as soon as we are  
considering nested patterns, it turns much less obvious whether the  
compiler is supposed to cache repeated isinstance-checks.  Can we  
really expect that the compiler must discover that in both case  
clauses the first element is checked against the same class?  Or would  
it make more sense to simply expect the compiler to potentially  
perform this ``Num`` instance check twice::

  match some_data:
      case [ Num(), 12 ]:
          ...
      case [ Num(), y, *z ]:
          ...

It is easy to think of cases where we accidentally end up calling an  
isinstance check more than once because the compiler could not prove  
that they are equal.  Still, whenever possible we want to give the  
compiler the freedom to optimise the pattern matching statement by  
caching.

In a static language, all of this would not be an issue at all, of  
course.  In Python, however, we end up being caught between its  
dynamic features and the desire to make pattern matching reasonably  
efficient.  So, we ended up leaving the question open as how often the  
pattern matching engine is allowed or supposed to check instances.   
Naturally, if you go and write some isinstance-check on a class with  
side-effects, you can break it.

2. USERS SHOULD NOT HAVE TO PAY AN UNNECESSARY PERFORMANCE PENALTY TO  
USE PATTERN MATCHING

To quote Mark [1] here:

/> Users should not have to pay an unnecessary performance penalty to  
use pattern matching./

Alright, what does this even mean?  What is an unnecessary performance  
penalty?  How should that be measured or compared?

Pattern matching is not just fancy syntax for an if-elif-statement,  
but a new way of writing and expressing structure.  There is currently  
nothing in Python that fully compares to pattern matching (which is  
obviously why we propose to add in the first place).  So, do you want  
to compare a pattern matching structure to an if-elif-chain or rather  
an implementation using reflection and/or the visitor pattern?  When  
implementing pattern matching, would we be allowed to trade off a  
little speed handling the first pattern for moving faster to patterns  
further down?

Do our PEPs really read to you like we went out of our ways to make it  
slow or inefficient?  Sure, we said let's start with an implementation  
that is correct and worry about optimising it later.  But I thought  
this is 101 of software engineering, anyway, and am thus rather  
surprised to find this item on the list.

3. FAILED MATCHES SHOULD NOT POLLUTE THE ENCLOSING NAMESPACE

  This is a slightly wider issue that has obviously sparked an entire  
discussion on this mailing list on scopes.

If there is a good solution that only assigns variables once the  
entire pattern matched, I would be very happy with that.  However, I  
think that variables should be assigned in full before evaluating any  
guards---even at the risk of the guard failing and variables being  
assigned that are not used later on.  Anything else would obviously  
introduce a mini-scope and lead to shadowing, which hardly improves  
anything with respect to legibility.

4. OBJECTS SHOULD BE ABLE DETERMINE WHICH PATTERNS THEY MATCH

Short version: no!

Class patterns are an extension of instance checks.  Leaving out the  
meta-classes at this point, it is basically the class that is  
responsible for determining if an object is an instance of it.   
Pattern matching follows the same logic, whereas Mark suggests to put  
that upside-down.  Since you certainly do not want to define the  
machinery in each instance, you end up delegating the entire thing to  
the class, anyway.

I find this suggestion also somewhat strange in light of the history  
of our PEPs.  We started with a more complex protocol that would allow  
for customised patterns, which was then ditched because it was felt as  
being too complicated.  There is still a possibility to add it later  
on, of course.  But here we are with Mark proposing to introduce a  
complex protocol again.  It would obviously also mean that we could  
not rely as much on Python's existing infrastructure, which makes  
efficient pattern matching harder, again.  I completely fail to see  
what should be gained by this.

5. IT SHOULD DISTINGUISH BETWEEN A FAILED MATCH AND AN ERRONEOUS PATTERN

This seems like a reasonable idea.  However, I do not think it is  
compatible to Python's existing culture.  Let's pick up Mark's example  
of an object ``RemoteCount`` with two attributes ``success`` and  
``total``.  You can then execute the following line in Python without  
getting any issues from the interpreter::

  my_remote_count.count = 3

Python does not discover that the attribute should have been ``total``  
rather than ``count`` here.  From a software engineering perspective,  
this is unfortunate and I would be surprised if there was anyone on  
this list who was never bitten by this.  But this is one of the prices  
we have to pay for Python's elegance in other aspects.  Requiring that  
pattern matching suddenly solves this is just not realistic.

6. SYNTAX AND SEMANTICS

There are some rather strange elements in this, such as the idea that  
the OR-pattern should be avoided.  In the matching process, you are  
also talking about matching an expression (under point 2), for  
instance; you might not really be aware of the issues of allowing  
expressions in patterns in the first place.

---

It is most certainly a good idea to start with guiding principles to  
then design and build a new feature like pattern matching.   
Incidentally, this is what we actually did before going into the  
details of our proposal.  As evidenced by extensive (!) documentation  
on our part, there is also a vision behind our proposal for pattern  
matching.

In my view, Mark's proposal completely fails to provide a vision or  
any rationale for the guiding principles, other than reference to some  
mysterious "user" who "should" or "should not" do certain things.   
Furthermore, there are various obvious holes and imprecisions that  
would have to be addressed.

Kind regards,
Tobias

[1]   
https://github.com/markshannon/pattern-matching/blob/master/precise_semantic...

Quoting Daniel Moisset :
...
[sorry for the duplicate, meant to reply-all]

    Thank you for this approach, I find it really helpful to put the  
conversation in these terms (semantics and guiding principles).       

     This is not an answer to the proposal (which I've read and  
helps me contextualize) but to your points below and how they apply  
to PEP-634. I'm also answering personally, with a reasonable guess  
about what the other authors of 634-636 would agree, but they may  
correct me if I'm wrong.
On Mon, 16 Nov 2020 at 14:44, Mark Shannon  wrote:
...
(...)
I believe that a pattern matching implementation must have the following
properties:
* The semantics must be precisely defined.
* It must be implemented efficiently.
* Failed matches must not pollute the enclosing namespace.
* Objects should be able determine which patterns they match.
* It should be able to handle erroneous patterns, beyond just syntax errors.
PEP 634 and PEP 642 don't have *any* of these properties.

     Let me answer this one by one:

     1. "The semantics must be precisely defined":
     If this happens in PEP634 I don't think it was intentional, and  
I'm pretty sure the authors would be happy to complete any  
incompleteness that it has. I would happily have a more accurate  
description (I drafted a non-official one for a much earlier version  
of  
PEP-622, https://github.com/dmoisset/notebook/blob/master/python/pep622/semantic-spec... ). Can you clarify where you see these  
imprecisions?

     2. "It must be implemented efficiently":
     I don't think "efficient implementation" was a priority in  
PEP634, although I saw your proposal defines this as "same  
performance as the equivalent if statement", and I'm quite sure that  
level of performance can be achieved (if it isn't already by  
Brandt's implementation). Finding the best way to optimise wasn't a  
priority, but I think if there was anything in our implementation  
that would make optimisations harder we would consider them as a  
change. Do you think anything like that has been presented?

     3. "Failed matches must not pollute the enclosing namespace": 
     This for me is one of the less-desirable parts of the proposal,  
and was agreed more as a matter of practicality and an engineering  
tradeoff. If you have a reasonable way of solving this (like putting  
matched variables in the stack and popping it later) reasonably I'd  
be much happier putting that in.

     4. "Objects should be able determine which patterns they match."
     This is something that you and I, and most of the authors of  
622 agree on. What we found out when discussing this is that we  
didn't have clear what and how to open that customization. Some  
customization options added a lot of complexity at the cost of  
performance, some others were very simple but it wasn't clear that  
they would be actually useful, or extensible in the future. This has  
a lot to do with this being a somewhat new paradigm in Python, and  
our lack of knowledge on what the user community may do with it  
beyond what we imagined. So the decision was "pattern matching as it  
is presented without extensibility is useful, let's get this in, and  
once we see how it is used in the wild we'll understand better what  
kind of extensibility is valuable". For a crude analogy, imagine  
trying to get the descriptor protocol right when the basic python  
object model was being implemented. These things happened as  
different times as the use of the language evolved, and my opinion  
is that customization of the match protocol must follow a similar  
path.

     5. "It should be able to handle erroneous patterns, beyond just  
syntax errors."
     I'll be answering this based on the example in your document,  
matching RemoteCount(success=True, count=count) where RemoteCount is  
a namedtuple. The code is likely an error, and I'm in general all  
for reporting errors early, but the kind of error detection proposed  
here is the kind of errors that python normally ignore. I find that  
example really similar to the kind of error you could make writing  
"if remcount.count == 3: ..." or going beyond this example "maxelems  
= configfile.read(); if len(elems) == maxelems: ...". There are many  
type errors that python ignores (especially related to equality),  
and Python has already made the decision of allowing mixed type  
equality comparisons everywhere, so why do you think pattern  
matching should be different with respect to this? In my opinion  
trying to "fix" this (or even agreeing if this is a bug or not) is a  
much more general issue unrelated to pattern matching. Given the  
current status-quo I normally trust python type checkers to help me  
with these errors, and I'd expect them to do the same with the  
"erroneous" match statement.
If there are other examples you had in mind when you wrote this I'd  
also be happy to discuss those.

     I'll try to get some time to review your specific  
counterproposal later, thanks for it.

     Best,
         Daniel
...

[Python-Dev] Re: The semantics of pattern matching for Python

Tobias Kohn