[Python-Dev] Re: PEP 622: Structural Pattern Matching

23 Jun 2020

      (I'm replying to several messages in one reply. But there is too much to
respond to so this is merely the first batch.)

On Tue, Jun 23, 2020 at 10:10 AM MRAB  wrote:
...
Why are:
case ._:
not OK?
In this:
case .BLACK:
         ...
     case BLACK:
         ...
the first matches the value against 'BLACK' and the second succeeds and
binds the value to 'BLACK'.
Replacing 'BLACK' with '_':
case ._:
         ...
     case _:
         ...
I'd expect something similar, except for the binding part.
I think the same could be said for:
case Color.BLACK:
and:
case _.BLACK:
We disallow ._ and _.xxx because when plain _ is used as a pattern it is a
*wildcard*, not an identifier. The pattern compiler must special-case _ as
a target because [x, x] is an invalid pattern (you can't bind the same
variable twice) but [_, _] is valid (it means "any non-string sequence of
two elements"). There are some other edge cases where _ is special-cased
too. But basically we want to prevent users doing a double take when they
see _ used as a wildcard on one line and as a variable on the next.
...
I also wonder whether "or" would be clearer than "|":
case .BLACK or Color.BLACK:
That's just bikeshedding. I personally don't think it would be clearer.
Other languages with pattern matching tend to use "|" (as do regular
expressions :-).

On Tue, Jun 23, 2020 at 10:28 AM Antoine Pitrou  wrote:
...
* What you call "Constant Value Patterns" can really refer to any local
  or non-local name, regardless of how complex the referred object is,
  right?  Also, __eq__ is called in that case, not __match__?
Yeah, we're considering renaming these to "Value Patterns". There's nothing
particularly constant about them. And yes, they use __eq__. Basically these
and literals are processed exactly the same once the value is obtained.
...
* If I understand correctly, these:
case b"":
        print("it's an empty bytes object")
and
case bytes():
        print("it's a bytes object")
have entirely different meanings.  Am I right?  This sounds like I have
to switch contexts when reading code, based on whether I am reading
regular code or a match clause, given that semantics are quite
different.
Yes, you're right. The first is a literal pattern, the second a class
pattern.

Switching how you interpret code when reading it based on context is common
-- seeing "x, y" on the LHS of an assignment is different than on the RHS,
and seeing "x=5" in a "def" line is completely different from seeing it in
a call.
...
Instead, it seems like the latter would be more explicitly spelled out
as, e.g.:
case instanceof(bytes):
        print("it's a bytes object")
But that's arbitrary syntax. The beauty of using SomeClass() is that the
pattern (if you squint :-) looks like a constructor for an object.
SomeClass() is just an edge case of the general form SomeClass(subpattern1,
subpattern2, ..., arg5=subp5, arg6=subp6, ...).
...
* The acronym "ADT" is never defined.
Yes it is, you missed this sentence:

A design pattern where a group of record-like classes is
combined into a union is popular in other languages that support pattern
matching and is known under a name of algebraic data types [2]_ or ADTs.
...
* """If there are more positional items than the length of
  __match_args__, an ImpossibleMatchError is raised."""
What if there are less positional items than ``len(__match_args__)``?
Can the match succeed rather than raise ImpossibleMatchError?  This
seems potentially error-prone.
Yes, it can succeed. Honestly, we could have  gone the other way on this
one, but we figured that there are plenty of functions and class
constructors that can be called with a variable number of positional
arguments, since some arguments have default values. We even invented an
optional attribute __match_args_required__ (an int) that would have given
the minimal number of positional arguments required, but it was deemed too
obscure (and the name is ugly). Also we definitely wanted to be able to
write `case Point():` for the isinstance check.
...
Overall, my main concern with this PEP is that the matching semantics
and pragmatics are different from everything else in the language.
When reading and understanding a match clause, there's a cognitive
overhead because suddently `Point(x, 0)` means something entirely
different (it doesn't call Point.__new__, it doesn't lookup `x` in the
locals or globals...).  Obviously, there are cases where this is
worthwhile, but still.
Quite a few other languages have done this and survived. And Python already
has LVALUES and RVALUES that look the same but have different meanings. (In
fact the pattern syntax for sequences was derived from those.)
...
It may be useful to think about different notations for these new
things, rather than re-use the object construction notation.
For example:
case Point with (x, y):
         print(f"Got a point with x={x}, y={y}")
or:
case Point @ (x, y):
         print(f"Got a point with x={x}, y={y}")
(yes, "@" is the matrix multiplication operator... but it's probably
much less likely to appear in common code and especially with a class
object at the left)
Believe me, we did plenty of bikeshedding in private. But if one of your
proposals gets overwhelming support we can revisit this.

On Tue, Jun 23, 2020 at 11:41 AM Ethan Furman  wrote:
...
Testing my understanding -- the following snippet from the PEP
match group_shapes():
         case [], [point := Point(x, y), *other]:
will succeed if group_shapes() returns two lists, the first one being
empty and the second one starting with a Point() ?
Correct. And it binds four variables: point, x, y, and other.
...
---
Runtime Specifications
The __match__ protocol
---
Suffers from several indentation errors (the nested lists are not).
I don't see this any more. Maybe someone already fixed it?
...
-------------------------------------------------------------------------
My biggest complaint is the use of
case _:
Unless I'm missing something, every other control flow statement in Python
that can have an "else" type branch uses "else" to denote it:
if/else
for/else
while/else
try/except/else
Since "else" perfectly sums up what's happening, why "case _" ?
match something:
     case 0 | 1 | 2:
         print("Small number")
     case [] | [_]:
         print("A short sequence")
     case str() | bytes():
         print("Something string-like")
     else:
         print("Something else")
Because it's not needed. In all those cases you mention the else clause
provides a feature that could not be expressed otherwise. But "case _:" is
going to work regardless, so we might as well use it. Rust, Scala and F# do
this too.

On Tue, Jun 23, 2020 at 12:07 PM Barry Warsaw  wrote:
...
Couldn’t you adopt a flat indentation scheme with the minor change of
moving the expression into a `match:` clause?  E.g.
match:
    expression
case a:
     foo()
case b:
    bar()
else:
    baz()
I didn’t see that in the rejected alternatives.
We discussed it, but ultimately rejected it because the first block would
be a novelty in Pythonic syntax: an indented block whose content is a
single expression rather than a sequence of statements. Do you think we
need to add this to the Rejected Ideas section?
...
I’m with others who think that `else:` would be a better choice than `case
_:`.  Given my background in i18n (see the flufl.i18n library, etc.), my
red flags go up when I see bare underscores being given syntactic meaning.
Ah, but bare _ has meaning throughout patterns -- it's a wildcard. This
follows the convention (common outside i18n work) that _ is a throwaway
target, e.g. for x, _, _ in points: print(x).

For example, the restriction against `case _.a:` *could* interact badly
...
with my library.  There, _ is the most common name binding for an object
that implements the translation semantics.  Therefore, it has attributes.
I can’t think of a concrete example right now, but e.g. what if I wanted to
match against `_.code`?  That wouldn’t be legal if I’m understanding
correctly (`code` being an attribute of the object typically bound to _).
Correct. You'd have to create an alias f = _ before entering the match and
then write `case f.code`.

MRAB brought this up too -- honestly if there's enough support for allowing
._ and _. we could easily allow it, it's currently just forbidden to
prevent user confusion, not for any deep reason.
...
I’m also concerned about the .BLACK vs BLACK example.  I get why the
distinction is there, and I get that the PEP proposes that static analyzers
help the ambiguity, but it strikes me as a potential gotcha and a source of
mysterious errors.
That's definitely a possibility (and some of the PEP's authors, being the
first people playing with a working implementation, have already
experienced this).

In fact this is one of the most debated issues for this PEP, and no
solution is entirely satisfactory. (My personal favorite was checking the
first letter of undotted names -- if it's a lowercase letter it's a
variable to be bound, if it's Uppercase it's a value to be loaded. This
follows PEP 8 recommendations for naming constants.)
...
Why not just bare `*` instead of `*_` in patterns?
Because we're trying to make sequence patterns look like sequence unpacking
assignments, and they support *rest. And in mapping patterns we support
**rest.
...
The PEP is unclear about what kind of method __match__() is.  As I was
reading along, I suspected it must be a static or class method, explicitly
cannot be an instance method because the class in the case statement is
never instantiated, but it’s not until I got to the object.__match__()
discussion that this distinction was made clear.  I think the PEP should
just be explicit upfront about that.
Good point. I'll add a clarification.
...
As a side note, I’m also concerned that `case Point(x, y)` *looks* like it
instantiates `Point` and that it’s jarring to Python developers that they
have to mentally switch models when reading that code.
Other languages using the same convention (e.g. Scala) seem to have no
problem with this. Basically we'll all have to learn that what comes after
"case" is *not* an expression. (Just like you have to learn that inside
f-strings {...} is an interpolation, not a dictionary. :-)
...
I was also unclear about that __match__() had to return an object until
much later in the PEP.  My running notes asked:
# Is returning None better than raising an exception? This won’t work:
class C:
     def __init__(self, x=None):
         self.x = x
    @staticmethod
    def __match__(obj):
        # Should just return obj?
        return obj.x
match C():
    case x:
        print(x)
But once I read on, I realized that __match__() should return `obj` in
this case.  Still, the fact that returning None to signal a case arm not
matching feels like there’s a gotcha lurking in there somewhere.
Hm, there's a whole section on the result value of __match__, but perhaps
it came too late. Brett also found the description of __match__ hard to
read -- I will try to add an example and mention ahead of time that
__match__ must return an object or None.
...
Should @sealed be in a separate PEP?  It’s relevant to the discussion in
622, but seems like it would have use outside of the match feature so
should possibly be proposed separately.
Hm, that would be a very short PEP, and it's really not all that useful
without a match statement.
...
The PEP is unclear whether `case` is also a soft keyword.  I’m guessing it
must be.
Yes, will clarify. (Though it is mentioned under Backwards Compatibility.
:-)

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...