Request for comments on final version of PEP 653 (Precise Semantics for Pattern Matching)

Hi everyone, As the 3.10 beta is not so far away, I've cut down PEP 653 down to the minimum needed for 3.10. The extensions will have to wait for 3.11. The essence of the PEP is now that: 1. The semantics of pattern matching, although basically unchanged, are more precisely defined. 2. The __match_kind__ special attribute will be used to determine which patterns to match, rather than relying on the collections.abc module. Everything else has been removed or deferred. The PEP now has only the slightest changes to semantics, which should be undetectable in normal use. For those corner cases where there is a difference, it is to make pattern matching more robust. E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not. As always, all thoughts and comments are welcome. Cheers, Mark.

On Sat, 27 Mar 2021 at 13:40, Mark Shannon <mark@hotpy.org> wrote:
Hi Mark, Thanks for putting this together.
It would take me some time to compare exactly how this differs from the current state after PEP 634 but I certainly prefer the object-model based approach. It does seem that there are a lot of permutations of how matching works but I guess that's just trying to tie up all the different cases introduced in PEP 634.
Maybe I misunderstood but it looks to me as if this (PEP 653) changes the behaviour of a mapping pattern in relation to extra keys. In PEP 634 extra keys in the target are ignored e.g.: obj = {'a': 1, 'b': 2} match(obj): case {'a': 1}: # matches obj because key 'b' is ignored In PEP 634 the use of **rest is optional if it is desired to catch the other keys but does not affect matching. Here in PEP 653 there is the pseudocode: # A pattern not including a double-star pattern: if $kind & MATCH_MAPPING == 0: FAIL if $value.keys() != $KEYWORD_PATTERNS.keys(): FAIL My reading of that is that all keys would need to be match unless **rest is used to absorb the others. Is that an intended difference? Personally I prefer extra keys not to be ignored by default so to me that seems an improvement. If intentional then it should be listed as another semantic difference though.
E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not.
As I understood it this proposes that match obj: should use the class attribute type(obj).__match_kind__ to indicate whether the object being matched should be considered a sequence or a mapping or something else rather than using isinstance(obj, Sequence) and isinstance(obj, Mapping). Is there a corner case here where an object can be both a Sequence and a Mapping? (How does PEP 634 handle that?) Not using the Sequence and Mapping ABCs is good IMO. I'm not aware of other core language features requiring the use of ABCs. In SymPy we have specifically avoided them because they slow down isinstance checking (this is measurable in the time taken to run the whole test suite). Using the ABCs in PEP 634 seems surprising given that the original pattern matching PEP actually listed the performance impact of isinstance checks as part of the opening motivation. Maybe the ABCs can be made faster but either way using them like this seems not in keeping with the rest of the language. Oscar

Hi Oscar, Thanks for the feedback. On 27/03/2021 4:19 pm, Oscar Benjamin wrote:
I missed that when updating the PEP, thanks for pointing it out. It should be the same as for double-star pattern: if not $value.keys() >= $KEYWORD_PATTERNS.keys(): FAIL I'll update the PEP.
I don't have a strong enough opinion either way. I can see advantages to both ways of doing it.
If you define a class as a subclass of both collections.abc.Sequence and collections.abc.Mapping, then PEP 634 will treat it as both sequence and mapping, meaning it has to try every pattern. That prevents the important (IMO) optimization of checking the kind only once. Cheers, Mark.

Hi Mark, Reading that spec will take some time. Can you please summarize the differences in English, in a way that is about as precise as PEP 634? I have some comments inline below as well. On Sat, Mar 27, 2021 at 10:16 AM Mark Shannon <mark@hotpy.org> wrote:
It would be simpler if this was simply an informational PEP without proposing new features -- then we wouldn't have to rush. You could then propose the new __match_kind__ attribute in a separate PEP, written more in the style of PEP 634, without pseudo code. I find it difficult to wrap my head around the semantics of __match_kind__ because it really represents a few independent flags (with some constraints) but all the text is written using explicit, hard-to-read bitwise and/or operations. Let me give it a try. - Let's call the four flag bits by short names: SEQUENCE, MAPPING, DEFAULT, SELF. SEQUENCE and MAPPING are for use when an instance of a class appears in the subject position (i.e., for `match x`, we look for these bits in `type(x).__match_kind__`). Neither of these is set by default. At most one of them should be set. - If SEQUENCE is set, the subject is treated like a sequence (this is set for list, tuple and other sequences, but not for str, bytes and bytearray). - Similarly, MAPPING means the subject should be treated as a mapping, and is set for dict and other mapping types. The DEFAULT and SELF flags are for use when a class is used in a class pattern (i.e., for `case cls(...)` we look for these bits in `cls.__match_kind__`). At most one of these should be set. DEFAULT is set on class `object` and anything that doesn't explicitly clear it. - If DEFAULT is set, semantics of PEP 634 apply except for the special behavior enabled by the SELF flag. - If SELF is set, `case cls(x)` binds the subject to x, and no other forms of `case cls(...)` are allowed. - If neither DEFAULT nor SELF is set, `case cls(...)` does not take arguments at all. Please correct any misunderstandings I expressed here! (And please include some kind of summary like this in your PEP.) Also, I think that we should probably separate this out in two separate flag sets, one for subjects and one for class patterns -- it is pretty confusing to merge the flag sets into a single value when their applicability (subject or class pattern) is so different.
Let's not change this. We carefully discussed and chose this behavior (ignore extra mapping keys, but don't ignore extra sequence items) for PEP 634 based on usability.
Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.)
I am fine with changing this one aspect of PEP 634. IIRC having separate SEQUENCE and MAPPING flags just for matching didn't occur to us during the design, and we strongly preferred some kind of type-based check over checking the presence of a specific attribute like `key`. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, Thanks for the feedback. On 27/03/2021 10:15 pm, Guido van Rossum wrote:
It is about to close to that as I can get it. The change to using __match_kind__ requires some small changes to behaviour.
`case cls():` is always allowed, regardless of flags.
I think you expressed it well. I'll add a more informal overview section to the PEP.
That would require two different special attributes, which adds bulk without adding any value. __match_kind__ = MATCH_SEQUENCE | MATCH_DEFAULT should be clear to anyone familiar with integer flags.

On Mon, 29 Mar 2021, 7:47 pm Mark Shannon, <mark@hotpy.org> wrote: [Guido wrote]
The combined flags might be clearer if the class matching flags were "MATCH_CLS_DEFAULT" and "MATCH_CLS_SELF" Without that, it isn't obvious that they're modifying the way class matching works. Alternatively, given Guido's suggestion of two attributes, they could be "__match_container__" and "__match_class__". The value of splitting them is that they should compose better under inheritance - the container ABCs could set "__match_container__" appropriately without affecting the way "__match_class__" is set. An implementation might flatten them out at class definition time for optimisation reasons, but it wouldn't need to be part of the public API. Cheers, Nick.

On Mon, Mar 29, 2021 at 7:35 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
+1 An implementation might flatten them out at class definition time for
optimisation reasons, but it wouldn't need to be part of the public API.
Since the two flag sets are independent the bult is only apparent. Few classes would need to set one of these, let alone two. In the C layer they may be combined as part oftp_flags (assuming there are enough free bits). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Overall, I am still uncomfortable with PEP 653, and would probably not support its acceptance. Although it has thankfully become a much less radical proposal than it was a few weeks ago (thanks, Mark, for your attention to our feedback), I feel that the rules it binds implementations to are *very* premature, and that the new mechanisms it introduces to do so only modestly improve potential performance at great expense to the ease of learning, using, and maintaining code using structural pattern matching. A few notes follow:
Maybe I'm missing something, but I don't understand at all how the provided code snippet relies on the self-matching behavior. Have the maintainers of SymPy (or any large library supposedly benefitting here) come out in support of the PEP? Are they at least aware of it? Have they indicated that the proposed idiom for implementing self-matching behavior using a property is truly too "tricky" for them? Have you identified any stdlib classes that would benefit greatly from this? For me, `__match_class__` feels like a feature without demonstrated need. Even if there is a great demand for this, I certainly think that there are far better options than the proposed flagging system: - A `@match_self` class decorator (someone's bound to put one on PyPI, at any rate). - Allowing `__match_args__ = None` to signal this case (an option we previously considered, and my personal preference). ...both of which can be added later, if needed. Further, PEP 634 makes it very easy for libraries to support Python versions with *and* without pattern matching (something I consider to be an important requirement). The following class works with both 3.9 and 3.10: ``` class C(collections.abc.Sequence): ... ``` While something like this is required for PEP 653: ``` class C: if sys.version_info >= (3, 10): from somewhere import MATCH_SEQUENCE __match_container__ = MATCH_SEQUENCE ... ```
PEP 634 relies on the `collections.abc` module when determining which patterns a value can match, implicitly importing it if necessary. This PEP will eliminate surprising import errors and misleading audit events from those imports.
I think that a broken `_collections_abc` module *should* be surprising. Is there any reasonable scenario where it's expected to not exist, or be not be fit for this purpose? And I'm not sure how an audit event for an import that is happening could be considered "misleading"... I certainly wouldn't want it suppressed.
Looking up a special attribute is much faster than performing a subclass test on an abstract base class.
How much faster? A quick benchmark on my machine suggests less than half a microsecond. PEP 634 (like PEP 653) already allows us to cache this information for the subject of a match statement, so I doubt that this is actually a real issue in practice. An indeed, with the current implementation, this test isn't even performed on the most common types, such as lists, tuples, and dictionaries. At the very least, PEP 653's confusing new flag system seems to be a *very* premature optimization, seriously hurting usability for a modest performance increase. (Using them wrongly also seems to introduce a fair amount of undefined behavior, which seems to go against the PEP's own motivation.)
If the value of `__match_args__` is not as specified, then the implementation may raise any exception, or match the wrong pattern.
I think there's a name for this sort of behavior... ;) A couple of other, more technical notes: - PEP 653 requires mappings to have a `keys()` method that returns an object supporting set inequality operations. It is not really that common to find this sort of support in user code (in my experience, it is more likely that user-defined `keys()` methods will return iterables). It's not even clear to me if this is an interface requirement for mappings in general. For example, `weakref.WeakKeyDictionary` and `weakref.WeakValueDictionary` presently do not work with PEP 653's requirements for mapping patterns, since their `keys()` methods return iterators. - Treating `__getitem__` as pure is problematic for some common classes (such as `defaultdict`). That's why we use two-argument `get()` instead. As well-fleshed out as the pseudocode for the matching operations in this PEP may be, examples like this suggest that perhaps we should wait until 3.11 or later to figure out what actually works in practice and what doesn't. PEP 634 took a full year of work, and the ideas it proposed changed substantially during that time (in no small part because we had many people experimenting with how an actual implementation interacted with real code). Brandt

Hi Brandt, On 30/03/2021 5:25 pm, Brandt Bucher wrote:
No I'm missing something. That should read case Mul(args=[Symbol(a), Symbol(b)) if a == b: ... I'll fix that in the PEP thanks.
The distinction between those classes that have the default behavior and those that match "self" is from PEP 634. I didn't introduce it. I'm just proposing a more principled way to make that distinction.
Or class C: __match_container__ = 1 # MATCH_SEQUENCE Which is one reason the PEP states that the values of MATCH_SEQUENCE, etc. will never change.
PEP 634 relies on the `collections.abc` module when determining which patterns a value can match, implicitly importing it if necessary. This PEP will eliminate surprising import errors and misleading audit events from those imports.
I think that a broken `_collections_abc` module *should* be surprising. Is there any reasonable scenario where it's expected to not exist, or be not be fit for this purpose?
No reasonable scenario, but unreasonable scenarios happen all too often.
And I'm not sure how an audit event for an import that is happening could be considered "misleading"... I certainly wouldn't want it suppressed.
It's misleading because a match statement doesn't include any explicit imports.
Looking up a special attribute is much faster than performing a subclass test on an abstract base class.
How much faster? A quick benchmark on my machine suggests less than half a microsecond. PEP 634 (like PEP 653) already allows us to cache this information for the subject of a match statement, so I doubt that this is actually a real issue in practice. An indeed, with the current implementation, this test isn't even performed on the most common types, such as lists, tuples, and dictionaries.
Half a microsecond is thousands of instructions on a modern CPU. That is a long time for a single VM operation.
At the very least, PEP 653's confusing new flag system seems to be a *very* premature optimization, seriously hurting usability for a modest performance increase. (Using them wrongly also seems to introduce a fair amount of undefined behavior, which seems to go against the PEP's own motivation.)
Why do you say it is a premature optimization? It's primary purpose is reliability and precise semantics. It is more optimizable, I agree, but that is hardly premature. You also say it is confusing, but I think it is simpler than the workarounds to match "self" that you propose. This is very subjective though. Evidently we think differently.
If the value of `__match_args__` is not as specified, then the implementation may raise any exception, or match the wrong pattern.
I think there's a name for this sort of behavior... ;)
Indeed, but there is only undefined behavior if a class violates clearly specified rules. The undefined behavior in PEP 634 is much broader. We already tolerate some amount of undefined behavior. For example, dictionary lookup is also undefined for classes which do not hash properly.
Thanks for pointing that out. I'd noted that PEP 634 used `get()`, which is why I inserted the guard on keys beforehand. Clearly that is insufficient. I'll update the semantics to use the two-argument `get()`. It does seem more robust.
As well-fleshed out as the pseudocode for the matching operations in this PEP may be, examples like this suggest that perhaps we should wait until 3.11 or later to figure out what actually works in practice and what doesn't. PEP 634 took a full year of work, and the ideas it proposed changed substantially during that time (in no small part because we had many people experimenting with how an actual implementation interacted with real code).
I fully understand that a lot of work went into PEP 634. Which is why I am using the syntax and much of the semantics of PEP 634 as I can. The test suite is proving very useful, thanks. The problem with waiting for 3.11 is that code will start to rely on some of the implementation details of pattern matching as it is now, and that our ability to optimize it will be delayed by a year. Cheers, Mark.

Hi Mark. I've spoken with Guido, and we are willing to propose the following amendments to PEP 634: - Require `__match_args__` to be a tuple. - Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`. Using this in Python would look like: ``` class MySeq: __match_seq__ = True ... class MyMap: __match_map__ = True ... ``` Using this in C would look like: ``` PyTypeObject PyMySeq_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_SEQ | ..., ... } PyTypeObject PyMyMap_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_MAP | ..., ... } ``` We believe that these changes will result in the best possible outcome: - The new mechanism should faster than either PEP. - The new mechanism should provide a better user experience than either PEP when defining types in either Python *or C*. If these amendments were made, would you be comfortable withdrawing PEP 653? We think that if we're in agreement here, a compromise incorporating these promising changes into the current design would be preferable to submitting yet another large pattern matching PEP for a very busy SC to review and pronounce before the feature freeze. I am also willing, able, and eager to implement these changes promptly (perhaps even before the next alpha) if so. Thanks for pushing us to make this better. Brandt

Hi Brandt, On 30/03/2021 11:49 pm, Brandt Bucher wrote:
I think we're all in agreement on this one. Let's just do it.
I don't like the way this need special inheritance rules, where inheriting one attribute mutates the value of another. It seems convoluted. Consider: class WhatIsIt(MySeq, MyMap): pass With __match_container__ it works as expected with no special inheritance rules. This was why you convinced me to split __match_kind__; it works better with inheritance. Anther reason for preferring __match_container__ is that it provides a better option for extensibility, IMO. Suppose we wanted to add a "set" pattern in the future, with __match_container__ we just need to add a new constant. With your proposed approach, we would need another special attribute.
I'm wary of using up tp_flags, as they are a precious resource, but this does provide a more declarative way to specific the behavior than setting the attribute via the C-API.
We believe that these changes will result in the best possible outcome: - The new mechanism should faster than either PEP.
The naive implementation of the boolean version might be a tiny bit faster (it would hard to measure a difference). However, once specialized by type version (as we do for LOAD_ATTR) both forms become a no-op.
- The new mechanism should provide a better user experience than either PEP when defining types in either Python *or C*.
The inheritance rules make __match_container__ a better user experience in Python, IMO. As for C, there no reason why the it would make any difference, __match_container__ could be (tp_flags & (Py_TPFLAGS_MATCH_SEQ|Py_TPFLAGS_MATCH_MAP)) shifted to line up the bits.
If these amendments were made, would you be comfortable withdrawing PEP 653? We think that if we're in agreement here, a compromise incorporating these promising changes into the current design would be preferable to submitting yet another large pattern matching PEP for a very busy SC to review and pronounce before the feature freeze. I am also willing, able, and eager to implement these changes promptly (perhaps even before the next alpha) if so.
I think we are close to agreement on the mechanism for selecting which pattern to match, but I still want the better defined semantics of PEP 653.
Thanks for pushing us to make this better.
And thank you for the feedback. Cheers, Mark.

On Wed, Mar 31, 2021 at 2:30 AM Mark Shannon <mark@hotpy.org> wrote:
Wait a minute, do you expect WhatIsIt to be a sequence but not a map? *I* would expect that it is both, and that's exactly what Brandt's proposal does. So I see this as a plus.
I think we are close to agreement on the mechanism for selecting which pattern to match, but I still want the better defined semantics of PEP 653.
I don't know that PEP 653's semantics are better. Have you analyzed any *differences* besides the proposal above? I've personally found reading your pseudo-code very difficult, so I simply don't know. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, On 31/03/2021 6:21 pm, Guido van Rossum wrote:
Earlier you said: Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.) PEP 653 requires that: (__match_container__ & (MATCH_SEQUENCE | MATCH_MAPPING)) != (MATCH_SEQUENCE | MATCH_MAPPING) Would you require that (__match_seq__ and __match_map__) is always false? If so, then what is the mechanism for handling the `WhatIsIt` class? If not, then you loose the ability to make a single test to determine which patterns can apply.
PEP 653 semantics are more precise. I think that is better :) Apart from that, I think the semantics are so similar once you've added __match_seq__/__match_map__ to PEP 634 that is hard to claim one is better than the other. My (unfinished) implementation of PEP 653 makes almost no changes to the test suite. The code in the examples is Python, not pseudo-code. That might be easier to follow. Cheers, Mark.

On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org> wrote:
[me, responding to Mark]
[Now back to Mark]
Ah, you caught me there. I do think that classes that combine both characteristics are in troublesome water. I think we can get optimizability either way, so I'll focus on semantics. Brandt has demonstrated that it's ugly to write the code for a class that in match statements behaves as either a sequence or a mapping (but not both) while at the same time keeping the code compatible with Python 3.9 or before. I also think that using flag attributes that are set to True or False (instead of using a bitmap of flags, which is obscure to many Python users) solves this problem nicely. Using separate flag attributes happens to lead to different semantics than the flags-bitmap approach in the case of multiple inheritance. Given that one *can* inherit from both Sequence and Mapping, having separate flags seems slightly better than the flags-bitmap approach. It wasn't enough to convince me earlier, but the other advantage does convince me: separate flag attributes are better than using a flags-bitmap. Now, if it weren't for other issues, having no flags at all here but just signalling the applicable pattern kinds through inheritance from collections.abc.{Sequence,Mapping} would be even cleaner. But we do have other issues: (a) the exceptions for str, bytes, bytearray, and (b) the clumsiness of importing collections.abc (which is Python code) deep in the ceval main loop. So some explicit form of signalling this is fine -- and for classes that explicitly inherit from Sequence or Mapping will get it for free that way.
Nope. If so, then what is the mechanism for handling the `WhatIsIt` class?
If not, then you loose the ability to make a single test to determine which patterns can apply.
Translating the flag attributes to bits in tp_flags (or in a new flags variable elsewhere in the type object) would still allow a pretty fast test. And needing to support overlapping subsets of the cases is not unique to this situation, after all a class may well be a sequence *and* have attributes named x, y and z.
I wish I knew of a single instance where PEP 634 and PEP 653 actually differ.
I'd like to see where those differences are -- then we can talk about which is better. :-)
The code in the examples is Python, not pseudo-code. That might be easier to follow.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, On 31/03/2021 9:53 pm, Guido van Rossum wrote:
On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:
[snip]
Almost all the changes come from requiring __match_args__ to be a tuple of unique strings. The only other change is that case int(real=0+0j, imag=0-0j): fails to match 0, because `int` is `MATCH_SELF` so won't match attributes. https://github.com/python/cpython/compare/master...markshannon:pep-653-imple... Cheers, Mark.

On Thu, Apr 1, 2021 at 2:18 PM Mark Shannon <mark@hotpy.org> wrote:
Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work, and I don't see anything semantically suspect in allowing that.
Oh, but that would be a problem. The intention wasn't that "self" mode prevents keyword/attribute matches. (FWIW real and imag should attributes should not be complex numbers, so that testcase is weird, but it should work.)
https://github.com/python/cpython/compare/master...markshannon:pep-653-imple...
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 4/1/2021 9:38 PM, Guido van Rossum wrote:
The current posted PEP does not say 'unique' and I agree with Guido that it should not.
Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work,
The current near-Python code does not have such a check.
and I don't see anything semantically suspect in allowing that.
If I understand the current pseudocode correctly, the effect of 's' appearing twice in 'C.__match_args__ would be to possibly look up and assign C.s to two different names in a case pattern. I would not be surprised if someone someday tries to do this intentionally. Except for the repeated lookup, it would be similar to a = b = C.s. This might make sense if C.s is mutable. Or the repeated lookups could yield different values. -- Terry Jan Reedy

On Thu, Apr 1, 2021 at 8:01 PM Terry Reedy <tjreedy@udel.edu> wrote:
(Of course, "the current PEP" is highly ambiguous in this context.) Well, now I have egg on my face, because the current implementation does reject multiple occurrences of the same identifier in __match_args__. We generate an error like "TypeError: C() got multiple sub-patterns for attribute 'a'". However, I cannot find this uniqueness requirement in PEP 634, so I think it was a mistake to implement it. Researching this led me to find another issue where PEP 634 and the implementation differ, but this time it's the other way around: PEP 634 says about types which accept a single positional subpattern (int(x), str(x) etc.) "for these types no keyword patterns are accepted." Mark's example `case int(real=0, imag=0):` makes me think this requirement is wrong and I would like to amend PEP 634 to strike this requirement. Fortunately, this is not what is implemented. E.g. `case int(1, real=1):` is accepted and works, as does `case int(real=0):`. Calling out Brandt to get his opinion. And thanks to Mark for finding these!
Again, I'm not sure what "the current near-Python code" refers to. From context it seems you are referring to the pseudo code in Mark's PEP 653.
Yes, and this could even be a valid backwards compatibility measure, if a class used to have two different attributes that would in practice never differ, the two attributes could be merged into one, and someone might have a pattern capturing both, positionally. That should keep working, and having a duplicate in __match_args__ seems a clean enough solution. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 4/2/2021 12:02 AM, Guido van Rossum wrote:
On Thu, Apr 1, 2021 at 8:01 PM Terry Reedy <tjreedy@udel.edu
The current near-Python code does not have such a check.
Again, I'm not sure what "the current near-Python code" refers to. From context it seems you are referring to the pseudo code in Mark's PEP 653.
Yes, the part I read was legal Python + $ variables + FAIL. I should have included 'pseudo'.

Guido van Rossum wrote:
The current implementation will reject any attribute being looked up more than once, by position *or* keyword. It's actually a bit tricky to do, which is why the `MATCH_CLASS` op is such a beast... it needs to look up positional and keyword attributes all in one go, keeping track of everything it's seen and checking for duplicates. I believe this behavior is a holdover from PEP 622:
The interpreter will check that two match items are not targeting the same attribute, for example `Point2d(1, 2, y=3)` is an error.
(https://www.python.org/dev/peps/pep-0622/#overlapping-sub-patterns) PEP 634 explicitly disallows duplicate keywords, but as far as I can tell it says nothing about duplicate `__match_args__` or keywords that also appear in `__match_args__`. It looks like an accidental omission during the 622 -> 634 rewrite. (I guess I figured that if somebody matches `Spam(foo, y=bar)`, where `Spam.__match_args__` is `("y",)`, that's probably a bug in the user's code. Ditto for `Spam(y=foo, y=bar)` and `Spam(foo, bar)` where `Spam.__match_args__` is `("y", "y")` But it's not a hill I'm willing to die on.) I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it.

Hi Brandt, On 02/04/2021 7:19 am, Brandt Bucher wrote:
Repeated keywords do seem likely to be a bug. Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time. So how about explicitly disallowing those, but not checking that the intersection of `__match_args__` and keywords is empty? We would get most of the error checking without the performance impact.
I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it.
PEP 634 allows it. PEP 653 currently forbids it, mainly for consistency reasons. The purpose of self-matching is to prevent deconstruction, so it seems inconsistent to allow it for keyword arguments. Are there are any use-cases? The test-case `int(real=0+0j, imag=0-0j)` is contrived, but I'm struggling to come up with less contrived examples for any of float, list, dict, tuple, str. Cheers, Mark.

On Fri, Apr 2, 2021 at 3:38 AM Mark Shannon <mark@hotpy.org> wrote:
Agreed. But as I sketched in a previous email I think duplicates ought to be acceptable in __match_args__. At the very least we should align the PEP and the implementation here, by adjusting one or the other. Most checks are cheap though.
Checking for duplicates in `__match_args__` can be done at class creation time,
Hm, what about dynamic updates to __match_args__? I've done that in the REPL.
and checking for duplicates in the pattern can be done at compile time.
I'd prefer not to do that check at all.
+1 on the latter (not checking the intersection).
The purpose of self-matching is user convenience. It should be seen as a shorthand for the code fragment in PEP 634 showing how to do this for any class.
There could be a subclass that adds an attribute. That's still contrived though. But if we start supporting this for *general* classes we should allow combining it with keywords/attributes. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon wrote:
PEP 634 says:
For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject; for these types no keyword patterns are accepted.
(https://www.python.org/dev/peps/pep-0634/#class-patterns)
Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time.
I assume the compile-time check only works for named keyword attributes. The current implementation already does this. -1 on checking `__match_args__` anywhere other than the match block itself. Guido van Rossum wrote:
I could see the case for something like `case defaultdict({"Spam": s}, default_factory=f)`. I certainly don't think it should be forbidden.

On Fri, Apr 2, 2021 at 12:43 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
But that's not what the implementation does. It still supports keyword patterns for these types -- and (as I've said earlier in this thread) I think the implementation is correct.
Agreed. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, On 02/04/2021 10:05 pm, Guido van Rossum wrote:
Why? (I also asked Brandt this) It is far more efficient to check `__match_args__` at class creation (or class attribute assignment) time. The most efficient way to check in the match block is to check at class creation time anyway and store a flag whether `__match_args__` is legal. In the match block we would check this flag, then proceed. It seems silly to know that there will be a runtime error, but not act on that information, allowing latent bugs could have been reported. Cheers, Mark.

On Sat, Apr 3, 2021 at 4:20 AM Mark Shannon <mark@hotpy.org> wrote:
Okay, now we're talking. If you check it on both class definition and at attribute assignment time I think that's fine (now that it's a tuple). But I don't think the specification (in whatever PEP) needs to specify that it *must* be checked at that time. So I think the current implementation is fine as well (once we change it to accept only tuples).
Yeah, nice optimization.
It seems silly to know that there will be a runtime error, but not act on that information, allowing latent bugs could have been reported.
Well, usually that is The Python Way. There are a lot of things that could be detected statically quite easily (without building something like mypy) but that aren't. Often that's due to historical accidents (in the past we were even less able to do the simplest static checks), so it's fine to do this your way. BTW we previously discussed whether `__match_args__` can contain duplicates. I thought the PEP didn't state either way, but I was wrong: it explicitly disallows it, matching the implementation. PEP 634 says on line 503: ``` - For duplicate keywords, ``TypeError`` is raised. ``` Given that there is no inconsistency here, I am inclined to keep it that way. If we find a better use case to allow duplicates we can always loosen up the implementation; it's not so simple the other way around. FWIW I am also submitting https://github.com/python/peps/pull/1909 to make `__match_args__` a tuple only, which we all seems to agree on. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Brandt, On 02/04/2021 8:41 pm, Brandt Bucher wrote:
I was relying on the "reference" implementation, which is also in the PEP.
I take this as +1 for having more precisely defined semantics for pattern matching :)
I'm curious, why? It is much faster *and* gives better error messages to check `__match_args__` at class creation time.
It is forbidden in the PEP, as written, correct? OOI, have you changed your mind, or was that an oversight in the original? Cheers, Mark.

On Sat, Apr 3, 2021 at 4:15 AM Mark Shannon <mark@hotpy.org> wrote:
But it's nor normative. However...
In this case I propose adjusting the PEP text. See https://github.com/python/peps/pull/1908
I take this as +1 for having more precisely defined semantics for pattern matching :)
Certainly I see it as +1 for having the semantics independently verified. [...]
I was surprised to find this phrase in the PEP, so I suspect that it was just a mistake when I wrote that section of the PEP. I can't find a similar restriction in PEP 622 (the original pattern matching PEP). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon said:
I was relying on the "reference" implementation, which is also in the PEP.
Can you please stop putting scare quotes around "reference implementation"? You've done it twice now, and it's been a weekend-ruiner for me each time. I've put months of work into writing and improving CPython's current pattern matching implementation, mostly on nights and weekends. I don't know whether it's intentional or not, but when you say things like that it instantly devalues all of my hard work in front of everyone on the list. For such a huge feature, I'm honestly quite amazed that this is the only issue we've found since it was merged over a month ago (and both authors have agreed that it needs to be fixed in the PEP, not the implementation). The PR introducing this behavior was reviewed by at least a half-dozen people, including you. The last time you said something like this, I just muted the thread. Let's please keep this respectful; we're all obviously committing a lot of our own time and energy to this, and we need to work well together for it to be successful in the long term. Brandt

On Sun, 4 Apr 2021 at 01:37, Brandt Bucher <brandtbucher@gmail.com> wrote:
Agreed - apart from the implication Brandt noted, it's also misleading. The code is in Python 3.10, so the correct term is "the implementation" (or if you want to be picky, "the CPython implementation"). To me, the term "reference implementation" implies "for reference, not yet released". At this point, we're discussing fixes to an implemented Python 3.10 feature, not tidying up a PEP. Paul

On Sun, Apr 4, 2021 at 6:20 PM Paul Moore <p.f.moore@gmail.com> wrote:
Normally, the term "reference implementation" means "the basis implementation that everything else is compared against". For instance, a compression algorithm might be published as a mathematical document, with a reference implementation in some language. It's then possible to create a new implementation in some other language, or more optimized, or whatever else; but to know whether it's giving the correct results, you compare its output to the output of the reference implementation. CPython is the reference implementation for the Python language. It's possible to have a discrepancy between the standard and the implementation, but it's still the reference implementation (just occasionally a buggy one). In this case, I believe that the term "reference implementation" is strictly accurate, and concur with Brandt's request to not discredit it by implying that it's only purporting to be one. ChrisA

Antoine Pitrou writes:
On Sun, 04 Apr 2021 00:34:18 -0000 "Brandt Bucher" <brandtbucher@gmail.com> wrote:
Can you please stop putting scare quotes
"Scare quotes" refers to an idiom English writers use to deprecate something. In what I wrote just above, the quotation marks indicate a focus on the *string* "scare quotes". In this case, the fact that they are the exact words of Brandt, and also that I'm defining those words, not using their meaning. In Mark's phrase '"reference" implementation', neither of those usages apply. It's possible that they are the deprecated "random quote emphasis" usage. Random quote emphasis is implausible here, however. I can see no reason why Mark would emphasize the modifier "reference" in this context. One of the most important remaining usages, and one that I find plausible in context, is scare quotes. These are quotation marks used to focus on the phrase in quotes, and indicate that it is somehow suspicious: inaccurate, imprecise, false, even the opposite of its dictionary meaning. In other words, if you don't have a reason to emphasize focus on the words themselves rather than their meaning, by adding (scare) quotes most likely you are turning a "reasonably polite expression" into an insult.
I'm probably missing something...
Probably so did a lot of native speakers; there are English dialects where scare quotes are rare and random quote emphasis is common. However, I assure you, many native speakers (along with a fair number of non-natives) did not. I neither know nor care what Mark's *intent* is. I'm explaining what (some) idiomatic speakers of English will read into what he writes, because it is a *common* idiom (common enough to have a name, and be mentioned in standard manuals of English style). Regards, Steve

Hi Brandt, On 04/04/2021 1:34 am, Brandt Bucher wrote:
I'm sorry for ruining your weekends. My intention, and I apologize for not making this clearer, was not to denigrate your work, but to question the implications of the term "reference". Calling something a "reference" implementation suggests that it is something that people can refer to, that is near perfectly correct and fills in the gaps in the specification. That is a high standard, and one that is very difficult to attain. It is why I use the term "implementation", and not "reference implementation" in my PEPs.
I've put months of work into writing and improving CPython's current pattern matching implementation, mostly on nights and weekends. I don't know whether it's intentional or not, but when you say things like that it instantly devalues all of my hard work in front of everyone on the list.
It definitely wasn't my intention.
For such a huge feature, I'm honestly quite amazed that this is the only issue we've found since it was merged over a month ago (and both authors have agreed that it needs to be fixed in the PEP, not the implementation). The PR introducing this behavior was reviewed by at least a half-dozen people, including you.
Indeed, I reviewed the implementation. I thought it was good enough to merge. I still think that.
The last time you said something like this, I just muted the thread. Let's please keep this respectful; we're all obviously committing a lot of our own time and energy to this, and we need to work well together for it to be successful in the long term.
Please don't take my criticisms of PEP 634 as criticisms of you or your efforts. I know it can often sound like that, but that really isn't my intent. Pattern matching is a *big* new feature, and to get it right takes a lot of discussion. Having your ideas continually battered is no fun, I know. So, I'd like to apologize again for any hurt caused. Cheers, Mark.

Mark Shannon writes:
Shoe fits, doesn't it? Both Guido and Brandt to my recall have specifically considered the possibility that the implementation is the better design, and therefore that the PEP should be changed.
That is a high standard, and one that is very difficult to attain.
That depends on context, doesn't it? In the case of a public release, yes, it's a very high standard. In the context of a feature in development, it *cannot* be that high, because even when the spec and the implementation are in *perfect* agreement, both may be changed in the light of experience or a change in requirements. Furthermore, in this instance, the implementation achieves *your* standard (Brandt, again):
both authors have agreed that it needs to be fixed in the PEP, not the implementation
You added:
It is why I use the term "implementation", and not "reference implementation" in my PEPs.
A reasonable usage. I think my more flexible, context-dependent definition is more useful. Unmodified, the word "implementation" covers everything from unrunnable pseudo-code to the high standard of a public release that is officially denoted "reference implementation". On the other hand, when Brandt says that a merge request is a "reference implementation", I interpret that to be a claim that, to his knowledge the MR is a perfect implementation of the specification, and an invitation to criticize the specification by referring to that implementation. That's a strong claim, even in my interpretation. However, I think that if the developer dares to make it, it's very useful to reviewers. As it was in this case. Final note: once this is merged and publicly released, it will lose its status as reference implementation in the above, strong sense. Any deviations from documented spec (the Language Reference) will be presumed to have to be fixed in the implementation (with due consideration for backward compatibility). "Although practicality beats purity," of course, but treating the Language Reference as authoritative is strongly preferred to keeping the implementation and modifying the Reference (at least as I understand it). Regards, Steve

On Sun, 4 Apr 2021 at 13:49, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Final note: once this is merged and publicly released, it will lose its status as reference implementation in the above, strong sense.
It *is* merged and publicly released - it's in the latest 3.10 alpha. That's really the point I was trying to make with my comment (I'm steering clear of the "scare quotes" discussion). The fact that the implementation kept getting referred to as the "reference implementation" confused me into thinking it hadn't been released yet, and that simply isn't true. Calling it "the implementation" avoids that confusion, IMO. Paul

Paul Moore writes:
It *is* merged and publicly released - it's in the latest 3.10 alpha.
Merged, yes, but in my terminology alphas, betas, and rcs aren't "public releases", they're merely "accessible to the public". (I'm happy to adopt your terminology when you're in the conversation, I'm just explaining what I meant in my previous post.)
The only thing I understand in that paragraph is "that [it hadn't been released yet] simply isn't true", which is true enough on your definition of "released". But why does "reference implementation" connote "unreleased"? That seems to be quite different from Mark's usage. I don't have an objection to your usage, I'd just like us all to converge on a set of terms so that Brandt has a compact way of saying "as far as I know, for the specification under discussion this implementation is completely accurate and folks are welcome to refer to the PEP, to the code, or to divergences as seems appropriate to them". I'm not sure if that's exactly what Brandt meant by "reference implementation", but that's how I understood it. Steve

On Tue, 6 Apr 2021 at 06:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
*shrug* It's (in my experience) a continuum - it's not in a release yet, but it is available via an installer, with a (pre-release) version number. But I get what you're saying and don't disagree, to be honest. I see the discrepancy as mostly being because we're trying to use (imprecise) informal language to pin down precise nuances. The main point I was making is that it's merged into the CPython source code at this stage, and available for people to download and experiment with, which is something I was unclear about.
In my experience, people developing PEPs will sometimes provide what gets referred to as a "reference implementation" of the proposal, which is a PR or equivalent that people can apply and try out if they want to see how the proposal works in practice. That "reference implementation" is generally seen as part of the *proposal*, even if it then becomes the final merged code as well. Once it's released, it tends to no longer get called the *reference* implementation, as it's now just the implementation (in CPython) of the feature. PEP 1 uses this terminology, as well - "Standards Track PEPs consist of two parts, a design document and a reference implementation" and "Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and incorporated into the main source code repository, the status will be changed to "Final"". PEP 635 follows this terminology, with a "Reference implementation" section linking to the development branch for the feature. To put this back into the context of this discussion, when Mark was referring to the "reference implementation" it made me think that maybe we were talking about that development branch, and that the code for the pattern matching PEP hadn't yet been merged to the main branch, which is why we were still iterating over implementation details. And that led me to think that they'd better get the discussion resolved soon, as they risk missing the 3.10 deadline if things drag on. Which *isn't* the case, and if I'd been following things more closely I'd have known that, but avoiding the term "reference implementation" for the merged change would also have spared my confusion.
Agreed, a common understanding is the main thing here. And as I'm not an active participant in the discussion, and I now understand the situation, my views shouldn't have too much weight in deciding what the best terminology is. Paul

Hi Mark. Thanks for your reply, I really appreciate it. Mark Shannon said:
Interesting. The reason I typically include a "Reference Implementation" section in my PEPs is because they almost always start out as a copy-paste of the template in PEP 12 (which also appears in PEP 1): https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep https://www.python.org/dev/peps/pep-0012/#suggested-sections Funny enough, PEP 635 has a "Reference Implementation" section, which itself refers to the implementation as simply a "feature-complete CPython implementation": https://www.python.org/dev/peps/pep-0635/#reference-implementation (PEP 634 and PEP 636 don't mention the existence of an implementation at all, as far as I can tell.) It's not a huge deal, but we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here" (which is what I've usually used the section to communicate). Brandt

On 7/04/21 5:22 am, Brandt Bucher wrote:
we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here"
Maybe "prototype implementation" would be better? I think I've used that term in PEPs before. -- Greg

Greg Ewing writes:
That seems to me to correspond well to Brandt's standard as expressed above. To me, "prototype implementation" is somewhere between "proof of concept" and "reference implementation", and I welcome the additional precision. The big question is can such terms be used accurately (ie, do various people assign similar meanings to them)? I would define them functionally as proof of concept demonstrates some of the features, especially those that were considered "difficult to implement" prototype implementation implements the whole spec, so can be used be developers to prototype applications, reference implementation intended to be a complete and accurate implementation of the specification By "complete and accurate" I mean that it can be used experimentally to understand what the spec means without much worry that the proponent will brush off questions with "oh, that's just not implemented yet, read the spec if you want to know how it will work when we're done." Furthermore, any divergence between spec and implementation is a bug that is actually a broken promise. (The promise implied by "reference".) Finally, as development continues there is a promise that the spec and implementation will be kept in sync (of course changes might be provisional, but even then the sync should be maintained). I don't think the Platonic ideal interpretation of "reference implementation" is very useful. Software evolves. It evolves very quickly during initial development, but it's useful to "ask the implementation" about the spec even then. That's implied by methodologies like test-driven development. There are other workflows where that's not true. My claim is that "reference implementation" can be useful to distinguish development processes where you expect the implementation to reliably reflect the spec, even in corner cases, from those where you shouldn't. And even as the software evolves. Note that if we use this definition, then the "Reference Implementation" requirement of the PEP process becomes quite a high bar. I think we all agree on that. So I advocate, as Brandt suggested, that we revise the PEP template. In particular I think it should use Greg's term "prototype implementation". Optionally, we could make "reference implementation" available to proponents who wish to make that claim about their implementation. Steve

On Wed, 7 Apr 2021 at 06:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
I'm OK with these terms (although I don't actually think you *will* get sufficient consensus on them to make them unambiguous) but with one proviso - once the implementation is merged into the CPython source, I think it should simply be referred to as "the implementation" and qualifiers should be unnecessary (and should be considered misleading). Paul

Hi Guido, On 02/04/2021 2:38 am, Guido van Rossum wrote:
Checking for uniqueness is almost free because __match_args__ is a tuple, and therefore immutable, so the check can be done at class creation time.
I thought matching `int(real=0+0j, imag=0-0j)` was a bit weird too. The change required to make it work is trivial, but the code seems more consistent if `int(real=0+0j, imag=0-0j)` is disallowed, which is why I went for that.

Let me clarify: these two attributes do not interact with one another; each attribute only interacts with its own flag on the type. It is perfectly possible to do: ``` class WhatIsIt: __match_map__ = True __match_seq__ = True ``` This will set both flags, and this `WhatIsIt` will match as a mapping *and* a sequence. This is allowed and works in PEP 634, but like Guido I'm not entirely opposed to making the matching behavior of such a class undefined against sequence or mapping patterns.
What *is* the expected behavior of this? Based on the current behavior of PEP 634, I would expect the `__match_container__` of each base to be or'ed, and something like this to match as both a mapping and a sequence (which PEP 653 says leads to undefined behavior). The actual behavior seems more like it will just be a sequence and not a mapping, since `__match_container__` would be inherited from `MySeq` and `MyMap` would be ignored. In the interest of precision, here is an implementation of *exactly* what I am thinking: `typeobject.c`: https://github.com/python/cpython/compare/master...brandtbucher:patma-flags#... `ceval.c`: https://github.com/python/cpython/compare/master...brandtbucher:patma-flags#... (One change from my last email: it doesn't allow `__match_map__` / `__match_seq__` to be set to `False`... only `True`. This prevents some otherwise tricky multiple-inheritance edge-cases present in both of our flagging systems that I discovered during testing. I don't think there are actual use-cases for unsetting the flags in subclasses, but we can revisit that later if needed.)

On Wed, Mar 31, 2021 at 2:14 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
That's surprising to me. Just like we can have a class that inherits from int but isn't hashable, and make that explicit by setting `__hash__ = None`, why couldn't I have a class that inherits from something else that happens to inherit from Sequence, and say "but I don't want it to match like a sequence" by adding `__match_sequence__ = False`? AFAIK all Mark's versions would support this by setting `__match_kind__ = 0`. Maybe you can show an example edge case where this would be undesirable? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Guido van Rossum wrote:
The issue isn't when *I* set `__match_seq__ = False` or `__match_container__ = 0`. It's when *one of my parents* does it that things become difficult.
Maybe you can show an example edge case where this would be undesirable?
Good idea. I've probably been staring at this stuff for too long to figure it out myself. :) As far as I can tell, these surprising cases arise because a bit flag can only be either 0 or 1. For us, "not specified" is equivalent to 0, which can lead to ambiguity. Consider this case: ``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE class Parent: pass class Child(Parent, Seq): pass ``` Okay, cool. `Child` will match as a sequence, which seems correct. But what about this similar case? ``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE class Parent: __match_seq__ = False # or __match_container__ = 0 class Child(Parent, Seq): pass ``` Here, `Child` will *not* match as a sequence, even though it probably should. The only workarounds I've found (like allowing `None` to mean "this is unset, don't inherit me if another parent sets this flag", ditching tp_flags entirely, or not inheriting these attributes) feel a bit extreme just to allow some users to do the moral equivalent of un-subclassing `collections.abc.Sequence`. So, my current solution (seen on the branch linked in my earlier email) is: - Set the flag if the corresponding magic attribute is set to True in the class definition - Raise at class definition time if it's set to anything other than True - Otherwise, set the flag if any of the parents set have the flag set As far as I can tell, this leads to the expected (and current, as of 3.10.0a6) behavior in all cases. Plus, it doesn't break my mental model of how inheritance works.

Here, `Child` will *not* match as a sequence, even though it probably should,
Strong disagree, if I explicitly set `__match_seq__` to `False` in `Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.
- Raise at class definition time if it's set to anything other than True
I feel like this is a consenting adults thing. Yeah you probably won't need to set a flag to `False` but I don't see why it should be forbidden. On Wed, Mar 31, 2021 at 3:35 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

On Thu, Apr 1, 2021 at 11:54 AM Caleb Donovick <donovick@cs.stanford.edu> wrote:
Here, `Child` will *not* match as a sequence, even though it probably should,
Strong disagree, if I explicitly set `__match_seq__` to `False` in `Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.
How much difference is there between: class Grandparent: """Not a sequence""" class Parent(Grandparent): """Also not a sequence""" class Child(Parent): """No sequences here""" and this: class Grandparent(list): """Is a sequence""" class Parent(Grandparent): """Explicitly not a sequence""" __match_seq__ = False class Child(Parent): """Shouldn't be a sequence""" ? Either way, Parent should function as a non-sequence. But if Child inherits from both Parent and tuple, it is most definitely a tuple, and therefore should be a sequence. With your proposed semantics, setting __match_seq__ to False is not simply saying "this isn't a sequence", but it's saying "prevent this from being a sequence". It's a stronger statement than simply undoing the declaration that it's a sequence. There would be no way to reset to the default state. Brandt's proposed semantics sound complicated, but as far as I can tell, they give sane results in all cases. ChrisA

How is this different from anything else that is inherited? The setting of a flag to `False` is not some irreversible process which permanently blocks child classes from setting that flag to `True`. If I want to give priority to `Seq` over `Parent` in Brandt's original example I need only switch the order of inheritance so that `Seq` is earlier in `Child` MRO or explicitly set the flag to `True` (or `Seq.__match_seq__`). In contrast Brandt's scheme does irreversibly set flags, there is no way to undo the setting of `__match_seq__` in a parent class. This really doesn't seem like an issue to me. I can't personally think of a use case for explicitly setting a flag to `False` but I also don't see why it should be forbidden. We get "- Otherwise, set the flag if any of the parents set have the flag set" for free through normal MRO rules except in the case where there is an explicit `False` (which I assume will be exceedingly rare and if it isn't there is clearly some use case). Why make it more complicated? On Wed, Mar 31, 2021 at 6:05 PM Chris Angelico <rosuav@gmail.com> wrote:

On 31/03/2021 11:31 pm, Brandt Bucher wrote:
This is just a weird case, so I don't think we should worry about it too much.
Inheritance in Python is based on the MRO (using the C3 linearization algorithm) so my mental model is that Child.__match_container__ == 0. Welcome the wonderful world of multiple inheritance :) If Parent.__match_container__ == 0 (rather than just inheriting it) then it is explicitly stating that it is *not* a container. Seq explicitly states that it *is* a sequence. So Child is just broken. That it is broken for pattern matching is consistent with it being broken in general. Cheers, Mark.

On Tue, 30 Mar 2021 at 17:32, Brandt Bucher <brandtbucher@gmail.com> wrote: Hi Brandt,
Speaking as a maintainer of SymPy I do support the PEP but not for SymPy specifically. I just used SymPy as an example of something that seems like it should be a good fit for pattern matching but also shows examples that don't seem to work with PEP 634 in the way intended. I'm sure SymPy will use case/match when support for Python 3.9 is dropped but I don't see it as something that would be a major feature for SymPy users or for internal code. I expect that case/match would make some code tidier and potentially it could make some things a little faster (although that depends on it being well optimised - half a microsecond might seem small until you add up millions of them). There is a recently opened SymPy issue discussing the possible use of this: https://github.com/sympy/sympy/issues/21193 Pattern matching and destructuring more generally are significant features for symbolic libraries such as SymPy which has much code for doing this and can also be used with other dedicated libraries such as matchpy. Much more is needed than case/match for that though: rewriting, substitution, associative/commutative matching etc. It's not clear to me that core Python could ever provide anything new that would lead to a groundbreaking improvement for SymPy in this respect. The surrounding discussion of the various pattern matching PEPs has led me to think of the idea of destructuring as more of a general language feature that might not in future be limited to case/match though. I'm not sure where that could go for Python but I'm interested to see if anything more comes of it. I like a lot of the features in PEP 634 and the way I see it this PEP (653) underpins those. The reason I support PEP 653 is because it seems like a more principled approach to the mechanism for how pattern-matching should work that places both user-defined types and builtin types on an even footing. The precise mechanisms (match_class, match_self etc) and their meanings do seem strange but that's because they are trying to codify the different cases that PEP 634 has introduced. It's possible that the design of that mechanism can be improved and there have been suggestions for that in this thread. I do think though that it is important to have a general extensible mechanism rather than a specification based on special cases. I also think that the use of the Sequence and Mapping ABCs is a bad idea on practical grounds (performance, circularity in the implementation) and is not in keeping with the rest of the language. ABCs have always been optional in the past: Python uses protocols rather than ABCs (ducktyping etc). Finally, speaking as someone who also teaches introductory programming with Python then with *that* hat on I would have preferred it if none of the pattern-matching PEPs had been accepted. The advantage of Python in having a simple and easily understood core erodes with each new addition to core syntax. For novice users case/match only really offers increased complexity compared to if/elif but it will still be something else that needs to be learned before being able to read existing code. Oscar Oscar

Hi Mark, I also wanted to give some feedback on this. While most of the discussion so far has been about the matching of the pattern itself I think it should also be considered what happens in the block below. Consider this code: ``` m = ... match m: case [a, b, c] as l: # what can we safely do with l? ``` or in terms of the type system: What is the most specific type that we can know l to be? With PEP 634 you can be sure that l is a sequence and that it's length is 3. With PEP 653 this is currently not explicitly defined. Judging from the pseudo code we can only assume that l is an iterable (because we use it in an unpacking assignment) and that it's length is 3, which greatly reduces the operations that can be safely done on l. For mapping matches with PEP 634 we can assume that l is a mapping. With PEP 653 all we can assume is that it has a .get method that takes two parameters, which is even more restrictive, as we can't even be sure if we can use len(), .keys, ... or iterate over it. This also makes it a lot harder for static type checkers to check match statements, because instead of checking against an existing type they now have to hard-code all the guarantees made my the match statement or not narrow the type at all. Additionally consider this typed example: ``` m: Mapping[str, int] = ... match m: case {'version': v}: pass ``` With PEP 634 we can statically check that v is an int. With PEP 653 there is no such guarantee. Therefore I would strongly be in favor of having sequence and mapping patterns only match certain types instead of relying on dunder attributes. If implementing all of sequence is really to much work just to be matched by a sequence pattern, as PEP 653 claims, then maybe a more general type could be chosen instead. I don't have any objections against the other parts of the PEP. Adrian Freund On 3/27/21 2:37 PM, Mark Shannon wrote:

On Sat, 27 Mar 2021 at 13:40, Mark Shannon <mark@hotpy.org> wrote:
Hi Mark, Thanks for putting this together.
It would take me some time to compare exactly how this differs from the current state after PEP 634 but I certainly prefer the object-model based approach. It does seem that there are a lot of permutations of how matching works but I guess that's just trying to tie up all the different cases introduced in PEP 634.
Maybe I misunderstood but it looks to me as if this (PEP 653) changes the behaviour of a mapping pattern in relation to extra keys. In PEP 634 extra keys in the target are ignored e.g.: obj = {'a': 1, 'b': 2} match(obj): case {'a': 1}: # matches obj because key 'b' is ignored In PEP 634 the use of **rest is optional if it is desired to catch the other keys but does not affect matching. Here in PEP 653 there is the pseudocode: # A pattern not including a double-star pattern: if $kind & MATCH_MAPPING == 0: FAIL if $value.keys() != $KEYWORD_PATTERNS.keys(): FAIL My reading of that is that all keys would need to be match unless **rest is used to absorb the others. Is that an intended difference? Personally I prefer extra keys not to be ignored by default so to me that seems an improvement. If intentional then it should be listed as another semantic difference though.
E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not.
As I understood it this proposes that match obj: should use the class attribute type(obj).__match_kind__ to indicate whether the object being matched should be considered a sequence or a mapping or something else rather than using isinstance(obj, Sequence) and isinstance(obj, Mapping). Is there a corner case here where an object can be both a Sequence and a Mapping? (How does PEP 634 handle that?) Not using the Sequence and Mapping ABCs is good IMO. I'm not aware of other core language features requiring the use of ABCs. In SymPy we have specifically avoided them because they slow down isinstance checking (this is measurable in the time taken to run the whole test suite). Using the ABCs in PEP 634 seems surprising given that the original pattern matching PEP actually listed the performance impact of isinstance checks as part of the opening motivation. Maybe the ABCs can be made faster but either way using them like this seems not in keeping with the rest of the language. Oscar

Hi Oscar, Thanks for the feedback. On 27/03/2021 4:19 pm, Oscar Benjamin wrote:
I missed that when updating the PEP, thanks for pointing it out. It should be the same as for double-star pattern: if not $value.keys() >= $KEYWORD_PATTERNS.keys(): FAIL I'll update the PEP.
I don't have a strong enough opinion either way. I can see advantages to both ways of doing it.
If you define a class as a subclass of both collections.abc.Sequence and collections.abc.Mapping, then PEP 634 will treat it as both sequence and mapping, meaning it has to try every pattern. That prevents the important (IMO) optimization of checking the kind only once. Cheers, Mark.

Hi Mark, Reading that spec will take some time. Can you please summarize the differences in English, in a way that is about as precise as PEP 634? I have some comments inline below as well. On Sat, Mar 27, 2021 at 10:16 AM Mark Shannon <mark@hotpy.org> wrote:
It would be simpler if this was simply an informational PEP without proposing new features -- then we wouldn't have to rush. You could then propose the new __match_kind__ attribute in a separate PEP, written more in the style of PEP 634, without pseudo code. I find it difficult to wrap my head around the semantics of __match_kind__ because it really represents a few independent flags (with some constraints) but all the text is written using explicit, hard-to-read bitwise and/or operations. Let me give it a try. - Let's call the four flag bits by short names: SEQUENCE, MAPPING, DEFAULT, SELF. SEQUENCE and MAPPING are for use when an instance of a class appears in the subject position (i.e., for `match x`, we look for these bits in `type(x).__match_kind__`). Neither of these is set by default. At most one of them should be set. - If SEQUENCE is set, the subject is treated like a sequence (this is set for list, tuple and other sequences, but not for str, bytes and bytearray). - Similarly, MAPPING means the subject should be treated as a mapping, and is set for dict and other mapping types. The DEFAULT and SELF flags are for use when a class is used in a class pattern (i.e., for `case cls(...)` we look for these bits in `cls.__match_kind__`). At most one of these should be set. DEFAULT is set on class `object` and anything that doesn't explicitly clear it. - If DEFAULT is set, semantics of PEP 634 apply except for the special behavior enabled by the SELF flag. - If SELF is set, `case cls(x)` binds the subject to x, and no other forms of `case cls(...)` are allowed. - If neither DEFAULT nor SELF is set, `case cls(...)` does not take arguments at all. Please correct any misunderstandings I expressed here! (And please include some kind of summary like this in your PEP.) Also, I think that we should probably separate this out in two separate flag sets, one for subjects and one for class patterns -- it is pretty confusing to merge the flag sets into a single value when their applicability (subject or class pattern) is so different.
Let's not change this. We carefully discussed and chose this behavior (ignore extra mapping keys, but don't ignore extra sequence items) for PEP 634 based on usability.
Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.)
I am fine with changing this one aspect of PEP 634. IIRC having separate SEQUENCE and MAPPING flags just for matching didn't occur to us during the design, and we strongly preferred some kind of type-based check over checking the presence of a specific attribute like `key`. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, Thanks for the feedback. On 27/03/2021 10:15 pm, Guido van Rossum wrote:
It is about to close to that as I can get it. The change to using __match_kind__ requires some small changes to behaviour.
`case cls():` is always allowed, regardless of flags.
I think you expressed it well. I'll add a more informal overview section to the PEP.
That would require two different special attributes, which adds bulk without adding any value. __match_kind__ = MATCH_SEQUENCE | MATCH_DEFAULT should be clear to anyone familiar with integer flags.

On Mon, 29 Mar 2021, 7:47 pm Mark Shannon, <mark@hotpy.org> wrote: [Guido wrote]
The combined flags might be clearer if the class matching flags were "MATCH_CLS_DEFAULT" and "MATCH_CLS_SELF" Without that, it isn't obvious that they're modifying the way class matching works. Alternatively, given Guido's suggestion of two attributes, they could be "__match_container__" and "__match_class__". The value of splitting them is that they should compose better under inheritance - the container ABCs could set "__match_container__" appropriately without affecting the way "__match_class__" is set. An implementation might flatten them out at class definition time for optimisation reasons, but it wouldn't need to be part of the public API. Cheers, Nick.

On Mon, Mar 29, 2021 at 7:35 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
+1 An implementation might flatten them out at class definition time for
optimisation reasons, but it wouldn't need to be part of the public API.
Since the two flag sets are independent the bult is only apparent. Few classes would need to set one of these, let alone two. In the C layer they may be combined as part oftp_flags (assuming there are enough free bits). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Overall, I am still uncomfortable with PEP 653, and would probably not support its acceptance. Although it has thankfully become a much less radical proposal than it was a few weeks ago (thanks, Mark, for your attention to our feedback), I feel that the rules it binds implementations to are *very* premature, and that the new mechanisms it introduces to do so only modestly improve potential performance at great expense to the ease of learning, using, and maintaining code using structural pattern matching. A few notes follow:
Maybe I'm missing something, but I don't understand at all how the provided code snippet relies on the self-matching behavior. Have the maintainers of SymPy (or any large library supposedly benefitting here) come out in support of the PEP? Are they at least aware of it? Have they indicated that the proposed idiom for implementing self-matching behavior using a property is truly too "tricky" for them? Have you identified any stdlib classes that would benefit greatly from this? For me, `__match_class__` feels like a feature without demonstrated need. Even if there is a great demand for this, I certainly think that there are far better options than the proposed flagging system: - A `@match_self` class decorator (someone's bound to put one on PyPI, at any rate). - Allowing `__match_args__ = None` to signal this case (an option we previously considered, and my personal preference). ...both of which can be added later, if needed. Further, PEP 634 makes it very easy for libraries to support Python versions with *and* without pattern matching (something I consider to be an important requirement). The following class works with both 3.9 and 3.10: ``` class C(collections.abc.Sequence): ... ``` While something like this is required for PEP 653: ``` class C: if sys.version_info >= (3, 10): from somewhere import MATCH_SEQUENCE __match_container__ = MATCH_SEQUENCE ... ```
PEP 634 relies on the `collections.abc` module when determining which patterns a value can match, implicitly importing it if necessary. This PEP will eliminate surprising import errors and misleading audit events from those imports.
I think that a broken `_collections_abc` module *should* be surprising. Is there any reasonable scenario where it's expected to not exist, or be not be fit for this purpose? And I'm not sure how an audit event for an import that is happening could be considered "misleading"... I certainly wouldn't want it suppressed.
Looking up a special attribute is much faster than performing a subclass test on an abstract base class.
How much faster? A quick benchmark on my machine suggests less than half a microsecond. PEP 634 (like PEP 653) already allows us to cache this information for the subject of a match statement, so I doubt that this is actually a real issue in practice. An indeed, with the current implementation, this test isn't even performed on the most common types, such as lists, tuples, and dictionaries. At the very least, PEP 653's confusing new flag system seems to be a *very* premature optimization, seriously hurting usability for a modest performance increase. (Using them wrongly also seems to introduce a fair amount of undefined behavior, which seems to go against the PEP's own motivation.)
If the value of `__match_args__` is not as specified, then the implementation may raise any exception, or match the wrong pattern.
I think there's a name for this sort of behavior... ;) A couple of other, more technical notes: - PEP 653 requires mappings to have a `keys()` method that returns an object supporting set inequality operations. It is not really that common to find this sort of support in user code (in my experience, it is more likely that user-defined `keys()` methods will return iterables). It's not even clear to me if this is an interface requirement for mappings in general. For example, `weakref.WeakKeyDictionary` and `weakref.WeakValueDictionary` presently do not work with PEP 653's requirements for mapping patterns, since their `keys()` methods return iterators. - Treating `__getitem__` as pure is problematic for some common classes (such as `defaultdict`). That's why we use two-argument `get()` instead. As well-fleshed out as the pseudocode for the matching operations in this PEP may be, examples like this suggest that perhaps we should wait until 3.11 or later to figure out what actually works in practice and what doesn't. PEP 634 took a full year of work, and the ideas it proposed changed substantially during that time (in no small part because we had many people experimenting with how an actual implementation interacted with real code). Brandt

Hi Brandt, On 30/03/2021 5:25 pm, Brandt Bucher wrote:
No I'm missing something. That should read case Mul(args=[Symbol(a), Symbol(b)) if a == b: ... I'll fix that in the PEP thanks.
The distinction between those classes that have the default behavior and those that match "self" is from PEP 634. I didn't introduce it. I'm just proposing a more principled way to make that distinction.
Or class C: __match_container__ = 1 # MATCH_SEQUENCE Which is one reason the PEP states that the values of MATCH_SEQUENCE, etc. will never change.
PEP 634 relies on the `collections.abc` module when determining which patterns a value can match, implicitly importing it if necessary. This PEP will eliminate surprising import errors and misleading audit events from those imports.
I think that a broken `_collections_abc` module *should* be surprising. Is there any reasonable scenario where it's expected to not exist, or be not be fit for this purpose?
No reasonable scenario, but unreasonable scenarios happen all too often.
And I'm not sure how an audit event for an import that is happening could be considered "misleading"... I certainly wouldn't want it suppressed.
It's misleading because a match statement doesn't include any explicit imports.
Looking up a special attribute is much faster than performing a subclass test on an abstract base class.
How much faster? A quick benchmark on my machine suggests less than half a microsecond. PEP 634 (like PEP 653) already allows us to cache this information for the subject of a match statement, so I doubt that this is actually a real issue in practice. An indeed, with the current implementation, this test isn't even performed on the most common types, such as lists, tuples, and dictionaries.
Half a microsecond is thousands of instructions on a modern CPU. That is a long time for a single VM operation.
At the very least, PEP 653's confusing new flag system seems to be a *very* premature optimization, seriously hurting usability for a modest performance increase. (Using them wrongly also seems to introduce a fair amount of undefined behavior, which seems to go against the PEP's own motivation.)
Why do you say it is a premature optimization? It's primary purpose is reliability and precise semantics. It is more optimizable, I agree, but that is hardly premature. You also say it is confusing, but I think it is simpler than the workarounds to match "self" that you propose. This is very subjective though. Evidently we think differently.
If the value of `__match_args__` is not as specified, then the implementation may raise any exception, or match the wrong pattern.
I think there's a name for this sort of behavior... ;)
Indeed, but there is only undefined behavior if a class violates clearly specified rules. The undefined behavior in PEP 634 is much broader. We already tolerate some amount of undefined behavior. For example, dictionary lookup is also undefined for classes which do not hash properly.
Thanks for pointing that out. I'd noted that PEP 634 used `get()`, which is why I inserted the guard on keys beforehand. Clearly that is insufficient. I'll update the semantics to use the two-argument `get()`. It does seem more robust.
As well-fleshed out as the pseudocode for the matching operations in this PEP may be, examples like this suggest that perhaps we should wait until 3.11 or later to figure out what actually works in practice and what doesn't. PEP 634 took a full year of work, and the ideas it proposed changed substantially during that time (in no small part because we had many people experimenting with how an actual implementation interacted with real code).
I fully understand that a lot of work went into PEP 634. Which is why I am using the syntax and much of the semantics of PEP 634 as I can. The test suite is proving very useful, thanks. The problem with waiting for 3.11 is that code will start to rely on some of the implementation details of pattern matching as it is now, and that our ability to optimize it will be delayed by a year. Cheers, Mark.

Hi Mark. I've spoken with Guido, and we are willing to propose the following amendments to PEP 634: - Require `__match_args__` to be a tuple. - Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`. Using this in Python would look like: ``` class MySeq: __match_seq__ = True ... class MyMap: __match_map__ = True ... ``` Using this in C would look like: ``` PyTypeObject PyMySeq_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_SEQ | ..., ... } PyTypeObject PyMyMap_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_MAP | ..., ... } ``` We believe that these changes will result in the best possible outcome: - The new mechanism should faster than either PEP. - The new mechanism should provide a better user experience than either PEP when defining types in either Python *or C*. If these amendments were made, would you be comfortable withdrawing PEP 653? We think that if we're in agreement here, a compromise incorporating these promising changes into the current design would be preferable to submitting yet another large pattern matching PEP for a very busy SC to review and pronounce before the feature freeze. I am also willing, able, and eager to implement these changes promptly (perhaps even before the next alpha) if so. Thanks for pushing us to make this better. Brandt

Hi Brandt, On 30/03/2021 11:49 pm, Brandt Bucher wrote:
I think we're all in agreement on this one. Let's just do it.
I don't like the way this need special inheritance rules, where inheriting one attribute mutates the value of another. It seems convoluted. Consider: class WhatIsIt(MySeq, MyMap): pass With __match_container__ it works as expected with no special inheritance rules. This was why you convinced me to split __match_kind__; it works better with inheritance. Anther reason for preferring __match_container__ is that it provides a better option for extensibility, IMO. Suppose we wanted to add a "set" pattern in the future, with __match_container__ we just need to add a new constant. With your proposed approach, we would need another special attribute.
I'm wary of using up tp_flags, as they are a precious resource, but this does provide a more declarative way to specific the behavior than setting the attribute via the C-API.
We believe that these changes will result in the best possible outcome: - The new mechanism should faster than either PEP.
The naive implementation of the boolean version might be a tiny bit faster (it would hard to measure a difference). However, once specialized by type version (as we do for LOAD_ATTR) both forms become a no-op.
- The new mechanism should provide a better user experience than either PEP when defining types in either Python *or C*.
The inheritance rules make __match_container__ a better user experience in Python, IMO. As for C, there no reason why the it would make any difference, __match_container__ could be (tp_flags & (Py_TPFLAGS_MATCH_SEQ|Py_TPFLAGS_MATCH_MAP)) shifted to line up the bits.
If these amendments were made, would you be comfortable withdrawing PEP 653? We think that if we're in agreement here, a compromise incorporating these promising changes into the current design would be preferable to submitting yet another large pattern matching PEP for a very busy SC to review and pronounce before the feature freeze. I am also willing, able, and eager to implement these changes promptly (perhaps even before the next alpha) if so.
I think we are close to agreement on the mechanism for selecting which pattern to match, but I still want the better defined semantics of PEP 653.
Thanks for pushing us to make this better.
And thank you for the feedback. Cheers, Mark.

On Wed, Mar 31, 2021 at 2:30 AM Mark Shannon <mark@hotpy.org> wrote:
Wait a minute, do you expect WhatIsIt to be a sequence but not a map? *I* would expect that it is both, and that's exactly what Brandt's proposal does. So I see this as a plus.
I think we are close to agreement on the mechanism for selecting which pattern to match, but I still want the better defined semantics of PEP 653.
I don't know that PEP 653's semantics are better. Have you analyzed any *differences* besides the proposal above? I've personally found reading your pseudo-code very difficult, so I simply don't know. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, On 31/03/2021 6:21 pm, Guido van Rossum wrote:
Earlier you said: Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.) PEP 653 requires that: (__match_container__ & (MATCH_SEQUENCE | MATCH_MAPPING)) != (MATCH_SEQUENCE | MATCH_MAPPING) Would you require that (__match_seq__ and __match_map__) is always false? If so, then what is the mechanism for handling the `WhatIsIt` class? If not, then you loose the ability to make a single test to determine which patterns can apply.
PEP 653 semantics are more precise. I think that is better :) Apart from that, I think the semantics are so similar once you've added __match_seq__/__match_map__ to PEP 634 that is hard to claim one is better than the other. My (unfinished) implementation of PEP 653 makes almost no changes to the test suite. The code in the examples is Python, not pseudo-code. That might be easier to follow. Cheers, Mark.

On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org> wrote:
[me, responding to Mark]
[Now back to Mark]
Ah, you caught me there. I do think that classes that combine both characteristics are in troublesome water. I think we can get optimizability either way, so I'll focus on semantics. Brandt has demonstrated that it's ugly to write the code for a class that in match statements behaves as either a sequence or a mapping (but not both) while at the same time keeping the code compatible with Python 3.9 or before. I also think that using flag attributes that are set to True or False (instead of using a bitmap of flags, which is obscure to many Python users) solves this problem nicely. Using separate flag attributes happens to lead to different semantics than the flags-bitmap approach in the case of multiple inheritance. Given that one *can* inherit from both Sequence and Mapping, having separate flags seems slightly better than the flags-bitmap approach. It wasn't enough to convince me earlier, but the other advantage does convince me: separate flag attributes are better than using a flags-bitmap. Now, if it weren't for other issues, having no flags at all here but just signalling the applicable pattern kinds through inheritance from collections.abc.{Sequence,Mapping} would be even cleaner. But we do have other issues: (a) the exceptions for str, bytes, bytearray, and (b) the clumsiness of importing collections.abc (which is Python code) deep in the ceval main loop. So some explicit form of signalling this is fine -- and for classes that explicitly inherit from Sequence or Mapping will get it for free that way.
Nope. If so, then what is the mechanism for handling the `WhatIsIt` class?
If not, then you loose the ability to make a single test to determine which patterns can apply.
Translating the flag attributes to bits in tp_flags (or in a new flags variable elsewhere in the type object) would still allow a pretty fast test. And needing to support overlapping subsets of the cases is not unique to this situation, after all a class may well be a sequence *and* have attributes named x, y and z.
I wish I knew of a single instance where PEP 634 and PEP 653 actually differ.
I'd like to see where those differences are -- then we can talk about which is better. :-)
The code in the examples is Python, not pseudo-code. That might be easier to follow.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, On 31/03/2021 9:53 pm, Guido van Rossum wrote:
On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:
[snip]
Almost all the changes come from requiring __match_args__ to be a tuple of unique strings. The only other change is that case int(real=0+0j, imag=0-0j): fails to match 0, because `int` is `MATCH_SELF` so won't match attributes. https://github.com/python/cpython/compare/master...markshannon:pep-653-imple... Cheers, Mark.

On Thu, Apr 1, 2021 at 2:18 PM Mark Shannon <mark@hotpy.org> wrote:
Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work, and I don't see anything semantically suspect in allowing that.
Oh, but that would be a problem. The intention wasn't that "self" mode prevents keyword/attribute matches. (FWIW real and imag should attributes should not be complex numbers, so that testcase is weird, but it should work.)
https://github.com/python/cpython/compare/master...markshannon:pep-653-imple...
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 4/1/2021 9:38 PM, Guido van Rossum wrote:
The current posted PEP does not say 'unique' and I agree with Guido that it should not.
Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work,
The current near-Python code does not have such a check.
and I don't see anything semantically suspect in allowing that.
If I understand the current pseudocode correctly, the effect of 's' appearing twice in 'C.__match_args__ would be to possibly look up and assign C.s to two different names in a case pattern. I would not be surprised if someone someday tries to do this intentionally. Except for the repeated lookup, it would be similar to a = b = C.s. This might make sense if C.s is mutable. Or the repeated lookups could yield different values. -- Terry Jan Reedy

On Thu, Apr 1, 2021 at 8:01 PM Terry Reedy <tjreedy@udel.edu> wrote:
(Of course, "the current PEP" is highly ambiguous in this context.) Well, now I have egg on my face, because the current implementation does reject multiple occurrences of the same identifier in __match_args__. We generate an error like "TypeError: C() got multiple sub-patterns for attribute 'a'". However, I cannot find this uniqueness requirement in PEP 634, so I think it was a mistake to implement it. Researching this led me to find another issue where PEP 634 and the implementation differ, but this time it's the other way around: PEP 634 says about types which accept a single positional subpattern (int(x), str(x) etc.) "for these types no keyword patterns are accepted." Mark's example `case int(real=0, imag=0):` makes me think this requirement is wrong and I would like to amend PEP 634 to strike this requirement. Fortunately, this is not what is implemented. E.g. `case int(1, real=1):` is accepted and works, as does `case int(real=0):`. Calling out Brandt to get his opinion. And thanks to Mark for finding these!
Again, I'm not sure what "the current near-Python code" refers to. From context it seems you are referring to the pseudo code in Mark's PEP 653.
Yes, and this could even be a valid backwards compatibility measure, if a class used to have two different attributes that would in practice never differ, the two attributes could be merged into one, and someone might have a pattern capturing both, positionally. That should keep working, and having a duplicate in __match_args__ seems a clean enough solution. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 4/2/2021 12:02 AM, Guido van Rossum wrote:
On Thu, Apr 1, 2021 at 8:01 PM Terry Reedy <tjreedy@udel.edu
The current near-Python code does not have such a check.
Again, I'm not sure what "the current near-Python code" refers to. From context it seems you are referring to the pseudo code in Mark's PEP 653.
Yes, the part I read was legal Python + $ variables + FAIL. I should have included 'pseudo'.

Guido van Rossum wrote:
The current implementation will reject any attribute being looked up more than once, by position *or* keyword. It's actually a bit tricky to do, which is why the `MATCH_CLASS` op is such a beast... it needs to look up positional and keyword attributes all in one go, keeping track of everything it's seen and checking for duplicates. I believe this behavior is a holdover from PEP 622:
The interpreter will check that two match items are not targeting the same attribute, for example `Point2d(1, 2, y=3)` is an error.
(https://www.python.org/dev/peps/pep-0622/#overlapping-sub-patterns) PEP 634 explicitly disallows duplicate keywords, but as far as I can tell it says nothing about duplicate `__match_args__` or keywords that also appear in `__match_args__`. It looks like an accidental omission during the 622 -> 634 rewrite. (I guess I figured that if somebody matches `Spam(foo, y=bar)`, where `Spam.__match_args__` is `("y",)`, that's probably a bug in the user's code. Ditto for `Spam(y=foo, y=bar)` and `Spam(foo, bar)` where `Spam.__match_args__` is `("y", "y")` But it's not a hill I'm willing to die on.) I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it.

Hi Brandt, On 02/04/2021 7:19 am, Brandt Bucher wrote:
Repeated keywords do seem likely to be a bug. Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time. So how about explicitly disallowing those, but not checking that the intersection of `__match_args__` and keywords is empty? We would get most of the error checking without the performance impact.
I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it.
PEP 634 allows it. PEP 653 currently forbids it, mainly for consistency reasons. The purpose of self-matching is to prevent deconstruction, so it seems inconsistent to allow it for keyword arguments. Are there are any use-cases? The test-case `int(real=0+0j, imag=0-0j)` is contrived, but I'm struggling to come up with less contrived examples for any of float, list, dict, tuple, str. Cheers, Mark.

On Fri, Apr 2, 2021 at 3:38 AM Mark Shannon <mark@hotpy.org> wrote:
Agreed. But as I sketched in a previous email I think duplicates ought to be acceptable in __match_args__. At the very least we should align the PEP and the implementation here, by adjusting one or the other. Most checks are cheap though.
Checking for duplicates in `__match_args__` can be done at class creation time,
Hm, what about dynamic updates to __match_args__? I've done that in the REPL.
and checking for duplicates in the pattern can be done at compile time.
I'd prefer not to do that check at all.
+1 on the latter (not checking the intersection).
The purpose of self-matching is user convenience. It should be seen as a shorthand for the code fragment in PEP 634 showing how to do this for any class.
There could be a subclass that adds an attribute. That's still contrived though. But if we start supporting this for *general* classes we should allow combining it with keywords/attributes. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon wrote:
PEP 634 says:
For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject; for these types no keyword patterns are accepted.
(https://www.python.org/dev/peps/pep-0634/#class-patterns)
Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time.
I assume the compile-time check only works for named keyword attributes. The current implementation already does this. -1 on checking `__match_args__` anywhere other than the match block itself. Guido van Rossum wrote:
I could see the case for something like `case defaultdict({"Spam": s}, default_factory=f)`. I certainly don't think it should be forbidden.

On Fri, Apr 2, 2021 at 12:43 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
But that's not what the implementation does. It still supports keyword patterns for these types -- and (as I've said earlier in this thread) I think the implementation is correct.
Agreed. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Guido, On 02/04/2021 10:05 pm, Guido van Rossum wrote:
Why? (I also asked Brandt this) It is far more efficient to check `__match_args__` at class creation (or class attribute assignment) time. The most efficient way to check in the match block is to check at class creation time anyway and store a flag whether `__match_args__` is legal. In the match block we would check this flag, then proceed. It seems silly to know that there will be a runtime error, but not act on that information, allowing latent bugs could have been reported. Cheers, Mark.

On Sat, Apr 3, 2021 at 4:20 AM Mark Shannon <mark@hotpy.org> wrote:
Okay, now we're talking. If you check it on both class definition and at attribute assignment time I think that's fine (now that it's a tuple). But I don't think the specification (in whatever PEP) needs to specify that it *must* be checked at that time. So I think the current implementation is fine as well (once we change it to accept only tuples).
Yeah, nice optimization.
It seems silly to know that there will be a runtime error, but not act on that information, allowing latent bugs could have been reported.
Well, usually that is The Python Way. There are a lot of things that could be detected statically quite easily (without building something like mypy) but that aren't. Often that's due to historical accidents (in the past we were even less able to do the simplest static checks), so it's fine to do this your way. BTW we previously discussed whether `__match_args__` can contain duplicates. I thought the PEP didn't state either way, but I was wrong: it explicitly disallows it, matching the implementation. PEP 634 says on line 503: ``` - For duplicate keywords, ``TypeError`` is raised. ``` Given that there is no inconsistency here, I am inclined to keep it that way. If we find a better use case to allow duplicates we can always loosen up the implementation; it's not so simple the other way around. FWIW I am also submitting https://github.com/python/peps/pull/1909 to make `__match_args__` a tuple only, which we all seems to agree on. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Hi Brandt, On 02/04/2021 8:41 pm, Brandt Bucher wrote:
I was relying on the "reference" implementation, which is also in the PEP.
I take this as +1 for having more precisely defined semantics for pattern matching :)
I'm curious, why? It is much faster *and* gives better error messages to check `__match_args__` at class creation time.
It is forbidden in the PEP, as written, correct? OOI, have you changed your mind, or was that an oversight in the original? Cheers, Mark.

On Sat, Apr 3, 2021 at 4:15 AM Mark Shannon <mark@hotpy.org> wrote:
But it's nor normative. However...
In this case I propose adjusting the PEP text. See https://github.com/python/peps/pull/1908
I take this as +1 for having more precisely defined semantics for pattern matching :)
Certainly I see it as +1 for having the semantics independently verified. [...]
I was surprised to find this phrase in the PEP, so I suspect that it was just a mistake when I wrote that section of the PEP. I can't find a similar restriction in PEP 622 (the original pattern matching PEP). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon said:
I was relying on the "reference" implementation, which is also in the PEP.
Can you please stop putting scare quotes around "reference implementation"? You've done it twice now, and it's been a weekend-ruiner for me each time. I've put months of work into writing and improving CPython's current pattern matching implementation, mostly on nights and weekends. I don't know whether it's intentional or not, but when you say things like that it instantly devalues all of my hard work in front of everyone on the list. For such a huge feature, I'm honestly quite amazed that this is the only issue we've found since it was merged over a month ago (and both authors have agreed that it needs to be fixed in the PEP, not the implementation). The PR introducing this behavior was reviewed by at least a half-dozen people, including you. The last time you said something like this, I just muted the thread. Let's please keep this respectful; we're all obviously committing a lot of our own time and energy to this, and we need to work well together for it to be successful in the long term. Brandt

On Sun, 4 Apr 2021 at 01:37, Brandt Bucher <brandtbucher@gmail.com> wrote:
Agreed - apart from the implication Brandt noted, it's also misleading. The code is in Python 3.10, so the correct term is "the implementation" (or if you want to be picky, "the CPython implementation"). To me, the term "reference implementation" implies "for reference, not yet released". At this point, we're discussing fixes to an implemented Python 3.10 feature, not tidying up a PEP. Paul

On Sun, Apr 4, 2021 at 6:20 PM Paul Moore <p.f.moore@gmail.com> wrote:
Normally, the term "reference implementation" means "the basis implementation that everything else is compared against". For instance, a compression algorithm might be published as a mathematical document, with a reference implementation in some language. It's then possible to create a new implementation in some other language, or more optimized, or whatever else; but to know whether it's giving the correct results, you compare its output to the output of the reference implementation. CPython is the reference implementation for the Python language. It's possible to have a discrepancy between the standard and the implementation, but it's still the reference implementation (just occasionally a buggy one). In this case, I believe that the term "reference implementation" is strictly accurate, and concur with Brandt's request to not discredit it by implying that it's only purporting to be one. ChrisA

Antoine Pitrou writes:
On Sun, 04 Apr 2021 00:34:18 -0000 "Brandt Bucher" <brandtbucher@gmail.com> wrote:
Can you please stop putting scare quotes
"Scare quotes" refers to an idiom English writers use to deprecate something. In what I wrote just above, the quotation marks indicate a focus on the *string* "scare quotes". In this case, the fact that they are the exact words of Brandt, and also that I'm defining those words, not using their meaning. In Mark's phrase '"reference" implementation', neither of those usages apply. It's possible that they are the deprecated "random quote emphasis" usage. Random quote emphasis is implausible here, however. I can see no reason why Mark would emphasize the modifier "reference" in this context. One of the most important remaining usages, and one that I find plausible in context, is scare quotes. These are quotation marks used to focus on the phrase in quotes, and indicate that it is somehow suspicious: inaccurate, imprecise, false, even the opposite of its dictionary meaning. In other words, if you don't have a reason to emphasize focus on the words themselves rather than their meaning, by adding (scare) quotes most likely you are turning a "reasonably polite expression" into an insult.
I'm probably missing something...
Probably so did a lot of native speakers; there are English dialects where scare quotes are rare and random quote emphasis is common. However, I assure you, many native speakers (along with a fair number of non-natives) did not. I neither know nor care what Mark's *intent* is. I'm explaining what (some) idiomatic speakers of English will read into what he writes, because it is a *common* idiom (common enough to have a name, and be mentioned in standard manuals of English style). Regards, Steve

Hi Brandt, On 04/04/2021 1:34 am, Brandt Bucher wrote:
I'm sorry for ruining your weekends. My intention, and I apologize for not making this clearer, was not to denigrate your work, but to question the implications of the term "reference". Calling something a "reference" implementation suggests that it is something that people can refer to, that is near perfectly correct and fills in the gaps in the specification. That is a high standard, and one that is very difficult to attain. It is why I use the term "implementation", and not "reference implementation" in my PEPs.
I've put months of work into writing and improving CPython's current pattern matching implementation, mostly on nights and weekends. I don't know whether it's intentional or not, but when you say things like that it instantly devalues all of my hard work in front of everyone on the list.
It definitely wasn't my intention.
For such a huge feature, I'm honestly quite amazed that this is the only issue we've found since it was merged over a month ago (and both authors have agreed that it needs to be fixed in the PEP, not the implementation). The PR introducing this behavior was reviewed by at least a half-dozen people, including you.
Indeed, I reviewed the implementation. I thought it was good enough to merge. I still think that.
The last time you said something like this, I just muted the thread. Let's please keep this respectful; we're all obviously committing a lot of our own time and energy to this, and we need to work well together for it to be successful in the long term.
Please don't take my criticisms of PEP 634 as criticisms of you or your efforts. I know it can often sound like that, but that really isn't my intent. Pattern matching is a *big* new feature, and to get it right takes a lot of discussion. Having your ideas continually battered is no fun, I know. So, I'd like to apologize again for any hurt caused. Cheers, Mark.

Mark Shannon writes:
Shoe fits, doesn't it? Both Guido and Brandt to my recall have specifically considered the possibility that the implementation is the better design, and therefore that the PEP should be changed.
That is a high standard, and one that is very difficult to attain.
That depends on context, doesn't it? In the case of a public release, yes, it's a very high standard. In the context of a feature in development, it *cannot* be that high, because even when the spec and the implementation are in *perfect* agreement, both may be changed in the light of experience or a change in requirements. Furthermore, in this instance, the implementation achieves *your* standard (Brandt, again):
both authors have agreed that it needs to be fixed in the PEP, not the implementation
You added:
It is why I use the term "implementation", and not "reference implementation" in my PEPs.
A reasonable usage. I think my more flexible, context-dependent definition is more useful. Unmodified, the word "implementation" covers everything from unrunnable pseudo-code to the high standard of a public release that is officially denoted "reference implementation". On the other hand, when Brandt says that a merge request is a "reference implementation", I interpret that to be a claim that, to his knowledge the MR is a perfect implementation of the specification, and an invitation to criticize the specification by referring to that implementation. That's a strong claim, even in my interpretation. However, I think that if the developer dares to make it, it's very useful to reviewers. As it was in this case. Final note: once this is merged and publicly released, it will lose its status as reference implementation in the above, strong sense. Any deviations from documented spec (the Language Reference) will be presumed to have to be fixed in the implementation (with due consideration for backward compatibility). "Although practicality beats purity," of course, but treating the Language Reference as authoritative is strongly preferred to keeping the implementation and modifying the Reference (at least as I understand it). Regards, Steve

On Sun, 4 Apr 2021 at 13:49, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Final note: once this is merged and publicly released, it will lose its status as reference implementation in the above, strong sense.
It *is* merged and publicly released - it's in the latest 3.10 alpha. That's really the point I was trying to make with my comment (I'm steering clear of the "scare quotes" discussion). The fact that the implementation kept getting referred to as the "reference implementation" confused me into thinking it hadn't been released yet, and that simply isn't true. Calling it "the implementation" avoids that confusion, IMO. Paul

Paul Moore writes:
It *is* merged and publicly released - it's in the latest 3.10 alpha.
Merged, yes, but in my terminology alphas, betas, and rcs aren't "public releases", they're merely "accessible to the public". (I'm happy to adopt your terminology when you're in the conversation, I'm just explaining what I meant in my previous post.)
The only thing I understand in that paragraph is "that [it hadn't been released yet] simply isn't true", which is true enough on your definition of "released". But why does "reference implementation" connote "unreleased"? That seems to be quite different from Mark's usage. I don't have an objection to your usage, I'd just like us all to converge on a set of terms so that Brandt has a compact way of saying "as far as I know, for the specification under discussion this implementation is completely accurate and folks are welcome to refer to the PEP, to the code, or to divergences as seems appropriate to them". I'm not sure if that's exactly what Brandt meant by "reference implementation", but that's how I understood it. Steve

On Tue, 6 Apr 2021 at 06:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
*shrug* It's (in my experience) a continuum - it's not in a release yet, but it is available via an installer, with a (pre-release) version number. But I get what you're saying and don't disagree, to be honest. I see the discrepancy as mostly being because we're trying to use (imprecise) informal language to pin down precise nuances. The main point I was making is that it's merged into the CPython source code at this stage, and available for people to download and experiment with, which is something I was unclear about.
In my experience, people developing PEPs will sometimes provide what gets referred to as a "reference implementation" of the proposal, which is a PR or equivalent that people can apply and try out if they want to see how the proposal works in practice. That "reference implementation" is generally seen as part of the *proposal*, even if it then becomes the final merged code as well. Once it's released, it tends to no longer get called the *reference* implementation, as it's now just the implementation (in CPython) of the feature. PEP 1 uses this terminology, as well - "Standards Track PEPs consist of two parts, a design document and a reference implementation" and "Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and incorporated into the main source code repository, the status will be changed to "Final"". PEP 635 follows this terminology, with a "Reference implementation" section linking to the development branch for the feature. To put this back into the context of this discussion, when Mark was referring to the "reference implementation" it made me think that maybe we were talking about that development branch, and that the code for the pattern matching PEP hadn't yet been merged to the main branch, which is why we were still iterating over implementation details. And that led me to think that they'd better get the discussion resolved soon, as they risk missing the 3.10 deadline if things drag on. Which *isn't* the case, and if I'd been following things more closely I'd have known that, but avoiding the term "reference implementation" for the merged change would also have spared my confusion.
Agreed, a common understanding is the main thing here. And as I'm not an active participant in the discussion, and I now understand the situation, my views shouldn't have too much weight in deciding what the best terminology is. Paul

Hi Mark. Thanks for your reply, I really appreciate it. Mark Shannon said:
Interesting. The reason I typically include a "Reference Implementation" section in my PEPs is because they almost always start out as a copy-paste of the template in PEP 12 (which also appears in PEP 1): https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep https://www.python.org/dev/peps/pep-0012/#suggested-sections Funny enough, PEP 635 has a "Reference Implementation" section, which itself refers to the implementation as simply a "feature-complete CPython implementation": https://www.python.org/dev/peps/pep-0635/#reference-implementation (PEP 634 and PEP 636 don't mention the existence of an implementation at all, as far as I can tell.) It's not a huge deal, but we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here" (which is what I've usually used the section to communicate). Brandt

On 7/04/21 5:22 am, Brandt Bucher wrote:
we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here"
Maybe "prototype implementation" would be better? I think I've used that term in PEPs before. -- Greg

Greg Ewing writes:
That seems to me to correspond well to Brandt's standard as expressed above. To me, "prototype implementation" is somewhere between "proof of concept" and "reference implementation", and I welcome the additional precision. The big question is can such terms be used accurately (ie, do various people assign similar meanings to them)? I would define them functionally as proof of concept demonstrates some of the features, especially those that were considered "difficult to implement" prototype implementation implements the whole spec, so can be used be developers to prototype applications, reference implementation intended to be a complete and accurate implementation of the specification By "complete and accurate" I mean that it can be used experimentally to understand what the spec means without much worry that the proponent will brush off questions with "oh, that's just not implemented yet, read the spec if you want to know how it will work when we're done." Furthermore, any divergence between spec and implementation is a bug that is actually a broken promise. (The promise implied by "reference".) Finally, as development continues there is a promise that the spec and implementation will be kept in sync (of course changes might be provisional, but even then the sync should be maintained). I don't think the Platonic ideal interpretation of "reference implementation" is very useful. Software evolves. It evolves very quickly during initial development, but it's useful to "ask the implementation" about the spec even then. That's implied by methodologies like test-driven development. There are other workflows where that's not true. My claim is that "reference implementation" can be useful to distinguish development processes where you expect the implementation to reliably reflect the spec, even in corner cases, from those where you shouldn't. And even as the software evolves. Note that if we use this definition, then the "Reference Implementation" requirement of the PEP process becomes quite a high bar. I think we all agree on that. So I advocate, as Brandt suggested, that we revise the PEP template. In particular I think it should use Greg's term "prototype implementation". Optionally, we could make "reference implementation" available to proponents who wish to make that claim about their implementation. Steve

On Wed, 7 Apr 2021 at 06:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
I'm OK with these terms (although I don't actually think you *will* get sufficient consensus on them to make them unambiguous) but with one proviso - once the implementation is merged into the CPython source, I think it should simply be referred to as "the implementation" and qualifiers should be unnecessary (and should be considered misleading). Paul

Hi Guido, On 02/04/2021 2:38 am, Guido van Rossum wrote:
Checking for uniqueness is almost free because __match_args__ is a tuple, and therefore immutable, so the check can be done at class creation time.
I thought matching `int(real=0+0j, imag=0-0j)` was a bit weird too. The change required to make it work is trivial, but the code seems more consistent if `int(real=0+0j, imag=0-0j)` is disallowed, which is why I went for that.

Let me clarify: these two attributes do not interact with one another; each attribute only interacts with its own flag on the type. It is perfectly possible to do: ``` class WhatIsIt: __match_map__ = True __match_seq__ = True ``` This will set both flags, and this `WhatIsIt` will match as a mapping *and* a sequence. This is allowed and works in PEP 634, but like Guido I'm not entirely opposed to making the matching behavior of such a class undefined against sequence or mapping patterns.
What *is* the expected behavior of this? Based on the current behavior of PEP 634, I would expect the `__match_container__` of each base to be or'ed, and something like this to match as both a mapping and a sequence (which PEP 653 says leads to undefined behavior). The actual behavior seems more like it will just be a sequence and not a mapping, since `__match_container__` would be inherited from `MySeq` and `MyMap` would be ignored. In the interest of precision, here is an implementation of *exactly* what I am thinking: `typeobject.c`: https://github.com/python/cpython/compare/master...brandtbucher:patma-flags#... `ceval.c`: https://github.com/python/cpython/compare/master...brandtbucher:patma-flags#... (One change from my last email: it doesn't allow `__match_map__` / `__match_seq__` to be set to `False`... only `True`. This prevents some otherwise tricky multiple-inheritance edge-cases present in both of our flagging systems that I discovered during testing. I don't think there are actual use-cases for unsetting the flags in subclasses, but we can revisit that later if needed.)

On Wed, Mar 31, 2021 at 2:14 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
That's surprising to me. Just like we can have a class that inherits from int but isn't hashable, and make that explicit by setting `__hash__ = None`, why couldn't I have a class that inherits from something else that happens to inherit from Sequence, and say "but I don't want it to match like a sequence" by adding `__match_sequence__ = False`? AFAIK all Mark's versions would support this by setting `__match_kind__ = 0`. Maybe you can show an example edge case where this would be undesirable? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Guido van Rossum wrote:
The issue isn't when *I* set `__match_seq__ = False` or `__match_container__ = 0`. It's when *one of my parents* does it that things become difficult.
Maybe you can show an example edge case where this would be undesirable?
Good idea. I've probably been staring at this stuff for too long to figure it out myself. :) As far as I can tell, these surprising cases arise because a bit flag can only be either 0 or 1. For us, "not specified" is equivalent to 0, which can lead to ambiguity. Consider this case: ``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE class Parent: pass class Child(Parent, Seq): pass ``` Okay, cool. `Child` will match as a sequence, which seems correct. But what about this similar case? ``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE class Parent: __match_seq__ = False # or __match_container__ = 0 class Child(Parent, Seq): pass ``` Here, `Child` will *not* match as a sequence, even though it probably should. The only workarounds I've found (like allowing `None` to mean "this is unset, don't inherit me if another parent sets this flag", ditching tp_flags entirely, or not inheriting these attributes) feel a bit extreme just to allow some users to do the moral equivalent of un-subclassing `collections.abc.Sequence`. So, my current solution (seen on the branch linked in my earlier email) is: - Set the flag if the corresponding magic attribute is set to True in the class definition - Raise at class definition time if it's set to anything other than True - Otherwise, set the flag if any of the parents set have the flag set As far as I can tell, this leads to the expected (and current, as of 3.10.0a6) behavior in all cases. Plus, it doesn't break my mental model of how inheritance works.

Here, `Child` will *not* match as a sequence, even though it probably should,
Strong disagree, if I explicitly set `__match_seq__` to `False` in `Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.
- Raise at class definition time if it's set to anything other than True
I feel like this is a consenting adults thing. Yeah you probably won't need to set a flag to `False` but I don't see why it should be forbidden. On Wed, Mar 31, 2021 at 3:35 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

On Thu, Apr 1, 2021 at 11:54 AM Caleb Donovick <donovick@cs.stanford.edu> wrote:
Here, `Child` will *not* match as a sequence, even though it probably should,
Strong disagree, if I explicitly set `__match_seq__` to `False` in `Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.
How much difference is there between: class Grandparent: """Not a sequence""" class Parent(Grandparent): """Also not a sequence""" class Child(Parent): """No sequences here""" and this: class Grandparent(list): """Is a sequence""" class Parent(Grandparent): """Explicitly not a sequence""" __match_seq__ = False class Child(Parent): """Shouldn't be a sequence""" ? Either way, Parent should function as a non-sequence. But if Child inherits from both Parent and tuple, it is most definitely a tuple, and therefore should be a sequence. With your proposed semantics, setting __match_seq__ to False is not simply saying "this isn't a sequence", but it's saying "prevent this from being a sequence". It's a stronger statement than simply undoing the declaration that it's a sequence. There would be no way to reset to the default state. Brandt's proposed semantics sound complicated, but as far as I can tell, they give sane results in all cases. ChrisA

How is this different from anything else that is inherited? The setting of a flag to `False` is not some irreversible process which permanently blocks child classes from setting that flag to `True`. If I want to give priority to `Seq` over `Parent` in Brandt's original example I need only switch the order of inheritance so that `Seq` is earlier in `Child` MRO or explicitly set the flag to `True` (or `Seq.__match_seq__`). In contrast Brandt's scheme does irreversibly set flags, there is no way to undo the setting of `__match_seq__` in a parent class. This really doesn't seem like an issue to me. I can't personally think of a use case for explicitly setting a flag to `False` but I also don't see why it should be forbidden. We get "- Otherwise, set the flag if any of the parents set have the flag set" for free through normal MRO rules except in the case where there is an explicit `False` (which I assume will be exceedingly rare and if it isn't there is clearly some use case). Why make it more complicated? On Wed, Mar 31, 2021 at 6:05 PM Chris Angelico <rosuav@gmail.com> wrote:

On 31/03/2021 11:31 pm, Brandt Bucher wrote:
This is just a weird case, so I don't think we should worry about it too much.
Inheritance in Python is based on the MRO (using the C3 linearization algorithm) so my mental model is that Child.__match_container__ == 0. Welcome the wonderful world of multiple inheritance :) If Parent.__match_container__ == 0 (rather than just inheriting it) then it is explicitly stating that it is *not* a container. Seq explicitly states that it *is* a sequence. So Child is just broken. That it is broken for pattern matching is consistent with it being broken in general. Cheers, Mark.

On Tue, 30 Mar 2021 at 17:32, Brandt Bucher <brandtbucher@gmail.com> wrote: Hi Brandt,
Speaking as a maintainer of SymPy I do support the PEP but not for SymPy specifically. I just used SymPy as an example of something that seems like it should be a good fit for pattern matching but also shows examples that don't seem to work with PEP 634 in the way intended. I'm sure SymPy will use case/match when support for Python 3.9 is dropped but I don't see it as something that would be a major feature for SymPy users or for internal code. I expect that case/match would make some code tidier and potentially it could make some things a little faster (although that depends on it being well optimised - half a microsecond might seem small until you add up millions of them). There is a recently opened SymPy issue discussing the possible use of this: https://github.com/sympy/sympy/issues/21193 Pattern matching and destructuring more generally are significant features for symbolic libraries such as SymPy which has much code for doing this and can also be used with other dedicated libraries such as matchpy. Much more is needed than case/match for that though: rewriting, substitution, associative/commutative matching etc. It's not clear to me that core Python could ever provide anything new that would lead to a groundbreaking improvement for SymPy in this respect. The surrounding discussion of the various pattern matching PEPs has led me to think of the idea of destructuring as more of a general language feature that might not in future be limited to case/match though. I'm not sure where that could go for Python but I'm interested to see if anything more comes of it. I like a lot of the features in PEP 634 and the way I see it this PEP (653) underpins those. The reason I support PEP 653 is because it seems like a more principled approach to the mechanism for how pattern-matching should work that places both user-defined types and builtin types on an even footing. The precise mechanisms (match_class, match_self etc) and their meanings do seem strange but that's because they are trying to codify the different cases that PEP 634 has introduced. It's possible that the design of that mechanism can be improved and there have been suggestions for that in this thread. I do think though that it is important to have a general extensible mechanism rather than a specification based on special cases. I also think that the use of the Sequence and Mapping ABCs is a bad idea on practical grounds (performance, circularity in the implementation) and is not in keeping with the rest of the language. ABCs have always been optional in the past: Python uses protocols rather than ABCs (ducktyping etc). Finally, speaking as someone who also teaches introductory programming with Python then with *that* hat on I would have preferred it if none of the pattern-matching PEPs had been accepted. The advantage of Python in having a simple and easily understood core erodes with each new addition to core syntax. For novice users case/match only really offers increased complexity compared to if/elif but it will still be something else that needs to be learned before being able to read existing code. Oscar Oscar

Hi Mark, I also wanted to give some feedback on this. While most of the discussion so far has been about the matching of the pattern itself I think it should also be considered what happens in the block below. Consider this code: ``` m = ... match m: case [a, b, c] as l: # what can we safely do with l? ``` or in terms of the type system: What is the most specific type that we can know l to be? With PEP 634 you can be sure that l is a sequence and that it's length is 3. With PEP 653 this is currently not explicitly defined. Judging from the pseudo code we can only assume that l is an iterable (because we use it in an unpacking assignment) and that it's length is 3, which greatly reduces the operations that can be safely done on l. For mapping matches with PEP 634 we can assume that l is a mapping. With PEP 653 all we can assume is that it has a .get method that takes two parameters, which is even more restrictive, as we can't even be sure if we can use len(), .keys, ... or iterate over it. This also makes it a lot harder for static type checkers to check match statements, because instead of checking against an existing type they now have to hard-code all the guarantees made my the match statement or not narrow the type at all. Additionally consider this typed example: ``` m: Mapping[str, int] = ... match m: case {'version': v}: pass ``` With PEP 634 we can statically check that v is an int. With PEP 653 there is no such guarantee. Therefore I would strongly be in favor of having sequence and mapping patterns only match certain types instead of relying on dunder attributes. If implementing all of sequence is really to much work just to be matched by a sequence pattern, as PEP 653 claims, then maybe a more general type could be chosen instead. I don't have any objections against the other parts of the PEP. Adrian Freund On 3/27/21 2:37 PM, Mark Shannon wrote:
participants (14)
-
Adrian Freund
-
Antoine Pitrou
-
Brandt Bucher
-
Caleb Donovick
-
Chris Angelico
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Mark Shannon
-
Nick Coghlan
-
Oscar Benjamin
-
Paul Moore
-
Stephen J. Turnbull
-
Terry Reedy