Mailman 3 Request for comments on final version of PEP 653 (Precise Semantics for Pattern Matching) - Python-Dev

newer
How to attract developer attention...

Request for comments on final version of PEP 653 (Precise Semantics for Pattern Matching)

Mark Shannon

March 27, 2021

1:37 p.m.

Hi everyone, As the 3.10 beta is not so far away, I've cut down PEP 653 down to the minimum needed for 3.10. The extensions will have to wait for 3.11. The essence of the PEP is now that: 1. The semantics of pattern matching, although basically unchanged, are more precisely defined. 2. The __match_kind__ special attribute will be used to determine which patterns to match, rather than relying on the collections.abc module. Everything else has been removed or deferred. The PEP now has only the slightest changes to semantics, which should be undetectable in normal use. For those corner cases where there is a difference, it is to make pattern matching more robust. E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not. As always, all thoughts and comments are welcome. Cheers, Mark.

Show replies by date

Oscar Benjamin

March 2021

4:19 p.m.

On Sat, 27 Mar 2021 at 13:40, Mark Shannon <mark@hotpy.org> wrote:

...

Hi Mark, Thanks for putting this together.

...

As the 3.10 beta is not so far away, I've cut down PEP 653 down to the minimum needed for 3.10. The extensions will have to wait for 3.11.

The essence of the PEP is now that:

1. The semantics of pattern matching, although basically unchanged, are more precisely defined.

2. The __match_kind__ special attribute will be used to determine which patterns to match, rather than relying on the collections.abc module.

Everything else has been removed or deferred.

It would take me some time to compare exactly how this differs from the current state after PEP 634 but I certainly prefer the object-model based approach. It does seem that there are a lot of permutations of how matching works but I guess that's just trying to tie up all the different cases introduced in PEP 634.

...

The PEP now has only the slightest changes to semantics, which should be undetectable in normal use. For those corner cases where there is a difference, it is to make pattern matching more robust.

Maybe I misunderstood but it looks to me as if this (PEP 653) changes the behaviour of a mapping pattern in relation to extra keys. In PEP 634 extra keys in the target are ignored e.g.: obj = {'a': 1, 'b': 2} match(obj): case {'a': 1}: # matches obj because key 'b' is ignored In PEP 634 the use of **rest is optional if it is desired to catch the other keys but does not affect matching. Here in PEP 653 there is the pseudocode: # A pattern not including a double-star pattern: if $kind & MATCH_MAPPING == 0: FAIL if $value.keys() != $KEYWORD_PATTERNS.keys(): FAIL My reading of that is that all keys would need to be match unless **rest is used to absorb the others. Is that an intended difference? Personally I prefer extra keys not to be ignored by default so to me that seems an improvement. If intentional then it should be listed as another semantic difference though.

...

E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not.

As I understood it this proposes that match obj: should use the class attribute type(obj).__match_kind__ to indicate whether the object being matched should be considered a sequence or a mapping or something else rather than using isinstance(obj, Sequence) and isinstance(obj, Mapping). Is there a corner case here where an object can be both a Sequence and a Mapping? (How does PEP 634 handle that?) Not using the Sequence and Mapping ABCs is good IMO. I'm not aware of other core language features requiring the use of ABCs. In SymPy we have specifically avoided them because they slow down isinstance checking (this is measurable in the time taken to run the whole test suite). Using the ABCs in PEP 634 seems surprising given that the original pattern matching PEP actually listed the performance impact of isinstance checks as part of the opening motivation. Maybe the ABCs can be made faster but either way using them like this seems not in keeping with the rest of the language. Oscar

Mark Shannon

5:09 p.m.

Hi Oscar, Thanks for the feedback. On 27/03/2021 4:19 pm, Oscar Benjamin wrote:

...

On Sat, 27 Mar 2021 at 13:40, Mark Shannon <mark@hotpy.org> wrote:

...
Hi Mark,

Thanks for putting this together.

...
As the 3.10 beta is not so far away, I've cut down PEP 653 down to the minimum needed for 3.10. The extensions will have to wait for 3.11.

The essence of the PEP is now that:

1. The semantics of pattern matching, although basically unchanged, are more precisely defined.

2. The __match_kind__ special attribute will be used to determine which patterns to match, rather than relying on the collections.abc module.

Everything else has been removed or deferred.

It would take me some time to compare exactly how this differs from the current state after PEP 634 but I certainly prefer the object-model based approach. It does seem that there are a lot of permutations of how matching works but I guess that's just trying to tie up all the different cases introduced in PEP 634.

...
The PEP now has only the slightest changes to semantics, which should be undetectable in normal use. For those corner cases where there is a difference, it is to make pattern matching more robust.

Maybe I misunderstood but it looks to me as if this (PEP 653) changes the behaviour of a mapping pattern in relation to extra keys. In PEP 634 extra keys in the target are ignored e.g.:

obj = {'a': 1, 'b': 2} match(obj): case {'a': 1}: # matches obj because key 'b' is ignored

In PEP 634 the use of **rest is optional if it is desired to catch the other keys but does not affect matching. Here in PEP 653 there is the pseudocode:

# A pattern not including a double-star pattern: if $kind & MATCH_MAPPING == 0: FAIL if $value.keys() != $KEYWORD_PATTERNS.keys(): FAIL

I missed that when updating the PEP, thanks for pointing it out. It should be the same as for double-star pattern: if not $value.keys() >= $KEYWORD_PATTERNS.keys(): FAIL I'll update the PEP.

...

My reading of that is that all keys would need to be match unless **rest is used to absorb the others.

Is that an intended difference?

Personally I prefer extra keys not to be ignored by default so to me that seems an improvement. If intentional then it should be listed as another semantic difference though.

I don't have a strong enough opinion either way. I can see advantages to both ways of doing it.

...

...
E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not.

As I understood it this proposes that match obj: should use the class attribute type(obj).__match_kind__ to indicate whether the object being matched should be considered a sequence or a mapping or something else rather than using isinstance(obj, Sequence) and isinstance(obj, Mapping). Is there a corner case here where an object can be both a Sequence and a Mapping? (How does PEP 634 handle that?)

If you define a class as a subclass of both collections.abc.Sequence and collections.abc.Mapping, then PEP 634 will treat it as both sequence and mapping, meaning it has to try every pattern. That prevents the important (IMO) optimization of checking the kind only once. Cheers, Mark.

...

Not using the Sequence and Mapping ABCs is good IMO. I'm not aware of other core language features requiring the use of ABCs. In SymPy we have specifically avoided them because they slow down isinstance checking (this is measurable in the time taken to run the whole test suite). Using the ABCs in PEP 634 seems surprising given that the original pattern matching PEP actually listed the performance impact of isinstance checks as part of the opening motivation. Maybe the ABCs can be made faster but either way using them like this seems not in keeping with the rest of the language.

Oscar

Guido van Rossum

10:15 p.m.

Hi Mark, Reading that spec will take some time. Can you please summarize the differences in English, in a way that is about as precise as PEP 634? I have some comments inline below as well. On Sat, Mar 27, 2021 at 10:16 AM Mark Shannon <mark@hotpy.org> wrote:

...

Hi Oscar,

Thanks for the feedback.

On 27/03/2021 4:19 pm, Oscar Benjamin wrote:

...
On Sat, 27 Mar 2021 at 13:40, Mark Shannon <mark@hotpy.org> wrote:

...
Hi Mark,

Thanks for putting this together.

...
As the 3.10 beta is not so far away, I've cut down PEP 653 down to the minimum needed for 3.10. The extensions will have to wait for 3.11.

The essence of the PEP is now that:

1. The semantics of pattern matching, although basically unchanged, are more precisely defined.

2. The __match_kind__ special attribute will be used to determine which patterns to match, rather than relying on the collections.abc module.

Everything else has been removed or deferred.

It would take me some time to compare exactly how this differs from the current state after PEP 634 but I certainly prefer the object-model based approach. It does seem that there are a lot of permutations of how matching works but I guess that's just trying to tie up all the different cases introduced in PEP 634.

It would be simpler if this was simply an informational PEP without proposing new features -- then we wouldn't have to rush. You could then propose the new __match_kind__ attribute in a separate PEP, written more in the style of PEP 634, without pseudo code. I find it difficult to wrap my head around the semantics of __match_kind__ because it really represents a few independent flags (with some constraints) but all the text is written using explicit, hard-to-read bitwise and/or operations. Let me give it a try. - Let's call the four flag bits by short names: SEQUENCE, MAPPING, DEFAULT, SELF. SEQUENCE and MAPPING are for use when an instance of a class appears in the subject position (i.e., for `match x`, we look for these bits in `type(x).__match_kind__`). Neither of these is set by default. At most one of them should be set. - If SEQUENCE is set, the subject is treated like a sequence (this is set for list, tuple and other sequences, but not for str, bytes and bytearray). - Similarly, MAPPING means the subject should be treated as a mapping, and is set for dict and other mapping types. The DEFAULT and SELF flags are for use when a class is used in a class pattern (i.e., for `case cls(...)` we look for these bits in `cls.__match_kind__`). At most one of these should be set. DEFAULT is set on class `object` and anything that doesn't explicitly clear it. - If DEFAULT is set, semantics of PEP 634 apply except for the special behavior enabled by the SELF flag. - If SELF is set, `case cls(x)` binds the subject to x, and no other forms of `case cls(...)` are allowed. - If neither DEFAULT nor SELF is set, `case cls(...)` does not take arguments at all. Please correct any misunderstandings I expressed here! (And please include some kind of summary like this in your PEP.) Also, I think that we should probably separate this out in two separate flag sets, one for subjects and one for class patterns -- it is pretty confusing to merge the flag sets into a single value when their applicability (subject or class pattern) is so different.

...

...
...
The PEP now has only the slightest changes to semantics, which should be undetectable in normal use. For those corner cases where there is a difference, it is to make pattern matching more robust.

Maybe I misunderstood but it looks to me as if this (PEP 653) changes the behaviour of a mapping pattern in relation to extra keys. In PEP 634 extra keys in the target are ignored e.g.:

obj = {'a': 1, 'b': 2} match(obj): case {'a': 1}: # matches obj because key 'b' is ignored

In PEP 634 the use of **rest is optional if it is desired to catch the other keys but does not affect matching. Here in PEP 653 there is the pseudocode:

# A pattern not including a double-star pattern: if $kind & MATCH_MAPPING == 0: FAIL if $value.keys() != $KEYWORD_PATTERNS.keys(): FAIL

I missed that when updating the PEP, thanks for pointing it out. It should be the same as for double-star pattern:

if not $value.keys() >= $KEYWORD_PATTERNS.keys(): FAIL

I'll update the PEP.

...
My reading of that is that all keys would need to be match unless **rest is used to absorb the others.

Is that an intended difference?

Personally I prefer extra keys not to be ignored by default so to me that seems an improvement. If intentional then it should be listed as another semantic difference though.

I don't have a strong enough opinion either way. I can see advantages to both ways of doing it.

Let's not change this. We carefully discussed and chose this behavior (ignore extra mapping keys, but don't ignore extra sequence items) for PEP 634 based on usability.

...

...
...
E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not.

As I understood it this proposes that match obj: should use the class attribute type(obj).__match_kind__ to indicate whether the object being matched should be considered a sequence or a mapping or something else rather than using isinstance(obj, Sequence) and isinstance(obj, Mapping). Is there a corner case here where an object can be both a Sequence and a Mapping? (How does PEP 634 handle that?)

If you define a class as a subclass of both collections.abc.Sequence and collections.abc.Mapping, then PEP 634 will treat it as both sequence and mapping, meaning it has to try every pattern. That prevents the important (IMO) optimization of checking the kind only once.

Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.)

...

Cheers, Mark.

...
Not using the Sequence and Mapping ABCs is good IMO. I'm not aware of other core language features requiring the use of ABCs. In SymPy we have specifically avoided them because they slow down isinstance checking (this is measurable in the time taken to run the whole test suite). Using the ABCs in PEP 634 seems surprising given that the original pattern matching PEP actually listed the performance impact of isinstance checks as part of the opening motivation. Maybe the ABCs can be made faster but either way using them like this seems not in keeping with the rest of the language.

I am fine with changing this one aspect of PEP 634. IIRC having separate SEQUENCE and MAPPING flags just for matching didn't occur to us during the design, and we strongly preferred some kind of type-based check over checking the presence of a specific attribute like `key`. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon

9:40 a.m.

Hi Guido, Thanks for the feedback. On 27/03/2021 10:15 pm, Guido van Rossum wrote:

...

Hi Mark,

Reading that spec will take some time. Can you please summarize the differences in English, in a way that is about as precise as PEP 634? I have some comments inline below as well.

On Sat, Mar 27, 2021 at 10:16 AM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

Hi Oscar,

Thanks for the feedback.

On 27/03/2021 4:19 pm, Oscar Benjamin wrote: > On Sat, 27 Mar 2021 at 13:40, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote: >> > > Hi Mark, > > Thanks for putting this together. > >> As the 3.10 beta is not so far away, I've cut down PEP 653 down to the >> minimum needed for 3.10. The extensions will have to wait for 3.11. >> >> The essence of the PEP is now that: >> >> 1. The semantics of pattern matching, although basically unchanged, are >> more precisely defined. >> >> 2. The __match_kind__ special attribute will be used to determine which >> patterns to match, rather than relying on the collections.abc module. >> >> Everything else has been removed or deferred. > > It would take me some time to compare exactly how this differs from > the current state after PEP 634 but I certainly prefer the > object-model based approach. It does seem that there are a lot of > permutations of how matching works but I guess that's just trying to > tie up all the different cases introduced in PEP 634.

It would be simpler if this was simply an informational PEP without proposing new features -- then we wouldn't have to rush.

It is about to close to that as I can get it. The change to using __match_kind__ requires some small changes to behaviour.

...

You could then propose the new __match_kind__ attribute in a separate PEP, written more in the style of PEP 634, without pseudo code.

I find it difficult to wrap my head around the semantics of __match_kind__ because it really represents a few independent flags (with some constraints) but all the text is written using explicit, hard-to-read bitwise and/or operations. Let me give it a try.

- Let's call the four flag bits by short names: SEQUENCE, MAPPING, DEFAULT, SELF.

SEQUENCE and MAPPING are for use when an instance of a class appears in the subject position (i.e., for `match x`, we look for these bits in `type(x).__match_kind__`). Neither of these is set by default. At most one of them should be set.

- If SEQUENCE is set, the subject is treated like a sequence (this is set for list, tuple and other sequences, but not for str, bytes and bytearray).

- Similarly, MAPPING means the subject should be treated as a mapping, and is set for dict and other mapping types.

The DEFAULT and SELF flags are for use when a class is used in a class pattern (i.e., for `case cls(...)` we look for these bits in `cls.__match_kind__`). At most one of these should be set. DEFAULT is set on class `object` and anything that doesn't explicitly clear it.

- If DEFAULT is set, semantics of PEP 634 apply except for the special behavior enabled by the SELF flag.

- If SELF is set, `case cls(x)` binds the subject to x, and no other forms of `case cls(...)` are allowed.

`case cls():` is always allowed, regardless of flags.

...

- If neither DEFAULT nor SELF is set, `case cls(...)` does not take arguments at all.

Please correct any misunderstandings I expressed here! (And please include some kind of summary like this in your PEP.)

I think you expressed it well. I'll add a more informal overview section to the PEP.

...

Also, I think that we should probably separate this out in two separate flag sets, one for subjects and one for class patterns -- it is pretty confusing to merge the flag sets into a single value when their applicability (subject or class pattern) is so different.

That would require two different special attributes, which adds bulk without adding any value. __match_kind__ = MATCH_SEQUENCE | MATCH_DEFAULT should be clear to anyone familiar with integer flags.

...

>> The PEP now has only the slightest changes to semantics, which should be >> undetectable in normal use. For those corner cases where there is a >> difference, it is to make pattern matching more robust. > > Maybe I misunderstood but it looks to me as if this (PEP 653) changes > the behaviour of a mapping pattern in relation to extra keys. In PEP > 634 extra keys in the target are ignored e.g.: > > obj = {'a': 1, 'b': 2} > match(obj): > case {'a': 1}: > # matches obj because key 'b' is ignored > > In PEP 634 the use of **rest is optional if it is desired to catch the > other keys but does not affect matching. Here in PEP 653 there is the > pseudocode: > > # A pattern not including a double-star pattern: > if $kind & MATCH_MAPPING == 0: > FAIL > if $value.keys() != $KEYWORD_PATTERNS.keys(): > FAIL

I missed that when updating the PEP, thanks for pointing it out. It should be the same as for double-star pattern:

if not $value.keys() >= $KEYWORD_PATTERNS.keys(): FAIL

I'll update the PEP.

> > My reading of that is that all keys would need to be match unless > **rest is used to absorb the others. > > Is that an intended difference? > > Personally I prefer extra keys not to be ignored by default so to me > that seems an improvement. If intentional then it should be listed as > another semantic difference though.

I don't have a strong enough opinion either way. I can see advantages to both ways of doing it.

Let's not change this. We carefully discussed and chose this behavior (ignore extra mapping keys, but don't ignore extra sequence items) for PEP 634 based on usability.

> >> E.g. With PEP 653, pattern matching will work in the collections.abc >> module. With PEP 634 it does not. > > As I understood it this proposes that match obj: should use the class > attribute type(obj).__match_kind__ to indicate whether the object > being matched should be considered a sequence or a mapping or > something else rather than using isinstance(obj, Sequence) and > isinstance(obj, Mapping). Is there a corner case here where an object > can be both a Sequence and a Mapping? (How does PEP 634 handle that?)

If you define a class as a subclass of both collections.abc.Sequence and collections.abc.Mapping, then PEP 634 will treat it as both sequence and mapping, meaning it has to try every pattern. That prevents the important (IMO) optimization of checking the kind only once.

Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.)

Cheers, Mark.

> > Not using the Sequence and Mapping ABCs is good IMO. I'm not aware of > other core language features requiring the use of ABCs. In SymPy we > have specifically avoided them because they slow down isinstance > checking (this is measurable in the time taken to run the whole test > suite). Using the ABCs in PEP 634 seems surprising given that the > original pattern matching PEP actually listed the performance impact > of isinstance checks as part of the opening motivation. Maybe the ABCs > can be made faster but either way using them like this seems not in > keeping with the rest of the language.

I am fine with changing this one aspect of PEP 634. IIRC having separate SEQUENCE and MAPPING flags just for matching didn't occur to us during the design, and we strongly preferred some kind of type-based check over checking the presence of a specific attribute like `key`.

-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>) /Pronouns: he/him //(why is my pronoun here?)/ <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Nick Coghlan

2:35 p.m.

On Mon, 29 Mar 2021, 7:47 pm Mark Shannon, <mark@hotpy.org> wrote: [Guido wrote]

...

...
Also, I think that we should probably separate this out in two separate flag sets, one for subjects and one for class patterns -- it is pretty confusing to merge the flag sets into a single value when their applicability (subject or class pattern) is so different.

That would require two different special attributes, which adds bulk without adding any value.

__match_kind__ = MATCH_SEQUENCE | MATCH_DEFAULT

should be clear to anyone familiar with integer flags.

The combined flags might be clearer if the class matching flags were "MATCH_CLS_DEFAULT" and "MATCH_CLS_SELF" Without that, it isn't obvious that they're modifying the way class matching works. Alternatively, given Guido's suggestion of two attributes, they could be "__match_container__" and "__match_class__". The value of splitting them is that they should compose better under inheritance - the container ABCs could set "__match_container__" appropriately without affecting the way "__match_class__" is set. An implementation might flatten them out at class definition time for optimisation reasons, but it wouldn't need to be part of the public API. Cheers, Nick.

...

Guido van Rossum

2:40 p.m.

On Mon, Mar 29, 2021 at 7:35 AM Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Mon, 29 Mar 2021, 7:47 pm Mark Shannon, <mark@hotpy.org> wrote:

[Guido wrote]

...
...
Also, I think that we should probably separate this out in two separate flag sets, one for subjects and one for class patterns -- it is pretty confusing to merge the flag sets into a single value when their applicability (subject or class pattern) is so different.

That would require two different special attributes, which adds bulk without adding any value.

__match_kind__ = MATCH_SEQUENCE | MATCH_DEFAULT

should be clear to anyone familiar with integer flags.

The combined flags might be clearer if the class matching flags were "MATCH_CLS_DEFAULT" and "MATCH_CLS_SELF"

Without that, it isn't obvious that they're modifying the way class matching works.

Alternatively, given Guido's suggestion of two attributes, they could be "__match_container__" and "__match_class__".

The value of splitting them is that they should compose better under inheritance - the container ABCs could set "__match_container__" appropriately without affecting the way "__match_class__" is set.

+1 An implementation might flatten them out at class definition time for

...

optimisation reasons, but it wouldn't need to be part of the public API.

Since the two flag sets are independent the bult is only apparent. Few classes would need to set one of these, let alone two. In the C layer they may be combined as part oftp_flags (assuming there are enough free bits). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Brandt Bucher

4:25 p.m.

Overall, I am still uncomfortable with PEP 653, and would probably not support its acceptance. Although it has thankfully become a much less radical proposal than it was a few weeks ago (thanks, Mark, for your attention to our feedback), I feel that the rules it binds implementations to are *very* premature, and that the new mechanisms it introduces to do so only modestly improve potential performance at great expense to the ease of learning, using, and maintaining code using structural pattern matching. A few notes follow:

...

For example, using `sympy`, we might want to write:

``` # a*a == a**2 case Mul(args=[a, b]) if a == b: return Pow(a, 2) ```

Which requires the sympy class `Symbol` to "self" match. For `sympy` to support this pattern with PEP 634 is possible, but a bit tricky. With this PEP it can be implemented very easily.

Maybe I'm missing something, but I don't understand at all how the provided code snippet relies on the self-matching behavior. Have the maintainers of SymPy (or any large library supposedly benefitting here) come out in support of the PEP? Are they at least aware of it? Have they indicated that the proposed idiom for implementing self-matching behavior using a property is truly too "tricky" for them? Have you identified any stdlib classes that would benefit greatly from this? For me, `__match_class__` feels like a feature without demonstrated need. Even if there is a great demand for this, I certainly think that there are far better options than the proposed flagging system: - A `@match_self` class decorator (someone's bound to put one on PyPI, at any rate). - Allowing `__match_args__ = None` to signal this case (an option we previously considered, and my personal preference). ...both of which can be added later, if needed. Further, PEP 634 makes it very easy for libraries to support Python versions with *and* without pattern matching (something I consider to be an important requirement). The following class works with both 3.9 and 3.10: ``` class C(collections.abc.Sequence): ... ``` While something like this is required for PEP 653: ``` class C: if sys.version_info >= (3, 10): from somewhere import MATCH_SEQUENCE __match_container__ = MATCH_SEQUENCE ... ```

...

PEP 634 relies on the `collections.abc` module when determining which patterns a value can match, implicitly importing it if necessary. This PEP will eliminate surprising import errors and misleading audit events from those imports.

I think that a broken `_collections_abc` module *should* be surprising. Is there any reasonable scenario where it's expected to not exist, or be not be fit for this purpose? And I'm not sure how an audit event for an import that is happening could be considered "misleading"... I certainly wouldn't want it suppressed.

...

Looking up a special attribute is much faster than performing a subclass test on an abstract base class.

How much faster? A quick benchmark on my machine suggests less than half a microsecond. PEP 634 (like PEP 653) already allows us to cache this information for the subject of a match statement, so I doubt that this is actually a real issue in practice. An indeed, with the current implementation, this test isn't even performed on the most common types, such as lists, tuples, and dictionaries. At the very least, PEP 653's confusing new flag system seems to be a *very* premature optimization, seriously hurting usability for a modest performance increase. (Using them wrongly also seems to introduce a fair amount of undefined behavior, which seems to go against the PEP's own motivation.)

...

If the value of `__match_args__` is not as specified, then the implementation may raise any exception, or match the wrong pattern.

I think there's a name for this sort of behavior... ;) A couple of other, more technical notes: - PEP 653 requires mappings to have a `keys()` method that returns an object supporting set inequality operations. It is not really that common to find this sort of support in user code (in my experience, it is more likely that user-defined `keys()` methods will return iterables). It's not even clear to me if this is an interface requirement for mappings in general. For example, `weakref.WeakKeyDictionary` and `weakref.WeakValueDictionary` presently do not work with PEP 653's requirements for mapping patterns, since their `keys()` methods return iterators. - Treating `__getitem__` as pure is problematic for some common classes (such as `defaultdict`). That's why we use two-argument `get()` instead. As well-fleshed out as the pseudocode for the matching operations in this PEP may be, examples like this suggest that perhaps we should wait until 3.11 or later to figure out what actually works in practice and what doesn't. PEP 634 took a full year of work, and the ideas it proposed changed substantially during that time (in no small part because we had many people experimenting with how an actual implementation interacted with real code). Brandt

Mark Shannon

5:39 p.m.

Hi Brandt, On 30/03/2021 5:25 pm, Brandt Bucher wrote:

...

Overall, I am still uncomfortable with PEP 653, and would probably not support its acceptance.

Although it has thankfully become a much less radical proposal than it was a few weeks ago (thanks, Mark, for your attention to our feedback), I feel that the rules it binds implementations to are *very* premature, and that the new mechanisms it introduces to do so only modestly improve potential performance at great expense to the ease of learning, using, and maintaining code using structural pattern matching.

A few notes follow:

...
For example, using `sympy`, we might want to write:

``` # a*a == a**2 case Mul(args=[a, b]) if a == b: return Pow(a, 2) ```

Which requires the sympy class `Symbol` to "self" match. For `sympy` to support this pattern with PEP 634 is possible, but a bit tricky. With this PEP it can be implemented very easily.

Maybe I'm missing something, but I don't understand at all how the provided code snippet relies on the self-matching behavior.

No I'm missing something. That should read case Mul(args=[Symbol(a), Symbol(b)) if a == b: ... I'll fix that in the PEP thanks.

...

Have the maintainers of SymPy (or any large library supposedly benefitting here) come out in support of the PEP? Are they at least aware of it? Have they indicated that the proposed idiom for implementing self-matching behavior using a property is truly too "tricky" for them?

Have you identified any stdlib classes that would benefit greatly from this?

For me, `__match_class__` feels like a feature without demonstrated need. Even if there is a great demand for this, I certainly think that there are far better options than the proposed flagging system:

The distinction between those classes that have the default behavior and those that match "self" is from PEP 634. I didn't introduce it. I'm just proposing a more principled way to make that distinction.

...

- A `@match_self` class decorator (someone's bound to put one on PyPI, at any rate). - Allowing `__match_args__ = None` to signal this case (an option we previously considered, and my personal preference).

...both of which can be added later, if needed.

Further, PEP 634 makes it very easy for libraries to support Python versions with *and* without pattern matching (something I consider to be an important requirement). The following class works with both 3.9 and 3.10:

``` class C(collections.abc.Sequence): ... ```

While something like this is required for PEP 653:

``` class C: if sys.version_info >= (3, 10): from somewhere import MATCH_SEQUENCE __match_container__ = MATCH_SEQUENCE ... ```

Or class C: __match_container__ = 1 # MATCH_SEQUENCE Which is one reason the PEP states that the values of MATCH_SEQUENCE, etc. will never change.

...

...
PEP 634 relies on the `collections.abc` module when determining which patterns a value can match, implicitly importing it if necessary. This PEP will eliminate surprising import errors and misleading audit events from those imports.

I think that a broken `_collections_abc` module *should* be surprising. Is there any reasonable scenario where it's expected to not exist, or be not be fit for this purpose?

No reasonable scenario, but unreasonable scenarios happen all too often.

...

And I'm not sure how an audit event for an import that is happening could be considered "misleading"... I certainly wouldn't want it suppressed.

It's misleading because a match statement doesn't include any explicit imports.

...

...
Looking up a special attribute is much faster than performing a subclass test on an abstract base class.

How much faster? A quick benchmark on my machine suggests less than half a microsecond. PEP 634 (like PEP 653) already allows us to cache this information for the subject of a match statement, so I doubt that this is actually a real issue in practice. An indeed, with the current implementation, this test isn't even performed on the most common types, such as lists, tuples, and dictionaries.

Half a microsecond is thousands of instructions on a modern CPU. That is a long time for a single VM operation.

...

At the very least, PEP 653's confusing new flag system seems to be a *very* premature optimization, seriously hurting usability for a modest performance increase. (Using them wrongly also seems to introduce a fair amount of undefined behavior, which seems to go against the PEP's own motivation.)

Why do you say it is a premature optimization? It's primary purpose is reliability and precise semantics. It is more optimizable, I agree, but that is hardly premature. You also say it is confusing, but I think it is simpler than the workarounds to match "self" that you propose. This is very subjective though. Evidently we think differently.

...

...
If the value of `__match_args__` is not as specified, then the implementation may raise any exception, or match the wrong pattern.

I think there's a name for this sort of behavior... ;)

Indeed, but there is only undefined behavior if a class violates clearly specified rules. The undefined behavior in PEP 634 is much broader. We already tolerate some amount of undefined behavior. For example, dictionary lookup is also undefined for classes which do not hash properly.

...

A couple of other, more technical notes:

- PEP 653 requires mappings to have a `keys()` method that returns an object supporting set inequality operations. It is not really that common to find this sort of support in user code (in my experience, it is more likely that user-defined `keys()` methods will return iterables). It's not even clear to me if this is an interface requirement for mappings in general. For example, `weakref.WeakKeyDictionary` and `weakref.WeakValueDictionary` presently do not work with PEP 653's requirements for mapping patterns, since their `keys()` methods return iterators.

- Treating `__getitem__` as pure is problematic for some common classes (such as `defaultdict`). That's why we use two-argument `get()` instead.

Thanks for pointing that out. I'd noted that PEP 634 used `get()`, which is why I inserted the guard on keys beforehand. Clearly that is insufficient. I'll update the semantics to use the two-argument `get()`. It does seem more robust.

...

As well-fleshed out as the pseudocode for the matching operations in this PEP may be, examples like this suggest that perhaps we should wait until 3.11 or later to figure out what actually works in practice and what doesn't. PEP 634 took a full year of work, and the ideas it proposed changed substantially during that time (in no small part because we had many people experimenting with how an actual implementation interacted with real code).

I fully understand that a lot of work went into PEP 634. Which is why I am using the syntax and much of the semantics of PEP 634 as I can. The test suite is proving very useful, thanks. The problem with waiting for 3.11 is that code will start to rely on some of the implementation details of pattern matching as it is now, and that our ability to optimize it will be delayed by a year. Cheers, Mark.

Brandt Bucher

10:49 p.m.

Hi Mark. I've spoken with Guido, and we are willing to propose the following amendments to PEP 634: - Require `__match_args__` to be a tuple. - Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`. Using this in Python would look like: ``` class MySeq: __match_seq__ = True ... class MyMap: __match_map__ = True ... ``` Using this in C would look like: ``` PyTypeObject PyMySeq_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_SEQ | ..., ... } PyTypeObject PyMyMap_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_MAP | ..., ... } ``` We believe that these changes will result in the best possible outcome: - The new mechanism should faster than either PEP. - The new mechanism should provide a better user experience than either PEP when defining types in either Python *or C*. If these amendments were made, would you be comfortable withdrawing PEP 653? We think that if we're in agreement here, a compromise incorporating these promising changes into the current design would be preferable to submitting yet another large pattern matching PEP for a very busy SC to review and pronounce before the feature freeze. I am also willing, able, and eager to implement these changes promptly (perhaps even before the next alpha) if so. Thanks for pushing us to make this better. Brandt

Mark Shannon

9:27 a.m.

Hi Brandt, On 30/03/2021 11:49 pm, Brandt Bucher wrote:

...

Hi Mark.

I've spoken with Guido, and we are willing to propose the following amendments to PEP 634:

- Require `__match_args__` to be a tuple.

I think we're all in agreement on this one. Let's just do it.

...

- Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`.

Using this in Python would look like:

``` class MySeq: __match_seq__ = True ...

class MyMap: __match_map__ = True ... ```

I don't like the way this need special inheritance rules, where inheriting one attribute mutates the value of another. It seems convoluted. Consider: class WhatIsIt(MySeq, MyMap): pass With __match_container__ it works as expected with no special inheritance rules. This was why you convinced me to split __match_kind__; it works better with inheritance. Anther reason for preferring __match_container__ is that it provides a better option for extensibility, IMO. Suppose we wanted to add a "set" pattern in the future, with __match_container__ we just need to add a new constant. With your proposed approach, we would need another special attribute.

...

Using this in C would look like:

``` PyTypeObject PyMySeq_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_SEQ | ..., ... }

PyTypeObject PyMyMap_Type = { ... .tp_flags = Py_TPFLAGS_MATCH_MAP | ..., ... } ```

I'm wary of using up tp_flags, as they are a precious resource, but this does provide a more declarative way to specific the behavior than setting the attribute via the C-API.

...

We believe that these changes will result in the best possible outcome: - The new mechanism should faster than either PEP.

The naive implementation of the boolean version might be a tiny bit faster (it would hard to measure a difference). However, once specialized by type version (as we do for LOAD_ATTR) both forms become a no-op.

...

- The new mechanism should provide a better user experience than either PEP when defining types in either Python *or C*.

The inheritance rules make __match_container__ a better user experience in Python, IMO. As for C, there no reason why the it would make any difference, __match_container__ could be (tp_flags & (Py_TPFLAGS_MATCH_SEQ|Py_TPFLAGS_MATCH_MAP)) shifted to line up the bits.

...

If these amendments were made, would you be comfortable withdrawing PEP 653? We think that if we're in agreement here, a compromise incorporating these promising changes into the current design would be preferable to submitting yet another large pattern matching PEP for a very busy SC to review and pronounce before the feature freeze. I am also willing, able, and eager to implement these changes promptly (perhaps even before the next alpha) if so.

I think we are close to agreement on the mechanism for selecting which pattern to match, but I still want the better defined semantics of PEP 653.

...

Thanks for pushing us to make this better.

And thank you for the feedback. Cheers, Mark.

Guido van Rossum

5:21 p.m.

On Wed, Mar 31, 2021 at 2:30 AM Mark Shannon <mark@hotpy.org> wrote:

...

...
- Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`.

Using this in Python would look like:

``` class MySeq: __match_seq__ = True ...

class MyMap: __match_map__ = True ... ```

I don't like the way this need special inheritance rules, where inheriting one attribute mutates the value of another. It seems convoluted.

Consider:

class WhatIsIt(MySeq, MyMap): pass

With __match_container__ it works as expected with no special inheritance rules.

Wait a minute, do you expect WhatIsIt to be a sequence but not a map? *I* would expect that it is both, and that's exactly what Brandt's proposal does. So I see this as a plus.

...

I think we are close to agreement on the mechanism for selecting which pattern to match, but I still want the better defined semantics of PEP 653.

I don't know that PEP 653's semantics are better. Have you analyzed any *differences* besides the proposal above? I've personally found reading your pseudo-code very difficult, so I simply don't know. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon

7:08 p.m.

Hi Guido, On 31/03/2021 6:21 pm, Guido van Rossum wrote:

...

On Wed, Mar 31, 2021 at 2:30 AM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

> - Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`. > > Using this in Python would look like: > > ``` > class MySeq: > __match_seq__ = True > ... > > class MyMap: > __match_map__ = True > ... > ```

I don't like the way this need special inheritance rules, where inheriting one attribute mutates the value of another. It seems convoluted.

Consider:

class WhatIsIt(MySeq, MyMap): pass

With __match_container__ it works as expected with no special inheritance rules.

Wait a minute, do you expect WhatIsIt to be a sequence but not a map? *I* would expect that it is both, and that's exactly what Brandt's proposal does. So I see this as a plus.

Earlier you said: Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.) PEP 653 requires that: (__match_container__ & (MATCH_SEQUENCE | MATCH_MAPPING)) != (MATCH_SEQUENCE | MATCH_MAPPING) Would you require that (__match_seq__ and __match_map__) is always false? If so, then what is the mechanism for handling the `WhatIsIt` class? If not, then you loose the ability to make a single test to determine which patterns can apply.

...

I think we are close to agreement on the mechanism for selecting which pattern to match, but I still want the better defined semantics of PEP 653.

I don't know that PEP 653's semantics are better. Have you analyzed any *differences* besides the proposal above? I've personally found reading your pseudo-code very difficult, so I simply don't know.

PEP 653 semantics are more precise. I think that is better :) Apart from that, I think the semantics are so similar once you've added __match_seq__/__match_map__ to PEP 634 that is hard to claim one is better than the other. My (unfinished) implementation of PEP 653 makes almost no changes to the test suite. The code in the examples is Python, not pseudo-code. That might be easier to follow. Cheers, Mark.

...

-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>) /Pronouns: he/him //(why is my pronoun here?)/ <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Guido van Rossum

8:53 p.m.

On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org> wrote:

...

Hi Guido,

On 31/03/2021 6:21 pm, Guido van Rossum wrote:

...
On Wed, Mar 31, 2021 at 2:30 AM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

...

[Brandt Bucher, earlier]

...
> - Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`. > > Using this in Python would look like: > > ``` > class MySeq: > __match_seq__ = True > ... > > class MyMap: > __match_map__ = True > ... > ```

...

[Mark, in response]

...
I don't like the way this need special inheritance rules, where inheriting one attribute mutates the value of another. It seems convoluted.

Consider:

class WhatIsIt(MySeq, MyMap): pass

With __match_container__ it works as expected with no special inheritance rules.

[me, responding to Mark]

...

...
Wait a minute, do you expect WhatIsIt to be a sequence but not a map? *I* would expect that it is both, and that's exactly what Brandt's proposal does. So I see this as a plus.

[Now back to Mark]

...

Earlier you said:

Classes that are both mappings and sequences are ill-conceived. Let's not compromise semantics or optimizability to support these. (IOW I agree with Mark here.)

Ah, you caught me there. I do think that classes that combine both characteristics are in troublesome water. I think we can get optimizability either way, so I'll focus on semantics. Brandt has demonstrated that it's ugly to write the code for a class that in match statements behaves as either a sequence or a mapping (but not both) while at the same time keeping the code compatible with Python 3.9 or before. I also think that using flag attributes that are set to True or False (instead of using a bitmap of flags, which is obscure to many Python users) solves this problem nicely. Using separate flag attributes happens to lead to different semantics than the flags-bitmap approach in the case of multiple inheritance. Given that one *can* inherit from both Sequence and Mapping, having separate flags seems slightly better than the flags-bitmap approach. It wasn't enough to convince me earlier, but the other advantage does convince me: separate flag attributes are better than using a flags-bitmap. Now, if it weren't for other issues, having no flags at all here but just signalling the applicable pattern kinds through inheritance from collections.abc.{Sequence,Mapping} would be even cleaner. But we do have other issues: (a) the exceptions for str, bytes, bytearray, and (b) the clumsiness of importing collections.abc (which is Python code) deep in the ceval main loop. So some explicit form of signalling this is fine -- and for classes that explicitly inherit from Sequence or Mapping will get it for free that way.

...

PEP 653 requires that: (__match_container__ & (MATCH_SEQUENCE | MATCH_MAPPING)) != (MATCH_SEQUENCE | MATCH_MAPPING)

Would you require that (__match_seq__ and __match_map__) is always false?

Nope. If so, then what is the mechanism for handling the `WhatIsIt` class?

...

If not, then you loose the ability to make a single test to determine which patterns can apply.

Translating the flag attributes to bits in tp_flags (or in a new flags variable elsewhere in the type object) would still allow a pretty fast test. And needing to support overlapping subsets of the cases is not unique to this situation, after all a class may well be a sequence *and* have attributes named x, y and z.

...

...
I think we are close to agreement on the mechanism for selecting

which

...
pattern to match, but I still want the better defined semantics of PEP 653.

I don't know that PEP 653's semantics are better. Have you analyzed any *differences* besides the proposal above? I've personally found reading your pseudo-code very difficult, so I simply don't know.

PEP 653 semantics are more precise. I think that is better :)

I wish I knew of a single instance where PEP 634 and PEP 653 actually differ.

...

Apart from that, I think the semantics are so similar once you've added __match_seq__/__match_map__ to PEP 634 that is hard to claim one is better than the other. My (unfinished) implementation of PEP 653 makes almost no changes to the test suite.

I'd like to see where those differences are -- then we can talk about which is better. :-)

...

The code in the examples is Python, not pseudo-code. That might be easier to follow.

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon

April 2021

9:18 p.m.

Hi Guido, On 31/03/2021 9:53 pm, Guido van Rossum wrote:

...

On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

[snip]

...

Apart from that, I think the semantics are so similar once you've added __match_seq__/__match_map__ to PEP 634 that is hard to claim one is better than the other. My (unfinished) implementation of PEP 653 makes almost no changes to the test suite.

I'd like to see where those differences are -- then we can talk about which is better. :-)

Almost all the changes come from requiring __match_args__ to be a tuple of unique strings. The only other change is that case int(real=0+0j, imag=0-0j): fails to match 0, because `int` is `MATCH_SELF` so won't match attributes. https://github.com/python/cpython/compare/master...markshannon:pep-653-imple... Cheers, Mark.

Guido van Rossum

1:38 a.m.

On Thu, Apr 1, 2021 at 2:18 PM Mark Shannon <mark@hotpy.org> wrote:

...

On 31/03/2021 9:53 pm, Guido van Rossum wrote:

...
On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

[snip]

...
Apart from that, I think the semantics are so similar once you've

added

...
__match_seq__/__match_map__ to PEP 634 that is hard to claim one is better than the other. My (unfinished) implementation of PEP 653 makes almost no changes to the test suite.

I'd like to see where those differences are -- then we can talk about which is better. :-)

Almost all the changes come from requiring __match_args__ to be a tuple of unique strings.

Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work, and I don't see anything semantically suspect in allowing that.

...

The only other change is that

case int(real=0+0j, imag=0-0j):

fails to match 0, because `int` is `MATCH_SELF` so won't match attributes.

Oh, but that would be a problem. The intention wasn't that "self" mode prevents keyword/attribute matches. (FWIW real and imag should attributes should not be complex numbers, so that testcase is weird, but it should work.)

...

https://github.com/python/cpython/compare/master...markshannon:pep-653-imple...

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Terry Reedy

2:56 a.m.

On 4/1/2021 9:38 PM, Guido van Rossum wrote:

...

On Thu, Apr 1, 2021 at 2:18 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote: Almost all the changes come from requiring __match_args__ to be a tuple of unique strings.

The current posted PEP does not say 'unique' and I agree with Guido that it should not.

...

Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work,

The current near-Python code does not have such a check.

...

and I don't see anything semantically suspect in allowing that.

If I understand the current pseudocode correctly, the effect of 's' appearing twice in 'C.__match_args__ would be to possibly look up and assign C.s to two different names in a case pattern. I would not be surprised if someone someday tries to do this intentionally. Except for the repeated lookup, it would be similar to a = b = C.s. This might make sense if C.s is mutable. Or the repeated lookups could yield different values. -- Terry Jan Reedy

Guido van Rossum

4:02 a.m.

On Thu, Apr 1, 2021 at 8:01 PM Terry Reedy <tjreedy@udel.edu> wrote:

...

On 4/1/2021 9:38 PM, Guido van Rossum wrote:

...
On Thu, Apr 1, 2021 at 2:18 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote: Almost all the changes come from requiring __match_args__ to be a tuple of unique strings.

The current posted PEP does not say 'unique' and I agree with Guido that it should not.

(Of course, "the current PEP" is highly ambiguous in this context.) Well, now I have egg on my face, because the current implementation does reject multiple occurrences of the same identifier in __match_args__. We generate an error like "TypeError: C() got multiple sub-patterns for attribute 'a'". However, I cannot find this uniqueness requirement in PEP 634, so I think it was a mistake to implement it. Researching this led me to find another issue where PEP 634 and the implementation differ, but this time it's the other way around: PEP 634 says about types which accept a single positional subpattern (int(x), str(x) etc.) "for these types no keyword patterns are accepted." Mark's example `case int(real=0, imag=0):` makes me think this requirement is wrong and I would like to amend PEP 634 to strike this requirement. Fortunately, this is not what is implemented. E.g. `case int(1, real=1):` is accepted and works, as does `case int(real=0):`. Calling out Brandt to get his opinion. And thanks to Mark for finding these!

...

...
Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work,

The current near-Python code does not have such a check.

Again, I'm not sure what "the current near-Python code" refers to. From context it seems you are referring to the pseudo code in Mark's PEP 653.

...

...
and I don't see anything semantically suspect in allowing that.

If I understand the current pseudocode correctly, the effect of 's' appearing twice in 'C.__match_args__ would be to possibly look up and assign C.s to two different names in a case pattern.

I would not be surprised if someone someday tries to do this intentionally. Except for the repeated lookup, it would be similar to a = b = C.s. This might make sense if C.s is mutable. Or the repeated lookups could yield different values.

Yes, and this could even be a valid backwards compatibility measure, if a class used to have two different attributes that would in practice never differ, the two attributes could be merged into one, and someone might have a pattern capturing both, positionally. That should keep working, and having a duplicate in __match_args__ seems a clean enough solution. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Terry Reedy

6:03 a.m.

On 4/2/2021 12:02 AM, Guido van Rossum wrote:

...

On Thu, Apr 1, 2021 at 8:01 PM Terry Reedy <tjreedy@udel.edu

...

The current near-Python code does not have such a check.

...

Again, I'm not sure what "the current near-Python code" refers to. From context it seems you are referring to the pseudo code in Mark's PEP 653.

Yes, the part I read was legal Python + $ variables + FAIL. I should have included 'pseudo'.

Brandt Bucher

6:19 a.m.

Guido van Rossum wrote:

...

Well, now I have egg on my face, because the current implementation does reject multiple occurrences of the same identifier in __match_args__. We generate an error like "TypeError: C() got multiple sub-patterns for attribute 'a'". However, I cannot find this uniqueness requirement in PEP 634, so I think it was a mistake to implement it.

Researching this led me to find another issue where PEP 634 and the implementation differ, but this time it's the other way around: PEP 634 says about types which accept a single positional subpattern (int(x), str(x) etc.) "for these types no keyword patterns are accepted." Mark's example `case int(real=0, imag=0):` makes me think this requirement is wrong and I would like to amend PEP 634 to strike this requirement. Fortunately, this is not what is implemented. E.g. `case int(1, real=1):` is accepted and works, as does `case int(real=0):`.

Calling out Brandt to get his opinion. And thanks to Mark for finding these!

The current implementation will reject any attribute being looked up more than once, by position *or* keyword. It's actually a bit tricky to do, which is why the `MATCH_CLASS` op is such a beast... it needs to look up positional and keyword attributes all in one go, keeping track of everything it's seen and checking for duplicates. I believe this behavior is a holdover from PEP 622:

...

The interpreter will check that two match items are not targeting the same attribute, for example `Point2d(1, 2, y=3)` is an error.

(https://www.python.org/dev/peps/pep-0622/#overlapping-sub-patterns) PEP 634 explicitly disallows duplicate keywords, but as far as I can tell it says nothing about duplicate `__match_args__` or keywords that also appear in `__match_args__`. It looks like an accidental omission during the 622 -> 634 rewrite. (I guess I figured that if somebody matches `Spam(foo, y=bar)`, where `Spam.__match_args__` is `("y",)`, that's probably a bug in the user's code. Ditto for `Spam(y=foo, y=bar)` and `Spam(foo, bar)` where `Spam.__match_args__` is `("y", "y")` But it's not a hill I'm willing to die on.) I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it.

Mark Shannon

10:33 a.m.

Hi Brandt, On 02/04/2021 7:19 am, Brandt Bucher wrote:

...

Guido van Rossum wrote:

...
Well, now I have egg on my face, because the current implementation does reject multiple occurrences of the same identifier in __match_args__. We generate an error like "TypeError: C() got multiple sub-patterns for attribute 'a'". However, I cannot find this uniqueness requirement in PEP 634, so I think it was a mistake to implement it.

Researching this led me to find another issue where PEP 634 and the implementation differ, but this time it's the other way around: PEP 634 says about types which accept a single positional subpattern (int(x), str(x) etc.) "for these types no keyword patterns are accepted." Mark's example `case int(real=0, imag=0):` makes me think this requirement is wrong and I would like to amend PEP 634 to strike this requirement. Fortunately, this is not what is implemented. E.g. `case int(1, real=1):` is accepted and works, as does `case int(real=0):`.

Calling out Brandt to get his opinion. And thanks to Mark for finding these!

The current implementation will reject any attribute being looked up more than once, by position *or* keyword. It's actually a bit tricky to do, which is why the `MATCH_CLASS` op is such a beast... it needs to look up positional and keyword attributes all in one go, keeping track of everything it's seen and checking for duplicates.

I believe this behavior is a holdover from PEP 622:

...
The interpreter will check that two match items are not targeting the same attribute, for example `Point2d(1, 2, y=3)` is an error.

(https://www.python.org/dev/peps/pep-0622/#overlapping-sub-patterns)

PEP 634 explicitly disallows duplicate keywords, but as far as I can tell it says nothing about duplicate `__match_args__` or keywords that also appear in `__match_args__`. It looks like an accidental omission during the 622 -> 634 rewrite.

(I guess I figured that if somebody matches `Spam(foo, y=bar)`, where `Spam.__match_args__` is `("y",)`, that's probably a bug in the user's code. Ditto for `Spam(y=foo, y=bar)` and `Spam(foo, bar)` where `Spam.__match_args__` is `("y", "y")` But it's not a hill I'm willing to die on.)

Repeated keywords do seem likely to be a bug. Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time. So how about explicitly disallowing those, but not checking that the intersection of `__match_args__` and keywords is empty? We would get most of the error checking without the performance impact.

...

I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it.

PEP 634 allows it. PEP 653 currently forbids it, mainly for consistency reasons. The purpose of self-matching is to prevent deconstruction, so it seems inconsistent to allow it for keyword arguments. Are there are any use-cases? The test-case `int(real=0+0j, imag=0-0j)` is contrived, but I'm struggling to come up with less contrived examples for any of float, list, dict, tuple, str. Cheers, Mark.

Guido van Rossum

4:33 p.m.

On Fri, Apr 2, 2021 at 3:38 AM Mark Shannon <mark@hotpy.org> wrote:

...

Hi Brandt,

...
Guido van Rossum wrote:

...
Well, now I have egg on my face, because the current implementation does reject multiple occurrences of the same identifier in __match_args__. We generate an error like "TypeError: C() got multiple sub-patterns for attribute 'a'". However, I cannot find this uniqueness requirement in PEP 634, so I think it was a mistake to implement it.

Researching this led me to find another issue where PEP 634 and the implementation differ, but this time it's the other way around: PEP 634 says about types which accept a single positional subpattern (int(x), str(x) etc.) "for these types no keyword patterns are accepted." Mark's example `case int(real=0, imag=0):` makes me think this requirement is wrong and I would like to amend PEP 634 to strike this requirement. Fortunately, this is not what is implemented. E.g. `case int(1, real=1):` is accepted and works, as does `case int(real=0):`.

Calling out Brandt to get his opinion. And thanks to Mark for finding

On 02/04/2021 7:19 am, Brandt Bucher wrote: these!

...
The current implementation will reject any attribute being looked up

more than once, by position *or* keyword. It's actually a bit tricky to do, which is why the `MATCH_CLASS` op is such a beast... it needs to look up positional and keyword attributes all in one go, keeping track of everything it's seen and checking for duplicates.

...
I believe this behavior is a holdover from PEP 622:

...
The interpreter will check that two match items are not targeting the

same attribute, for example `Point2d(1, 2, y=3)` is an error.

...
(https://www.python.org/dev/peps/pep-0622/#overlapping-sub-patterns)

PEP 634 explicitly disallows duplicate keywords, but as far as I can

tell it says nothing about duplicate `__match_args__` or keywords that also appear in `__match_args__`. It looks like an accidental omission during the 622 -> 634 rewrite.

...
(I guess I figured that if somebody matches `Spam(foo, y=bar)`, where

`Spam.__match_args__` is `("y",)`, that's probably a bug in the user's code. Ditto for `Spam(y=foo, y=bar)` and `Spam(foo, bar)` where `Spam.__match_args__` is `("y", "y")` But it's not a hill I'm willing to die on.)

Repeated keywords do seem likely to be a bug.

Agreed. But as I sketched in a previous email I think duplicates ought to be acceptable in __match_args__. At the very least we should align the PEP and the implementation here, by adjusting one or the other. Most checks are cheap though.

...

Checking for duplicates in `__match_args__` can be done at class creation time,

Hm, what about dynamic updates to __match_args__? I've done that in the REPL.

...

and checking for duplicates in the pattern can be done at compile time.

I'd prefer not to do that check at all.

...

So how about explicitly disallowing those, but not checking that the intersection of `__match_args__` and keywords is empty? We would get most of the error checking without the performance impact.

+1 on the latter (not checking the intersection).

...

...
I agree that self-matching classes should absolutely allow keyword

matches. I had no idea the PEP forbade it.

PEP 634 allows it. PEP 653 currently forbids it, mainly for consistency reasons. The purpose of self-matching is to prevent deconstruction, so it seems inconsistent to allow it for keyword arguments.

The purpose of self-matching is user convenience. It should be seen as a shorthand for the code fragment in PEP 634 showing how to do this for any class.

...

Are there are any use-cases? The test-case `int(real=0+0j, imag=0-0j)` is contrived, but I'm struggling to come up with less contrived examples for any of float, list, dict, tuple, str.

There could be a subclass that adds an attribute. That's still contrived though. But if we start supporting this for *general* classes we should allow combining it with keywords/attributes. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Brandt Bucher

7:41 p.m.

Mark Shannon wrote:

...

On 02/04/2021 7:19 am, Brandt Bucher wrote:

...
I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it. PEP 634 allows it.

PEP 634 says:

...

For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject; for these types no keyword patterns are accepted.

(https://www.python.org/dev/peps/pep-0634/#class-patterns)

...

Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time.

I assume the compile-time check only works for named keyword attributes. The current implementation already does this. -1 on checking `__match_args__` anywhere other than the match block itself. Guido van Rossum wrote:

...

On Fri, Apr 2, 2021 at 3:38 AM Mark Shannon mark@hotpy.org wrote:

...
Are there are any use-cases? The test-case `int(real=0+0j, imag=0-0j)` is contrived, but I'm struggling to come up with less contrived examples for any of float, list, dict, tuple, str. There could be a subclass that adds an attribute. That's still contrived though.

I could see the case for something like `case defaultdict({"Spam": s}, default_factory=f)`. I certainly don't think it should be forbidden.

Guido van Rossum

9:05 p.m.

On Fri, Apr 2, 2021 at 12:43 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

Mark Shannon wrote:

...
On 02/04/2021 7:19 am, Brandt Bucher wrote:

...
I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it. PEP 634 allows it.

PEP 634 says:

...
For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject; for these types no keyword patterns are accepted.

(https://www.python.org/dev/peps/pep-0634/#class-patterns)

But that's not what the implementation does. It still supports keyword patterns for these types -- and (as I've said earlier in this thread) I think the implementation is correct.

...

...
Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time.

I assume the compile-time check only works for named keyword attributes. The current implementation already does this.

-1 on checking `__match_args__` anywhere other than the match block itself.

Agreed. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon

11:19 a.m.

Hi Guido, On 02/04/2021 10:05 pm, Guido van Rossum wrote:

...

On Fri, Apr 2, 2021 at 12:43 PM Brandt Bucher <brandtbucher@gmail.com <mailto:brandtbucher@gmail.com>> wrote:

Mark Shannon wrote: > On 02/04/2021 7:19 am, Brandt Bucher wrote: > > I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it. > PEP 634 allows it.

PEP 634 says:

> For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject; for these types no keyword patterns are accepted.

(https://www.python.org/dev/peps/pep-0634/#class-patterns <https://www.python.org/dev/peps/pep-0634/#class-patterns>)

But that's not what the implementation does. It still supports keyword patterns for these types -- and (as I've said earlier in this thread) I think the implementation is correct.

> Most checks are cheap though. > Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time.

I assume the compile-time check only works for named keyword attributes. The current implementation already does this.

-1 on checking `__match_args__` anywhere other than the match block itself.

Agreed.

Why? (I also asked Brandt this) It is far more efficient to check `__match_args__` at class creation (or class attribute assignment) time. The most efficient way to check in the match block is to check at class creation time anyway and store a flag whether `__match_args__` is legal. In the match block we would check this flag, then proceed. It seems silly to know that there will be a runtime error, but not act on that information, allowing latent bugs could have been reported. Cheers, Mark.

...

-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>) /Pronouns: he/him //(why is my pronoun here?)/ <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/S45E7DCD... Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

9:32 p.m.

On Sat, Apr 3, 2021 at 4:20 AM Mark Shannon <mark@hotpy.org> wrote:

...

Hi Guido,

On 02/04/2021 10:05 pm, Guido van Rossum wrote:

...
On Fri, Apr 2, 2021 at 12:43 PM Brandt Bucher <brandtbucher@gmail.com [...] -1 on checking `__match_args__` anywhere other than the match block itself.

Agreed.

Why? (I also asked Brandt this)

It is far more efficient to check `__match_args__` at class creation (or class attribute assignment) time.

Okay, now we're talking. If you check it on both class definition and at attribute assignment time I think that's fine (now that it's a tuple). But I don't think the specification (in whatever PEP) needs to specify that it *must* be checked at that time. So I think the current implementation is fine as well (once we change it to accept only tuples).

...

The most efficient way to check in the match block is to check at class creation time anyway and store a flag whether `__match_args__` is legal. In the match block we would check this flag, then proceed.

Yeah, nice optimization.

...

It seems silly to know that there will be a runtime error, but not act on that information, allowing latent bugs could have been reported.

Well, usually that is The Python Way. There are a lot of things that could be detected statically quite easily (without building something like mypy) but that aren't. Often that's due to historical accidents (in the past we were even less able to do the simplest static checks), so it's fine to do this your way. BTW we previously discussed whether `__match_args__` can contain duplicates. I thought the PEP didn't state either way, but I was wrong: it explicitly disallows it, matching the implementation. PEP 634 says on line 503: ``` - For duplicate keywords, ``TypeError`` is raised. ``` Given that there is no inconsistency here, I am inclined to keep it that way. If we find a better use case to allow duplicates we can always loosen up the implementation; it's not so simple the other way around. FWIW I am also submitting https://github.com/python/peps/pull/1909 to make `__match_args__` a tuple only, which we all seems to agree on. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Mark Shannon

11:10 a.m.

Hi Brandt, On 02/04/2021 8:41 pm, Brandt Bucher wrote:

...

Mark Shannon wrote:

...
On 02/04/2021 7:19 am, Brandt Bucher wrote:

...
I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it. PEP 634 allows it.

PEP 634 says:

...
For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject; for these types no keyword patterns are accepted.

(https://www.python.org/dev/peps/pep-0634/#class-patterns)

I was relying on the "reference" implementation, which is also in the PEP.

...

...
...
match 0: ... case int(imag=0): ... print ("Experimentally, int supports keyword matching.") ... Experimentally, int supports keyword matching.

I take this as +1 for having more precisely defined semantics for pattern matching :)

...

...
Most checks are cheap though. Checking for duplicates in `__match_args__` can be done at class creation time, and checking for duplicates in the pattern can be done at compile time.

I assume the compile-time check only works for named keyword attributes. The current implementation already does this.

-1 on checking `__match_args__` anywhere other than the match block itself.

I'm curious, why? It is much faster *and* gives better error messages to check `__match_args__` at class creation time.

...

Guido van Rossum wrote:

...
On Fri, Apr 2, 2021 at 3:38 AM Mark Shannon mark@hotpy.org wrote:

...
Are there are any use-cases? The test-case `int(real=0+0j, imag=0-0j)` is contrived, but I'm struggling to come up with less contrived examples for any of float, list, dict, tuple, str. There could be a subclass that adds an attribute. That's still contrived though.

I could see the case for something like `case defaultdict({"Spam": s}, default_factory=f)`. I certainly don't think it should be forbidden.

It is forbidden in the PEP, as written, correct? OOI, have you changed your mind, or was that an oversight in the original? Cheers, Mark.

Guido van Rossum

9:16 p.m.

On Sat, Apr 3, 2021 at 4:15 AM Mark Shannon <mark@hotpy.org> wrote:

...

Hi Brandt,

On 02/04/2021 8:41 pm, Brandt Bucher wrote:

...
Mark Shannon wrote:

...
On 02/04/2021 7:19 am, Brandt Bucher wrote:

...
I agree that self-matching classes should absolutely allow keyword matches. I had no idea the PEP forbade it. PEP 634 allows it.

PEP 634 says:

...
For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject; for these types no keyword patterns are accepted.

(https://www.python.org/dev/peps/pep-0634/#class-patterns)

I was relying on the "reference" implementation, which is also in the PEP.

But it's nor normative. However...

...

...
...
...
match 0: ... case int(imag=0): ... print ("Experimentally, int supports keyword matching.") ... Experimentally, int supports keyword matching.

In this case I propose adjusting the PEP text. See https://github.com/python/peps/pull/1908

...

I take this as +1 for having more precisely defined semantics for pattern matching :)

Certainly I see it as +1 for having the semantics independently verified. [...]

...

...
Guido van Rossum wrote:

...
On Fri, Apr 2, 2021 at 3:38 AM Mark Shannon mark@hotpy.org wrote:

...
Are there are any use-cases? The test-case `int(real=0+0j, imag=0-0j)` is contrived, but I'm struggling to come up with less contrived examples for any of float, list, dict, tuple, str. There could be a subclass that adds an attribute. That's still contrived though.

I could see the case for something like `case defaultdict({"Spam": s}, default_factory=f)`. I certainly don't think it should be forbidden.

It is forbidden in the PEP, as written, correct? OOI, have you changed your mind, or was that an oversight in the original?

I was surprised to find this phrase in the PEP, so I suspect that it was just a mistake when I wrote that section of the PEP. I can't find a similar restriction in PEP 622 (the original pattern matching PEP). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Brandt Bucher

12:34 a.m.

Mark Shannon said:

...

I was relying on the "reference" implementation, which is also in the PEP.

Can you please stop putting scare quotes around "reference implementation"? You've done it twice now, and it's been a weekend-ruiner for me each time. I've put months of work into writing and improving CPython's current pattern matching implementation, mostly on nights and weekends. I don't know whether it's intentional or not, but when you say things like that it instantly devalues all of my hard work in front of everyone on the list. For such a huge feature, I'm honestly quite amazed that this is the only issue we've found since it was merged over a month ago (and both authors have agreed that it needs to be fixed in the PEP, not the implementation). The PR introducing this behavior was reviewed by at least a half-dozen people, including you. The last time you said something like this, I just muted the thread. Let's please keep this respectful; we're all obviously committing a lot of our own time and energy to this, and we need to work well together for it to be successful in the long term. Brandt

Paul Moore

8:16 a.m.

On Sun, 4 Apr 2021 at 01:37, Brandt Bucher <brandtbucher@gmail.com> wrote:

...

Mark Shannon said:

...
I was relying on the "reference" implementation, which is also in the PEP.

Can you please stop putting scare quotes around "reference implementation"?

Agreed - apart from the implication Brandt noted, it's also misleading. The code is in Python 3.10, so the correct term is "the implementation" (or if you want to be picky, "the CPython implementation"). To me, the term "reference implementation" implies "for reference, not yet released". At this point, we're discussing fixes to an implemented Python 3.10 feature, not tidying up a PEP. Paul

Chris Angelico

9:07 a.m.

On Sun, Apr 4, 2021 at 6:20 PM Paul Moore <p.f.moore@gmail.com> wrote:

...

On Sun, 4 Apr 2021 at 01:37, Brandt Bucher <brandtbucher@gmail.com> wrote:

...
Mark Shannon said:

...
I was relying on the "reference" implementation, which is also in the PEP.

Can you please stop putting scare quotes around "reference implementation"?

Agreed - apart from the implication Brandt noted, it's also misleading. The code is in Python 3.10, so the correct term is "the implementation" (or if you want to be picky, "the CPython implementation"). To me, the term "reference implementation" implies "for reference, not yet released".

Normally, the term "reference implementation" means "the basis implementation that everything else is compared against". For instance, a compression algorithm might be published as a mathematical document, with a reference implementation in some language. It's then possible to create a new implementation in some other language, or more optimized, or whatever else; but to know whether it's giving the correct results, you compare its output to the output of the reference implementation. CPython is the reference implementation for the Python language. It's possible to have a discrepancy between the standard and the implementation, but it's still the reference implementation (just occasionally a buggy one). In this case, I believe that the term "reference implementation" is strictly accurate, and concur with Brandt's request to not discredit it by implying that it's only purporting to be one. ChrisA

Antoine Pitrou

8:38 a.m.

On Sun, 04 Apr 2021 00:34:18 -0000 "Brandt Bucher" <brandtbucher@gmail.com> wrote:

...

Mark Shannon said:

...
I was relying on the "reference" implementation, which is also in the PEP.

Can you please stop putting scare quotes around "reference implementation"? You've done it twice now, and it's been a weekend-ruiner for me each time.

I've put months of work into writing and improving CPython's current pattern matching implementation, mostly on nights and weekends. I don't know whether it's intentional or not, but when you say things like that it instantly devalues all of my hard work in front of everyone on the list.

I'm not a native English speaker, but I don't understand how putting quotes around a reasonably polite expression devalues anyone's work. I'm probably missing something... Antoine.

Stephen J. Turnbull

12:03 p.m.

Antoine Pitrou writes:

...

On Sun, 04 Apr 2021 00:34:18 -0000 "Brandt Bucher" <brandtbucher@gmail.com> wrote:

...

...
Can you please stop putting scare quotes

...

I'm not a native English speaker, but I don't understand how putting quotes around a reasonably polite expression devalues anyone's work.

"Scare quotes" refers to an idiom English writers use to deprecate something. In what I wrote just above, the quotation marks indicate a focus on the *string* "scare quotes". In this case, the fact that they are the exact words of Brandt, and also that I'm defining those words, not using their meaning. In Mark's phrase '"reference" implementation', neither of those usages apply. It's possible that they are the deprecated "random quote emphasis" usage. Random quote emphasis is implausible here, however. I can see no reason why Mark would emphasize the modifier "reference" in this context. One of the most important remaining usages, and one that I find plausible in context, is scare quotes. These are quotation marks used to focus on the phrase in quotes, and indicate that it is somehow suspicious: inaccurate, imprecise, false, even the opposite of its dictionary meaning. In other words, if you don't have a reason to emphasize focus on the words themselves rather than their meaning, by adding (scare) quotes most likely you are turning a "reasonably polite expression" into an insult.

...

I'm probably missing something...

Probably so did a lot of native speakers; there are English dialects where scare quotes are rare and random quote emphasis is common. However, I assure you, many native speakers (along with a fair number of non-natives) did not. I neither know nor care what Mark's *intent* is. I'm explaining what (some) idiomatic speakers of English will read into what he writes, because it is a *common* idiom (common enough to have a name, and be mentioned in standard manuals of English style). Regards, Steve

Mark Shannon

9:15 a.m.

Hi Brandt, On 04/04/2021 1:34 am, Brandt Bucher wrote:

...

Mark Shannon said:

...
I was relying on the "reference" implementation, which is also in the PEP.

Can you please stop putting scare quotes around "reference implementation"? You've done it twice now, and it's been a weekend-ruiner for me each time.

I'm sorry for ruining your weekends. My intention, and I apologize for not making this clearer, was not to denigrate your work, but to question the implications of the term "reference". Calling something a "reference" implementation suggests that it is something that people can refer to, that is near perfectly correct and fills in the gaps in the specification. That is a high standard, and one that is very difficult to attain. It is why I use the term "implementation", and not "reference implementation" in my PEPs.

...

I've put months of work into writing and improving CPython's current pattern matching implementation, mostly on nights and weekends. I don't know whether it's intentional or not, but when you say things like that it instantly devalues all of my hard work in front of everyone on the list.

It definitely wasn't my intention.

...

For such a huge feature, I'm honestly quite amazed that this is the only issue we've found since it was merged over a month ago (and both authors have agreed that it needs to be fixed in the PEP, not the implementation). The PR introducing this behavior was reviewed by at least a half-dozen people, including you.

Indeed, I reviewed the implementation. I thought it was good enough to merge. I still think that.

...

The last time you said something like this, I just muted the thread. Let's please keep this respectful; we're all obviously committing a lot of our own time and energy to this, and we need to work well together for it to be successful in the long term.

Please don't take my criticisms of PEP 634 as criticisms of you or your efforts. I know it can often sound like that, but that really isn't my intent. Pattern matching is a *big* new feature, and to get it right takes a lot of discussion. Having your ideas continually battered is no fun, I know. So, I'd like to apologize again for any hurt caused. Cheers, Mark.

Stephen J. Turnbull

12:46 p.m.

Mark Shannon writes:

...

Calling something a "reference" implementation suggests that it is something that people can refer to, that is near perfectly correct and fills in the gaps in the specification.

Shoe fits, doesn't it? Both Guido and Brandt to my recall have specifically considered the possibility that the implementation is the better design, and therefore that the PEP should be changed.

...

That is a high standard, and one that is very difficult to attain.

That depends on context, doesn't it? In the case of a public release, yes, it's a very high standard. In the context of a feature in development, it *cannot* be that high, because even when the spec and the implementation are in *perfect* agreement, both may be changed in the light of experience or a change in requirements. Furthermore, in this instance, the implementation achieves *your* standard (Brandt, again):

...

...
both authors have agreed that it needs to be fixed in the PEP, not the implementation

You added:

...

It is why I use the term "implementation", and not "reference implementation" in my PEPs.

A reasonable usage. I think my more flexible, context-dependent definition is more useful. Unmodified, the word "implementation" covers everything from unrunnable pseudo-code to the high standard of a public release that is officially denoted "reference implementation". On the other hand, when Brandt says that a merge request is a "reference implementation", I interpret that to be a claim that, to his knowledge the MR is a perfect implementation of the specification, and an invitation to criticize the specification by referring to that implementation. That's a strong claim, even in my interpretation. However, I think that if the developer dares to make it, it's very useful to reviewers. As it was in this case. Final note: once this is merged and publicly released, it will lose its status as reference implementation in the above, strong sense. Any deviations from documented spec (the Language Reference) will be presumed to have to be fixed in the implementation (with due consideration for backward compatibility). "Although practicality beats purity," of course, but treating the Language Reference as authoritative is strongly preferred to keeping the implementation and modifying the Reference (at least as I understand it). Regards, Steve

Paul Moore

12:56 p.m.

On Sun, 4 Apr 2021 at 13:49, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Final note: once this is merged and publicly released, it will lose its status as reference implementation in the above, strong sense.

It *is* merged and publicly released - it's in the latest 3.10 alpha. That's really the point I was trying to make with my comment (I'm steering clear of the "scare quotes" discussion). The fact that the implementation kept getting referred to as the "reference implementation" confused me into thinking it hadn't been released yet, and that simply isn't true. Calling it "the implementation" avoids that confusion, IMO. Paul

Stephen J. Turnbull

5:15 a.m.

Paul Moore writes:

...

It *is* merged and publicly released - it's in the latest 3.10 alpha.

Merged, yes, but in my terminology alphas, betas, and rcs aren't "public releases", they're merely "accessible to the public". (I'm happy to adopt your terminology when you're in the conversation, I'm just explaining what I meant in my previous post.)

...

The fact that the implementation kept getting referred to as the "reference implementation" confused me into thinking it hadn't been released yet, and that simply isn't true. Calling it "the implementation" avoids that confusion, IMO.

The only thing I understand in that paragraph is "that [it hadn't been released yet] simply isn't true", which is true enough on your definition of "released". But why does "reference implementation" connote "unreleased"? That seems to be quite different from Mark's usage. I don't have an objection to your usage, I'd just like us all to converge on a set of terms so that Brandt has a compact way of saying "as far as I know, for the specification under discussion this implementation is completely accurate and folks are welcome to refer to the PEP, to the code, or to divergences as seems appropriate to them". I'm not sure if that's exactly what Brandt meant by "reference implementation", but that's how I understood it. Steve

Paul Moore

7:31 a.m.

On Tue, 6 Apr 2021 at 06:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Paul Moore writes:

...
It *is* merged and publicly released - it's in the latest 3.10 alpha.

Merged, yes, but in my terminology alphas, betas, and rcs aren't "public releases", they're merely "accessible to the public". (I'm happy to adopt your terminology when you're in the conversation, I'm just explaining what I meant in my previous post.)

*shrug* It's (in my experience) a continuum - it's not in a release yet, but it is available via an installer, with a (pre-release) version number. But I get what you're saying and don't disagree, to be honest. I see the discrepancy as mostly being because we're trying to use (imprecise) informal language to pin down precise nuances. The main point I was making is that it's merged into the CPython source code at this stage, and available for people to download and experiment with, which is something I was unclear about.

...

...
The fact that the implementation kept getting referred to as the "reference implementation" confused me into thinking it hadn't been released yet, and that simply isn't true. Calling it "the implementation" avoids that confusion, IMO.

The only thing I understand in that paragraph is "that [it hadn't been released yet] simply isn't true", which is true enough on your definition of "released". But why does "reference implementation" connote "unreleased"? That seems to be quite different from Mark's usage.

In my experience, people developing PEPs will sometimes provide what gets referred to as a "reference implementation" of the proposal, which is a PR or equivalent that people can apply and try out if they want to see how the proposal works in practice. That "reference implementation" is generally seen as part of the *proposal*, even if it then becomes the final merged code as well. Once it's released, it tends to no longer get called the *reference* implementation, as it's now just the implementation (in CPython) of the feature. PEP 1 uses this terminology, as well - "Standards Track PEPs consist of two parts, a design document and a reference implementation" and "Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and incorporated into the main source code repository, the status will be changed to "Final"". PEP 635 follows this terminology, with a "Reference implementation" section linking to the development branch for the feature. To put this back into the context of this discussion, when Mark was referring to the "reference implementation" it made me think that maybe we were talking about that development branch, and that the code for the pattern matching PEP hadn't yet been merged to the main branch, which is why we were still iterating over implementation details. And that led me to think that they'd better get the discussion resolved soon, as they risk missing the 3.10 deadline if things drag on. Which *isn't* the case, and if I'd been following things more closely I'd have known that, but avoiding the term "reference implementation" for the merged change would also have spared my confusion.

...

I don't have an objection to your usage, I'd just like us all to converge on a set of terms so that Brandt has a compact way of saying "as far as I know, for the specification under discussion this implementation is completely accurate and folks are welcome to refer to the PEP, to the code, or to divergences as seems appropriate to them". I'm not sure if that's exactly what Brandt meant by "reference implementation", but that's how I understood it.

Agreed, a common understanding is the main thing here. And as I'm not an active participant in the discussion, and I now understand the situation, my views shouldn't have too much weight in deciding what the best terminology is. Paul

Brandt Bucher

5:22 p.m.

Hi Mark. Thanks for your reply, I really appreciate it. Mark Shannon said:

...

My intention, and I apologize for not making this clearer, was not to denigrate your work, but to question the implications of the term "reference".

Calling something a "reference" implementation suggests that it is something that people can refer to, that is near perfectly correct and fills in the gaps in the specification.

That is a high standard, and one that is very difficult to attain. It is why I use the term "implementation", and not "reference implementation" in my PEPs.

Interesting. The reason I typically include a "Reference Implementation" section in my PEPs is because they almost always start out as a copy-paste of the template in PEP 12 (which also appears in PEP 1): https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep https://www.python.org/dev/peps/pep-0012/#suggested-sections Funny enough, PEP 635 has a "Reference Implementation" section, which itself refers to the implementation as simply a "feature-complete CPython implementation": https://www.python.org/dev/peps/pep-0635/#reference-implementation (PEP 634 and PEP 636 don't mention the existence of an implementation at all, as far as I can tell.) It's not a huge deal, but we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here" (which is what I've usually used the section to communicate). Brandt

Greg Ewing

11:24 p.m.

On 7/04/21 5:22 am, Brandt Bucher wrote:

...

we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here"

Maybe "prototype implementation" would be better? I think I've used that term in PEPs before. -- Greg

Stephen J. Turnbull

5:10 a.m.

Greg Ewing writes:

...

On 7/04/21 5:22 am, Brandt Bucher wrote:

...
we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here"

Maybe "prototype implementation" would be better? I think I've used that term in PEPs before.

That seems to me to correspond well to Brandt's standard as expressed above. To me, "prototype implementation" is somewhere between "proof of concept" and "reference implementation", and I welcome the additional precision. The big question is can such terms be used accurately (ie, do various people assign similar meanings to them)? I would define them functionally as proof of concept demonstrates some of the features, especially those that were considered "difficult to implement" prototype implementation implements the whole spec, so can be used be developers to prototype applications, reference implementation intended to be a complete and accurate implementation of the specification By "complete and accurate" I mean that it can be used experimentally to understand what the spec means without much worry that the proponent will brush off questions with "oh, that's just not implemented yet, read the spec if you want to know how it will work when we're done." Furthermore, any divergence between spec and implementation is a bug that is actually a broken promise. (The promise implied by "reference".) Finally, as development continues there is a promise that the spec and implementation will be kept in sync (of course changes might be provisional, but even then the sync should be maintained). I don't think the Platonic ideal interpretation of "reference implementation" is very useful. Software evolves. It evolves very quickly during initial development, but it's useful to "ask the implementation" about the spec even then. That's implied by methodologies like test-driven development. There are other workflows where that's not true. My claim is that "reference implementation" can be useful to distinguish development processes where you expect the implementation to reliably reflect the spec, even in corner cases, from those where you shouldn't. And even as the software evolves. Note that if we use this definition, then the "Reference Implementation" requirement of the PEP process becomes quite a high bar. I think we all agree on that. So I advocate, as Brandt suggested, that we revise the PEP template. In particular I think it should use Greg's term "prototype implementation". Optionally, we could make "reference implementation" available to proponents who wish to make that claim about their implementation. Steve

Paul Moore

7:25 a.m.

On Wed, 7 Apr 2021 at 06:15, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Greg Ewing writes:

...
On 7/04/21 5:22 am, Brandt Bucher wrote:

...
we might consider updating those templates if the term "Reference Implementation" implies a higher standard than "we've put in the work to make this happen, and you can try it out here"

Maybe "prototype implementation" would be better? I think I've used that term in PEPs before.

That seems to me to correspond well to Brandt's standard as expressed above.

To me, "prototype implementation" is somewhere between "proof of concept" and "reference implementation", and I welcome the additional precision. The big question is can such terms be used accurately (ie, do various people assign similar meanings to them)?

I would define them functionally as

proof of concept demonstrates some of the features, especially those that were considered "difficult to implement"

prototype implementation implements the whole spec, so can be used be developers to prototype applications,

reference implementation intended to be a complete and accurate implementation of the specification

I'm OK with these terms (although I don't actually think you *will* get sufficient consensus on them to make them unambiguous) but with one proviso - once the implementation is merged into the CPython source, I think it should simply be referred to as "the implementation" and qualifiers should be unnecessary (and should be considered misleading). Paul

Stephen J. Turnbull

1:34 p.m.

Paul Moore writes:

...

I'm OK with these terms (although I don't actually think you *will* get sufficient consensus on them to make them unambiguous)

...

once the implementation is merged into the CPython source, I think it should simply be referred to as "the implementation" and qualifiers should be unnecessary (and should be considered misleading).

Sounds good to me. Steve

Ethan Furman

5:58 p.m.

On 4/4/21 2:15 AM, Mark Shannon wrote:

...

Calling something a "reference" implementation suggests that it is something that people can refer to, that is near perfectly correct and fills in the gaps in the specification.

That is a high standard, and one that is very difficult to attain.

Indeed. I don't think even the CPython reference implementation achieves that standard -- unless we have vastly different ideas on what "near perfectly correct" means. -- ~Ethan~

Mark Shannon

10:19 a.m.

Hi Guido, On 02/04/2021 2:38 am, Guido van Rossum wrote:

...

On Thu, Apr 1, 2021 at 2:18 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

On 31/03/2021 9:53 pm, Guido van Rossum wrote: > On Wed, Mar 31, 2021 at 12:08 PM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org> > <mailto:mark@hotpy.org <mailto:mark@hotpy.org>>> wrote:

[snip]

> Apart from that, I think the semantics are so similar once you've added > __match_seq__/__match_map__ to PEP 634 that is hard to > claim one is better than the other. > My (unfinished) implementation of PEP 653 makes almost no changes to > the test suite. > > I'd like to see where those differences are -- then we can talk about > which is better. :-)

Almost all the changes come from requiring __match_args__ to be a tuple of unique strings.

Ah, *unique* strings. Not sure I care about that. Explicitly checking for that seems extra work, and I don't see anything semantically suspect in allowing that.

Checking for uniqueness is almost free because __match_args__ is a tuple, and therefore immutable, so the check can be done at class creation time.

...

The only other change is that

case int(real=0+0j, imag=0-0j):

fails to match 0, because `int` is `MATCH_SELF` so won't match attributes.

Oh, but that would be a problem. The intention wasn't that "self" mode prevents keyword/attribute matches. (FWIW real and imag should attributes should not be complex numbers, so that testcase is weird, but it should work.)

I thought matching `int(real=0+0j, imag=0-0j)` was a bit weird too. The change required to make it work is trivial, but the code seems more consistent if `int(real=0+0j, imag=0-0j)` is disallowed, which is why I went for that.

...

https://github.com/python/cpython/compare/master...markshannon:pep-653-imple... <https://github.com/python/cpython/compare/master...markshannon:pep-653-imple...>

-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>) /Pronouns: he/him //(why is my pronoun here?)/ <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Brandt Bucher

March 2021

9:14 p.m.

...

...
- Add new `__match_seq__` and `__match_map__` special attributes, corresponding to new public `Py_TPFLAGS_MATCH_SEQ` and `Py_TPFLAGS_MATCH_MAP` flags for use in `tp_flags`. When Python classes are defined with one or both of these attributes set to a boolean value, `type.__new__` will update the flags on the type to reflect the change (using a similar mechanism as `__slots__` definitions). They will be inherited otherwise. For convenience, `collections.abc.Sequence` will define `__match_seq__ = True`, and `collections.abc.Mapping` will define `__match_map__ = True`.

Using this in Python would look like:

``` class MySeq: __match_seq__ = True ...

class MyMap: __match_map__ = True ... ```

I don't like the way this need special inheritance rules, where inheriting one attribute mutates the value of another. It seems convoluted.

Let me clarify: these two attributes do not interact with one another; each attribute only interacts with its own flag on the type. It is perfectly possible to do: ``` class WhatIsIt: __match_map__ = True __match_seq__ = True ``` This will set both flags, and this `WhatIsIt` will match as a mapping *and* a sequence. This is allowed and works in PEP 634, but like Guido I'm not entirely opposed to making the matching behavior of such a class undefined against sequence or mapping patterns.

...

Consider:

class WhatIsIt(MySeq, MyMap): pass

With __match_container__ it works as expected with no special inheritance rules.

What *is* the expected behavior of this? Based on the current behavior of PEP 634, I would expect the `__match_container__` of each base to be or'ed, and something like this to match as both a mapping and a sequence (which PEP 653 says leads to undefined behavior). The actual behavior seems more like it will just be a sequence and not a mapping, since `__match_container__` would be inherited from `MySeq` and `MyMap` would be ignored. In the interest of precision, here is an implementation of *exactly* what I am thinking: `typeobject.c`: https://github.com/python/cpython/compare/master...brandtbucher:patma-flags#... `ceval.c`: https://github.com/python/cpython/compare/master...brandtbucher:patma-flags#... (One change from my last email: it doesn't allow `__match_map__` / `__match_seq__` to be set to `False`... only `True`. This prevents some otherwise tricky multiple-inheritance edge-cases present in both of our flagging systems that I discovered during testing. I don't think there are actual use-cases for unsetting the flags in subclasses, but we can revisit that later if needed.)

Guido van Rossum

9:36 p.m.

On Wed, Mar 31, 2021 at 2:14 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

(One change from my last email: it doesn't allow `__match_map__` / `__match_seq__` to be set to `False`... only `True`. This prevents some otherwise tricky multiple-inheritance edge-cases present in both of our flagging systems that I discovered during testing. I don't think there are actual use-cases for unsetting the flags in subclasses, but we can revisit that later if needed.)

That's surprising to me. Just like we can have a class that inherits from int but isn't hashable, and make that explicit by setting `__hash__ = None`, why couldn't I have a class that inherits from something else that happens to inherit from Sequence, and say "but I don't want it to match like a sequence" by adding `__match_sequence__ = False`? AFAIK all Mark's versions would support this by setting `__match_kind__ = 0`. Maybe you can show an example edge case where this would be undesirable? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Brandt Bucher

10:31 p.m.

Guido van Rossum wrote:

...

...
(One change from my last email: it doesn't allow `__match_map__` / `__match_seq__` to be set to `False`... only `True`. This prevents some otherwise tricky multiple-inheritance edge-cases present in both of our flagging systems that I discovered during testing. I don't think there are actual use-cases for unsetting the flags in subclasses, but we can revisit that later if needed.) That's surprising to me. Just like we can have a class that inherits from int but isn't hashable, and make that explicit by setting `__hash__ = None`, why couldn't I have a class that inherits from something else that happens to inherit from Sequence, and say "but I don't want it to match

On Wed, Mar 31, 2021 at 2:14 PM Brandt Bucher brandtbucher@gmail.com wrote: like a sequence" by adding `__match_sequence__ = False`? AFAIK all Mark's versions would support this by setting `__match_kind__ = 0`.

The issue isn't when *I* set `__match_seq__ = False` or `__match_container__ = 0`. It's when *one of my parents* does it that things become difficult.

...

Maybe you can show an example edge case where this would be undesirable?

Good idea. I've probably been staring at this stuff for too long to figure it out myself. :) As far as I can tell, these surprising cases arise because a bit flag can only be either 0 or 1. For us, "not specified" is equivalent to 0, which can lead to ambiguity. Consider this case: ``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE class Parent: pass class Child(Parent, Seq): pass ``` Okay, cool. `Child` will match as a sequence, which seems correct. But what about this similar case? ``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE class Parent: __match_seq__ = False # or __match_container__ = 0 class Child(Parent, Seq): pass ``` Here, `Child` will *not* match as a sequence, even though it probably should. The only workarounds I've found (like allowing `None` to mean "this is unset, don't inherit me if another parent sets this flag", ditching tp_flags entirely, or not inheriting these attributes) feel a bit extreme just to allow some users to do the moral equivalent of un-subclassing `collections.abc.Sequence`. So, my current solution (seen on the branch linked in my earlier email) is: - Set the flag if the corresponding magic attribute is set to True in the class definition - Raise at class definition time if it's set to anything other than True - Otherwise, set the flag if any of the parents set have the flag set As far as I can tell, this leads to the expected (and current, as of 3.10.0a6) behavior in all cases. Plus, it doesn't break my mental model of how inheritance works.

Caleb Donovick

April 2021

12:31 a.m.

...

Here, `Child` will *not* match as a sequence, even though it probably should,

Strong disagree, if I explicitly set `__match_seq__` to `False` in `Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.

...

- Raise at class definition time if it's set to anything other than True

I feel like this is a consenting adults thing. Yeah you probably won't need to set a flag to `False` but I don't see why it should be forbidden. On Wed, Mar 31, 2021 at 3:35 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

Guido van Rossum wrote:

...
...
(One change from my last email: it doesn't allow `__match_map__` / `__match_seq__` to be set to `False`... only `True`. This prevents some otherwise tricky multiple-inheritance edge-cases present in both of our flagging systems that I discovered during testing. I don't think there are actual use-cases for unsetting the flags in subclasses, but we can revisit that later if needed.) That's surprising to me. Just like we can have a class that inherits from int but isn't hashable, and make that explicit by setting `__hash__ = None`, why couldn't I have a class that inherits from something else that happens to inherit from Sequence, and say "but I don't want it to match

On Wed, Mar 31, 2021 at 2:14 PM Brandt Bucher brandtbucher@gmail.com wrote: like a sequence" by adding `__match_sequence__ = False`? AFAIK all Mark's versions would support this by setting `__match_kind__ = 0`.

The issue isn't when *I* set `__match_seq__ = False` or `__match_container__ = 0`. It's when *one of my parents* does it that things become difficult.

...
Maybe you can show an example edge case where this would be undesirable?

Good idea. I've probably been staring at this stuff for too long to figure it out myself. :)

As far as I can tell, these surprising cases arise because a bit flag can only be either 0 or 1. For us, "not specified" is equivalent to 0, which can lead to ambiguity.

Consider this case:

``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE

class Parent: pass

class Child(Parent, Seq): pass ```

Okay, cool. `Child` will match as a sequence, which seems correct. But what about this similar case?

``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE

class Parent: __match_seq__ = False # or __match_container__ = 0

class Child(Parent, Seq): pass ```

Here, `Child` will *not* match as a sequence, even though it probably should. The only workarounds I've found (like allowing `None` to mean "this is unset, don't inherit me if another parent sets this flag", ditching tp_flags entirely, or not inheriting these attributes) feel a bit extreme just to allow some users to do the moral equivalent of un-subclassing `collections.abc.Sequence`.

So, my current solution (seen on the branch linked in my earlier email) is:

- Set the flag if the corresponding magic attribute is set to True in the class definition - Raise at class definition time if it's set to anything other than True - Otherwise, set the flag if any of the parents set have the flag set

As far as I can tell, this leads to the expected (and current, as of 3.10.0a6) behavior in all cases. Plus, it doesn't break my mental model of how inheritance works. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/R3BGDN6N... Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

1:04 a.m.

On Thu, Apr 1, 2021 at 11:54 AM Caleb Donovick <donovick@cs.stanford.edu> wrote:

...

...
Here, `Child` will *not* match as a sequence, even though it probably should,

Strong disagree, if I explicitly set `__match_seq__` to `False` in `Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.

How much difference is there between: class Grandparent: """Not a sequence""" class Parent(Grandparent): """Also not a sequence""" class Child(Parent): """No sequences here""" and this: class Grandparent(list): """Is a sequence""" class Parent(Grandparent): """Explicitly not a sequence""" __match_seq__ = False class Child(Parent): """Shouldn't be a sequence""" ? Either way, Parent should function as a non-sequence. But if Child inherits from both Parent and tuple, it is most definitely a tuple, and therefore should be a sequence. With your proposed semantics, setting __match_seq__ to False is not simply saying "this isn't a sequence", but it's saying "prevent this from being a sequence". It's a stronger statement than simply undoing the declaration that it's a sequence. There would be no way to reset to the default state. Brandt's proposed semantics sound complicated, but as far as I can tell, they give sane results in all cases. ChrisA

Caleb Donovick

1:54 a.m.

...

It's a stronger statement than simply undoing the declaration that it's a sequence. There would be no way to reset to the default state.

How is this different from anything else that is inherited? The setting of a flag to `False` is not some irreversible process which permanently blocks child classes from setting that flag to `True`. If I want to give priority to `Seq` over `Parent` in Brandt's original example I need only switch the order of inheritance so that `Seq` is earlier in `Child` MRO or explicitly set the flag to `True` (or `Seq.__match_seq__`). In contrast Brandt's scheme does irreversibly set flags, there is no way to undo the setting of `__match_seq__` in a parent class. This really doesn't seem like an issue to me. I can't personally think of a use case for explicitly setting a flag to `False` but I also don't see why it should be forbidden. We get "- Otherwise, set the flag if any of the parents set have the flag set" for free through normal MRO rules except in the case where there is an explicit `False` (which I assume will be exceedingly rare and if it isn't there is clearly some use case). Why make it more complicated? On Wed, Mar 31, 2021 at 6:05 PM Chris Angelico <rosuav@gmail.com> wrote:

...

On Thu, Apr 1, 2021 at 11:54 AM Caleb Donovick <donovick@cs.stanford.edu> wrote:

...
...
Here, `Child` will *not* match as a sequence, even though it probably

should,

...
Strong disagree, if I explicitly set `__match_seq__` to `False` in

`Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.

...
How much difference is there between:

class Grandparent: """Not a sequence""" class Parent(Grandparent): """Also not a sequence""" class Child(Parent): """No sequences here"""

and this:

class Grandparent(list): """Is a sequence""" class Parent(Grandparent): """Explicitly not a sequence""" __match_seq__ = False class Child(Parent): """Shouldn't be a sequence"""

? Either way, Parent should function as a non-sequence. But if Child inherits from both Parent and tuple, it is most definitely a tuple, and therefore should be a sequence.

With your proposed semantics, setting __match_seq__ to False is not simply saying "this isn't a sequence", but it's saying "prevent this from being a sequence". It's a stronger statement than simply undoing the declaration that it's a sequence. There would be no way to reset to the default state.

Brandt's proposed semantics sound complicated, but as far as I can tell, they give sane results in all cases.

ChrisA _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GKOUSL2C... Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

2:50 a.m.

+1. I also don’t see what’s the big deal. On Wed, Mar 31, 2021 at 19:27 Caleb Donovick <donovick@cs.stanford.edu> wrote:

...

...
It's a stronger statement than simply undoing the declaration that it's a sequence. There would be no way to reset to the default state.

How is this different from anything else that is inherited?

The setting of a flag to `False` is not some irreversible process which permanently blocks child classes from setting that flag to `True`. If I want to give priority to `Seq` over `Parent` in Brandt's original example I need only switch the order of inheritance so that `Seq` is earlier in `Child` MRO or explicitly set the flag to `True` (or `Seq.__match_seq__`). In contrast Brandt's scheme does irreversibly set flags, there is no way to undo the setting of `__match_seq__` in a parent class.

This really doesn't seem like an issue to me. I can't personally think of a use case for explicitly setting a flag to `False` but I also don't see why it should be forbidden. We get "- Otherwise, set the flag if any of the parents set have the flag set" for free through normal MRO rules except in the case where there is an explicit `False` (which I assume will be exceedingly rare and if it isn't there is clearly some use case). Why make it more complicated?

On Wed, Mar 31, 2021 at 6:05 PM Chris Angelico <rosuav@gmail.com> wrote:

...
On Thu, Apr 1, 2021 at 11:54 AM Caleb Donovick <donovick@cs.stanford.edu> wrote:

...
...
Here, `Child` will *not* match as a sequence, even though it probably

should,

...
Strong disagree, if I explicitly set `__match_seq__` to `False` in

`Parent` I probably have a good reason for it and would absolutely expect `Child` to not match as a sequence.

...
How much difference is there between:

class Grandparent: """Not a sequence""" class Parent(Grandparent): """Also not a sequence""" class Child(Parent): """No sequences here"""

and this:

class Grandparent(list): """Is a sequence""" class Parent(Grandparent): """Explicitly not a sequence""" __match_seq__ = False class Child(Parent): """Shouldn't be a sequence"""

? Either way, Parent should function as a non-sequence. But if Child inherits from both Parent and tuple, it is most definitely a tuple, and therefore should be a sequence.

With your proposed semantics, setting __match_seq__ to False is not simply saying "this isn't a sequence", but it's saying "prevent this from being a sequence". It's a stronger statement than simply undoing the declaration that it's a sequence. There would be no way to reset to the default state.

Brandt's proposed semantics sound complicated, but as far as I can tell, they give sane results in all cases.

ChrisA _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GKOUSL2C... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UWSPAA27... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido (mobile)

Mark Shannon

9:04 a.m.

On 31/03/2021 11:31 pm, Brandt Bucher wrote:

...

Guido van Rossum wrote:

...
...
(One change from my last email: it doesn't allow `__match_map__` / `__match_seq__` to be set to `False`... only `True`. This prevents some otherwise tricky multiple-inheritance edge-cases present in both of our flagging systems that I discovered during testing. I don't think there are actual use-cases for unsetting the flags in subclasses, but we can revisit that later if needed.) That's surprising to me. Just like we can have a class that inherits from int but isn't hashable, and make that explicit by setting `__hash__ = None`, why couldn't I have a class that inherits from something else that happens to inherit from Sequence, and say "but I don't want it to match

On Wed, Mar 31, 2021 at 2:14 PM Brandt Bucher brandtbucher@gmail.com wrote: like a sequence" by adding `__match_sequence__ = False`? AFAIK all Mark's versions would support this by setting `__match_kind__ = 0`.

The issue isn't when *I* set `__match_seq__ = False` or `__match_container__ = 0`. It's when *one of my parents* does it that things become difficult.

...
Maybe you can show an example edge case where this would be undesirable?

Good idea. I've probably been staring at this stuff for too long to figure it out myself. :)

As far as I can tell, these surprising cases arise because a bit flag can only be either 0 or 1. For us, "not specified" is equivalent to 0, which can lead to ambiguity.

Consider this case:

``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE

class Parent: pass

class Child(Parent, Seq): pass ```

Okay, cool. `Child` will match as a sequence, which seems correct. But what about this similar case?

``` class Seq: __match_seq__ = True # or __match_container__ = MATCH_SEQUENCE

class Parent: __match_seq__ = False # or __match_container__ = 0

class Child(Parent, Seq): pass ```

Here, `Child` will *not* match as a sequence, even though it probably should. The only workarounds I've found (like allowing `None` to mean "this is unset, don't inherit me if another parent sets this flag", ditching tp_flags entirely, or not inheriting these attributes) feel a bit extreme just to allow some users to do the moral equivalent of un-subclassing `collections.abc.Sequence`.

This is just a weird case, so I don't think we should worry about it too much.

...

So, my current solution (seen on the branch linked in my earlier email) is:

- Set the flag if the corresponding magic attribute is set to True in the class definition - Raise at class definition time if it's set to anything other than True - Otherwise, set the flag if any of the parents set have the flag set

As far as I can tell, this leads to the expected (and current, as of 3.10.0a6) behavior in all cases. Plus, it doesn't break my mental model of how inheritance works.

Inheritance in Python is based on the MRO (using the C3 linearization algorithm) so my mental model is that Child.__match_container__ == 0. Welcome the wonderful world of multiple inheritance :) If Parent.__match_container__ == 0 (rather than just inheriting it) then it is explicitly stating that it is *not* a container. Seq explicitly states that it *is* a sequence. So Child is just broken. That it is broken for pattern matching is consistent with it being broken in general. Cheers, Mark.

...

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/R3BGDN6N... Code of Conduct: http://python.org/psf/codeofconduct/

Oscar Benjamin

March 2021

10:47 p.m.

On Tue, 30 Mar 2021 at 17:32, Brandt Bucher <brandtbucher@gmail.com> wrote: Hi Brandt,

...

...
Which requires the sympy class `Symbol` to "self" match. For `sympy` to support this pattern with PEP 634 is possible, but a bit tricky. With this PEP it can be implemented very easily.

Maybe I'm missing something, but I don't understand at all how the provided code snippet relies on the self-matching behavior.

Have the maintainers of SymPy (or any large library supposedly benefitting here) come out in support of the PEP? Are they at least aware of it? Have they indicated that the proposed idiom for implementing self-matching behavior using a property is truly too "tricky" for them?

Speaking as a maintainer of SymPy I do support the PEP but not for SymPy specifically. I just used SymPy as an example of something that seems like it should be a good fit for pattern matching but also shows examples that don't seem to work with PEP 634 in the way intended. I'm sure SymPy will use case/match when support for Python 3.9 is dropped but I don't see it as something that would be a major feature for SymPy users or for internal code. I expect that case/match would make some code tidier and potentially it could make some things a little faster (although that depends on it being well optimised - half a microsecond might seem small until you add up millions of them). There is a recently opened SymPy issue discussing the possible use of this: https://github.com/sympy/sympy/issues/21193 Pattern matching and destructuring more generally are significant features for symbolic libraries such as SymPy which has much code for doing this and can also be used with other dedicated libraries such as matchpy. Much more is needed than case/match for that though: rewriting, substitution, associative/commutative matching etc. It's not clear to me that core Python could ever provide anything new that would lead to a groundbreaking improvement for SymPy in this respect. The surrounding discussion of the various pattern matching PEPs has led me to think of the idea of destructuring as more of a general language feature that might not in future be limited to case/match though. I'm not sure where that could go for Python but I'm interested to see if anything more comes of it. I like a lot of the features in PEP 634 and the way I see it this PEP (653) underpins those. The reason I support PEP 653 is because it seems like a more principled approach to the mechanism for how pattern-matching should work that places both user-defined types and builtin types on an even footing. The precise mechanisms (match_class, match_self etc) and their meanings do seem strange but that's because they are trying to codify the different cases that PEP 634 has introduced. It's possible that the design of that mechanism can be improved and there have been suggestions for that in this thread. I do think though that it is important to have a general extensible mechanism rather than a specification based on special cases. I also think that the use of the Sequence and Mapping ABCs is a bad idea on practical grounds (performance, circularity in the implementation) and is not in keeping with the rest of the language. ABCs have always been optional in the past: Python uses protocols rather than ABCs (ducktyping etc). Finally, speaking as someone who also teaches introductory programming with Python then with *that* hat on I would have preferred it if none of the pattern-matching PEPs had been accepted. The advantage of Python in having a simple and easily understood core erodes with each new addition to core syntax. For novice users case/match only really offers increased complexity compared to if/elif but it will still be something else that needs to be learned before being able to read existing code. Oscar Oscar

Adrian Freund

3:10 p.m.

Hi Mark, I also wanted to give some feedback on this. While most of the discussion so far has been about the matching of the pattern itself I think it should also be considered what happens in the block below. Consider this code: ``` m = ... match m: case [a, b, c] as l: # what can we safely do with l? ``` or in terms of the type system: What is the most specific type that we can know l to be? With PEP 634 you can be sure that l is a sequence and that it's length is 3. With PEP 653 this is currently not explicitly defined. Judging from the pseudo code we can only assume that l is an iterable (because we use it in an unpacking assignment) and that it's length is 3, which greatly reduces the operations that can be safely done on l. For mapping matches with PEP 634 we can assume that l is a mapping. With PEP 653 all we can assume is that it has a .get method that takes two parameters, which is even more restrictive, as we can't even be sure if we can use len(), .keys, ... or iterate over it. This also makes it a lot harder for static type checkers to check match statements, because instead of checking against an existing type they now have to hard-code all the guarantees made my the match statement or not narrow the type at all. Additionally consider this typed example: ``` m: Mapping[str, int] = ... match m: case {'version': v}: pass ``` With PEP 634 we can statically check that v is an int. With PEP 653 there is no such guarantee. Therefore I would strongly be in favor of having sequence and mapping patterns only match certain types instead of relying on dunder attributes. If implementing all of sequence is really to much work just to be matched by a sequence pattern, as PEP 653 claims, then maybe a more general type could be chosen instead. I don't have any objections against the other parts of the PEP. Adrian Freund On 3/27/21 2:37 PM, Mark Shannon wrote:

...

Hi everyone,

As the 3.10 beta is not so far away, I've cut down PEP 653 down to the minimum needed for 3.10. The extensions will have to wait for 3.11.

The essence of the PEP is now that:

1. The semantics of pattern matching, although basically unchanged, are more precisely defined.

2. The __match_kind__ special attribute will be used to determine which patterns to match, rather than relying on the collections.abc module.

Everything else has been removed or deferred.

The PEP now has only the slightest changes to semantics, which should be undetectable in normal use. For those corner cases where there is a difference, it is to make pattern matching more robust. E.g. With PEP 653, pattern matching will work in the collections.abc module. With PEP 634 it does not.

As always, all thoughts and comments are welcome.

Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YYIT3QXM... Code of Conduct: http://python.org/psf/codeofconduct/

1419

Age (days ago)

1430

Last active (days ago)

List overview

Download

54 comments

14 participants

participants (14)

Adrian Freund
Antoine Pitrou
Brandt Bucher
Caleb Donovick
Chris Angelico
Ethan Furman
Greg Ewing
Guido van Rossum
Mark Shannon
Nick Coghlan
Oscar Benjamin
Paul Moore
Stephen J. Turnbull
Terry Reedy

Request for comments on final version of PEP 653 (Precise Semantics for Pattern Matching)

tags

participants (14)