Mailman 3 Critique of PEP 622 (Structural Pattern Matching) - Python-Dev

Critique of PEP 622 (Structural Pattern Matching)

Mark Shannon

Aug. 14, 2020

2:24 p.m.

Hi all, I've written up a critique of PEP 622. Rather than dump a 2000 line email on you all, I've made a git repo. https://github.com/markshannon/pep622-critique If you have any corrections or additions to suggest, feel free to submit a PR. If you'd rather not submit a PR, for any reason, just send me an email. Cheers, Mark.

Show replies by date

Chris Angelico

August 2020

2:36 p.m.

On Sat, Aug 15, 2020 at 12:32 AM Mark Shannon <mark@hotpy.org> wrote:

...

Hi all,

I've written up a critique of PEP 622. Rather than dump a 2000 line email on you all, I've made a git repo.

https://github.com/markshannon/pep622-critique

I started reading it. You're saying the same things that everyone else has said, so I stopped reading. Do you have anything new to add to the discussion, or is this 2000 lines of rehash? ChrisA

Steven D'Aprano

6:36 a.m.

On Sat, Aug 15, 2020 at 12:36:25AM +1000, Chris Angelico wrote:

...

Do you have anything new to add to the discussion, or is this 2000 lines of rehash?

Having a summary of objections/critiques in one place is far better than expecting people to wade through multiple huge threads. I've lost count... is the number of emails in this discussion more or less than a googolplex? *wink* -- Steven

Guido van Rossum

10:06 p.m.

On Fri, Aug 14, 2020 at 11:42 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Sat, Aug 15, 2020 at 12:36:25AM +1000, Chris Angelico wrote:

...
Do you have anything new to add to the discussion, or is this 2000 lines of rehash?

Having a summary of objections/critiques in one place is far better than expecting people to wade through multiple huge threads.

But Mark's repo doesn't replace any of the threads -- it just repeats Mark's own arguments, which are exclusively focused on the examples in the PEP (it's as if Mark read nothing *but* the examples). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Steven D'Aprano

11:51 p.m.

On Sat, Aug 15, 2020 at 03:06:46PM -0700, Guido van Rossum wrote:

...

But Mark's repo doesn't replace any of the threads -- it just repeats Mark's own arguments, which are exclusively focused on the examples in the PEP (it's as if Mark read nothing *but* the examples).

Oh, I'm sorry, I based my comment on Chris' comment that Mark was repeating everyone else's arguments. My bad :-( I guess at some point I shall have to read the entire thread if I want to have an opinion on this feature. (Other than "pattern matching sounds great, but I don't understand how it works!") -- Steven

Guido van Rossum

2:19 a.m.

On Sat, Aug 15, 2020 at 5:00 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

I guess at some point I shall have to read the entire thread if I want to have an opinion on this feature.

Or you could try reading the PEP itself. :-) It'll be quicker than reading all the commentary, and (unlike the first version) it actually starts with a pretty compelling introduction. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Stephen J. Turnbull

6:21 a.m.

Steven D'Aprano writes:

...

Oh, I'm sorry, I based my comment on Chris' comment that Mark was repeating everyone else's arguments. My bad :-(

Mark can be tendentious. Some of his arguments in the main gist were also made by others, but mostly they do seem to be reiterations of his own pet peeves. That said, I think Guido was unfair. Mark's pamphlet does provide a lot of new data for us to consider, not so much in the main "Critique", but rather in the "Analysis of the Standard Library"[1]. To me, the "queries" he came up with seem a fair representation of the kinds of things that this proposal might improve in the stdlib. I found that Mark's rewritings are generally pretty attractive even though they seem unlikely to provide substantial decrease in LOC and quite unlikely to be more performant. I like destructuring as a way of "breaking out" attributes to local bindings. I think most of the pattern matching versions are distinct improvements over both the existing code and Mark's suggested alternatives, although pattern matching is a "big change" and his alternatives have much smaller footprints on the future of Python. I would make a few points based on little expertise in pattern matching ;-), my esthetic sense about destructuring, and Mark's critique and analysis. 1. In the Critique Mark argues use of pattern matching is likely to be infrequent because of the heavy use of duck typing in well- written Python code. But I find pattern matching to be complementary to duck-typing. To be a bit melodramatic, it's an extension of duck-typing from "walks and quacks like a duck" ("dynamic duck-typing") to "has a bill like a duck" ("static duck-typing" or perhaps "platypus mistyping"). 2. In the Analysis Mark argues that idioms amenable to pattern matching in existing stdlib code are quite rare (a couple of dozen instances, and as I wrote above I think his query was reasonably representative). While that certainly is a useful analysis, it cannot account for the fact that pattern matching is a *paradigm* that has not been available in Python in the past. *Availability of a pleasant syntax for a new paradigm causes code to be designed differently.* That is something that none of us can claim to be able to quantify accurately, although the experience Guido and others have with "async" may be relevant to guesstimating it. 3. Mark identifies a number of minor deficiencies in the existing proposal, of which the attractive nuisance of "symbolic constants" (the "case HTTP_OK" example) and the inability to use pattern matching directly to populate object attributes in __init__ methods were most impressive to me. (I still like the PEP.) I wonder if it would be possible to provide hooks for a useful amount of pattern matching syntax while leaving open the possibility of future semantic improvements as was done with function annotations.

...

I guess at some point I shall have to read the entire thread if I want to have an opinion on this feature.

Only if the SC refuses to approve the current draft! ;-) [1] https://github.com/markshannon/pep622-critique/blob/master/stdlib_examples.m...

Mark Shannon

10:24 a.m.

On 15/08/2020 11:06 pm, Guido van Rossum wrote:

...

On Fri, Aug 14, 2020 at 11:42 PM Steven D'Aprano <steve@pearwood.info <mailto:steve@pearwood.info>> wrote:

On Sat, Aug 15, 2020 at 12:36:25AM +1000, Chris Angelico wrote:

> Do you have anything new to add to the discussion, or is this 2000 > lines of rehash?

Having a summary of objections/critiques in one place is far better than expecting people to wade through multiple huge threads.

But Mark's repo doesn't replace any of the threads -- it just repeats Mark's own arguments, which are exclusively focused on the examples in the PEP (it's as if Mark read nothing *but* the examples).

I've read all the PEP. *All* of it. Several times. I would encourage others to read it all carefully as well. Anyone who just reads the abstract and introduction might well think that PEP 622 is a wonderful thing without any flaws. Reading the whole PEP and thinking about its application reveals its flaws. By choosing examples of your choosing, I am making an effort to be as fair as possible. Probably more than fair, in this case. Throughout the critique, I have attempted to be objective where possible and fair where objectivity is impossible. Please point out anywhere I have failed, I'd like the critique to be as fair as possible. For examples where PEP 622 works poorly see the "Irregularities and surprising behavior" section. I would also bring you attention to my rigorous analysis of the possible application to PEP 622 the entirety of CPython. If I have made any mistakes there, I'd be happy to correct them. Cheers, Mark.

Henk-Jaap Wagenaar

12:13 p.m.

On Mon, 17 Aug 2020 at 11:30, Mark Shannon <mark@hotpy.org> wrote:

...

I would also bring you attention to my rigorous analysis of the possible application to PEP 622 the entirety of CPython. If I have made any mistakes there, I'd be happy to correct them.

You say "I've elided a lot of complex logic int cases, as it is not relevant." in the plistlib._BinaryPlistWriter._write_object example, this seems to be a prime example where guards could be used to simplify/unnest the logic? Even if you disagree, I think it is highly relevant and worth commenting on, one way or another!

Mark Shannon

1:16 p.m.

On 17/08/2020 1:13 pm, Henk-Jaap Wagenaar wrote:

...

On Mon, 17 Aug 2020 at 11:30, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

I would also bring you attention to my rigorous analysis of the possible application to PEP 622 the entirety of CPython. If I have made any mistakes there, I'd be happy to correct them.

You say "I've elided a lot of complex logic int cases, as it is not relevant." in the plistlib._BinaryPlistWriter._write_object example, this seems to be a prime example where guards could be used to simplify/unnest the logic? Even if you disagree, I think it is highly relevant and worth commenting on, one way or another!

Thanks for the feedback. I've expanded the code in the `int` and `UID` cases, and made it clearer why the remaining code has been elided. Cheers, Mark.

Henk-Jaap Wagenaar

2:08 p.m.

Thanks for having a look! The example now looks like (looking at int case only, same applies to UID): case int(): if value < 0: try: self._fp.write(struct.pack('>Bq', 0x13, value)) except struct.error: raise OverflowError(value) from None elif value < 1 << 8: self._fp.write(struct.pack('>BB', 0x10, value)) ... elif value < 1 << 64: self._fp.write(b'\x14' + value.to_bytes(16, 'big', signed=True)) else: raise OverflowError(value) I was more thinking it would read/look something like: case int() if value < 0: try: self._fp.write(struct.pack('>Bq', 0x13, value)) except struct.error: raise OverflowError(value) from None case int() if value < 1 << 8: self._fp.write(struct.pack('>BB', 0x10, value)) ... case int() if value < 1 << 64: self._fp.write(b'\x14' + value.to_bytes(16, 'big', signed=True)) case int(): raise OverflowError(value) Which I think works as expected under the current PEP622? On Mon, 17 Aug 2020 at 14:16, Mark Shannon <mark@hotpy.org> wrote:

...

On 17/08/2020 1:13 pm, Henk-Jaap Wagenaar wrote:

...
On Mon, 17 Aug 2020 at 11:30, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

I would also bring you attention to my rigorous analysis of the possible application to PEP 622 the entirety of CPython. If I have made any mistakes there, I'd be happy to correct them.

You say "I've elided a lot of complex logic int cases, as it is not relevant." in the plistlib._BinaryPlistWriter._write_object example, this seems to be a prime example where guards could be used to simplify/unnest the logic? Even if you disagree, I think it is highly relevant and worth commenting on, one way or another!

Thanks for the feedback.

I've expanded the code in the `int` and `UID` cases, and made it clearer why the remaining code has been elided.

Cheers, Mark.

Mark Shannon

2:49 p.m.

On 17/08/2020 3:08 pm, Henk-Jaap Wagenaar wrote:

...

Thanks for having a look! The example now looks like (looking at int case only, same applies to UID):

case int(): if value < 0: try: self._fp.write(struct.pack('>Bq', 0x13, value)) except struct.error: raise OverflowError(value) from None elif value < 1 << 8: self._fp.write(struct.pack('>BB', 0x10, value)) ... elif value < 1 << 64: self._fp.write(b'\x14' + value.to_bytes(16, 'big', signed=True)) else: raise OverflowError(value)

I was more thinking it would read/look something like:

case int() if value < 0: try: self._fp.write(struct.pack('>Bq', 0x13, value)) except struct.error: raise OverflowError(value) from None case int() if value < 1 << 8: self._fp.write(struct.pack('>BB', 0x10, value)) ... case int() if value < 1 << 64: self._fp.write(b'\x14' + value.to_bytes(16, 'big', signed=True)) case int(): raise OverflowError(value)

Which I think works as expected under the current PEP622?

That would work, but would be slower for the reference implementation due to the repeated `isinstance(value, int)` checks. I think the repeated `int()` cases do not help readability. Which form do you think is more readable?

...

On Mon, 17 Aug 2020 at 14:16, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:

On 17/08/2020 1:13 pm, Henk-Jaap Wagenaar wrote: > On Mon, 17 Aug 2020 at 11:30, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org> > <mailto:mark@hotpy.org <mailto:mark@hotpy.org>>> wrote: > > > I would also bring you attention to my rigorous analysis of the > possible > application to PEP 622 the entirety of CPython. > If I have made any mistakes there, I'd be happy to correct them. > > > You say "I've elided a lot of complex logic int cases, as it is not > relevant." in the plistlib._BinaryPlistWriter._write_object example, > this seems to be a prime example where guards could be used to > simplify/unnest the logic? Even if you disagree, I think it is highly > relevant and worth commenting on, one way or another!

Thanks for the feedback.

I've expanded the code in the `int` and `UID` cases, and made it clearer why the remaining code has been elided.

Cheers, Mark.

Henk-Jaap Wagenaar

3:29 p.m.

...

That would work, but would be slower for the reference implementation due to the repeated `isinstance(value, int)` checks.

If you wanted to avoid that you could use match/case inside the "case int()" instead, i.e.: case int(): match value: case _ if value < 8: // do things case _ if value < 1 << 8: // do things ... case _: // raise some kind of error but that might be madness.

...

I think the repeated `int()` cases do not help readability. Which form do you think is more readable?

I think the form I suggested is more readable and also I think it is more PEP622-like and that it is easier to reason about/test/debug/maintain, but that's just my opinion! Not sure how important the speed difference is to this example.

Guido van Rossum

4:56 p.m.

On Mon, Aug 17, 2020 at 8:11 AM Mark Shannon <mark@hotpy.org> wrote:

...

On 17/08/2020 3:08 pm, Henk-Jaap Wagenaar wrote:

...
Thanks for having a look! The example now looks like (looking at int case only, same applies to UID):

case int(): if value < 0: try: self._fp.write(struct.pack('>Bq', 0x13, value)) except struct.error: raise OverflowError(value) from None elif value < 1 << 8: self._fp.write(struct.pack('>BB', 0x10, value)) ... elif value < 1 << 64: self._fp.write(b'\x14' + value.to_bytes(16, 'big', signed=True)) else: raise OverflowError(value)

I was more thinking it would read/look something like:

case int() if value < 0: try: self._fp.write(struct.pack('>Bq', 0x13, value)) except struct.error: raise OverflowError(value) from None case int() if value < 1 << 8: self._fp.write(struct.pack('>BB', 0x10, value)) ... case int() if value < 1 << 64: self._fp.write(b'\x14' + value.to_bytes(16, 'big', signed=True)) case int(): raise OverflowError(value)

Which I think works as expected under the current PEP622?

That would work, but would be slower for the reference implementation due to the repeated `isinstance(value, int)` checks.

The PEP allows the compiler to generate optimized code that only checks once.

...

I think the repeated `int()` cases do not help readability. Which form do you think is more readable?

I find Henk-Jaap's version better, because the case blocks show the structure of the code better. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Marat Khalili

8:52 a.m.

Hi all,

...

I started reading it. You're saying the same things that everyone else has said, so I stopped reading.

Do you have anything new to add to the discussion, or is this 2000 lines of rehash?

I'm new to the subject, and I find the rehash of everything that has been said on the subject useful. Apart from the OP's gist I could find only one publication on subject <https://thautwarm.github.io/Site-32/Design/PEP622-1.html> which actually spends more space discussing an alternative approach. If OP's gist is one-sided, I'd prefer to read more summaries from people who defend the PEP. The proposed changes are substantial, and the arguments collected by OP against it are quite compelling (the example with changing HTTP_OK value is downright horrifying). To make my comment slightly less meta, I'd like to ask if the following aspect was already discussed. In some cases the problem that PEP 622 is intended to solve is solved by the "dict of lambdas" pattern (here <https://medium.com/@julian.harley/custom-formatter-in-python-3-7e056b21d2d7> is an example in the wild). Particularly, the HTTP example is handled easily this way (note that I handled more cases in the same number of lines): RESPONSE_STATUS_REACTIONS = { HTTP_OK: lambda: do_something(response.data), HTTP_MOVED_PERMANENTLY: lambda: retry(response.location), # no way to merge with the one above in current syntax HTTP_FOUND: lambda: retry(response.location), HTTP_UNAUTHORIZED: lambda: retry(auth=get_credentials()), HTTP_UPGRADE: lambda: retry(delay=DELAY), # `retry` is pretty magical here # `lambda: raise` does not work in current syntax HTTP_INTERNAL_SERVER_ERROR: lambda: raise_(RequestError("we couldn't get the data")), # same problems with merging here HTTP_BAD_GATEWAY: lambda: raise_(RequestError("we couldn't get the data")), HTTP_SERVICE_UNAVAILABLE: lambda: raise_(RequestError("we couldn't get the data")),}RESPONSE_STATUS_REACTIONS.get(response.status, lambda: raise_(RequestError(f"Unexpected response status {response.status}")))() But of course dict only matches keys exactly, so the following won't work for subclasses of bool or int: key = { bool: lambda key: ('false', 'true')[key], int: _intstr, # no idea what _intstr does, just rewording the gist example }[type(key)](key) I wonder if some better mapping class can be added to the standard library that would accept more flexible patterns and match them in correct order without assigning anything? Assignment or whatever side effects would then be performed explicitly and kept separate from matching. With Best Regards, Marat Khalili

Mark Shannon

12:05 p.m.

Hi Chris, On 14/08/2020 3:36 pm, Chris Angelico wrote:

...

On Sat, Aug 15, 2020 at 12:32 AM Mark Shannon <mark@hotpy.org> wrote:

...
Hi all,

I've written up a critique of PEP 622. Rather than dump a 2000 line email on you all, I've made a git repo.

https://github.com/markshannon/pep622-critique

I started reading it. You're saying the same things that everyone else has said, so I stopped reading.

I've added an abstract and made the analysis of the standard library more prominent. Would you take another look and let me know if it is more a compelling read now?

...

Do you have anything new to add to the discussion, or is this 2000 lines of rehash?

Do let me know whether you think it adds anything new, once you've read a bit more. Cheers, Mark.

Baptiste Carvello

4:50 p.m.

Le 14/08/2020 à 16:24, Mark Shannon a écrit :

...

https://github.com/markshannon/pep622-critique

Hi all, reading through this made me think of 3 ideas which I think are new [1]. 2 of them are about the Value Pattern question, the last one is a small nit about the Django example. * the critique points the limitations to the use of pattern matching in __init__ methods, because Capture Patterns can't assign to dotted names. Thus the currently proposed dot-based rule is a limitation not just for Value Patterns, as already heavily discussed, but also for Capture Patterns. Moreover, this limitation cannot be lifted after the fact if the __init__ method use-case proves important in the future. * the critique points that Value Patterns with booleans would need identity-, not equality-based comparison. This leads to the idea that, if a special syntax if eventually used for Value Patterns, using the comparison operator in it might be useful. Thus we could have:

...

...
...
match int_or_boolean: # ok, dubious design! ... case is True: # same as "if int_or_boolean is True:" ... print("boolean true") ... case is False: ... print("boolean false") ... case == ONE: # same as "if int_or_boolean == ONE:" ... print("integer 1")

* the Django example could be written more idiomatically by relying more on destructuring instead of having a guard:

...

...
...
match value: ... case [first, *_, label := (Promise() | str())]: ... value = value[:-1] ... case _: ... label = key.replace('_', ' ').title()

Cheers, Baptiste [1] as in: I skimmed through most of the threads and believe they have not already been said (famous last words…)

1652

Age (days ago)

1655

Last active (days ago)

List overview

Download

16 comments

8 participants

participants (8)

Baptiste Carvello
Chris Angelico
Guido van Rossum
Henk-Jaap Wagenaar
Marat Khalili
Mark Shannon
Stephen J. Turnbull
Steven D'Aprano

Critique of PEP 622 (Structural Pattern Matching)

Baptiste Carvello

tags

participants (8)