PEP 642: Constraint Pattern Syntax for Structural Pattern Matching
Hi folks, This is a mailing list repost of the Discourse thread at https://discuss.python.org/t/pep-642-constraint-pattern-syntax-for-structura... The rendered version of the PEP can be found here: https://www.python.org/dev/peps/pep-0642/ The full text is also quoted in the Discourse thread. The remainder of this email is the same introduction that I posted on Discourse. I’m largely a fan of the Structural Pattern Matching proposal in PEP 634, but there’s one specific piece of the syntax proposal that I strongly dislike: the idea of basing the distinction between capture patterns and value patterns purely on whether they use a simple name or a dotted name. Thus PEP 642, which retains most of PEP 634 unchanged, but adjusts value checks to use an explicit prefix syntax (either `?EXPR` for equality constraints, or `?is EXPR` for identity constraints), rather than relying on users learning that literals and attribute lookups in a capture pattern mean a value lookup check, while simple names mean a capture pattern (unlike both normal expressions, where all three mean a value lookup, and assignment targets, where both simple and dotted names bind a new reference). The PEP itself has a lot of words explaining why I’ve made the design decisions I have, as well as the immediate and potential future benefits offered by using an explicit prefix syntax for value constraints, but the super short form goes like this: * if you don’t like match statements at all, or wish we were working on designing a C-style switch statement instead, then PEP 642 isn’t going to appeal to you any more than PEP 634 does * if, like me, you don’t like the idea of breaking the existing property of Python that caching the result of a value lookup subexpression in a local variable and then using that variable in place of the original subexpression should “just work”, then PEP 642’s explicit constraint prefix syntax may be more to your liking * however, if the idea of the `?` symbol becoming part of Python’s syntax doesn’t appeal to you, then you may consider any improved clarity of intent that PEP 642 might offer to not be worth that cost Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Thank you for the well-written PEP, although I don't agree with it. My response below is quite long. Here is my opinionated TL;DR: (1) Just get over the use of `_` for the wildcard pattern. another identifier. Now that the parser will support soft keywords, we should expect more cases that something that is an identifier is one context will be a keyword in another. (2) The most common uses of patterns should not require sigils. (3) None is special, and we should insist on `is` comparisons by default. True and False are a little more problematic. (4) Using sigils to over-ride the default is okay. That includes turning what would otherwise be a capture pattern into a comparison. Details below. On Sat, Oct 31, 2020 at 05:16:59PM +1000, Nick Coghlan wrote:
The rendered version of the PEP can be found here: https://www.python.org/dev/peps/pep-0642/
class Demo: ... def __bool__(self): ... return True ... def __eq__(self, other): ... return False ... x = Demo() x == True False if x: print("truthy") ...
Quoting from the PEP: "Wildcard patterns change their syntactic marker from _ to ?" Yuck. Sorry, I find `?` in that role very aesthetically and visually unappealing :-( I really don't get why so many people are hung up over this minuscule issue of giving `_` special meaning inside match statements. IMO, consistency with other languages' pattern matching is more useful than the ability to capture using `_` as a variable name. Now that the PEG parser makes it easy to have soft keywords, there will probably be more cases in the future where something that is syntactically an identifier is a regular name in one context and special syntax in another. This has happened before (e.g. "as") and it will happen again. We have a very strong convention that `_` is used as a write-only "don't care" variable. (The two exceptions are the magic underscore in the REPL, and `_()` in i18n.) In idiomatic Python code, if we bind a value to `_` and then use it later, we are Doing It Wrong. Is there such a shortage of local variable names that the inability to misuse `_` is a problem in practice? Just use another identifier. But if we really *must* break that convention and bind to `_`, we can still do it inside a match statement: case a: _ = a print(_) The fact that you have to use a temporary variable to break the rules is, in my opinion, a good thing -- it reminds you that what you are doing is weird. Quoting code from the PEP: ``` # Literal patterns match number: case ?0: print("Nothing") case ?1: print("Just one") ``` I think this is an example of what Larry Wall talked about when he discussed the mistakes of Perl's original regex syntax: "Poor Huffman coding" https://www.perl.com/pub/2002/06/04/apo5.html/ Wall regrets that many common patterns are longer and harder to write than rarer patterns. Why do we need a `?` sigil to match a literal? `case 1` cannot possibly be interpreted as a capture pattern. It would be wrong to compare it with `is`. What else could it mean other than equality comparison? The question mark is pure noise. So here's a counter suggestion: (1) Literals still match by equality, because that is what want 99% of the time. No sigil required. You mention this in the "Rejected ideas" section, but I reject your rejection :-) The PEP rejects this because: "they have the same syntax sensitivity problem as value patterns do, where attempting to move the literal pattern out to a local variable for naming clarity would turn the value checking literal pattern into a name binding capture pattern" but that's based on a really simple-minded refactoring. Sure, the naive user who knows little about pattern matching might try to refactor like this: # Before. match record: case (42, x): ... # After. ANSWER_TO_LIFE = 42 match record: # It's a Trap! case (ANSWER_TO_LIFE, x): ... and I am sympathetic to your desire to avoid that. But this is the sort of error that: - only applies in a comparatively unusual circumstances (naively refactoring a literal in a case statement); - is easily avoided by automated refactoring tools; - linters will warn about (assignment to a CONSTANT); - is easily spotted if you have unit tests; - is obvious to those with more experience in pattern matching. So I don't see this is as a large problem. I expect few people will be bitten by this more than once, if that. I think that your preventative solution, forcing all literal patterns to require a sigil, is worse than the problem it is solving. Bottom line: let's not hamstring pattern matching with poor Hoffman coding right from day one. (2) While literals usually compare by equality, the exception is three special keywords, and one symbol, that compare by identity: case None | True | False | ... : # Compares by identity. I can't think of any other literal where identity tests would be useful and guaranteed by the language (no relying on implementation-specific details, such as small int caching or string interning). So these keywords (plus the ... symbol) match by identity by default, because that's what we want 99% of the time. (Although, see below for discussion about the two bools.) Other special values, like NotImplemented and Ellipsis, aren't keywords, they are just names, and don't get special treatment. (3) Overriding the default comparison with an explicit sigil is allowed: case ==True: print("True, or 1, or 1.0, or 1+0j, etc") case ==None: print("None, or something weird that equals None") case is 1943.63: print("if you see this, the interpreter is caching floats") I don't think that there will be any ambiguity between the unary "==" pattern modifier and the real `==` operator. But if I am wrong, then we can change the spelling: case ?None: print("None, or something weird that equals None") case ?is 1943.63: print("if you see this, the interpreter is caching floats") (I don't love the question mark here, but I don't hate it either.) The important thing here is that the cases with no sigil are the common operations; the sigil is only needed for the uncommon case. (4) Patterns which could conceivably be interpreted as assignment targets default to capture patterns, because that's what is normally wanted in pattern matching: case [1, spam, eggs]: # captures spam and eggs If you don't want to capture a named value, but just match on it, override it with an explicit `==` or `is`: case [1, ==spam, eggs]: # matches `spam` by equality, captures on eggs Quoting the PEP: "nobody litters their if-elif chains with x is True or x is False expressions, they write x and not x, both of which compare by value, not identity." That's incorrect. `if x` doesn't *compare* at all, not by value and not with equality, it duck-types truthiness: ``` truthy ``` There's a reasonable argument to make that (unless overridden by an explicit sigil) the `True` and `False` patterns should match by truthiness, not equality or identity, but I'm not going to make that argument. Quote: "Indeed, PEP 8 explicitly disallows the use if x is True" This is true, but I think you have to understand the intention there. I believe the intent is that APIs should not insist on *exactly* the True or False singletons for boolean flags, but instead accept any truthy or falsey objects. (Duck typing for the win.) But if you need to distinguish *exactly* True from an arbitrary truthy value like "spam and eggs" or 93.78, then identity, not equality, is the correct way to do it.
On Sat, Oct 31, 2020 at 10:22:09PM +1100, Steven D'Aprano wrote:
(1) Just get over the use of `_` for the wildcard pattern. another identifier. Now that the parser will support soft keywords, we should expect more cases that something that is an identifier is one context will be a keyword in another.
Oops, I lost a word. That should say "use another identifier". All other typos and misspellings are intentional :-) -- Steve
On Sat, 31 Oct 2020 at 11:25, Steven D'Aprano <steve@pearwood.info> wrote:
Thank you for the well-written PEP, although I don't agree with it. My response below is quite long. Here is my opinionated TL;DR:
For what it's worth, I find your rebuttal of PEP 642 convincing, and in line with my thoughts on the matter. -1 from me on PEP 642. Paul
Hello, On Sat, 31 Oct 2020 12:16:09 +0000 Paul Moore <p.f.moore@gmail.com> wrote:
On Sat, 31 Oct 2020 at 11:25, Steven D'Aprano <steve@pearwood.info> wrote:
Thank you for the well-written PEP, although I don't agree with it. My response below is quite long. Here is my opinionated TL;DR:
For what it's worth, I find your rebuttal of PEP 642 convincing, and in line with my thoughts on the matter.
-1 from me on PEP 642.
Given that this was a direct reply to Steven's mail, and he explicitly said:
(4) Using sigils to over-ride the default is okay. That includes turning what would otherwise be a capture pattern into a comparison.
And that's also the stated goal of PEP 642, quoting:
This PEP takes the view that not requiring a marker prefix on value lookups in match patterns results in a cure that is worse than the disease: Python's first ever syntax-sensitive value lookup where you can't transparently replace an attribute lookup with a local variable lookup
So, both PEP 642 and Steven agree that the problem exists, and explicit marker is a suitable means to address it. Then, deriving "rebuttal" and "-1" to PEP 642 from Steven's mail sounds a bit confusing. -- Best regards, Paul mailto:pmiscml@gmail.com
On Sat., 31 Oct. 2020, 9:29 pm Steven D'Aprano, <steve@pearwood.info> wrote:
(3) Overriding the default comparison with an explicit sigil is allowed:
case ==True: print("True, or 1, or 1.0, or 1+0j, etc")
case ==None: print("None, or something weird that equals None")
case is 1943.63: print("if you see this, the interpreter is caching floats")
Where is this override allowed? It isn't covered under the syntax for value patterns or literal patterns: * https://www.python.org/dev/peps/pep-0634/#value-patterns * https://www.python.org/dev/peps/pep-0634/#literal-patterns and there aren't any other pattern types that make comparisons. It also isn't in the draft reference implementation. If PEP 634 allowed the exact comparison operator to be specified for patterns (with at least "==" and "is" allowed), and patterns with such explicit operators allowed arbitrary primary expressions as PEP 642 proposes, that would indeed address the bulk of my concerns: * literal patterns would be an unambiguous shorthand for a comparison pattern (always equality - see discussion below) * attribute patterns would be an unambiguous shorthand for a comparison pattern (always equality) * the implementation would have no need to reinvent a subset of expression compilation specifically for literal and attribute patterns, it could just use the parser to control the conversion of the restricted syntactic shorthand to the more general comparison pattern at the AST level * the deferred ideas in PEP 642 (negated comparisons, containment checks) would all be just as applicable as deferred ideas for an updated PEP 634 that included comparison patterns (with the question mark free spellings "!=", "is not", "in" and "not in") (To a first approximation, the code needed to implement this feature for PEP 634 is the code I already wrote to implement "?" and "?is" for PEP 642, and the code deletion notes in my branch would also generally apply)
I don't think that there will be any ambiguity between the unary "==" pattern modifier and the real `==` operator. But if I am wrong, then we can change the spelling:
case ?None: print("None, or something weird that equals None")
case ?is 1943.63: print("if you see this, the interpreter is caching floats")
(I don't love the question mark here, but I don't hate it either.)
The important thing here is that the cases with no sigil are the common operations; the sigil is only needed for the uncommon case.
The tokeniser does struggle with "==" appearing after "=" or ":" in class patterns and mapping patterns, so you have to make sure to help it out with whitespace or parentheses. That's why I didn't use it for PEP 642, but the whitespace sensitivity would be more tolerable if the explicit symbol was left out most of the time.
(4) Patterns which could conceivably be interpreted as assignment targets default to capture patterns, because that's what is normally wanted in pattern matching:
case [1, spam, eggs]: # captures spam and eggs
If you don't want to capture a named value, but just match on it, override it with an explicit `==` or `is`:
case [1, ==spam, eggs]: # matches `spam` by equality, captures on eggs
As noted above, the current PEP 634 spec doesn't allow this, but if it did, then I agree it would adress most of the concerns that prompted me to write PEP 642. If the 634 PEP authors are amenable, I'd be happy to prepare a PR against the PEP that made this change so you could see what it would look like at the grammar level.
Quoting the PEP:
"nobody litters their if-elif chains with x is True or x is False expressions, they write x and not x, both of which compare by value, not identity."
That's incorrect. `if x` doesn't *compare* at all, not by value and not with equality, it duck-types truthiness:
Aye, I considered going back and rewording that part to be more technically precise, but never actually did it (whether by type coercion or equality comparison, the ultimate effect is being more permissive than the strict identity check suggested for literal patterns). ```
class Demo: ... def __bool__(self): ... return True ... def __eq__(self, other): ... return False ... x = Demo() x == True False if x: print("truthy") ... truthy
There's a reasonable argument to make that (unless overridden by an explicit sigil) the `True` and `False` patterns should match by truthiness, not equality or identity, but I'm not going to make that argument.
While I'd consider duck typing True & False less objectionable than comparing them by identity (as it would follow PEP 8), it wouldn't fix the key problem with special casing literals in the compiler: you lose that special casing if the literal value is replaced by a symbolic reference to the literal value. I don't ever want to be having conversations about why "case True:" doesn't behave the same way as "case some.attr.referring.to.true:". If PEP 634 had comparison patterns, then users would get "== True" by default for both literal and attribute patterns, "is True" if they explicitly asked for it, and regular boolean coercion if they combined a capture pattern with a guard expression. I do agree that None & Ellipsis are less of a concern (as almost no one overrides equality to compare equal to those, so comparing by equality vs identity gives the same answer), but that also means the special case would serve little practical purpose.
Cheers, Nick.
On Sat, Oct 31, 2020 at 6:30 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat., 31 Oct. 2020, 9:29 pm Steven D'Aprano, <steve@pearwood.info> wrote:
(3) Overriding the default comparison with an explicit sigil is allowed:
case ==True: print("True, or 1, or 1.0, or 1+0j, etc")
case ==None: print("None, or something weird that equals None")
case is 1943.63: print("if you see this, the interpreter is caching floats")
Where is this override allowed? [...]
You're quoting from Steven's counter-proposal, which he prefaced with:
So here's a counter suggestion:
If PEP 634 allowed the exact comparison operator to be specified for patterns (with at least "==" and "is" allowed), and patterns with such explicit operators allowed arbitrary primary expressions as PEP 642 proposes, that would indeed address the bulk of my concerns:
* literal patterns would be an unambiguous shorthand for a comparison pattern (always equality - see discussion below) * attribute patterns would be an unambiguous shorthand for a comparison pattern (always equality) * the implementation would have no need to reinvent a subset of expression compilation specifically for literal and attribute patterns, it could just use the parser to control the conversion of the restricted syntactic shorthand to the more general comparison pattern at the AST level * the deferred ideas in PEP 642 (negated comparisons, containment checks) would all be just as applicable as deferred ideas for an updated PEP 634 that included comparison patterns (with the question mark free spellings "!=", "is not", "in" and "not in")
(To a first approximation, the code needed to implement this feature for PEP 634 is the code I already wrote to implement "?" and "?is" for PEP 642, and the code deletion notes in my branch would also generally apply)
I think this over-stresses the notion that users might want to override the comparison operator to be used. We only have two operators that make sense in this context, 'is' and '==', and really, for almost everything you want to do, '==' is the appropriate operator. (There is a small trickle of bugs caused by people inappropriately using e.g. `if x is 1` instead of `if x == 1`, suggesting that if anything, there is too much freedom here.) The big exception is `None`, where you basically always want to use `is`, which is what PEP 634 does. In PEP 622, we didn't do this, and we felt uncomfortable about it, so we changed it in PEP 634. We also changed it for True and False, because we realized that since 1 == 1.0 == True, people writing ``` case True: ``` would expect this to match only Booleans. The main use case here is situations (like JSON) where Booleans are *not* to be considered equivalent to 0 and 1, which using PEP 622 would have to be written as ``` case bool(True): ``` which is hard to discover and not that easy to grasp when reading either. There's not really ever a reason to write ``` case ==True: # Using Steven's notation ``` since that's just an odd and misleading way to write ``` case 1: ``` I don't ever want to be having conversations about why "case True:" doesn't
behave the same way as "case some.attr.referring.to.true:".
And you won't, because why would people define their own names for True and False? For sure people will define constants with Boolean values (e.g. `DEBUG = True`) but these aren't good candidates for use in patterns. I could imagine seeing ``` match DEBUG_NETWORK, DEBUG_LOGIC: case False, False: pass case False, True: print("We're debugging logic only") case True, False: print("Debugging network only") case True, True: print("Debugging network and logging") ``` but I would be surprised by ``` match x: case DEBUG: ... ``` just like I'd be surprised seeing ``` if x == DEBUG: ... ``` PS. Using `...` as a literal pattern is also Steven's invention, this isn't in PEP 634. People would probably think it had some special meaning as a pattern rather than understanding it was meant as the literal value `Ellipsis`. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Sat, Oct 31, 2020 at 9:37 PM Guido van Rossum <guido@python.org> wrote:
I think this over-stresses the notion that users might want to override the comparison operator to be used. We only have two operators that make sense in this context, 'is' and '==', and really, for almost everything you want to do, '==' is the appropriate operator. (There is a small trickle of bugs caused by people inappropriately using e.g. `if x is 1` instead of `if x == 1`, suggesting that if anything, there is too much freedom here.) The big exception is `None`, where you basically always want to use `is`, which is what PEP 634 does.
FWIW, there's an additional exception: sentinel = object() if var is sentinel: I use this idiom from time to time - instead of None.
On Sat, Oct 31, 2020 at 21:48 Dan Stromberg <drsalists@gmail.com> wrote:
On Sat, Oct 31, 2020 at 9:37 PM Guido van Rossum <guido@python.org> wrote:
I think this over-stresses the notion that users might want to override the comparison operator to be used. We only have two operators that make sense in this context, 'is' and '==', and really, for almost everything you want to do, '==' is the appropriate operator. (There is a small trickle of bugs caused by people inappropriately using e.g. `if x is 1` instead of `if x == 1`, suggesting that if anything, there is too much freedom here.) The big exception is `None`, where you basically always want to use `is`, which is what PEP 634 does.
FWIW, there's an additional exception:
sentinel = object()
if var is sentinel:
I use this idiom from time to time - instead of None.
You can just write ‘case sentinel’, since object’s == operator uses identity anyway.
-- --Guido (mobile)
On Sun., 1 Nov. 2020, 3:01 pm Guido van Rossum, <guido@python.org> wrote:
On Sat, Oct 31, 2020 at 21:48 Dan Stromberg <drsalists@gmail.com> wrote:
On Sat, Oct 31, 2020 at 9:37 PM Guido van Rossum <guido@python.org> wrote:
I think this over-stresses the notion that users might want to override the comparison operator to be used. We only have two operators that make sense in this context, 'is' and '==', and really, for almost everything you want to do, '==' is the appropriate operator. (There is a small trickle of bugs caused by people inappropriately using e.g. `if x is 1` instead of `if x == 1`, suggesting that if anything, there is too much freedom here.) The big exception is `None`, where you basically always want to use `is`, which is what PEP 634 does.
FWIW, there's an additional exception:
sentinel = object()
if var is sentinel:
I use this idiom from time to time - instead of None.
You can just write ‘case sentinel’, since object’s == operator uses identity anyway.
No, you can't, as the other operand might decide it wants to compare equal to your sentinel value. Cheers, Nick. --
--Guido (mobile)
Nick Coghlan doesn't want to ever be having conversations about why "case True:" doesn't behave the same way as "case some.attr.referring.to.true:". Guido thinks that it strange enough that you won't see it. I agree that it is odd to define a complicated alias for True, but it isn't so odd to have a config variable that is boolean, or even one that is essentially always defined to the same value. I'm not sure this is worth bending over backwards for, but it does exist. -jJ
On Sun, 1 Nov 2020 at 11:29, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat., 31 Oct. 2020, 9:29 pm Steven D'Aprano, <steve@pearwood.info> wrote:
(4) Patterns which could conceivably be interpreted as assignment targets default to capture patterns, because that's what is normally wanted in pattern matching:
case [1, spam, eggs]: # captures spam and eggs
If you don't want to capture a named value, but just match on it, override it with an explicit `==` or `is`:
case [1, ==spam, eggs]: # matches `spam` by equality, captures on eggs
As noted above, the current PEP 634 spec doesn't allow this, but if it did, then I agree it would adress most of the concerns that prompted me to write PEP 642.
If the 634 PEP authors are amenable, I'd be happy to prepare a PR against the PEP that made this change so you could see what it would look like at the grammar level.
Since Guido has indicated he's still dubious about the value of offering an explicit prefix marker syntax at all, I'm instead going to agree with most of Steven's counter proposal and adopt it as the next iteration of PEP 642 (conceding the point on "_", using "==" and "is" as the prefix markers, and keeping the syntactic sugar that lets you omit the "==" prefix for comparison against literals and attributes). For the literal comparisons where equality isn't the right default, I'm still proposing leaving out the special casing, but I'm switching to proposing that we just not consider them valid literals for pattern matching purposes in the initial iteration of the design (so `is None` would be allowed as an identity constraint, but a bare ``None`` would be rejected as ambiguous, at least for now. I'd be more prepared to concede the "But is it *really* ambiguous?" case for `None` and `...` than I would for `True` and `False`, though). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano <steve@pearwood.info> wrote:
I really don't get why so many people are hung up over this minuscule issue of giving `_` special meaning inside match statements. IMO, consistency with other languages' pattern matching is more useful than the ability to capture using `_` as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but *only* in cases in match statements, not in iterable unpacking. It *already* isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable. Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context. It makes all uses of '_' harder to parse, and it makes it easier to mistake one situation for another. Perhaps not terribly easy, but since there is _no_ confusion now, it's by definition *easier*. The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things. -- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!
On 11/2/2020 9:31 AM, Thomas Wouters wrote:
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano <steve@pearwood.info <mailto:steve@pearwood.info>> wrote:
I really don't get why so many people are hung up over this minuscule issue of giving `_` special meaning inside match statements. IMO, consistency with other languages' pattern matching is more useful than the ability to capture using `_` as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but *only* in cases in match statements, not in iterable unpacking. It *already* isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable.
Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context. It makes all uses of '_' harder to parse, and it makes it easier to mistake one situation for another. Perhaps not terribly easy, but since there is _no_ confusion now, it's by definition *easier*. The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things.
All good points. What I don't understand is why '_' is treated any differently than any named capture pattern. It seems to me that using: case x: # a capture_pattern is the same as: case _: # the wildcard_pattern They both always match (I'm ignoring the binding thing here, it's coming up). I realize PEP 635 gives the rational for separating this so that it can enforce that "case x, x:" can be made invalid, likening it to duplicate function parameters. The PEP focuses on the differences between that and tuple unpacking. But I think that if the semantics were the same as tuple unpacking (allowed duplicates, and binding to the last one) then the whole "_ as wildcard" arguments would just go away, and "_" would be treated just as it is elsewhere in Python. For me, this would address Thomas' point above and reduce the cognitive load of having a special rule. But I'm probably missing some other nuance to the whole discussion, which will no doubt now be pointed out to me. Eric
On Mon, Nov 2, 2020 at 1:14 PM Eric V. Smith <eric@trueblade.com> wrote:
On 11/2/2020 9:31 AM, Thomas Wouters wrote:
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano <steve@pearwood.info> wrote:
I really don't get why so many people are hung up over this minuscule issue of giving `_` special meaning inside match statements. IMO, consistency with other languages' pattern matching is more useful than the ability to capture using `_` as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but *only* in cases in match statements, not in iterable unpacking. It *already* isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable.
Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context. It makes all uses of '_' harder to parse, and it makes it easier to mistake one situation for another. Perhaps not terribly easy, but since there is _no_ confusion now, it's by definition *easier*. The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things.
All good points.
What I don't understand is why '_' is treated any differently than any named capture pattern. It seems to me that using:
case x: # a capture_pattern
is the same as:
case _: # the wildcard_pattern
They both always match (I'm ignoring the binding thing here, it's coming up). I realize PEP 635 gives the rational for separating this so that it can enforce that "case x, x:" can be made invalid, likening it to duplicate function parameters. The PEP focuses on the differences between that and tuple unpacking. But I think that if the semantics were the same as tuple unpacking (allowed duplicates, and binding to the last one) then the whole "_ as wildcard" arguments would just go away, and "_" would be treated just as it is elsewhere in Python. For me, this would address Thomas' point above and reduce the cognitive load of having a special rule.
But I'm probably missing some other nuance to the whole discussion, which will no doubt now be pointed out to me.
Eric
That's not an unreasonable characterization. But we feel that `case x, x` can easily be misunderstood as "a tuple of two equal values" and we want to be able to call that out as an error. Hence the need for recognizing the wildcard in the parser, since `case x, _, _` *is* important. Hence the need to standardize it (i.e., not leave it to be *just* a convention). Using _ seems the most commonly used convention for "throwaway" target (although we know some organizations have different conventions), *and* it matches the wildcard notation in most other languages, which looks like a win-win to me. Finally, not assigning a value to _ is kind of important in the context of i18n, where _("string") is the common convention for tagging translatable strings. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 11/2/20 1:52 PM, Glenn Linderman wrote:
On 11/2/2020 1:42 PM, Guido van Rossum wrote:
But we feel that `case x, x` can easily be misunderstood as "a tuple of two equal values"
So what _is_ the syntax for "a tuple of two equal values" ?
case x, ?x: # comes to mind (not that it is in the PEP :))
Using a guard statement: case x, y if x == y I believe supporting case x, x # look ma! no guard! is a possible future enhancement. -- ~Ethan~
On 3/11/20 11:01 am, Ethan Furman wrote:
I believe supporting
case x, x # look ma! no guard!
is a possible future enhancement.
In which case there will be a need for *some* kind of true "don't care" placeholder. If it's not "_" then it will have to be something else like "?". And we need to decide about it now, because once people start using "_" as a wildcard in patterns, it will be too late to go back. -- Greg
On Tue, 3 Nov 2020, Greg Ewing wrote:
On 3/11/20 11:01 am, Ethan Furman wrote:
I believe supporting
case x, x # look ma! no guard!
is a possible future enhancement.
In which case there will be a need for *some* kind of true "don't care" placeholder. If it's not "_" then it will have to be something else like "?". And we need to decide about it now, because once people start using "_" as a wildcard in patterns, it will be too late to go back.
But will it, really ? It seems to me, that if we leave the "_" magic out, and leave "case x, x" to the linters, that leaves a clear path forward for whatever can be decided whenever it can be decided. /Paul
On 4/11/20 4:36 am, Paul Svensson wrote:
On Tue, 3 Nov 2020, Greg Ewing wrote:
once people start using "_" as a wildcard in patterns, it will be too late to go back.
But will it, really ? It seems to me, that if we leave the "_" magic out, and leave "case x, x" to the linters, that leaves a clear path forward for whatever can be decided whenever it can be decided.
If "_" is a non-binding wildcard, linters will have to allow "case _, _" otherwise it might as well not be there. And then if it is later changed to be binding, "case _, _" will either become invalid or start forcing the two occurrences to be equal, depending on which change is made, thus breaking existing code. The only way I can see to keep our future options open in this area is not to have a wildcard at all, and make people use a different throwaway name for each don't-care position in a pattern. -- Greg
On Wed, Nov 04, 2020 at 12:15:08PM +1300, Greg Ewing wrote:
If "_" is a non-binding wildcard, linters will have to allow "case _, _" otherwise it might as well not be there. And then if it is later changed to be binding,
Why would we want to do that? Apart from the backward incompatibility of such a change, why would we want to make `_` binding? There is an effectively unlimited number of possible capture patterns available to choose from. Just use another variable. We aren't going to use `_` as a normal capturing pattern regardless of what the language allows: that would go against idiomatic Python convention. If we use `_` other Pythonistas will snigger at our lack of clue, our programs will fail code review, and linters will complain about it. And it will go against the common practice among most current pattern matching languages.
"case _, _" will either become invalid or start forcing the two occurrences to be equal, depending on which change is made, thus breaking existing code.
Right. We will have no good reason to remove the non-binding wildcard pattern, and very good reason to *not* break people's code by removing it. So why are we discussing this?
The only way I can see to keep our future options open in this area is not to have a wildcard at all,
Why would we want to "keep our options open" here? What benefit do we have for going against half a century of pattern matching theory and practice and common usage in other languages? There is a lot of prior art here, probably a dozen or more languages: Haskell, Rust, Nemerle, Erlang, Ocaml, Prolog, F#, Elixer, Mathematica, etc. I haven't done a full survey of the prior art, but I doubt that I have even scratched the surface here. I'm sure there are many others, depending on how widely you want to define pattern matching. Coconut already uses `_` as the wildcard: https://coconut.readthedocs.io/en/master/DOCS.html#match Were they wrong to do so? Does the Coconut community -- to say nothing of Haskell, Rust etc -- wish that they had kept their options open?
and make people use a different throwaway name for each don't-care position in a pattern.
That would be: (1) Annoying and frustrating. (2) Misleading: using a capture pattern means you care about the value you are capturing. Using a capture pattern to bind a value you don't care about is obfuscates the code. (3) Inefficient: that would mean things you don't care about will be captured as real, potentially long-lived, name bindings. Bindings aren't free. While it is true that we don't normally care too much about wasting the odd name binding here and there, neither do we go out of our way to *intentionally* be wasteful by unnecessarily capturing values we don't care about: who_cares1 = my_list.sort() who_cares2 = print(my_list) still_don't_care = values.reverse() honestly_i_don't_care_what_this_returns = settings.update(config) especially not chosing a different name each time. -- Steve
On 6/11/20 4:54 am, Steven D'Aprano wrote:
On Wed, Nov 04, 2020 at 12:15:08PM +1300, Greg Ewing wrote:
If "_" is a non-binding wildcard, linters will have to allow "case _, _" otherwise it might as well not be there. And then if it is later changed to be binding,
Why would we want to do that?
I'm not suggesting we should. I was replying to a post proposing to not treat "_" specially, and pointing out that if we don't make it special now we can't change our mind later. -- Greg
On 3 Nov 2020, at 16:36, Paul Svensson <paul-python@svensson.org> wrote:
On Tue, 3 Nov 2020, Greg Ewing wrote:
On 3/11/20 11:01 am, Ethan Furman wrote:
I believe supporting
case x, x # look ma! no guard! is a possible future enhancement.
In which case there will be a need for *some* kind of true "don't care" placeholder. If it's not "_" then it will have to be something else like "?". And we need to decide about it now, because once people start using "_" as a wildcard in patterns, it will be too late to go back.
But will it, really ? It seems to me, that if we leave the "_" magic out, and leave "case x, x" to the linters, that leaves a clear path forward for whatever can be decided whenever it can be decided.
Leaving this to linters makes it harder to change the behaviour of “case x, x” later. Also: not everyone uses a linter. The particular example of “case x, x” also seems to be a bit of a red herring because that scenario is close to regular tuple unpacking. If I read the PEP correctly binding the same name multiple times is also forbidden in more complex scenario’s where multiple binding is not so easily recognised, such as "case Rect(Point(x, y), Size(x, w))”. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
On Tue., 3 Nov. 2020, 8:07 am Ethan Furman, <ethan@stoneleaf.us> wrote:
On 11/2/20 1:52 PM, Glenn Linderman wrote:
On 11/2/2020 1:42 PM, Guido van Rossum wrote:
But we feel that `case x, x` can easily be misunderstood as "a tuple of two equal values"
So what _is_ the syntax for "a tuple of two equal values" ?
case x, ?x: # comes to mind (not that it is in the PEP :))
Using a guard statement:
case x, y if x == y
This example made me realise that I need to add test cases for "case x, ==x:" and "case x, is x:" to PEP 642's reference implementation (and text to the PEP pointing out that explicit constraints can help address the pattern back-reference problem). Cheers, Nick.
On 11/2/20 2:01 PM, Brandt Bucher wrote:
Glenn Linderman wrote:
So what _is_ the syntax for "a tuple of two equal values" ?
If you’re asking about PEP 634:
``` case x, y if x == y: ```
Which is much clearer, in my opinion.
Yeah, I've come 'round to this opinion as well. Let's get basic pattern matching in (by which I mean PEPs 634-636) and we can add bells and whistles later if there is need/demand for it. -- ~Ethan~
On Tue, Nov 3, 2020 at 8:53 AM Glenn Linderman <v+python@g.nevcal.com> wrote:
On 11/2/2020 1:42 PM, Guido van Rossum wrote:
But we feel that `case x, x` can easily be misunderstood as "a tuple of two equal values"
So what _is_ the syntax for "a tuple of two equal values" ?
case x, ?x: # comes to mind (not that it is in the PEP :))
case x, y if x == y: If it gets a lot of demand, a dedicated syntax can be added in the future without breaking anything. ChrisA
On Mon, Nov 02, 2020 at 03:31:44PM +0100, Thomas Wouters wrote:
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano <steve@pearwood.info> wrote:
I really don't get why so many people are hung up over this minuscule issue of giving `_` special meaning inside match statements. IMO, consistency with other languages' pattern matching is more useful than the ability to capture using `_` as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but *only* in cases in match statements, not in iterable unpacking. It *already* isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable.
Right. This is a small inconsistency in the meaning of `_` between match statements and other statements: 1. In a `case` statement (but not the block following the case line?), `_` is a soft keyword with special meaning as a wildcard match. 2. Elsewhere, `_` is an ordinary name but special by convention. We've had soft keywords before, like `as`, `async` and `await`, and the world didn't end. The intention is to have them again in the future: https://docs.python.org/3/library/keyword.html#keyword.issoftkeyword Is it your intention to argue against all soft keywords, or just this one?
Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context.
We've had this burden ever since Python introduced strings: x = a + _ # It's a name. x = a + '_' # It's a string. And f-strings have added to that burden: x = a + f'_{_}' # It's both a string and a name! I don't think this is a heavy burden, and I don't fear this will be a heavy burden either: case _: # It's a wildcard pattern. if _: # It's a name. If I can cope with strings, with our without an f-prefix, I can cope with underscore being context-dependent. I agree that your statement is objectively true:
The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same.
but your argument still doesn't convince me. Using `?` would solve that problem, but I don't think that's a problem that needs solving, and furthermore it would introduce other problems in its place: - `?` as a wildcard token is ugly (that's a personal, subjective judgement); - it's confusable with it's use in regexes (things that are different should not look the same); - and it clashes with the wildcard used in most(?) other languages with pattern matching. I have not done a full review, but I believe that `_` is a wildcard pattern in Clojure, Kotlin, Haskell, Scala, Ocaml, F# and Rust, among others. We have no obligation to make Python look like other languages, but by the same token we need not be different just for the sake of being different. There's value in picking the same syntax, or at least similar syntax, as other languages. I expect to spend a long time learning how to read pattern matches before I am as fluent with them as I am with other Python code, but the wildcard pattern is probably one of the simplest parts to grasp. And the beauty is that I can look at (say) Haskell pattern matching code, and even if I can recognise nothing else, I can recognise the underscore and the `|` used for alternatives, and that gives me a toe-hold to start deciphering what I am reading. So while I acknowledge the issues you mention, I just don't think they are important. To me, the benefit of using underscore outweighs the negatives. -- Steve
I think using symbols like ? and == in patterns looks stylistically ugly, and unintuitive, albeit more explicit. I, too, would rather have pattern matching more explicit, but it shouldn't need to be so ugly (yes, I know, ugly is a subjective term, so be it). I would propose that, opposite to what this PEP 642 proposes, matching as literal terms should be the default, and a special notation should be used for binding to names. match number: case 0: print("Nothing") case 1: print("Just one") This would be equivalent: zero = 0 one = 1 match number: case zero: print("Nothing") case one: print("Just one") And I would propose to use "as" for the notation of binding to a variable, possibly in combination with "_" for the wildcard pattern: expected_value = "xxx" match some_json: case {"foo": expected_value}: # matches {"foo": "xxx"} pass case {"foo": _ as bar}: # matches any {"foo": <anything>} print(f"json got foo value {bar}") Yes, I understand that being forced to use "_ as name" in a lot of patterns is more verbose, but I posit that it is both explicit _and_ intuitive. And perhaps not as ugly as ? and ==. In my mind, I don't see that this "as" usage causes any confusion with the "as" in context managers. That is a cop-out. I see this "as" as more akin to the exception handling: try: ... except RuntimeError as error: ... See? No context manager protocol involved here. "as" is simply representing a name binding. On Sat, 31 Oct 2020 at 07:17, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hi folks,
This is a mailing list repost of the Discourse thread at
https://discuss.python.org/t/pep-642-constraint-pattern-syntax-for-structura...
The rendered version of the PEP can be found here: https://www.python.org/dev/peps/pep-0642/
The full text is also quoted in the Discourse thread.
The remainder of this email is the same introduction that I posted on Discourse.
I’m largely a fan of the Structural Pattern Matching proposal in PEP 634, but there’s one specific piece of the syntax proposal that I strongly dislike: the idea of basing the distinction between capture patterns and value patterns purely on whether they use a simple name or a dotted name.
Thus PEP 642, which retains most of PEP 634 unchanged, but adjusts value checks to use an explicit prefix syntax (either `?EXPR` for equality constraints, or `?is EXPR` for identity constraints), rather than relying on users learning that literals and attribute lookups in a capture pattern mean a value lookup check, while simple names mean a capture pattern (unlike both normal expressions, where all three mean a value lookup, and assignment targets, where both simple and dotted names bind a new reference).
The PEP itself has a lot of words explaining why I’ve made the design decisions I have, as well as the immediate and potential future benefits offered by using an explicit prefix syntax for value constraints, but the super short form goes like this:
* if you don’t like match statements at all, or wish we were working on designing a C-style switch statement instead, then PEP 642 isn’t going to appeal to you any more than PEP 634 does * if, like me, you don’t like the idea of breaking the existing property of Python that caching the result of a value lookup subexpression in a local variable and then using that variable in place of the original subexpression should “just work”, then PEP 642’s explicit constraint prefix syntax may be more to your liking * however, if the idea of the `?` symbol becoming part of Python’s syntax doesn’t appeal to you, then you may consider any improved clarity of intent that PEP 642 might offer to not be worth that cost
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WT3ZZ42X... Code of Conduct: http://python.org/psf/codeofconduct/
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert
Is there a bestiary of examples for the current pattern matching proposal(s)? It seems I don't have a good handle on how one matches simple tests like callability, function signatures, possession of specific attribute(s).....etc. Also will matching ever extend into the Typing universe? -- Robin Becker
On Wed, Nov 18, 2020 at 1:25 AM Robin Becker <robin@reportlab.com> wrote:
Is there a bestiary of examples for the current pattern matching proposal(s)?
It seems I don't have a good handle on how one matches simple tests like callability,
Doable using protocols.
function signatures,
I don't think that's directly doable, but there might be some way to bend it to protocols.
possession of specific attribute(s).....etc.
Protocols.
Also will matching ever extend into the Typing universe?
In what way do you have in mind? With protocol support baked into PEP 634 that already ties into type hints. -Brett
-- Robin Becker _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OKBOTKDQ... Code of Conduct: http://python.org/psf/codeofconduct/
(For people who fail to find any mention of protocols in PEP 634, Protocols (PEP 544) can be used (with the @runtime decorator) to override isinstance(), and class patterns are defined to use isinstance() for the class check.) On Wed, Nov 18, 2020 at 11:50 AM Brett Cannon <brett@python.org> wrote:
On Wed, Nov 18, 2020 at 1:25 AM Robin Becker <robin@reportlab.com> wrote:
Is there a bestiary of examples for the current pattern matching proposal(s)?
It seems I don't have a good handle on how one matches simple tests like callability,
Doable using protocols.
function signatures,
I don't think that's directly doable, but there might be some way to bend it to protocols.
possession of specific attribute(s).....etc.
Protocols.
Also will matching ever extend into the Typing universe?
In what way do you have in mind? With protocol support baked into PEP 634 that already ties into type hints.
-Brett
-- Robin Becker _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OKBOTKDQ... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YMLP5QDX... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
..........
Also will matching ever extend into the Typing universe?
In what way do you have in mind? With protocol support baked into PEP 634 that already ties into type hints.
-Brett
......... thanks for the answers; the only one missing is whether there is an actual bestiary of examples, but I guess the PEPs will have to do for now. It's unlikely I will need any of this for a while so examples will appear over time. -- Robin Becker
participants (19)
-
Brandt Bucher
-
Brett Cannon
-
Chris Angelico
-
Dan Stromberg
-
Eric V. Smith
-
Ethan Furman
-
Glenn Linderman
-
Greg Ewing
-
Guido van Rossum
-
Gustavo Carneiro
-
Jim J. Jewett
-
Nick Coghlan
-
Paul Moore
-
Paul Sokolovsky
-
Paul Svensson
-
Robin Becker
-
Ronald Oussoren
-
Steven D'Aprano
-
Thomas Wouters