Hi folks,
This is a mailing list repost of the Discourse thread at https://discuss.python.org/t/pep-642-constraint-pattern-syntax-for-structura...
The rendered version of the PEP can be found here: https://www.python.org/dev/peps/pep-0642/
The full text is also quoted in the Discourse thread.
The remainder of this email is the same introduction that I posted on Discourse.
I’m largely a fan of the Structural Pattern Matching proposal in PEP 634, but there’s one specific piece of the syntax proposal that I strongly dislike: the idea of basing the distinction between capture patterns and value patterns purely on whether they use a simple name or a dotted name.
Thus PEP 642, which retains most of PEP 634 unchanged, but adjusts
value checks to use an explicit prefix syntax (either ?EXPR
for
equality constraints, or ?is EXPR
for identity constraints), rather
than relying on users learning that literals and attribute lookups in
a capture pattern mean a value lookup check, while simple names mean a
capture pattern (unlike both normal expressions, where all three mean
a value lookup, and assignment targets, where both simple and dotted
names bind a new reference).
The PEP itself has a lot of words explaining why I’ve made the design decisions I have, as well as the immediate and potential future benefits offered by using an explicit prefix syntax for value constraints, but the super short form goes like this:
?
symbol becoming part of Python’s
syntax doesn’t appeal to you, then you may consider any improved
clarity of intent that PEP 642 might offer to not be worth that costCheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Thank you for the well-written PEP, although I don't agree with it. My response below is quite long. Here is my opinionated TL;DR:
(1) Just get over the use of _
for the wildcard pattern.
another identifier. Now that the parser will support soft keywords, we
should expect more cases that something that is an identifier is one
context will be a keyword in another.
(2) The most common uses of patterns should not require sigils.
(3) None is special, and we should insist on is
comparisons by
default. True and False are a little more problematic.
(4) Using sigils to over-ride the default is okay. That includes turning what would otherwise be a capture pattern into a comparison.
Details below.
On Sat, Oct 31, 2020 at 05:16:59PM +1000, Nick Coghlan wrote:
The rendered version of the PEP can be found here: https://www.python.org/dev/peps/pep-0642/
Quoting from the PEP:
"Wildcard patterns change their syntactic marker from _ to ?"
Yuck. Sorry, I find ?
in that role very aesthetically and
visually unappealing :-(
I really don't get why so many people are hung up over this minuscule
issue of giving _
special meaning inside match statements. IMO,
consistency with other languages' pattern matching is more useful than
the ability to capture using _
as a variable name.
Now that the PEG parser makes it easy to have soft keywords, there will probably be more cases in the future where something that is syntactically an identifier is a regular name in one context and special syntax in another. This has happened before (e.g. "as") and it will happen again.
We have a very strong convention that _
is used as a write-only "don't
care" variable. (The two exceptions are the magic underscore in the
REPL, and _()
in i18n.) In idiomatic Python code, if we bind a value
to _
and then use it later, we are Doing It Wrong.
Is there such a shortage of local variable names that the inability to
misuse _
is a problem in practice? Just use another identifier.
But if we really must break that convention and bind to _
, we can
still do it inside a match statement:
case a:
_ = a
print(_)
The fact that you have to use a temporary variable to break the rules is, in my opinion, a good thing -- it reminds you that what you are doing is weird.
Quoting code from the PEP:
# Literal patterns
match number:
case ?0:
print("Nothing")
case ?1:
print("Just one")
I think this is an example of what Larry Wall talked about when he discussed the mistakes of Perl's original regex syntax:
"Poor Huffman coding"
https://www.perl.com/pub/2002/06/04/apo5.html/
Wall regrets that many common patterns are longer and harder to write than rarer patterns.
Why do we need a ?
sigil to match a literal? case 1
cannot
possibly
be interpreted as a capture pattern. It would be wrong to compare it
with is
. What else could it mean other than equality comparison? The
question mark is pure noise.
So here's a counter suggestion:
(1) Literals still match by equality, because that is what want 99% of the time. No sigil required.
You mention this in the "Rejected ideas" section, but I reject your rejection :-)
The PEP rejects this because:
"they have the same syntax sensitivity problem as value patterns do, where attempting to move the literal pattern out to a local variable for naming clarity would turn the value checking literal pattern into a name binding capture pattern"
but that's based on a really simple-minded refactoring. Sure, the naive user who knows little about pattern matching might try to refactor like this:
# Before.
match record:
case (42, x): ...
# After.
ANSWER_TO_LIFE = 42
match record:
# It's a Trap!
case (ANSWER_TO_LIFE, x): ...
and I am sympathetic to your desire to avoid that.
But this is the sort of error that:
only applies in a comparatively unusual circumstances (naively refactoring a literal in a case statement);
is easily avoided by automated refactoring tools;
linters will warn about (assignment to a CONSTANT);
is easily spotted if you have unit tests;
is obvious to those with more experience in pattern matching.
So I don't see this is as a large problem. I expect few people will be bitten by this more than once, if that. I think that your preventative solution, forcing all literal patterns to require a sigil, is worse than the problem it is solving.
Bottom line: let's not hamstring pattern matching with poor Hoffman coding right from day one.
(2) While literals usually compare by equality, the exception is three special keywords, and one symbol, that compare by identity:
case None | True | False | ... :
# Compares by identity.
I can't think of any other literal where identity tests would be useful and guaranteed by the language (no relying on implementation-specific details, such as small int caching or string interning).
So these keywords (plus the ... symbol) match by identity by default, because that's what we want 99% of the time. (Although, see below for discussion about the two bools.)
Other special values, like NotImplemented and Ellipsis, aren't keywords, they are just names, and don't get special treatment.
(3) Overriding the default comparison with an explicit sigil is allowed:
case ==True:
print("True, or 1, or 1.0, or 1+0j, etc")
case ==None:
print("None, or something weird that equals None")
case is 1943.63:
print("if you see this, the interpreter is caching floats")
I don't think that there will be any ambiguity between the unary "=="
pattern modifier and the real ==
operator. But if I am wrong, then we
can change the spelling:
case ?None:
print("None, or something weird that equals None")
case ?is 1943.63:
print("if you see this, the interpreter is caching floats")
(I don't love the question mark here, but I don't hate it either.)
The important thing here is that the cases with no sigil are the common operations; the sigil is only needed for the uncommon case.
(4) Patterns which could conceivably be interpreted as assignment targets default to capture patterns, because that's what is normally wanted in pattern matching:
case [1, spam, eggs]:
# captures spam and eggs
If you don't want to capture a named value, but just match on it,
override it with an explicit ==
or is
:
case [1, ==spam, eggs]:
# matches `spam` by equality, captures on eggs
Quoting the PEP:
"nobody litters their if-elif chains with x is True or x is False expressions, they write x and not x, both of which compare by value, not identity."
That's incorrect. if x
doesn't compare at all, not by value and
not
with equality, it duck-types truthiness:
>>> class Demo:
... def __bool__(self):
... return True
... def __eq__(self, other):
... return False
...
>>> x = Demo()
>>> x == True
False
>>> if x: print("truthy")
...
truthy
There's a reasonable argument to make that (unless overridden by an
explicit sigil) the True
and False
patterns should match by
truthiness, not equality or identity, but I'm not going to make that
argument.
Quote:
"Indeed, PEP 8 explicitly disallows the use if x is True"
This is true, but I think you have to understand the intention there. I believe the intent is that APIs should not insist on exactly the True or False singletons for boolean flags, but instead accept any truthy or falsey objects. (Duck typing for the win.)
But if you need to distinguish exactly True from an arbitrary truthy value like "spam and eggs" or 93.78, then identity, not equality, is the correct way to do it.
On Sat, Oct 31, 2020 at 10:22:09PM +1100, Steven D'Aprano wrote:
(1) Just get over the use of _
for the
wildcard pattern.
another identifier. Now that the parser will support soft keywords, we
should expect more cases that something that is an identifier is one
context will be a keyword in another.
Oops, I lost a word. That should say "use another identifier".
All other typos and misspellings are intentional :-)
-- Steve
On Sat, 31 Oct 2020 at 11:25, Steven D'Aprano steve@pearwood.info wrote: >
Thank you for the well-written PEP, although I don't agree with it. My response below is quite long. Here is my opinionated TL;DR:
For what it's worth, I find your rebuttal of PEP 642 convincing, and in line with my thoughts on the matter.
-1 from me on PEP 642. Paul
Hello,
On Sat, 31 Oct 2020 12:16:09 +0000 Paul Moore p.f.moore@gmail.com wrote:
On Sat, 31 Oct 2020 at 11:25, Steven D'Aprano steve@pearwood.info wrote: >
Thank you for the well-written PEP, although I don't agree with it. My response below is quite long. Here is my opinionated TL;DR:
For what it's worth, I find your rebuttal of PEP 642 convincing, and in line with my thoughts on the matter.
-1 from me on PEP 642.
Given that this was a direct reply to Steven's mail, and he explicitly said:
(4) Using sigils to over-ride the default is okay. That includes turning what would otherwise be a capture pattern into a comparison.
And that's also the stated goal of PEP 642, quoting:
This PEP takes the view that not requiring a marker prefix on value lookups in match patterns results in a cure that is worse than the disease: Python's first ever syntax-sensitive value lookup where you can't transparently replace an attribute lookup with a local variable lookup
So, both PEP 642 and Steven agree that the problem exists, and explicit marker is a suitable means to address it.
Then, deriving "rebuttal" and "-1" to PEP 642 from Steven's mail sounds a bit confusing.
-- Best regards, Paul mailto:pmiscml@gmail.com
On Sat., 31 Oct. 2020, 9:29 pm Steven D'Aprano, steve@pearwood.info wrote:
> > >
(3) Overriding the default comparison with an explicit sigil is allowed:
case ==True:
print("True, or 1, or 1.0, or 1+0j, etc")
case ==None:
print("None, or something weird that equals None")
case is 1943.63:
print("if you see this, the interpreter is caching floats")
Where is this override allowed? It isn't covered under the syntax for value patterns or literal patterns:
and there aren't any other pattern types that make comparisons.
It also isn't in the draft reference implementation.
If PEP 634 allowed the exact comparison operator to be specified for patterns (with at least "==" and "is" allowed), and patterns with such explicit operators allowed arbitrary primary expressions as PEP 642 proposes, that would indeed address the bulk of my concerns:
(To a first approximation, the code needed to implement this feature for PEP 634 is the code I already wrote to implement "?" and "?is" for PEP 642, and the code deletion notes in my branch would also generally apply)
>
I don't think that there will be any ambiguity
between the unary "=="
pattern modifier and the real ==
operator. But if I am wrong, then we
can change the spelling:
case ?None:
print("None, or something weird that equals None")
case ?is 1943.63:
print("if you see this, the interpreter is caching floats")
(I don't love the question mark here, but I don't hate it either.)
The important thing here is that the cases with no sigil are the common operations; the sigil is only needed for the uncommon case.
The tokeniser does struggle with "==" appearing after "=" or ":" in class patterns and mapping patterns, so you have to make sure to help it out with whitespace or parentheses.
That's why I didn't use it for PEP 642, but the whitespace sensitivity would be more tolerable if the explicit symbol was left out most of the time.
>
(4) Patterns which could conceivably be interpreted as assignment targets default to capture patterns, because that's what is normally wanted in pattern matching:
case [1, spam, eggs]:
# captures spam and eggs
If you don't want to capture a named value, but just match on it,
override it with an explicit ==
or is
:
case [1, ==spam, eggs]:
# matches `spam` by equality, captures on eggs
As noted above, the current PEP 634 spec doesn't allow this, but if it did, then I agree it would adress most of the concerns that prompted me to write PEP 642.
If the 634 PEP authors are amenable, I'd be happy to prepare a PR against the PEP that made this change so you could see what it would look like at the grammar level.
>
Quoting the PEP:
"nobody litters their if-elif chains with x is True or x is False expressions, they write x and not x, both of which compare by value, not identity."
That's incorrect. if x
doesn't compare at all, not by value and
not
with equality, it duck-types truthiness:
Aye, I considered going back and rewording that part to be more technically precise, but never actually did it (whether by type coercion or equality comparison, the ultimate effect is being more permissive than the strict identity check suggested for literal patterns).
> >>> class Demo:
> ... def __bool__(self):
> ... return True
> ... def __eq__(self, other):
> ... return False
> ...
> >>> x = Demo()
> >>> x == True
> False
> >>> if x: print("truthy")
> ...
> truthy
>
>
There's a reasonable argument to make that (unless
overridden by an
explicit sigil) the True
and False
patterns should match by
truthiness, not equality or identity, but I'm not going to make that
argument.
While I'd consider duck typing True & False less objectionable than comparing them by identity (as it would follow PEP 8), it wouldn't fix the key problem with special casing literals in the compiler: you lose that special casing if the literal value is replaced by a symbolic reference to the literal value.
I don't ever want to be having conversations about why "case True:" doesn't behave the same way as "case some.attr.referring.to.true:".
If PEP 634 had comparison patterns, then users would get "== True" by default for both literal and attribute patterns, "is True" if they explicitly asked for it, and regular boolean coercion if they combined a capture pattern with a guard expression.
I do agree that None & Ellipsis are less of a concern (as almost no one overrides equality to compare equal to those, so comparing by equality vs identity gives the same answer), but that also means the special case would serve little practical purpose.
> Cheers, Nick.
>
On Sat, Oct 31, 2020 at 6:30 PM Nick Coghlan ncoghlan@gmail.com wrote:
On Sat., 31 Oct. 2020, 9:29 pm Steven D'Aprano, steve@pearwood.info wrote:
> > >
(3) Overriding the default comparison with an explicit sigil is allowed:
case ==True:
print("True, or 1, or 1.0, or 1+0j, etc")
case ==None:
print("None, or something weird that equals None")
case is 1943.63:
print("if you see this, the interpreter is caching floats")
Where is this override allowed? [...]
You're quoting from Steven's counter-proposal, which he prefaced with:
So here's a counter suggestion:
If PEP 634 allowed the exact comparison operator to be specified for patterns (with at least "==" and "is" allowed), and patterns with such explicit operators allowed arbitrary primary expressions as PEP 642 proposes, that would indeed address the bulk of my concerns:
(To a first approximation, the code needed to implement this feature for PEP 634 is the code I already wrote to implement "?" and "?is" for PEP 642, and the code deletion notes in my branch would also generally apply)
I think this over-stresses the notion that users might want to override the
comparison operator to be used. We only have two operators that make sense
in this context, 'is' and '==', and really, for almost everything you want
to do, '==' is the appropriate operator. (There is a small trickle of bugs
caused by people inappropriately using e.g. if x is 1
instead of if x
==
1
, suggesting that if anything, there is too much freedom here.) The big
exception is None
, where you basically always want to use is
,
which is
what PEP 634 does.
In PEP 622, we didn't do this, and we felt uncomfortable about it, so we changed it in PEP 634.
We also changed it for True and False, because we realized that since 1 == 1.0 == True, people writing
case True:
would expect this to match only Booleans. The main use case here is situations (like JSON) where Booleans are not to be considered equivalent to 0 and 1, which using PEP 622 would have to be written as
case bool(True):
which is hard to discover and not that easy to grasp when reading either.
There's not really ever a reason to write
case ==True: # Using Steven's notation
since that's just an odd and misleading way to write
case 1:
I don't ever want to be having conversations about why "case True:" doesn't
behave the same way as "case some.attr.referring.to.true:".
And you won't, because why would people define their own names for True and
False? For sure people will define constants with Boolean values (e.g.
DEBUG = True
) but these aren't good candidates for use in patterns. I
could imagine seeing
match DEBUG_NETWORK, DEBUG_LOGIC:
case False, False: pass
case False, True: print("We're debugging logic only")
case True, False: print("Debugging network only")
case True, True: print("Debugging network and logging")
but I would be surprised by
match x:
case DEBUG: ...
just like I'd be surprised seeing
if x == DEBUG: ...
PS. Using ...
as a literal pattern is also Steven's invention, this isn't
in PEP 634. People would probably think it had some special meaning as a
pattern rather than understanding it was meant as the literal value
Ellipsis
.
-- --Guido van Rossum (python.org/~guido) Pronouns: he/him **(why is my pronoun here?) http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
On Sat, Oct 31, 2020 at 9:37 PM Guido van Rossum guido@python.org wrote:
>
I think this over-stresses the notion that users
might want to override
the comparison operator to be used. We only have two operators that make
sense in this context, 'is' and '==', and really, for almost everything you
want to do, '==' is the appropriate operator. (There is a small trickle of
bugs caused by people inappropriately using e.g. if x is 1
instead of
if
x == 1
, suggesting that if anything, there is too much freedom here.) The
big exception is None
, where you basically always want to use
is
, which
is what PEP 634 does.
FWIW, there's an additional exception: sentinel = object()
if var is sentinel:
I use this idiom from time to time - instead of None.
On Sat, Oct 31, 2020 at 21:48 Dan Stromberg drsalists@gmail.com wrote:
>
On Sat, Oct 31, 2020 at 9:37 PM Guido van Rossum guido@python.org wrote:
>
I think this over-stresses the notion that users
might want to override
the comparison operator to be used. We only have two operators that make
sense in this context, 'is' and '==', and really, for almost everything you
want to do, '==' is the appropriate operator. (There is a small trickle of
bugs caused by people inappropriately using e.g. if x is 1
instead of
if
x == 1
, suggesting that if anything, there is too much freedom here.) The
big exception is None
, where you basically always want to use
is
, which
is what PEP 634 does.
FWIW, there's an additional exception: sentinel = object()
if var is sentinel:
I use this idiom from time to time - instead of None.
You can just write ‘case sentinel’, since object’s == operator uses identity anyway.
-- --Guido (mobile)
On Sun., 1 Nov. 2020, 3:01 pm Guido van Rossum, guido@python.org wrote:
On Sat, Oct 31, 2020 at 21:48 Dan Stromberg drsalists@gmail.com wrote:
>
On Sat, Oct 31, 2020 at 9:37 PM Guido van Rossum guido@python.org wrote:
>
I think this over-stresses the notion that users
might want to override
the comparison operator to be used. We only have two operators that make
sense in this context, 'is' and '==', and really, for almost everything you
want to do, '==' is the appropriate operator. (There is a small trickle of
bugs caused by people inappropriately using e.g. if x is 1
instead of
if
x == 1
, suggesting that if anything, there is too much freedom here.) The
big exception is None
, where you basically always want to use
is
, which
is what PEP 634 does.
FWIW, there's an additional exception: sentinel = object()
if var is sentinel:
I use this idiom from time to time - instead of None.
You can just write ‘case sentinel’, since object’s == operator uses identity anyway.
No, you can't, as the other operand might decide it wants to compare equal to your sentinel value.
Cheers, Nick.
--
--Guido (mobile)
Nick Coghlan doesn't want to ever be having conversations about why "case True:" doesn't behave the same way as "case some.attr.referring.to.true:".
Guido thinks that it strange enough that you won't see it. I agree that it is odd to define a complicated alias for True, but it isn't so odd to have a config variable that is boolean, or even one that is essentially always defined to the same value. I'm not sure this is worth bending over backwards for, but it does exist.
-jJ
On Sun, 1 Nov 2020 at 11:29, Nick Coghlan ncoghlan@gmail.com wrote:
On Sat., 31 Oct. 2020, 9:29 pm Steven D'Aprano, steve@pearwood.info wrote:
(4) Patterns which could conceivably be interpreted as assignment targets default to capture patterns, because that's what is normally wanted in pattern matching:
case [1, spam, eggs]:
# captures spam and eggs
If you don't want to capture a named value, but just match on it,
override it with an explicit ==
or is
:
case [1, ==spam, eggs]:
# matches `spam` by equality, captures on eggs
As noted above, the current PEP 634 spec doesn't allow this, but if it did, then I agree it would adress most of the concerns that prompted me to write PEP 642.
If the 634 PEP authors are amenable, I'd be happy to prepare a PR against the PEP that made this change so you could see what it would look like at the grammar level.
Since Guido has indicated he's still dubious about the value of offering an explicit prefix marker syntax at all, I'm instead going to agree with most of Steven's counter proposal and adopt it as the next iteration of PEP 642 (conceding the point on "_", using "==" and "is" as the prefix markers, and keeping the syntactic sugar that lets you omit the "==" prefix for comparison against literals and attributes).
For the literal comparisons where equality isn't the right default,
I'm still proposing leaving out the special casing, but I'm switching
to proposing that we just not consider them valid literals for pattern
matching purposes in the initial iteration of the design (so is None
would be allowed as an identity constraint, but a bare None
would
be rejected as ambiguous, at least for now. I'd be more prepared to
concede the "But is it really ambiguous?" case for None
and
...
than I would for True
and False
, though).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano steve@pearwood.info wrote:
>
I really don't get why so many people are hung up
over this minuscule
issue of giving _
special meaning inside match statements. IMO,
consistency with other languages' pattern matching is more useful than
the ability to capture using _
as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but only in cases in match statements, not in iterable unpacking. It already isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable.
Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context. It makes all uses of '_' harder to parse, and it makes it easier to mistake one situation for another. Perhaps not terribly easy, but since there is _no_ confusion now, it's by definition easier. The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things.
-- Thomas Wouters thomas@python.org
Hi! I'm an email virus! Think twice before sending your email to help me spread!
On 11/2/2020 9:31 AM, Thomas Wouters wrote: > >
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano <steve@pearwood.info
mailto:steve@pearwood.info> wrote:
I really don't get why so many people are hung up over this minuscule
issue of giving `_` special meaning inside match statements. IMO,
consistency with other languages' pattern matching is more useful
than
the ability to capture using `_` as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but only in cases in match statements, not in iterable unpacking. It already isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable.
Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context. It makes all uses of '_' harder to parse, and it makes it easier to mistake one situation for another. Perhaps not terribly easy, but since there is _no_ confusion now, it's by definition easier. The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things.
All good points.
What I don't understand is why '_' is treated any differently than any named capture pattern. It seems to me that using:
case x: # a capture_pattern
is the same as:
case _: # the wildcard_pattern
They both always match (I'm ignoring the binding thing here, it's coming up). I realize PEP 635 gives the rational for separating this so that it can enforce that "case x, x:" can be made invalid, likening it to duplicate function parameters. The PEP focuses on the differences between that and tuple unpacking. But I think that if the semantics were the same as tuple unpacking (allowed duplicates, and binding to the last one) then the whole "_ as wildcard" arguments would just go away, and "_" would be treated just as it is elsewhere in Python. For me, this would address Thomas' point above and reduce the cognitive load of having a special rule.
But I'm probably missing some other nuance to the whole discussion, which will no doubt now be pointed out to me.
Eric
On Mon, Nov 2, 2020 at 1:14 PM Eric V. Smith eric@trueblade.com wrote:
On 11/2/2020 9:31 AM, Thomas Wouters wrote:
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano steve@pearwood.info wrote:
>
I really don't get why so many people are hung
up over this minuscule
issue of giving _
special meaning inside match statements. IMO,
consistency with other languages' pattern matching is more useful than
the ability to capture using _
as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but only in cases in match statements, not in iterable unpacking. It already isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable.
Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context. It makes all uses of '_' harder to parse, and it makes it easier to mistake one situation for another. Perhaps not terribly easy, but since there is _no_ confusion now, it's by definition easier. The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things.
All good points.
What I don't understand is why '_' is treated any differently than any named capture pattern. It seems to me that using:
case x: # a capture_pattern
is the same as:
case _: # the wildcard_pattern
They both always match (I'm ignoring the binding thing here, it's coming up). I realize PEP 635 gives the rational for separating this so that it can enforce that "case x, x:" can be made invalid, likening it to duplicate function parameters. The PEP focuses on the differences between that and tuple unpacking. But I think that if the semantics were the same as tuple unpacking (allowed duplicates, and binding to the last one) then the whole "_ as wildcard" arguments would just go away, and "_" would be treated just as it is elsewhere in Python. For me, this would address Thomas' point above and reduce the cognitive load of having a special rule.
But I'm probably missing some other nuance to the whole discussion, which will no doubt now be pointed out to me.
Eric
That's not an unreasonable characterization. But we feel that case x,
x
can easily be misunderstood as "a tuple of two equal values" and we want to
be able to call that out as an error. Hence the need for recognizing the
wildcard in the parser, since case x, _, _
is important. Hence the
need
to standardize it (i.e., not leave it to be just a convention). Using _
seems the most commonly used convention for "throwaway" target (although we
know some organizations have different conventions), and it matches the
wildcard notation in most other languages, which looks like a win-win to
me. Finally, not assigning a value to _ is kind of important in the context
of i18n, where _("string") is the common convention for tagging
translatable strings.
-- --Guido van Rossum (python.org/~guido) Pronouns: he/him **(why is my pronoun here?) http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
On 11/2/20 1:52 PM, Glenn Linderman wrote:
On 11/2/2020 1:42 PM, Guido van Rossum wrote:
But we feel that case x, x
can
easily be misunderstood as "a tuple of two equal values"
So what _is_ the syntax for "a tuple of two equal values" ?
case x, ?x: # comes to mind (not that it is in the PEP :))
Using a guard statement:
case x, y if x == y
I believe supporting
case x, x # look ma! no guard!
is a possible future enhancement.
-- ~Ethan~
On 3/11/20 11:01 am, Ethan Furman wrote:
I believe supporting
case x, x # look ma! no guard!
is a possible future enhancement.
In which case there will be a need for some kind of true "don't care" placeholder. If it's not "_" then it will have to be something else like "?". And we need to decide about it now, because once people start using "_" as a wildcard in patterns, it will be too late to go back.
-- Greg
On Tue, 3 Nov 2020, Greg Ewing wrote:
On 3/11/20 11:01 am, Ethan Furman wrote:
I believe supporting
case x, x # look ma! no guard!
is a possible future enhancement.
In which case there will be a need for some kind of true "don't care" placeholder. If it's not "_" then it will have to be something else like "?". And we need to decide about it now, because once people start using "_" as a wildcard in patterns, it will be too late to go back.
But will it, really ? It seems to me, that if we leave the "_" magic out, and leave "case x, x" to the linters, that leaves a clear path forward for whatever can be decided whenever it can be decided.
/Paul
On 4/11/20 4:36 am, Paul Svensson wrote:
On Tue, 3 Nov 2020, Greg Ewing wrote:
once people start using "_" as a wildcard in patterns, it will be too late to go back.
But will it, really ? It seems to me, that if we leave the "_" magic out, and leave "case x, x" to the linters, that leaves a clear path forward for whatever can be decided whenever it can be decided.
If "_" is a non-binding wildcard, linters will have to allow "case _, _" otherwise it might as well not be there. And then if it is later changed to be binding, "case _, _" will either become invalid or start forcing the two occurrences to be equal, depending on which change is made, thus breaking existing code.
The only way I can see to keep our future options open in this area is not to have a wildcard at all, and make people use a different throwaway name for each don't-care position in a pattern.
-- Greg
On Wed, Nov 04, 2020 at 12:15:08PM +1300, Greg Ewing wrote:
If "_" is a non-binding wildcard, linters will have to allow "case _, _" otherwise it might as well not be there. And then if it is later changed to be binding,
Why would we want to do that?
Apart from the backward incompatibility of such a change, why would we
want to make _
binding? There is an effectively unlimited number of
possible capture patterns available to choose from. Just use another
variable.
We aren't going to use _
as a normal capturing pattern regardless of
what the language allows: that would go against idiomatic Python
convention. If we use _
other Pythonistas will snigger at our lack of
clue, our programs will fail code review, and linters will complain
about it. And it will go against the common practice among most current
pattern matching languages.
"case _, _" will either become invalid or start forcing the two occurrences to be equal, depending on which change is made, thus breaking existing code.
Right. We will have no good reason to remove the non-binding wildcard pattern, and very good reason to not break people's code by removing it. So why are we discussing this?
The only way I can see to keep our future options open in this area is not to have a wildcard at all,
Why would we want to "keep our options open" here? What benefit do we have for going against half a century of pattern matching theory and practice and common usage in other languages?
There is a lot of prior art here, probably a dozen or more languages: Haskell, Rust, Nemerle, Erlang, Ocaml, Prolog, F#, Elixer, Mathematica, etc. I haven't done a full survey of the prior art, but I doubt that I have even scratched the surface here. I'm sure there are many others, depending on how widely you want to define pattern matching.
Coconut already uses _
as the wildcard:
https://coconut.readthedocs.io/en/master/DOCS.html#match
Were they wrong to do so? Does the Coconut community -- to say nothing of Haskell, Rust etc -- wish that they had kept their options open?
and make people use a different throwaway name for each don't-care position in a pattern.
That would be:
(1) Annoying and frustrating.
(2) Misleading: using a capture pattern means you care about the value you are capturing. Using a capture pattern to bind a value you don't care about is obfuscates the code.
(3) Inefficient: that would mean things you don't care about will be captured as real, potentially long-lived, name bindings. Bindings aren't free.
While it is true that we don't normally care too much about wasting the odd name binding here and there, neither do we go out of our way to intentionally be wasteful by unnecessarily capturing values we don't care about:
who_cares1 = my_list.sort()
who_cares2 = print(my_list)
still_don't_care = values.reverse()
honestly_i_don't_care_what_this_returns = settings.update(config)
especially not chosing a different name each time.
-- Steve
On 6/11/20 4:54 am, Steven D'Aprano wrote:
On Wed, Nov 04, 2020 at 12:15:08PM +1300, Greg Ewing wrote:
If "_" is a non-binding wildcard, linters will have to allow "case _, _" otherwise it might as well not be there. And then if it is later changed to be binding,
Why would we want to do that?
I'm not suggesting we should. I was replying to a post proposing to not treat "_" specially, and pointing out that if we don't make it special now we can't change our mind later.
-- Greg
On 3 Nov 2020, at 16:36, Paul Svensson paul-python@svensson.org wrote:
On Tue, 3 Nov 2020, Greg Ewing wrote:
On 3/11/20 11:01 am, Ethan Furman wrote:
I believe supporting
case x, x # look ma! no guard!
is a possible future enhancement.
In which case there will be a need for some kind of true "don't care" placeholder. If it's not "_" then it will have to be something else like "?". And we need to decide about it now, because once people start using "_" as a wildcard in patterns, it will be too late to go back.
But will it, really ? It seems to me, that if we leave the "_" magic out, and leave "case x, x" to the linters, that leaves a clear path forward for whatever can be decided whenever it can be decided.
Leaving this to linters makes it harder to change the behaviour of “case x, x” later. Also: not everyone uses a linter.
The particular example of “case x, x” also seems to be a bit of a red herring because that scenario is close to regular tuple unpacking. If I read the PEP correctly binding the same name multiple times is also forbidden in more complex scenario’s where multiple binding is not so easily recognised, such as "case Rect(Point(x, y), Size(x, w))”.
Ronald
—
Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
On Tue., 3 Nov. 2020, 8:07 am Ethan Furman, ethan@stoneleaf.us wrote:
On 11/2/20 1:52 PM, Glenn Linderman wrote:
On 11/2/2020 1:42 PM, Guido van Rossum wrote:
But we feel that case x, x
can
easily be misunderstood as "a tuple of
two equal values"
So what _is_ the syntax for "a tuple of two equal values" ?
case x, ?x: # comes to mind (not that it is in the PEP :))
Using a guard statement:
case x, y if x == y
This example made me realise that I need to add test cases for "case x, ==x:" and "case x, is x:" to PEP 642's reference implementation (and text to the PEP pointing out that explicit constraints can help address the pattern back-reference problem).
Cheers, Nick.
> > > >
On 11/2/20 2:01 PM, Brandt Bucher wrote:
Glenn Linderman wrote:
So what _is_ the syntax for "a tuple of two equal values" ?
If you’re asking about PEP 634:
case x, y if x == y:
Which is much clearer, in my opinion.
Yeah, I've come 'round to this opinion as well.
Let's get basic pattern matching in (by which I mean PEPs 634-636) and we can add bells and whistles later if there is need/demand for it.
-- ~Ethan~
On Tue, Nov 3, 2020 at 8:53 AM Glenn Linderman v+python@g.nevcal.com wrote: >
On 11/2/2020 1:42 PM, Guido van Rossum wrote:
But we feel that case x, x
can
easily be misunderstood as "a tuple
of two equal values"
So what _is_ the syntax for "a tuple of two equal values" ?
case x, ?x: # comes to mind (not that it is in the PEP :))
case x, y if x == y:
If it gets a lot of demand, a dedicated syntax can be added in the future without breaking anything.
ChrisA
On Mon, Nov 02, 2020 at 03:31:44PM +0100, Thomas Wouters wrote:
On Sat, Oct 31, 2020 at 12:25 PM Steven D'Aprano steve@pearwood.info wrote:
>
I really don't get why so many people are hung
up over this minuscule
issue of giving _
special meaning inside match statements. IMO,
consistency with other languages' pattern matching is more useful than
the ability to capture using _
as a variable name.
Allow me to explain, then: structured pattern matching is (even by admission of PEPs 634-363) an extension of iterable unpacking. The use of '_' as a wildcard pattern is a sharp break in that extension. In the structured pattern matching proposal, '_' is special syntax (and not in any way less so than '?') but only in cases in match statements, not in iterable unpacking. It already isn't consistent with '_' in other languages, and we can't fix that without breaking uses of _ for gettext, not to mention other situations existing code uses '_' as something other than an assign-only variable.
Right. This is a small inconsistency in the meaning of _
between
match
statements and other statements:
In a case
statement (but not the block following the case line?),
_
is a soft keyword with special meaning as a wildcard match.
Elsewhere, _
is an ordinary name but special by convention.
We've had soft keywords before, like as
, async
and
await
, and the
world didn't end. The intention is to have them again in the future:
https://docs.python.org/3/library/keyword.html#keyword.issoftkeyword
Is it your intention to argue against all soft keywords, or just this one?
Using '_' in structured pattern matching means any use of '_' becomes an extra burden -- you have to know whether it's a name or not based on the surrounding context.
We've had this burden ever since Python introduced strings:
x = a + _ # It's a name.
x = a + '_' # It's a string.
And f-strings have added to that burden:
x = a + f'_{_}' # It's both a string and a name!
I don't think this is a heavy burden, and I don't fear this will be a heavy burden either:
case _: # It's a wildcard pattern.
if _: # It's a name.
If I can cope with strings, with our without an f-prefix, I can cope with underscore being context-dependent.
I agree that your statement is objectively true:
The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same.
but your argument still doesn't convince me. Using ?
would solve
that
problem, but I don't think that's a problem that needs solving, and
furthermore it would introduce other problems in its place:
?
as a wildcard token is ugly (that's a personal, subjective
judgement);
it's confusable with it's use in regexes (things that are different should not look the same);
and it clashes with the wildcard used in most(?) other languages with pattern matching.
I have not done a full review, but I believe that _
is a wildcard
pattern in Clojure, Kotlin, Haskell, Scala, Ocaml, F# and Rust, among
others.
We have no obligation to make Python look like other languages, but by the same token we need not be different just for the sake of being different. There's value in picking the same syntax, or at least similar syntax, as other languages.
I expect to spend a long time learning how to read pattern matches
before I am as fluent with them as I am with other Python code, but the
wildcard pattern is probably one of the simplest parts to grasp. And the
beauty is that I can look at (say) Haskell pattern matching code, and
even if I can recognise nothing else, I can recognise the underscore and
the |
used for alternatives, and that gives me a toe-hold to start
deciphering what I am reading.
So while I acknowledge the issues you mention, I just don't think they are important. To me, the benefit of using underscore outweighs the negatives.
-- Steve
I think using symbols like ? and == in patterns looks stylistically ugly, and unintuitive, albeit more explicit.
I, too, would rather have pattern matching more explicit, but it shouldn't need to be so ugly (yes, I know, ugly is a subjective term, so be it).
I would propose that, opposite to what this PEP 642 proposes, matching as literal terms should be the default, and a special notation should be used for binding to names.
match number: case 0: print("Nothing") case 1: print("Just one")
This would be equivalent:
zero = 0 one = 1 match number: case zero: print("Nothing") case one: print("Just one")
And I would propose to use "as" for the notation of binding to a variable, possibly in combination with "_" for the wildcard pattern:
expected_value = "xxx" match some_json: case {"foo": expected_value}: # matches {"foo": "xxx"} pass case {"foo": _ as bar}: # matches any {"foo": <anything>} print(f"json got foo value {bar}")
Yes, I understand that being forced to use "_ as name" in a lot of patterns is more verbose, but I posit that it is both explicit _and_ intuitive. And perhaps not as ugly as ? and ==.
In my mind, I don't see that this "as" usage causes any confusion with the "as" in context managers. That is a cop-out. I see this "as" as more akin to the exception handling:
try: ... except RuntimeError as error: ...
See? No context manager protocol involved here. "as" is simply representing a name binding.
On Sat, 31 Oct 2020 at 07:17, Nick Coghlan ncoghlan@gmail.com wrote:
Hi folks,
This is a mailing list repost of the Discourse thread at
https://discuss.python.org/t/pep-642-constraint-pattern-syntax-for-structura...
The rendered version of the PEP can be found here: https://www.python.org/dev/peps/pep-0642/
The full text is also quoted in the Discourse thread.
The remainder of this email is the same introduction that I posted on Discourse.
I’m largely a fan of the Structural Pattern Matching proposal in PEP 634, but there’s one specific piece of the syntax proposal that I strongly dislike: the idea of basing the distinction between capture patterns and value patterns purely on whether they use a simple name or a dotted name.
Thus PEP 642, which retains most of PEP 634 unchanged, but adjusts
value checks to use an explicit prefix syntax (either ?EXPR
for
equality constraints, or ?is EXPR
for identity constraints), rather
than relying on users learning that literals and attribute lookups in
a capture pattern mean a value lookup check, while simple names mean a
capture pattern (unlike both normal expressions, where all three mean
a value lookup, and assignment targets, where both simple and dotted
names bind a new reference).
The PEP itself has a lot of words explaining why I’ve made the design decisions I have, as well as the immediate and potential future benefits offered by using an explicit prefix syntax for value constraints, but the super short form goes like this:
?
symbol becoming part of Python’s
syntax doesn’t appeal to you, then you may consider any improved
clarity of intent that PEP 642 might offer to not be worth that costCheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WT3ZZ42X... Code of Conduct: http://python.org/psf/codeofconduct/
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert
Is there a bestiary of examples for the current pattern matching proposal(s)?
It seems I don't have a good handle on how one matches simple tests like callability, function signatures, possession of specific attribute(s).....etc.
Robin Becker
On Wed, Nov 18, 2020 at 1:25 AM Robin Becker robin@reportlab.com wrote:
Is there a bestiary of examples for the current pattern matching proposal(s)?
It seems I don't have a good handle on how one matches simple tests like callability,
Doable using protocols.
function signatures,
I don't think that's directly doable, but there might be some way to bend it to protocols.
possession of specific attribute(s).....etc.
Protocols.
>
Also will matching ever extend into the Typing universe?
In what way do you have in mind? With protocol support baked into PEP 634 that already ties into type hints.
-Brett
-- Robin Becker
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OKBOTKDQ... Code of Conduct: http://python.org/psf/codeofconduct/
(For people who fail to find any mention of protocols in PEP 634, Protocols (PEP 544) can be used (with the @runtime decorator) to override isinstance(), and class patterns are defined to use isinstance() for the class check.)
On Wed, Nov 18, 2020 at 11:50 AM Brett Cannon brett@python.org wrote:
> >
On Wed, Nov 18, 2020 at 1:25 AM Robin Becker robin@reportlab.com wrote:
Is there a bestiary of examples for the current pattern matching proposal(s)?
It seems I don't have a good handle on how one matches simple tests like callability,
Doable using protocols.
function signatures,
I don't think that's directly doable, but there might be some way to bend it to protocols.
possession of specific attribute(s).....etc.
Protocols.
>
Also will matching ever extend into the Typing universe?
In what way do you have in mind? With protocol support baked into PEP 634 that already ties into type hints.
-Brett
-- Robin Becker
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OKBOTKDQ... Code of Conduct: http://python.org/psf/codeofconduct/
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YMLP5QDX... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) Pronouns: he/him **(why is my pronoun here?) http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
..........
>
Also will matching ever extend into the Typing universe?
In what way do you have in mind? With protocol support baked into PEP 634 that already ties into type hints.
-Brett .........
thanks for the answers; the only one missing is whether there is an actual bestiary of examples, but I guess the PEPs
Robin Becker