Thank you for the well-written PEP, although I don't agree with it. My response below is quite long. Here is my opinionated TL;DR:
(1) Just get over the use of `_` for the wildcard pattern. another identifier. Now that the parser will support soft keywords, we should expect more cases that something that is an identifier is one context will be a keyword in another.
(2) The most common uses of patterns should not require sigils.
(3) None is special, and we should insist on `is` comparisons by default. True and False are a little more problematic.
(4) Using sigils to over-ride the default is okay. That includes turning what would otherwise be a capture pattern into a comparison.
On Sat, Oct 31, 2020 at 05:16:59PM +1000, Nick Coghlan wrote:
The rendered version of the PEP can be found here: https://www.python.org/dev/peps/pep-0642/
Quoting from the PEP:
"Wildcard patterns change their syntactic marker from _ to ?"
Yuck. Sorry, I find `?` in that role very aesthetically and visually unappealing :-(
I really don't get why so many people are hung up over this minuscule issue of giving `_` special meaning inside match statements. IMO, consistency with other languages' pattern matching is more useful than the ability to capture using `_` as a variable name.
Now that the PEG parser makes it easy to have soft keywords, there will probably be more cases in the future where something that is syntactically an identifier is a regular name in one context and special syntax in another. This has happened before (e.g. "as") and it will happen again.
We have a very strong convention that `_` is used as a write-only "don't care" variable. (The two exceptions are the magic underscore in the REPL, and `_()` in i18n.) In idiomatic Python code, if we bind a value to `_` and then use it later, we are Doing It Wrong.
Is there such a shortage of local variable names that the inability to misuse `_` is a problem in practice? Just use another identifier.
But if we really *must* break that convention and bind to `_`, we can still do it inside a match statement:
case a: _ = a print(_)
The fact that you have to use a temporary variable to break the rules is, in my opinion, a good thing -- it reminds you that what you are doing is weird.
Quoting code from the PEP:
``` # Literal patterns match number: case ?0: print("Nothing") case ?1: print("Just one") ```
I think this is an example of what Larry Wall talked about when he discussed the mistakes of Perl's original regex syntax:
"Poor Huffman coding"
Wall regrets that many common patterns are longer and harder to write than rarer patterns.
Why do we need a `?` sigil to match a literal? `case 1` cannot possibly be interpreted as a capture pattern. It would be wrong to compare it with `is`. What else could it mean other than equality comparison? The question mark is pure noise.
So here's a counter suggestion:
(1) Literals still match by equality, because that is what want 99% of the time. No sigil required.
You mention this in the "Rejected ideas" section, but I reject your rejection :-)
The PEP rejects this because:
"they have the same syntax sensitivity problem as value patterns do, where attempting to move the literal pattern out to a local variable for naming clarity would turn the value checking literal pattern into a name binding capture pattern"
but that's based on a really simple-minded refactoring. Sure, the naive user who knows little about pattern matching might try to refactor like this:
# Before. match record: case (42, x): ...
# After. ANSWER_TO_LIFE = 42 match record: # It's a Trap! case (ANSWER_TO_LIFE, x): ...
and I am sympathetic to your desire to avoid that.
But this is the sort of error that:
- only applies in a comparatively unusual circumstances (naively refactoring a literal in a case statement);
- is easily avoided by automated refactoring tools;
- linters will warn about (assignment to a CONSTANT);
- is easily spotted if you have unit tests;
- is obvious to those with more experience in pattern matching.
So I don't see this is as a large problem. I expect few people will be bitten by this more than once, if that. I think that your preventative solution, forcing all literal patterns to require a sigil, is worse than the problem it is solving.
Bottom line: let's not hamstring pattern matching with poor Hoffman coding right from day one.
(2) While literals usually compare by equality, the exception is three special keywords, and one symbol, that compare by identity:
case None | True | False | ... : # Compares by identity.
I can't think of any other literal where identity tests would be useful and guaranteed by the language (no relying on implementation-specific details, such as small int caching or string interning).
So these keywords (plus the ... symbol) match by identity by default, because that's what we want 99% of the time. (Although, see below for discussion about the two bools.)
Other special values, like NotImplemented and Ellipsis, aren't keywords, they are just names, and don't get special treatment.
(3) Overriding the default comparison with an explicit sigil is allowed:
case ==True: print("True, or 1, or 1.0, or 1+0j, etc")
case ==None: print("None, or something weird that equals None")
case is 1943.63: print("if you see this, the interpreter is caching floats")
I don't think that there will be any ambiguity between the unary "==" pattern modifier and the real `==` operator. But if I am wrong, then we can change the spelling:
case ?None: print("None, or something weird that equals None")
case ?is 1943.63: print("if you see this, the interpreter is caching floats")
(I don't love the question mark here, but I don't hate it either.)
The important thing here is that the cases with no sigil are the common operations; the sigil is only needed for the uncommon case.
(4) Patterns which could conceivably be interpreted as assignment targets default to capture patterns, because that's what is normally wanted in pattern matching:
case [1, spam, eggs]: # captures spam and eggs
If you don't want to capture a named value, but just match on it, override it with an explicit `==` or `is`:
case [1, ==spam, eggs]: # matches `spam` by equality, captures on eggs
Quoting the PEP:
"nobody litters their if-elif chains with x is True or x is False expressions, they write x and not x, both of which compare by value, not identity."
That's incorrect. `if x` doesn't *compare* at all, not by value and not with equality, it duck-types truthiness:
... def __bool__(self): ... return True ... def __eq__(self, other): ... return False ...
x = Demo() x == True
if x: print("truthy")
... truthy ```
There's a reasonable argument to make that (unless overridden by an explicit sigil) the `True` and `False` patterns should match by truthiness, not equality or identity, but I'm not going to make that argument.
"Indeed, PEP 8 explicitly disallows the use if x is True"
This is true, but I think you have to understand the intention there. I believe the intent is that APIs should not insist on *exactly* the True or False singletons for boolean flags, but instead accept any truthy or falsey objects. (Duck typing for the win.)
But if you need to distinguish *exactly* True from an arbitrary truthy value like "spam and eggs" or 93.78, then identity, not equality, is the correct way to do it.