
On 7/8/20 8:02 AM, Guido van Rossum wrote:
Regarding the syntax for wildcards and OR patterns, the PEP explains why `_` and `|` are the best choices here: no other language surveyed uses anything but `_` for wildcards, and the vast majority uses `|` for OR patterns. A similar argument applies to class patterns.
In that case, I'd like to make a specific pitch for "don't make '_' special". (I'm going to spell it '_' as it seems to be easier to read this way; ignore the quotes.) IIUC '_' is special in two ways: 1) we permit it to be used more than once in a single pattern, and 2) if it matches, it isn't bound. If we forego these two exceptions, '_' can go back to behaving like any other identifier. It becomes an idiom rather than a special case. Drilling down on what we'd need to change: To address 1), allow using a name multiple times in a single pattern. 622 v2 already says: For the moment, we decided to make repeated use of names within the same pattern an error; we can always relax this restriction later without affecting backwards compatibility. If we relax it now, then we don't need '_' to be special in this way. All in all this part seems surprisingly uncontentious. To address 2), bind '_' when it's used as a name in a pattern. This adds an extra reference and an extra store. That by itself seems harmless. The existing implementation has optimizations here. If that's important, we could achieve the same result with a little dataflow analysis to optimize away the dead store. We could even special-case optimizing away dead stores /only/ to '_' and /only/ in match/case statements and all would be forgiven. Folks point out that I18N code frequently uses a global function named '_'. The collision of these two uses is unfortunate, but I think it's survivable. I certainly don't think this collision means we should special-case this one identifier in this one context in the /language/ specification. Consider: * There's no installed base of I18N code using pattern matching, because it's a new (proposed!) syntax. Therefore, any I18N code that wants to use match/case statements will be new code, and so can be written with this (admittedly likely!) collision in mind. I18N code could address this in several ways, for example: o Mandate use of an alternate name for "don't care" match patterns in I18N code, perhaps '__' (two underscores). This approach seems best. o Use a different name for the '_' function in scopes where you're using match/case, e.g. 'gettext'. o Since most Python code lives inside functions, I18N code could use '_' in its match/case statements, then "del _" after the match statement. '_' would revert back to finding the global function. (This wouldn't work for code at module scope for obvious reasons. One /could/ simply rebind '_', but I doubt people want to consider this approach in the first place.) * As the PEP mentions, '_' is already a Python idiom for "I don't care about this value", e.g. "basename, _, extension = filename.partition('.')". I18N has already survived contact with this idiom. * Similarly, '_' has a special meaning in the Python REPL. Admittedly, folks don't use a lot of I18N work in the REPL, so this isn't a problem in practice. I'm just re-making the previous point: I18N programmers already cope with other idiomatic uses of '_'. * Static code analyzers could detect if users run afoul of this collision. "Warning: match/case using _ in module using _ for gettext" etc. One consideration: if you /do/ use '_' multiple times in a single pattern, and you /do/ refer to its value afterwards, what value should it get? Consider that Python already permits multiple assignments in a single expression: (x:="first", x:="middle", x:="last") After this expression is evaluated, x has been bound to the value "last". I could live with "it keeps the rightmost". I could also live with "the result is implementation-defined". I suspect it doesn't matter much, because the point of the idiom is that people don't care about the value. In keeping with this change, I additionally propose removing '*_' as a special token. '*_' would behave like any other '*identifier', binding the value to the unpacked sequence. Alternately, we could keep the special token but change it to '*' so it mirrors Python function declaration syntax. I don't have a strong opinion about this second alternative. Cheers, //arry/