On 7/8/20 8:02 AM, Guido van Rossum wrote:
Regarding the syntax for wildcards and OR patterns, the PEP explains
why `_` and `|` are the best choices here: no other language surveyed
uses anything but `_` for wildcards, and the vast majority uses `|`
for OR patterns.  A similar argument applies to class patterns.


In that case, I'd like to make a specific pitch for "don't make '_' special".  (I'm going to spell it '_' as it seems to be easier to read this way; ignore the quotes.)


IIUC '_' is special in two ways:

1) we permit it to be used more than once in a single pattern, and
2) if it matches, it isn't bound.

If we forego these two exceptions, '_' can go back to behaving like any other identifier.  It becomes an idiom rather than a special case.


Drilling down on what we'd need to change:

To address 1), allow using a name multiple times in a single pattern.

622 v2 already says:

For the moment, we decided to make repeated use of names within the same pattern an error; we can always relax this restriction later without affecting backwards compatibility.

If we relax it now, then we don't need '_' to be special in this way.  All in all this part seems surprisingly uncontentious.


To address 2), bind '_' when it's used as a name in a pattern.

This adds an extra reference and an extra store.  That by itself seems harmless.

The existing implementation has optimizations here.  If that's important, we could achieve the same result with a little dataflow analysis to optimize away the dead store.  We could even special-case optimizing away dead stores only to '_' and only in match/case statements and all would be forgiven.

Folks point out that I18N code frequently uses a global function named '_'.  The collision of these two uses is unfortunate, but I think it's survivable.  I certainly don't think this collision means we should special-case this one identifier in this one context in the language specification.

Consider:


One consideration: if you do use '_' multiple times in a single pattern, and you do refer to its value afterwards, what value should it get?  Consider that Python already permits multiple assignments in a single expression:

(x:="first", x:="middle", x:="last")

After this expression is evaluated, x has been bound to the value "last".  I could live with "it keeps the rightmost".  I could also live with "the result is implementation-defined".  I suspect it doesn't matter much, because the point of the idiom is that people don't care about the value.


In keeping with this change, I additionally propose removing '*_' as a special token.  '*_' would behave like any other '*identifier', binding the value to the unpacked sequence.  Alternately, we could keep the special token but change it to '*' so it mirrors Python function declaration syntax.  I don't have a strong opinion about this second alternative.


Cheers,


/arry