PEP 622: Structural Pattern Matching -- followup
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Everyone, If you've commented and you're worried you haven't been heard, please add your issue *concisely* to this new thread. Note that the following issues are already open and will be responded to separately; please don't bother commenting on these until we've done so: - Alternative spellings for '|' - Whether to add an 'else' clause (and how to indent it) - A different token for wildcards instead of '_' - What to do about the footgun of 'case foo' vs. 'case .foo' (Note that the last two could be combined, e.g. '?foo' or 'foo?' to mark a variable binding and '?' for a wildcard.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On Wed, 24 Jun 2020 at 16:40, Guido van Rossum <guido@python.org> wrote:
I'd like also to see considerations about the issue of an alternative spelling that would not resemble a class instantiation, brought first by Antoine Pitrou: ``` case Point with (x, y): print(f"Got a point with x={x}, y={y}") ``` And somewhere on the other thread, someone pointed the possibility of all assignments in a case be well delimited, even with angle parentheses - (yes, that addresses the "foot gun" again, but it is a step beyond dot or not dot in instant-readability: ``` case Point with (<x>, <y>): print(f"Got a point with x={x}, y={y}") ``` (AFAIC, the "dot" thing falls in the category of speckles on Tim's monitor) --
data:image/s3,"s3://crabby-images/e94e5/e94e50138bdcb6ec7711217f439489133d1c0273" alt=""
I actually like that it looks like instantiation; it seems to be saying "Do we have the sort of object we would get from this instantiation?" Unfortunately, this does aggravate the confusion over whether a variable is being used as a filter, vs binding to something from the matched object.
data:image/s3,"s3://crabby-images/8347a/8347a5cfc282b375dd02fd7cb4705ecaa5720d1d" alt=""
On Sun, Jun 28, 2020 at 8:44 AM Jim J. Jewett <jimjjewett@gmail.com> wrote:
The constructor-like syntax for class patterns is the part I like least about this proposal. It seems to expect that there is a one-to-one correspondence between constructor arguments and instance attributes. While that might be common, especially for DataClass-like types, it's certainly not always the case. Some attributes might be computed from multiple arguments (or looked up elsewhere), and some arguments may never be saved in their original form. I fear it will be extremely confusing if an attribute being matched by a class pattern doesn't correspond at all to an argument in a valid constructor call. For example, this class would make things very confusing: class Foo: def __init__(self, a, b): self.c = a + b You could match an instance of the class with `case Foo(c=x)` and it would work, but that might come as a surprise to anyone familiar with the class constructor's argument names. Even when attributes and constructor arguments do line up, the class pattern syntax also seems a bit awkward when you are not required to match against all of the non-optional constructor arguments. I imagine `case datetime.datetime(year=2020):` would be a valid (and even useful!) class pattern, but you can't construct a datetime instance in that way since the class has three required arguments. To sum up, I feel like using constructor and keyword-argument syntax to access attributes is an abuse of notation. I'd much prefer a new syntax for matching classes and their attributes that was not so likely to be confusing due to imperfect parallels with class construction.
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Wed, 24 Jun 2020 12:38:52 -0700 Guido van Rossum <guido@python.org> wrote:
I don't know if you read it, so I'll reiterate what I said :-) """ Overall, my main concern with this PEP is that the matching semantics and pragmatics are different from everything else in the language. When reading and understanding a match clause, there's a cognitive overhead because suddently `Point(x, 0)` means something entirely different (it doesn't call Point.__new__, it doesn't lookup `x` in the locals or globals...). Obviously, there are cases where this is worthwhile, but still. It may be useful to think about different notations for these new things, rather than re-use the object construction notation. """ Regards Antoine.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, Jun 25, 2020 at 6:53 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
AIUI, the case clauses are far more akin to *assignment targets* than they are to expressions. If you see something like this: [x, y] = foo() then you don't expect it to look up x or y in the current scope, nor to construct a list. Is there any way to make the syntax look more like assignment? Or maybe this won't even matter - people will simply get used to it with a bit of experience, same as "for x, y in stuff" has an assignment target in it. ChrisA
data:image/s3,"s3://crabby-images/70fc8/70fc8fa179a5d021666f98a3904f97f5b1680694" alt=""
What about `case for Point(x, 0):`? It reads very naturally, the presence of "for" hints against Point() being a call to the class, and "for" is an existing keyword that would make no other sense in that position. Examples with other formats such as `case for [x, 0]:` seem to work just as well.
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Without arguing for or against allowing a capture variable, IMO rather than syntax like match <expr> into <var>: it would be far better (and not require a new keyword) to write this as with <var> as match <expr>: Rob Cliffe PS: Or <var> = match <expr> On 24/06/2020 20:38, Guido van Rossum wrote:
data:image/s3,"s3://crabby-images/8aca7/8aca7e22be08ab16930a56176dfa4ee2085cde7b" alt=""
I was talking with a colleague today about the PEP and he raised a couple of question regarding the match protocol and the proxy result. One question is that taking into account that 'case object(x)' is valid for every object, but it does (could do) something different for objects that have a non-None __match_args__ it seems that implementing __match_args__ will break Liskov substitutability as you could not substitute the child in a context where you expect a parent. Even if you don't care about Liskov substitutability seems that introducing a __match_args__ for a class will almost always be backwards incompatible. For example, let's say that 'datetime.date' doesn't have a custom matching defined, so it inherits the default object.__match__, which does: class object: @classmethod def __match__(cls, instance): if isinstance(instance, cls): return instance The PEP notes that:
The above implementation means that by default only match-by-name and a single positional match by value against the proxy will work
Which means that users can do a positional match against the proxy with a name pattern: match input: case datetime.date(dt): print(f"The date {dt.isoformat()}" Imagine that later, someone notices that it would be reasonable to support structural pattern matching for the fields of a 'datetime.date' so that users could do: match birthday: case datetime.date(year) if year == 1970: print("You were born in 1970") But, if 'datetime.date' were updated to implement a non-default __match_args__, allowing individual fields to be pulled out of it like this, then the first block would be valid, correct code before the change, but would raise an ImpossibleMatch after the change because 'dt' is not a field in __match_args__. Is this argument misinterpreting something about the PEP or is missing some important detail? On Wed, 24 Jun 2020 at 20:47, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/9dc20/9dc20afcdbd45240ea2b1726268727683af3f19a" alt=""
Pablo Galindo Salgado wrote:
Well yeah, it's actually a fair bit worse than you describe. Since dt is matched positionally, it wouldn't raise during matching - it would just succeed as before, but instead binding the year attribute (not the whole object) to the name "dt". So it wouldn't fail until later, when your method call raises a TypeError.
data:image/s3,"s3://crabby-images/9dc20/9dc20afcdbd45240ea2b1726268727683af3f19a" alt=""
Ethan Furman wrote:
Ouch. That seems like a pretty serious drawback. Will this issue be resolved?
It's currently being revisited. Realistically, I'd imagine that we either find some straightforward way of opting-in to the current default behavior (allowing one arg to be positionally matched against the proxy), or lose the nice behavior altogether. Obviously the former is preferable, since it's not trivial to reimplement yourself.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 26/06/20 6:21 am, Pablo Galindo Salgado wrote:
I think that would be an incorrect way for matching on datetime to behave. Since datetime has a constructor that takes positional arguments, it should have a __match__ and/or __match_args__ that agrees. This suggests that there will be a burden on many existing types to ensure they implement appropriate matching behaviour, as the default behaviour provided by object will be wrong for them. I'm wondering whether the default "single positional match" behaviour is a bad idea. I.e. the only thing that should work by default is case someclass(): and not case someclass(instance): Classes such as int with constructors that can take a single argument should be required to implement the corresponding match behaviour explicitly. -- Greg
data:image/s3,"s3://crabby-images/b957e/b957eb3ce8f7e4648537689d41c333147b87e8c2" alt=""
On Wed, Jun 24, 2020 at 12:46 PM Guido van Rossum <guido@python.org> wrote:
I don't think combining assignment and wildcard will help. '_' is fine for that. I could get used to '?' as an assignment or a wildcard, but it would always be a double-take if it was both. I've seen languages that use '>foo' to indicate assignment, but I can't for the life of me remember which now, aside from shell redirection. '=foo' might be more obvious. In the end I think it's only important that there's some assignment operator, and we'll all get used to whatever you choose.
data:image/s3,"s3://crabby-images/ad871/ad8719d13f5221b2b2ffc449f52e50c929ccebf0" alt=""
[apologies for the duplicate to Guido, used reply instead of reply to all] To summarize my previous unanswered post, I posted a +1 to the "defaulting to binding vs interpreting NAME as a constant" is a dangerous default. And I submitted a couple of alternate syntactic ways to denote "capture is desired" (the angle brackets, and the capture object). I think both are reasonably readable, and one of them doesn't even add any unusual syntax (not even the "dot prefix") As an elaboration on that, after reading the discussion and trying to not repeat what has previously been said: - I think there's a mismatch between an assumption made by the authors vs what many of us are posting here, which is explicitly stated in " https://www.python.org/dev/peps/pep-0622/#alternatives-for-constant-value-pa..." : Quoting the PEP: «the name patterns are more common in typical code, so having special syntax for common case would be weird». Even if I think a popular use case would be analysing and deconstructing complex nested structures (like ASTs), I think roughly half of the uses will actually be for "the switch statement we never had", where all branches would be constants. I've been writing a lot of code like that these last couple of weeks, so I may be biased (although the PEP authors may have been writing AST visitors this last week and may be biased the other way ;-) ) - As a sub point, I can understand if the PEP authors argue "match is not for that, use if/elif/dicts of functions in that case like you did before and ignore this PEP", but if that's the case, that should be explicit in the PEP. - I am fairly sure (as much as one can be of the future in these things) that with this PEP approved as is, linters will add new rules like "you have more than a top level name pattern, only the first one will match. Perhaps you wanted constant patterns?" and "your pattern captures shadow an existing name" and "a name you bound in a pattern isn't used inside the pattern". These will definitely help, but for me "how many new linter rules will be needed if this language change is introduced" is a good measure of how unelegant it is. - Perhaps arguing against myself, I know that, thanks to my regular usage of linters, I personally won't suffer much from this problem (once they get updated). But I also teach Python to people, and I feel that I'd have to add this to the list of "gotchas to avoid" if the PEP passes as is. Again, I can't write this email without saying that this feature is great, that the effort put in this PEP is really palpable, that I'd love to find the way to get it accepted, and that even if I'm normally conservative upgrading python versions and waiting my environment to support it fully, this will likely be the first time that I upgrade just for a language feature :) On Wed, 24 Jun 2020 at 20:44, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Fri, 26 Jun 2020 at 11:29, Daniel Moisset <dfmoisset@gmail.com> wrote:
I think roughly half of the uses will actually be for "the switch statement we never had", where all branches would be constants. I've been writing a lot of code like that these last couple of weeks, so I may be biased (although the PEP authors may have been writing AST visitors this last week and may be biased the other way ;-) )
As a sub point, I can understand if the PEP authors argue "match is not for that, use if/elif/dicts of functions in that case like you did before and ignore this PEP", but if that's the case, that should be explicit in the PEP.
For me, this prompts the question (which I appreciate is more about implementation than design) - would there be any (significant) performance difference between match var: case 1: print("Got 1") case 2: print("Got 2") case _: print("Got another value") and if var == 1: print("Got 1") elif var == 2: print("Got 2") else: print("Got another value") ? In C, the switch statement was explicitly intended to be faster by means of doing a computed branch. In a higher level language, I can see the added features of match meaning that it's *slower* than a series of if tests for simple cases. But I have no intuition about the performance of this proposal. I'd like to believe that the choice between the 2 alternatives above is purely a matter of preferred style, but I don't know. If match is significantly slower, that could make it a bit of an attractive nuisance. Paul
data:image/s3,"s3://crabby-images/b957e/b957eb3ce8f7e4648537689d41c333147b87e8c2" alt=""
On Fri, Jun 26, 2020 at 3:47 AM Paul Moore <p.f.moore@gmail.com> wrote:
Each case essentially compiles down to an equivalent if structure, already. There's no penalty that I'm seeing. There's actually much more room for optimizing eventually, since the test is bound to a single element instead of any arbitrary if expression, and a Cython or PyPy could work some magic to detect a small set of primitive types and optimize for that.
data:image/s3,"s3://crabby-images/ad871/ad8719d13f5221b2b2ffc449f52e50c929ccebf0" alt=""
This one is new but I think unrelated and unmentioned: Why is the mapping match semantics non-strict about keys? Besides the "asymmetry" with sequence matches, I think a strict match should be useful sometimes (quickly deconstructing JSON data comes to my mind, where I want to know that I didn't get unexpected keys). I cannot get that behaviour with the current pep. But if we make key strictness the default, I can always add **_ to my mapping pattern and make it non strict (that's currently forbidden but the restriction can be lifted). Is there an assumption (or even better, data evidence) that non-strict checks are much much more common? A similar but weaker argument can be made for class patterns (although I can imagine non-strict matches *are* more common in that case). Mostly but not completely unrelated to the above, and purely syntactic sugar bikeshedding, but I think having "..." as an alias for "*_" or "**_" (depending on context, and I'd say it's *both* inside a class pattern) could make these patterns slightly more readable. Best, D. On Wed, 24 Jun 2020 at 20:44, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Thu., 25 Jun. 2020, 5:41 am Guido van Rossum, <guido@python.org> wrote:
I'm not sure if it's a separate point or not, but something I would like to see the PEP take into account is whether or not the destructuring syntax (especially for mappings) could become the basis for a future proposed enhancement to assignment statements that effectively allowed an assignment statement to be a shorthand for: match RHS: case LHS: pass # Just doing a destructuring assignment else: raise ValueError("Could not match RHS to LHS") In "y, x = x, y" the fact the names are being used as both lvalues and rvalues is indicated solely by their appearing on both sides of the assignment statement. This is the strongest existing precedent for all names in case expressions being lvalues by default and having a separate marker for rvalues. However, I believe it's also a reasonably strong argument *against* using "." as that rvalue marker, as in "obj.x, obj.y = x, y" the dotted references remain lvalues, they don't implicitly turn into rvalues. Interestingly though, what those points suggest is that to be forward compatible with a possible extension to assignment statements, the PEP is correct that any syntactic marker would need to be on the *rvalues* that are constraining the match, putting any chosen symbol (e.g. "?") squarely in the wildcard role rather than the "lvalue marker" role. y,? = returns_2_tuple() y,?None = returns_2_tuple() # Exception if 2nd element is not == None y,?sentinel = returns_2_tuple() # Exception if 2nd element is not == sentinel y,*? = returns_iterable() The main mindset shift this approach would require relative to the PEP as currently written is in explicitly treating the case expression syntax as an evolution of the existing lvalue syntax in assignment statements rather than treating it as the introduction of a third independent kind of expression syntax. Cheers, Nick.
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Why not use '=' to distinguish binding from equality testing: case Point(x, =y): # matches a Point() with 2nd parameter equal to y; if it does, binds to x. This would allow a future (or present!) extension to other relative operators: case Point(x, >y): (although the syntax doesn't AFAICS naturally extend to specifying a range, i.e. an upper and lower bound, which might be a desirable thing to do. Perhaps someone can think of a way of doing it). Whether case =42: case 42: would both be allowed would be one issue to be decided. Rob Cliffe
data:image/s3,"s3://crabby-images/474a1/474a1974d48681689f39a093fc22ff397c790bef" alt=""
On 7/7/20 10:08 PM, Rob Cliffe via Python-Dev wrote: likely cause a run time error if it hasn't been bound yet, or at the very least probably fails in a 'safer' way. I think forgetting to add a special mark is a much more likely error than adding a mark by mistake (unless the mark is just havig a dot in the name). -- Richard Damon
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
Since this is very new system, can we have some restriction to allow aggressive optimization than regular Python code? # Class Pattern Example: match val: case Point(0, y): ... case Point(x, 0): ... case Point(x, y): ... * Can VM lookup "Point" only once per executing `match`, instead three times? * Can VM cache the "Point" at first execution, and never lookup in next time? (e.g. function executed many times) # Constant value pattern Example: match val: case Sides.SPAM: ... case Sides.EGGS: ... * Can VM lookup "Sides" only once, instead of two? * Can VM cache the value of "Sides.SPAM" and "Sides.EGGS" for next execution? Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jul 8, 2020 at 6:17 PM Inada Naoki <songofacandy@gmail.com> wrote:
I'd prefer not - that seems very confusing.
Similar, but with the additional consideration that you can create a "pre-baked pattern" by using a dict, so if you're worried about performance, use the slightly uglier notation (assuming that Sides.SPAM and Sides.EGGS are both hashable - and if they're not, the risk of prebaking is way too high).
* Can VM lookup "Point" only once per executing `match`, instead three times? * Can VM lookup "Sides" only once, instead of two?
These two I would be less averse to, but the trouble is that they make the semantics a bit harder to explain. "Dotted names are looked up if not already looked up, otherwise they use the same object from the previous lookup". If you have (say) "case socket.AddressFamily.AF_INET", does it cache "socket", "socket.AddressFamily", or both? ChrisA
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jul 8, 2020 at 8:56 PM Inada Naoki <songofacandy@gmail.com> wrote:
Fair enough. I wouldn't mind that, it seems like a nice optimization that would only harm code that would be extremely confusing to read anyway. But only within one matching - caching beyond that seems more dangerous. ChrisA
data:image/s3,"s3://crabby-images/9dc20/9dc20afcdbd45240ea2b1726268727683af3f19a" alt=""
Inada Naoki wrote:
Since this is very new system, can we have some restriction to allow aggressive optimization than regular Python code?
The authors were just discussing a related question yesterday (more specifically, can the compiler fold `C(<p0>) | C(<p1>)` -> `C(<p0> | <p1>)`). The answer we arrived at is "yes"; in general patterns may take reasonable shortcuts, and should not be expected to follow all the same rules as expressions. This means that users should never count on `__contains__`/`__getitem__`/`__instancecheck__`/`__len__`/`__match_args__` or other attributes being looked up or called more than once with the same arguments, and that name lookups *may* be "frozen", in a sense. We don't feel a need to cater to code that relies on these side-effecty behaviors (or doing even nastier things like changing local/global name bindings); in the eyes of the authors, code like that is buggy. However, these rules only apply as long as we are still in "pattern-land", meaning all of our knowledge about the world is invalidated as soon as we hit a guard or stop matching. In practice, I am currently experimenting with building decision-trees at compile-time. Given a match block of the following form: ``` match <s>: case <p0> | <p1>: ... case <p2> | <p3> if <g0>: ... case <p4> | <p5>: ... ``` It's safe to use the same decision tree for <p0> through <p3>, but it must be rebuilt for <p4> and <p5>, since <g0> could have done literally *anything*.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 9/07/20 3:26 am, Brandt Bucher wrote:
I think you're being overly cautious here. To my mind, the guards should be regarded as part of the pattern matching process, and so people shouldn't be writing code that depends on them having side effects. As a nice consequence of adopting that rule, we would be able to say that these are equivalent: case C(a.b): ... case C(x) if x == a.b: ... -- Greg
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 24/06/2020 20:38, Guido van Rossum wrote:
(Prefatory remarks: I am sure you get a lot of questions to which the answer is basically "Read the PEP". I myself have been guilty in this regard. But I fear this is inevitable when the PEP is so long and there is so much new stuff to absorb. Apologies if this is yet another one.) _First question_: Sometimes no action is needed after a case clause. If the Django example had been written if ( isinstance(value, (list, tuple)) and len(value) > 1 and isinstance(value[-1], (Promise, str)) ): *value, label = value else: label = key.replace('_', ' ').title() the replacement code would/could be match value: case [*value, label := (Promise() | str())] if value: pass case _: label = key.replace('_', ' ').title() AFAICS the PEP does not *explicitly* state that the 'pass' line is necessary (is it?), i.e. that the block following `case` cannot (or can?) be empty. The term `block` is not defined in the PEP, or in https://docs.python.org/3/reference/grammar.html. But an empty block following a line ending in `:` would AFAIK be unprecedented in Python. I think it is worth clarifiying this. _Second question_: in the above example replacement, if `case _:` does not bind to `_`, does that mean that the following line will not work? Is this one of the "two bugs" that Mark Shannon alluded to? (I have read every message in the threads and I don't remember them being spelt out.) And I'm curious what the other one is (is it binding to a variable `v`?). Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/2b9b3/2b9b36bc837072bad690c31154c1853e8d82b536" alt=""
Hi Rob, You are right: the grammar should probably read `suite` rather than `block` (i.e. the `pass` is necessary). Thanks for catching this! As for the second question, I assume there might be a slight oversight on your part. The last line in the example replaces the string `"_"` rather than the variable `_`. The not-binding of `_` thus has no influence on the last line. I think I will leave it for Mark himself to name the two bugs rather than start a guessing game. However, in an earlier version we had left out the `if value` for the first case, accidentally translating the `len(value) > 1` as a `len(value) >= 1` instead. Kind regards, Tobias Quoting Rob Cliffe via Python-Dev <python-dev@python.org>:
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
I think we are storing up trouble unless we 1) Allow arbitrary expressions after `case`, interpreted *as now* 2) Use *different* syntaxes, not legal in expressions, for alternative matching values (i.e. not `|` or `or`) (NB simply stacking with multiple `case` lines is one possibility) templates such as `Point(x, 0)` anything else particular to `match` I am reminded of the special restrictions for decorator syntax, which were eventually removed. On 24/06/2020 20:38, Guido van Rossum wrote:
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On Wed, 24 Jun 2020 at 16:40, Guido van Rossum <guido@python.org> wrote:
I'd like also to see considerations about the issue of an alternative spelling that would not resemble a class instantiation, brought first by Antoine Pitrou: ``` case Point with (x, y): print(f"Got a point with x={x}, y={y}") ``` And somewhere on the other thread, someone pointed the possibility of all assignments in a case be well delimited, even with angle parentheses - (yes, that addresses the "foot gun" again, but it is a step beyond dot or not dot in instant-readability: ``` case Point with (<x>, <y>): print(f"Got a point with x={x}, y={y}") ``` (AFAIC, the "dot" thing falls in the category of speckles on Tim's monitor) --
data:image/s3,"s3://crabby-images/e94e5/e94e50138bdcb6ec7711217f439489133d1c0273" alt=""
I actually like that it looks like instantiation; it seems to be saying "Do we have the sort of object we would get from this instantiation?" Unfortunately, this does aggravate the confusion over whether a variable is being used as a filter, vs binding to something from the matched object.
data:image/s3,"s3://crabby-images/8347a/8347a5cfc282b375dd02fd7cb4705ecaa5720d1d" alt=""
On Sun, Jun 28, 2020 at 8:44 AM Jim J. Jewett <jimjjewett@gmail.com> wrote:
The constructor-like syntax for class patterns is the part I like least about this proposal. It seems to expect that there is a one-to-one correspondence between constructor arguments and instance attributes. While that might be common, especially for DataClass-like types, it's certainly not always the case. Some attributes might be computed from multiple arguments (or looked up elsewhere), and some arguments may never be saved in their original form. I fear it will be extremely confusing if an attribute being matched by a class pattern doesn't correspond at all to an argument in a valid constructor call. For example, this class would make things very confusing: class Foo: def __init__(self, a, b): self.c = a + b You could match an instance of the class with `case Foo(c=x)` and it would work, but that might come as a surprise to anyone familiar with the class constructor's argument names. Even when attributes and constructor arguments do line up, the class pattern syntax also seems a bit awkward when you are not required to match against all of the non-optional constructor arguments. I imagine `case datetime.datetime(year=2020):` would be a valid (and even useful!) class pattern, but you can't construct a datetime instance in that way since the class has three required arguments. To sum up, I feel like using constructor and keyword-argument syntax to access attributes is an abuse of notation. I'd much prefer a new syntax for matching classes and their attributes that was not so likely to be confusing due to imperfect parallels with class construction.
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Wed, 24 Jun 2020 12:38:52 -0700 Guido van Rossum <guido@python.org> wrote:
I don't know if you read it, so I'll reiterate what I said :-) """ Overall, my main concern with this PEP is that the matching semantics and pragmatics are different from everything else in the language. When reading and understanding a match clause, there's a cognitive overhead because suddently `Point(x, 0)` means something entirely different (it doesn't call Point.__new__, it doesn't lookup `x` in the locals or globals...). Obviously, there are cases where this is worthwhile, but still. It may be useful to think about different notations for these new things, rather than re-use the object construction notation. """ Regards Antoine.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, Jun 25, 2020 at 6:53 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
AIUI, the case clauses are far more akin to *assignment targets* than they are to expressions. If you see something like this: [x, y] = foo() then you don't expect it to look up x or y in the current scope, nor to construct a list. Is there any way to make the syntax look more like assignment? Or maybe this won't even matter - people will simply get used to it with a bit of experience, same as "for x, y in stuff" has an assignment target in it. ChrisA
data:image/s3,"s3://crabby-images/70fc8/70fc8fa179a5d021666f98a3904f97f5b1680694" alt=""
What about `case for Point(x, 0):`? It reads very naturally, the presence of "for" hints against Point() being a call to the class, and "for" is an existing keyword that would make no other sense in that position. Examples with other formats such as `case for [x, 0]:` seem to work just as well.
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Without arguing for or against allowing a capture variable, IMO rather than syntax like match <expr> into <var>: it would be far better (and not require a new keyword) to write this as with <var> as match <expr>: Rob Cliffe PS: Or <var> = match <expr> On 24/06/2020 20:38, Guido van Rossum wrote:
data:image/s3,"s3://crabby-images/8aca7/8aca7e22be08ab16930a56176dfa4ee2085cde7b" alt=""
I was talking with a colleague today about the PEP and he raised a couple of question regarding the match protocol and the proxy result. One question is that taking into account that 'case object(x)' is valid for every object, but it does (could do) something different for objects that have a non-None __match_args__ it seems that implementing __match_args__ will break Liskov substitutability as you could not substitute the child in a context where you expect a parent. Even if you don't care about Liskov substitutability seems that introducing a __match_args__ for a class will almost always be backwards incompatible. For example, let's say that 'datetime.date' doesn't have a custom matching defined, so it inherits the default object.__match__, which does: class object: @classmethod def __match__(cls, instance): if isinstance(instance, cls): return instance The PEP notes that:
The above implementation means that by default only match-by-name and a single positional match by value against the proxy will work
Which means that users can do a positional match against the proxy with a name pattern: match input: case datetime.date(dt): print(f"The date {dt.isoformat()}" Imagine that later, someone notices that it would be reasonable to support structural pattern matching for the fields of a 'datetime.date' so that users could do: match birthday: case datetime.date(year) if year == 1970: print("You were born in 1970") But, if 'datetime.date' were updated to implement a non-default __match_args__, allowing individual fields to be pulled out of it like this, then the first block would be valid, correct code before the change, but would raise an ImpossibleMatch after the change because 'dt' is not a field in __match_args__. Is this argument misinterpreting something about the PEP or is missing some important detail? On Wed, 24 Jun 2020 at 20:47, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/9dc20/9dc20afcdbd45240ea2b1726268727683af3f19a" alt=""
Pablo Galindo Salgado wrote:
Well yeah, it's actually a fair bit worse than you describe. Since dt is matched positionally, it wouldn't raise during matching - it would just succeed as before, but instead binding the year attribute (not the whole object) to the name "dt". So it wouldn't fail until later, when your method call raises a TypeError.
data:image/s3,"s3://crabby-images/9dc20/9dc20afcdbd45240ea2b1726268727683af3f19a" alt=""
Ethan Furman wrote:
Ouch. That seems like a pretty serious drawback. Will this issue be resolved?
It's currently being revisited. Realistically, I'd imagine that we either find some straightforward way of opting-in to the current default behavior (allowing one arg to be positionally matched against the proxy), or lose the nice behavior altogether. Obviously the former is preferable, since it's not trivial to reimplement yourself.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 26/06/20 6:21 am, Pablo Galindo Salgado wrote:
I think that would be an incorrect way for matching on datetime to behave. Since datetime has a constructor that takes positional arguments, it should have a __match__ and/or __match_args__ that agrees. This suggests that there will be a burden on many existing types to ensure they implement appropriate matching behaviour, as the default behaviour provided by object will be wrong for them. I'm wondering whether the default "single positional match" behaviour is a bad idea. I.e. the only thing that should work by default is case someclass(): and not case someclass(instance): Classes such as int with constructors that can take a single argument should be required to implement the corresponding match behaviour explicitly. -- Greg
data:image/s3,"s3://crabby-images/b957e/b957eb3ce8f7e4648537689d41c333147b87e8c2" alt=""
On Wed, Jun 24, 2020 at 12:46 PM Guido van Rossum <guido@python.org> wrote:
I don't think combining assignment and wildcard will help. '_' is fine for that. I could get used to '?' as an assignment or a wildcard, but it would always be a double-take if it was both. I've seen languages that use '>foo' to indicate assignment, but I can't for the life of me remember which now, aside from shell redirection. '=foo' might be more obvious. In the end I think it's only important that there's some assignment operator, and we'll all get used to whatever you choose.
data:image/s3,"s3://crabby-images/ad871/ad8719d13f5221b2b2ffc449f52e50c929ccebf0" alt=""
[apologies for the duplicate to Guido, used reply instead of reply to all] To summarize my previous unanswered post, I posted a +1 to the "defaulting to binding vs interpreting NAME as a constant" is a dangerous default. And I submitted a couple of alternate syntactic ways to denote "capture is desired" (the angle brackets, and the capture object). I think both are reasonably readable, and one of them doesn't even add any unusual syntax (not even the "dot prefix") As an elaboration on that, after reading the discussion and trying to not repeat what has previously been said: - I think there's a mismatch between an assumption made by the authors vs what many of us are posting here, which is explicitly stated in " https://www.python.org/dev/peps/pep-0622/#alternatives-for-constant-value-pa..." : Quoting the PEP: «the name patterns are more common in typical code, so having special syntax for common case would be weird». Even if I think a popular use case would be analysing and deconstructing complex nested structures (like ASTs), I think roughly half of the uses will actually be for "the switch statement we never had", where all branches would be constants. I've been writing a lot of code like that these last couple of weeks, so I may be biased (although the PEP authors may have been writing AST visitors this last week and may be biased the other way ;-) ) - As a sub point, I can understand if the PEP authors argue "match is not for that, use if/elif/dicts of functions in that case like you did before and ignore this PEP", but if that's the case, that should be explicit in the PEP. - I am fairly sure (as much as one can be of the future in these things) that with this PEP approved as is, linters will add new rules like "you have more than a top level name pattern, only the first one will match. Perhaps you wanted constant patterns?" and "your pattern captures shadow an existing name" and "a name you bound in a pattern isn't used inside the pattern". These will definitely help, but for me "how many new linter rules will be needed if this language change is introduced" is a good measure of how unelegant it is. - Perhaps arguing against myself, I know that, thanks to my regular usage of linters, I personally won't suffer much from this problem (once they get updated). But I also teach Python to people, and I feel that I'd have to add this to the list of "gotchas to avoid" if the PEP passes as is. Again, I can't write this email without saying that this feature is great, that the effort put in this PEP is really palpable, that I'd love to find the way to get it accepted, and that even if I'm normally conservative upgrading python versions and waiting my environment to support it fully, this will likely be the first time that I upgrade just for a language feature :) On Wed, 24 Jun 2020 at 20:44, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Fri, 26 Jun 2020 at 11:29, Daniel Moisset <dfmoisset@gmail.com> wrote:
I think roughly half of the uses will actually be for "the switch statement we never had", where all branches would be constants. I've been writing a lot of code like that these last couple of weeks, so I may be biased (although the PEP authors may have been writing AST visitors this last week and may be biased the other way ;-) )
As a sub point, I can understand if the PEP authors argue "match is not for that, use if/elif/dicts of functions in that case like you did before and ignore this PEP", but if that's the case, that should be explicit in the PEP.
For me, this prompts the question (which I appreciate is more about implementation than design) - would there be any (significant) performance difference between match var: case 1: print("Got 1") case 2: print("Got 2") case _: print("Got another value") and if var == 1: print("Got 1") elif var == 2: print("Got 2") else: print("Got another value") ? In C, the switch statement was explicitly intended to be faster by means of doing a computed branch. In a higher level language, I can see the added features of match meaning that it's *slower* than a series of if tests for simple cases. But I have no intuition about the performance of this proposal. I'd like to believe that the choice between the 2 alternatives above is purely a matter of preferred style, but I don't know. If match is significantly slower, that could make it a bit of an attractive nuisance. Paul
data:image/s3,"s3://crabby-images/b957e/b957eb3ce8f7e4648537689d41c333147b87e8c2" alt=""
On Fri, Jun 26, 2020 at 3:47 AM Paul Moore <p.f.moore@gmail.com> wrote:
Each case essentially compiles down to an equivalent if structure, already. There's no penalty that I'm seeing. There's actually much more room for optimizing eventually, since the test is bound to a single element instead of any arbitrary if expression, and a Cython or PyPy could work some magic to detect a small set of primitive types and optimize for that.
data:image/s3,"s3://crabby-images/ad871/ad8719d13f5221b2b2ffc449f52e50c929ccebf0" alt=""
This one is new but I think unrelated and unmentioned: Why is the mapping match semantics non-strict about keys? Besides the "asymmetry" with sequence matches, I think a strict match should be useful sometimes (quickly deconstructing JSON data comes to my mind, where I want to know that I didn't get unexpected keys). I cannot get that behaviour with the current pep. But if we make key strictness the default, I can always add **_ to my mapping pattern and make it non strict (that's currently forbidden but the restriction can be lifted). Is there an assumption (or even better, data evidence) that non-strict checks are much much more common? A similar but weaker argument can be made for class patterns (although I can imagine non-strict matches *are* more common in that case). Mostly but not completely unrelated to the above, and purely syntactic sugar bikeshedding, but I think having "..." as an alias for "*_" or "**_" (depending on context, and I'd say it's *both* inside a class pattern) could make these patterns slightly more readable. Best, D. On Wed, 24 Jun 2020 at 20:44, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On Thu., 25 Jun. 2020, 5:41 am Guido van Rossum, <guido@python.org> wrote:
I'm not sure if it's a separate point or not, but something I would like to see the PEP take into account is whether or not the destructuring syntax (especially for mappings) could become the basis for a future proposed enhancement to assignment statements that effectively allowed an assignment statement to be a shorthand for: match RHS: case LHS: pass # Just doing a destructuring assignment else: raise ValueError("Could not match RHS to LHS") In "y, x = x, y" the fact the names are being used as both lvalues and rvalues is indicated solely by their appearing on both sides of the assignment statement. This is the strongest existing precedent for all names in case expressions being lvalues by default and having a separate marker for rvalues. However, I believe it's also a reasonably strong argument *against* using "." as that rvalue marker, as in "obj.x, obj.y = x, y" the dotted references remain lvalues, they don't implicitly turn into rvalues. Interestingly though, what those points suggest is that to be forward compatible with a possible extension to assignment statements, the PEP is correct that any syntactic marker would need to be on the *rvalues* that are constraining the match, putting any chosen symbol (e.g. "?") squarely in the wildcard role rather than the "lvalue marker" role. y,? = returns_2_tuple() y,?None = returns_2_tuple() # Exception if 2nd element is not == None y,?sentinel = returns_2_tuple() # Exception if 2nd element is not == sentinel y,*? = returns_iterable() The main mindset shift this approach would require relative to the PEP as currently written is in explicitly treating the case expression syntax as an evolution of the existing lvalue syntax in assignment statements rather than treating it as the introduction of a third independent kind of expression syntax. Cheers, Nick.
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Why not use '=' to distinguish binding from equality testing: case Point(x, =y): # matches a Point() with 2nd parameter equal to y; if it does, binds to x. This would allow a future (or present!) extension to other relative operators: case Point(x, >y): (although the syntax doesn't AFAICS naturally extend to specifying a range, i.e. an upper and lower bound, which might be a desirable thing to do. Perhaps someone can think of a way of doing it). Whether case =42: case 42: would both be allowed would be one issue to be decided. Rob Cliffe
data:image/s3,"s3://crabby-images/474a1/474a1974d48681689f39a093fc22ff397c790bef" alt=""
On 7/7/20 10:08 PM, Rob Cliffe via Python-Dev wrote: likely cause a run time error if it hasn't been bound yet, or at the very least probably fails in a 'safer' way. I think forgetting to add a special mark is a much more likely error than adding a mark by mistake (unless the mark is just havig a dot in the name). -- Richard Damon
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
Since this is very new system, can we have some restriction to allow aggressive optimization than regular Python code? # Class Pattern Example: match val: case Point(0, y): ... case Point(x, 0): ... case Point(x, y): ... * Can VM lookup "Point" only once per executing `match`, instead three times? * Can VM cache the "Point" at first execution, and never lookup in next time? (e.g. function executed many times) # Constant value pattern Example: match val: case Sides.SPAM: ... case Sides.EGGS: ... * Can VM lookup "Sides" only once, instead of two? * Can VM cache the value of "Sides.SPAM" and "Sides.EGGS" for next execution? Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jul 8, 2020 at 6:17 PM Inada Naoki <songofacandy@gmail.com> wrote:
I'd prefer not - that seems very confusing.
Similar, but with the additional consideration that you can create a "pre-baked pattern" by using a dict, so if you're worried about performance, use the slightly uglier notation (assuming that Sides.SPAM and Sides.EGGS are both hashable - and if they're not, the risk of prebaking is way too high).
* Can VM lookup "Point" only once per executing `match`, instead three times? * Can VM lookup "Sides" only once, instead of two?
These two I would be less averse to, but the trouble is that they make the semantics a bit harder to explain. "Dotted names are looked up if not already looked up, otherwise they use the same object from the previous lookup". If you have (say) "case socket.AddressFamily.AF_INET", does it cache "socket", "socket.AddressFamily", or both? ChrisA
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jul 8, 2020 at 8:56 PM Inada Naoki <songofacandy@gmail.com> wrote:
Fair enough. I wouldn't mind that, it seems like a nice optimization that would only harm code that would be extremely confusing to read anyway. But only within one matching - caching beyond that seems more dangerous. ChrisA
data:image/s3,"s3://crabby-images/9dc20/9dc20afcdbd45240ea2b1726268727683af3f19a" alt=""
Inada Naoki wrote:
Since this is very new system, can we have some restriction to allow aggressive optimization than regular Python code?
The authors were just discussing a related question yesterday (more specifically, can the compiler fold `C(<p0>) | C(<p1>)` -> `C(<p0> | <p1>)`). The answer we arrived at is "yes"; in general patterns may take reasonable shortcuts, and should not be expected to follow all the same rules as expressions. This means that users should never count on `__contains__`/`__getitem__`/`__instancecheck__`/`__len__`/`__match_args__` or other attributes being looked up or called more than once with the same arguments, and that name lookups *may* be "frozen", in a sense. We don't feel a need to cater to code that relies on these side-effecty behaviors (or doing even nastier things like changing local/global name bindings); in the eyes of the authors, code like that is buggy. However, these rules only apply as long as we are still in "pattern-land", meaning all of our knowledge about the world is invalidated as soon as we hit a guard or stop matching. In practice, I am currently experimenting with building decision-trees at compile-time. Given a match block of the following form: ``` match <s>: case <p0> | <p1>: ... case <p2> | <p3> if <g0>: ... case <p4> | <p5>: ... ``` It's safe to use the same decision tree for <p0> through <p3>, but it must be rebuilt for <p4> and <p5>, since <g0> could have done literally *anything*.
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 9/07/20 3:26 am, Brandt Bucher wrote:
I think you're being overly cautious here. To my mind, the guards should be regarded as part of the pattern matching process, and so people shouldn't be writing code that depends on them having side effects. As a nice consequence of adopting that rule, we would be able to say that these are equivalent: case C(a.b): ... case C(x) if x == a.b: ... -- Greg
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 24/06/2020 20:38, Guido van Rossum wrote:
(Prefatory remarks: I am sure you get a lot of questions to which the answer is basically "Read the PEP". I myself have been guilty in this regard. But I fear this is inevitable when the PEP is so long and there is so much new stuff to absorb. Apologies if this is yet another one.) _First question_: Sometimes no action is needed after a case clause. If the Django example had been written if ( isinstance(value, (list, tuple)) and len(value) > 1 and isinstance(value[-1], (Promise, str)) ): *value, label = value else: label = key.replace('_', ' ').title() the replacement code would/could be match value: case [*value, label := (Promise() | str())] if value: pass case _: label = key.replace('_', ' ').title() AFAICS the PEP does not *explicitly* state that the 'pass' line is necessary (is it?), i.e. that the block following `case` cannot (or can?) be empty. The term `block` is not defined in the PEP, or in https://docs.python.org/3/reference/grammar.html. But an empty block following a line ending in `:` would AFAIK be unprecedented in Python. I think it is worth clarifiying this. _Second question_: in the above example replacement, if `case _:` does not bind to `_`, does that mean that the following line will not work? Is this one of the "two bugs" that Mark Shannon alluded to? (I have read every message in the threads and I don't remember them being spelt out.) And I'm curious what the other one is (is it binding to a variable `v`?). Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/2b9b3/2b9b36bc837072bad690c31154c1853e8d82b536" alt=""
Hi Rob, You are right: the grammar should probably read `suite` rather than `block` (i.e. the `pass` is necessary). Thanks for catching this! As for the second question, I assume there might be a slight oversight on your part. The last line in the example replaces the string `"_"` rather than the variable `_`. The not-binding of `_` thus has no influence on the last line. I think I will leave it for Mark himself to name the two bugs rather than start a guessing game. However, in an earlier version we had left out the `if value` for the first case, accidentally translating the `len(value) > 1` as a `len(value) >= 1` instead. Kind regards, Tobias Quoting Rob Cliffe via Python-Dev <python-dev@python.org>:
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
I think we are storing up trouble unless we 1) Allow arbitrary expressions after `case`, interpreted *as now* 2) Use *different* syntaxes, not legal in expressions, for alternative matching values (i.e. not `|` or `or`) (NB simply stacking with multiple `case` lines is one possibility) templates such as `Point(x, 0)` anything else particular to `match` I am reminded of the special restrictions for decorator syntax, which were eventually removed. On 24/06/2020 20:38, Guido van Rossum wrote:
participants (22)
-
Antoine Pitrou
-
Brandt Bucher
-
Chris Angelico
-
Daniel Moisset
-
David Mertz
-
Emily Bowman
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Inada Naoki
-
Jim J. Jewett
-
Joao S. O. Bueno
-
MRAB
-
Nick Coghlan
-
Pablo Galindo Salgado
-
Paul Moore
-
Rhodri James
-
Richard Damon
-
Rob Cliffe
-
salernof11@gmail.com
-
Steven Barker
-
Tobias Kohn