Hi Brandt, On 30/03/2021 5:25 pm, Brandt Bucher wrote:
Overall, I am still uncomfortable with PEP 653, and would probably not support its acceptance.
Although it has thankfully become a much less radical proposal than it was a few weeks ago (thanks, Mark, for your attention to our feedback), I feel that the rules it binds implementations to are *very* premature, and that the new mechanisms it introduces to do so only modestly improve potential performance at great expense to the ease of learning, using, and maintaining code using structural pattern matching.
A few notes follow:
For example, using `sympy`, we might want to write:
``` # a*a == a**2 case Mul(args=[a, b]) if a == b: return Pow(a, 2) ```
Which requires the sympy class `Symbol` to "self" match. For `sympy` to support this pattern with PEP 634 is possible, but a bit tricky. With this PEP it can be implemented very easily.
Maybe I'm missing something, but I don't understand at all how the provided code snippet relies on the self-matching behavior.
No I'm missing something. That should read case Mul(args=[Symbol(a), Symbol(b)) if a == b: ... I'll fix that in the PEP thanks.
Have the maintainers of SymPy (or any large library supposedly benefitting here) come out in support of the PEP? Are they at least aware of it? Have they indicated that the proposed idiom for implementing self-matching behavior using a property is truly too "tricky" for them?
Have you identified any stdlib classes that would benefit greatly from this?
For me, `__match_class__` feels like a feature without demonstrated need. Even if there is a great demand for this, I certainly think that there are far better options than the proposed flagging system:
The distinction between those classes that have the default behavior and those that match "self" is from PEP 634. I didn't introduce it. I'm just proposing a more principled way to make that distinction.
- A `@match_self` class decorator (someone's bound to put one on PyPI, at any rate). - Allowing `__match_args__ = None` to signal this case (an option we previously considered, and my personal preference).
...both of which can be added later, if needed.
Further, PEP 634 makes it very easy for libraries to support Python versions with *and* without pattern matching (something I consider to be an important requirement). The following class works with both 3.9 and 3.10:
``` class C(collections.abc.Sequence): ... ```
While something like this is required for PEP 653:
``` class C: if sys.version_info >= (3, 10): from somewhere import MATCH_SEQUENCE __match_container__ = MATCH_SEQUENCE ... ```
Or class C: __match_container__ = 1 # MATCH_SEQUENCE Which is one reason the PEP states that the values of MATCH_SEQUENCE, etc. will never change.
PEP 634 relies on the `collections.abc` module when determining which patterns a value can match, implicitly importing it if necessary. This PEP will eliminate surprising import errors and misleading audit events from those imports.
I think that a broken `_collections_abc` module *should* be surprising. Is there any reasonable scenario where it's expected to not exist, or be not be fit for this purpose?
No reasonable scenario, but unreasonable scenarios happen all too often.
And I'm not sure how an audit event for an import that is happening could be considered "misleading"... I certainly wouldn't want it suppressed.
It's misleading because a match statement doesn't include any explicit imports.
Looking up a special attribute is much faster than performing a subclass test on an abstract base class.
How much faster? A quick benchmark on my machine suggests less than half a microsecond. PEP 634 (like PEP 653) already allows us to cache this information for the subject of a match statement, so I doubt that this is actually a real issue in practice. An indeed, with the current implementation, this test isn't even performed on the most common types, such as lists, tuples, and dictionaries.
Half a microsecond is thousands of instructions on a modern CPU. That is a long time for a single VM operation.
At the very least, PEP 653's confusing new flag system seems to be a *very* premature optimization, seriously hurting usability for a modest performance increase. (Using them wrongly also seems to introduce a fair amount of undefined behavior, which seems to go against the PEP's own motivation.)
Why do you say it is a premature optimization? It's primary purpose is reliability and precise semantics. It is more optimizable, I agree, but that is hardly premature. You also say it is confusing, but I think it is simpler than the workarounds to match "self" that you propose. This is very subjective though. Evidently we think differently.
If the value of `__match_args__` is not as specified, then the implementation may raise any exception, or match the wrong pattern.
I think there's a name for this sort of behavior... ;)
Indeed, but there is only undefined behavior if a class violates clearly specified rules. The undefined behavior in PEP 634 is much broader. We already tolerate some amount of undefined behavior. For example, dictionary lookup is also undefined for classes which do not hash properly.
A couple of other, more technical notes:
- PEP 653 requires mappings to have a `keys()` method that returns an object supporting set inequality operations. It is not really that common to find this sort of support in user code (in my experience, it is more likely that user-defined `keys()` methods will return iterables). It's not even clear to me if this is an interface requirement for mappings in general. For example, `weakref.WeakKeyDictionary` and `weakref.WeakValueDictionary` presently do not work with PEP 653's requirements for mapping patterns, since their `keys()` methods return iterators.
- Treating `__getitem__` as pure is problematic for some common classes (such as `defaultdict`). That's why we use two-argument `get()` instead.
Thanks for pointing that out. I'd noted that PEP 634 used `get()`, which is why I inserted the guard on keys beforehand. Clearly that is insufficient. I'll update the semantics to use the two-argument `get()`. It does seem more robust.
As well-fleshed out as the pseudocode for the matching operations in this PEP may be, examples like this suggest that perhaps we should wait until 3.11 or later to figure out what actually works in practice and what doesn't. PEP 634 took a full year of work, and the ideas it proposed changed substantially during that time (in no small part because we had many people experimenting with how an actual implementation interacted with real code).
I fully understand that a lot of work went into PEP 634. Which is why I am using the syntax and much of the semantics of PEP 634 as I can. The test suite is proving very useful, thanks. The problem with waiting for 3.11 is that code will start to rely on some of the implementation details of pattern matching as it is now, and that our ability to optimize it will be delayed by a year. Cheers, Mark.