PEP 634-636: Mapping patterns and extra keys

From PEP 636 (Structural Pattern Matching):
It surprises me that ignoring extra keys would be the *default* behavior. This seems unsafe. Extra keys I would think would be best treated as suspicious by default. * Ignoring extra keys loses data silently. In the current proposal: point = {'x': 1, 'y': 2, 'z': 3) match point: case {'x': x, 'y': y}: # MATCHES, losing z O_O pass case {'x': x, 'y': y, 'z': z}: # will never match O_O pass * Ignoring extra keys is inconsistent with the handling of sequences: We don't allow extra items when using a destructuring assignment to a sequence: p = [1, 2] [x, y] = p [x, y, z] = p # ERROR: ValueError: not enough values to unpack (expected 3, got 2) :) * Ignoring extra keys in mapping patterns is inconsistent with the current proposal for how sequence patterns match data: point = [1, 2, 3] match point: case [x, y]: # notices extra value and does NOT match :) pass case [x, y, z]: # matches :) pass * Ignoring extra keys is inconsistent with TypedDict's default "total" matching behavior: from typing import TypedDict class Point2D(TypedDict): x: int y: int p1: Point2D = {'x': 1, 'y': 2} p2: Point2D = {'x': 1, 'y': 2, 'z': 3) # ERROR: Extra key 'z' for TypedDict "Point2D" :) * It is *possible* to force an exact key match with a pattern guard but it's clumsy to do so. It should not be clumsy to parse strictly. point = {'x': 1, 'y': 2, 'z': 3) match point: # notices extra value and does NOT match, but requires ugly guard :/ case {'x': x, 'y': y, **rest} if rest == {}: pass case {'x': x, 'y': y, 'z': z, **rest} if rest == {}: pass To avoid the above problems, **I'd advocate for disallowing extra keys in mapping patterns by default**. For cases where extra keys want to be specifically allowed and ignored, I propose allowing a **_ wildcard. Some examples that illustrate behavior when *disallowing* extra keys in mapping patterns: 1. Strict parsing from typing import TypedDict, Union Point2D = TypedDict('Point2D', {'x': int, 'y': int}) Point3D = TypedDict('Point3D', {'x': int, 'y': int, 'z': int}) def parse_point(point_json: dict) -> Union[Point2D, Point3D]: match point_json: case {'x': int(x), 'y': int(y)}: return Point2D({'x': x, 'y': y}) case {'x': int(x), 'y': int(y), 'z': int(z)}: return Point3D({'x': x, 'y': y, 'z': z}) case _: raise ValueError(f'not a valid point: {point_json!r}') 2. Loose parsing, discarding unknown data. Common when reading JSON-like data when it's not necessary to output it again later. from typing import TypedDict TodoItem_ReadOnly = TypedDict('TodoItem_ReadOnly', {'title': str, 'completed': bool}) def parse_todo_item(todo_item_json: Dict) -> TodoItem_ReadOnly: match todo_item_json: case {'title': str(title), 'completed': bool(completed), **_}: return TodoItem_ReadOnly({'title': title, 'completed': completed}) case _: raise ValueError() input = {'title': 'Buy groceries', 'completed': True, 'assigned_to': ['me']} print(parse_todo_item(input)) # prints: {'title': 'Buy groceries', 'completed': True} 3. Loose parsing, preserving unknown data. Common when parsing JSON-like data when it needs to be round-tripped and output again later. from typing import Any, Dict, TypedDict TodoItem_ReadWrite = TypedDict('TodoItem_ReadWrite', {'title': str, 'completed': bool, 'extra': Dict[str, Any]}) def parse_todo_item(todo_item_json: Dict) -> TodoItem_ReadWrite: match todo_item_json: case {'title': str(title), 'completed': bool(completed), **extra}: return TodoItem_ReadWrite({'title': title, 'completed': completed, 'extra': extra}) case _: raise ValueError() def format_todo_item(item: TodoItem_ReadWrite) -> Dict: return {'title': item['title'], 'completed': item['completed'], **item['extra']} input = {'title': 'Buy groceries', 'completed': True, 'assigned_to': ['me']} output = format_todo_item(parse_todo_item(input)) print(output) # prints: {'title': 'Buy groceries', 'completed': True, 'assigned_to': ['me']} Comments? -- David Foster | Seattle, WA, USA Contributor to TypedDict support for mypy

On 11/14/20 10:17 PM, Guido van Rossum wrote:
It’s a usability issue; mappings are used quite differently than sequences. Compare to class patterns rather than sequence patterns.
I just found the following explanation from the superceded PEP 622 as to why extra keys are ignored:
I suppose this makes sense when using "match" to work with a dictionary used as a lightweight object, which I expect would be relatively common. The examples I originally presented assume use of "match" for parsing, and parsing tends to default to stricter matching. :) -- David Foster | Seattle, WA, USA Contributor to TypedDict support for mypy

I’m surprised PEP 635 doesn’t explain this at least as well as 622? Also in your example 1, the narrower pattern (three keys) should precede the more general pattern (two). Again, same as class patterns. Or except clauses (catch RuntimeError *before* Exception). On Sat, Nov 14, 2020 at 22:49 David Foster <davidfstr@gmail.com> wrote:
-- --Guido (mobile)

On Sat, Nov 14, 2020 at 10:17:34PM -0800, Guido van Rossum wrote:
It’s a usability issue; mappings are used quite differently than sequences. Compare to class patterns rather than sequence patterns.
I'm keeping an open mind on this question, but I think David is right to raise it. I think that most people are going to see this as dict matching as "ignoring errors by default" and going against the Zen of Python, and I expect that we'll be answering questions about it for years to come. "Why did my match statement match the wrong case?" Naively, I too would expect that dicts should only match if the keys match with no left overs, and I would like to see the choice to ignore left overs justified in the PEP. It would be good if the PEP gave a survey of the practical experience of other languages with pattern matching: - are there languages which require an exact match, with no left over keys? what issues, if any, do users have with that choice? - which languages ignore extra keys? do users of those languages consider this feature a bug, a wart, or a feature? -- Steve

On 2020-11-15 at 19:11:15 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
In Erlang, "mappings" tend to be generalized collections of (key, value) pairs in which the keys are not nailed down at design time, or the keys evolve over time (think about adding a new field to an existing message). Pattern matching ignores extra keys, so that old code can continue to handle the messages it knows how to handle and simply ignore data it doesn't know about (yes, you have to think carefully about extending messages in this way, but it has worked well over decades). This is definitely a feature. Also in Erlang, "records" are very similar to Python's named tuples. Pattern matching on records also ignores extra keys, so that I can match records that meet certain criteria and not have to list every attribute in every pattern. IMO, ignoring extra keys allows for extensibility when you don't always have control over which versions of which code is actually running (which is the case in the typical distributed system). Not ignoring extra keys may work better inside a monolithic application where all the data comes from within or is already parsed/decoded. IMO, this is going to come down to your use case (I'm *shocked*). If I receive HTML/XML/JSON/TCP/whatever messages, and I want to use pattern matching to decode or dispatch on the message type (e.g., login, logout, attack, connect), then *not* having to write **rest on every pattern reduces clutter. But if I have to handle 2D points separately from 3D points, then a more strict matching (i.e., not ignoring extra keys) relieves me of having to think about which case is more or less specific, and may be easier for beginners to use. (IMO, making things easier for beginners is only a means to an end. If making things easier for beginners makes things harder for experts, then don't do it. But I'm not in charge around here.) As an analogy, when you write a command line utility, do you accept or reject extraneous command line arguments? Is "spam --version ham eggs" the same as "spam --version"? I'm going to guess that it depends on your personality and your background and not anything else inside the utility (note that your choice of command line parser also depends on your personality and your backround...). IOW, the answer to the question of ignoring extra keys is going to aggravate half the users and half the use cases no matter what.

I've completed my survey of how other languages use pattern matching to match Mapping-like and dict-like types, especially focusing on whether they ignore (𝔸) or disallow (𝔹) extra keys by default. In general: 1. Dynamically-typed languages that support pattern matching have mostly experimental (†) pattern-matching implementations that all ignore extra keys by default (𝔸) rather than disallow them (𝔹). Some popular dynamically-typed languages don't support pattern-matching at all (✖️). - Erlang/Elixir are the only languages in this category with a non-experimental implementation. - Ruby and JavaScript both have experimental implementations only. 2. Statically-typed languages frequently provide pattern matching capabilities but they can't be used for Mapping-like or dict-like types (🤷♀️). Frequently the pattern matching provided *can* however be used on association lists (ordered lists of key-value tuples) or dataclass-like instances. So it certainly seems at least the popular vote points toward ignoring extra keys by default rather than disallowing them by default, which is consistent with the current wording of PEP 634-636. Below is the full survey summary: Key: * Pattern Match Approach 𝔸 = Pattern-matches mappings/dicts, ignoring extra keys by default 𝔹 = Pattern-matches mappings/dicts, disallowing extra keys by default 🤷♀️ = Has pattern matching, but doesn't apply to a Mapping-like or dict-like types ✖️ = Has no pattern matching for algebraic data types (but might for strings) * Maturity † = Pattern-matching syntax is advertised as "experimental" Dynamically-typed languages: * Erlang 𝔸 * Elixir (derived from Erlang) 𝔸 * Ruby (2.7) 𝔸† * JavaScript (TC39 draft) 𝔸† * PHP ✖️ * Lua ✖️ * Python (current draft of PEP 634-636) 𝔸† Statically-typed languages: * OCaml 🤷♀️ * Scala 🤷♀️ * Swift 🤷♀️ * Rust 🤷♀️ * Haskell 🤷♀️ * C++ 🤷♀️ -- David Foster | Seattle, WA, USA Contributor to TypedDict support for mypy

On 11/19/20 10:08 PM, David Foster wrote:
To close the loop on this thread: * Based on (1) the explanation from PEP 622 and Guido RE that "mappings [...] have natural structural sub-typing behavior, i.e., passing a dictionary with extra keys somewhere will likely just work" and (2) the survey results, I'm now personally fine (+0) with keys being ignored by default when matching against mappings. * I do think it might be illustrative to copy the following explanatory sentences from the "Mapping Patterns" section of the older PEP 622 to the same section of PEP 635 (Structural Pattern Matching: Motivation and Rationale):
Specifically the above might replace the following sentence in PEP 635, which doesn't really give a rationale:
Moreover, the mapping pattern does not check for the presence of additional keys.
* I still have an interest in strictly matching dictionaries that are destined to become TypedDicts, but I can take that conversation to a different thread. -- David Foster | Seattle, WA, USA Contributor to TypedDict support for mypy

Thanks very much for the survey (which actually surprised me somewhat). Regarding the suggested update for the PEP, I'll make a PR to change that -- I agree it's worth saying it. On Sat, Nov 21, 2020 at 5:00 PM David Foster <davidfstr@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

I'm in favor of keeping the PEP as it currently is. Mappings are naturally structural subtypes of one another, therefore mapping patterns should be consistent with class patterns. car = Car(...) match car: case Vehicle(): pass case Car(): # will never match pass This example is analogous to the first one in the discussion. If Car is a subclass of Vehicle, then Vehicle() is a more general pattern than Car() and will always match despite the instance not being exactly of type Vehicle. With mapping patterns it's exactly the same thing. You need to match the more specific patterns first. Matching x and y is more general than matching x, y and z. One more thing. If we compare the proposed behavior to other languages the most relevant example would be object destructuring in javascript: const { 'x': x, 'y': y } = { 'x': 1, 'y': 2, 'z': 3 }; const { x, y } = { x: 1, y: 2, z: 3 }; // more common short form Object destructuring only matches the specified fields. You can also match the remaining fields but it's always explicit: const { x, y, ...rest } = { x: 1, y: 2, z: 3 }; console.log(rest); // { z: 3 } The pattern is widely adopted and the behavior generally lines up with people's expectations.

On 11/14/20 10:17 PM, Guido van Rossum wrote:
It’s a usability issue; mappings are used quite differently than sequences. Compare to class patterns rather than sequence patterns.
I just found the following explanation from the superceded PEP 622 as to why extra keys are ignored:
I suppose this makes sense when using "match" to work with a dictionary used as a lightweight object, which I expect would be relatively common. The examples I originally presented assume use of "match" for parsing, and parsing tends to default to stricter matching. :) -- David Foster | Seattle, WA, USA Contributor to TypedDict support for mypy

I’m surprised PEP 635 doesn’t explain this at least as well as 622? Also in your example 1, the narrower pattern (three keys) should precede the more general pattern (two). Again, same as class patterns. Or except clauses (catch RuntimeError *before* Exception). On Sat, Nov 14, 2020 at 22:49 David Foster <davidfstr@gmail.com> wrote:
-- --Guido (mobile)

On Sat, Nov 14, 2020 at 10:17:34PM -0800, Guido van Rossum wrote:
It’s a usability issue; mappings are used quite differently than sequences. Compare to class patterns rather than sequence patterns.
I'm keeping an open mind on this question, but I think David is right to raise it. I think that most people are going to see this as dict matching as "ignoring errors by default" and going against the Zen of Python, and I expect that we'll be answering questions about it for years to come. "Why did my match statement match the wrong case?" Naively, I too would expect that dicts should only match if the keys match with no left overs, and I would like to see the choice to ignore left overs justified in the PEP. It would be good if the PEP gave a survey of the practical experience of other languages with pattern matching: - are there languages which require an exact match, with no left over keys? what issues, if any, do users have with that choice? - which languages ignore extra keys? do users of those languages consider this feature a bug, a wart, or a feature? -- Steve

On 2020-11-15 at 19:11:15 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
In Erlang, "mappings" tend to be generalized collections of (key, value) pairs in which the keys are not nailed down at design time, or the keys evolve over time (think about adding a new field to an existing message). Pattern matching ignores extra keys, so that old code can continue to handle the messages it knows how to handle and simply ignore data it doesn't know about (yes, you have to think carefully about extending messages in this way, but it has worked well over decades). This is definitely a feature. Also in Erlang, "records" are very similar to Python's named tuples. Pattern matching on records also ignores extra keys, so that I can match records that meet certain criteria and not have to list every attribute in every pattern. IMO, ignoring extra keys allows for extensibility when you don't always have control over which versions of which code is actually running (which is the case in the typical distributed system). Not ignoring extra keys may work better inside a monolithic application where all the data comes from within or is already parsed/decoded. IMO, this is going to come down to your use case (I'm *shocked*). If I receive HTML/XML/JSON/TCP/whatever messages, and I want to use pattern matching to decode or dispatch on the message type (e.g., login, logout, attack, connect), then *not* having to write **rest on every pattern reduces clutter. But if I have to handle 2D points separately from 3D points, then a more strict matching (i.e., not ignoring extra keys) relieves me of having to think about which case is more or less specific, and may be easier for beginners to use. (IMO, making things easier for beginners is only a means to an end. If making things easier for beginners makes things harder for experts, then don't do it. But I'm not in charge around here.) As an analogy, when you write a command line utility, do you accept or reject extraneous command line arguments? Is "spam --version ham eggs" the same as "spam --version"? I'm going to guess that it depends on your personality and your background and not anything else inside the utility (note that your choice of command line parser also depends on your personality and your backround...). IOW, the answer to the question of ignoring extra keys is going to aggravate half the users and half the use cases no matter what.

I've completed my survey of how other languages use pattern matching to match Mapping-like and dict-like types, especially focusing on whether they ignore (𝔸) or disallow (𝔹) extra keys by default. In general: 1. Dynamically-typed languages that support pattern matching have mostly experimental (†) pattern-matching implementations that all ignore extra keys by default (𝔸) rather than disallow them (𝔹). Some popular dynamically-typed languages don't support pattern-matching at all (✖️). - Erlang/Elixir are the only languages in this category with a non-experimental implementation. - Ruby and JavaScript both have experimental implementations only. 2. Statically-typed languages frequently provide pattern matching capabilities but they can't be used for Mapping-like or dict-like types (🤷♀️). Frequently the pattern matching provided *can* however be used on association lists (ordered lists of key-value tuples) or dataclass-like instances. So it certainly seems at least the popular vote points toward ignoring extra keys by default rather than disallowing them by default, which is consistent with the current wording of PEP 634-636. Below is the full survey summary: Key: * Pattern Match Approach 𝔸 = Pattern-matches mappings/dicts, ignoring extra keys by default 𝔹 = Pattern-matches mappings/dicts, disallowing extra keys by default 🤷♀️ = Has pattern matching, but doesn't apply to a Mapping-like or dict-like types ✖️ = Has no pattern matching for algebraic data types (but might for strings) * Maturity † = Pattern-matching syntax is advertised as "experimental" Dynamically-typed languages: * Erlang 𝔸 * Elixir (derived from Erlang) 𝔸 * Ruby (2.7) 𝔸† * JavaScript (TC39 draft) 𝔸† * PHP ✖️ * Lua ✖️ * Python (current draft of PEP 634-636) 𝔸† Statically-typed languages: * OCaml 🤷♀️ * Scala 🤷♀️ * Swift 🤷♀️ * Rust 🤷♀️ * Haskell 🤷♀️ * C++ 🤷♀️ -- David Foster | Seattle, WA, USA Contributor to TypedDict support for mypy

On 11/19/20 10:08 PM, David Foster wrote:
To close the loop on this thread: * Based on (1) the explanation from PEP 622 and Guido RE that "mappings [...] have natural structural sub-typing behavior, i.e., passing a dictionary with extra keys somewhere will likely just work" and (2) the survey results, I'm now personally fine (+0) with keys being ignored by default when matching against mappings. * I do think it might be illustrative to copy the following explanatory sentences from the "Mapping Patterns" section of the older PEP 622 to the same section of PEP 635 (Structural Pattern Matching: Motivation and Rationale):
Specifically the above might replace the following sentence in PEP 635, which doesn't really give a rationale:
Moreover, the mapping pattern does not check for the presence of additional keys.
* I still have an interest in strictly matching dictionaries that are destined to become TypedDicts, but I can take that conversation to a different thread. -- David Foster | Seattle, WA, USA Contributor to TypedDict support for mypy

Thanks very much for the survey (which actually surprised me somewhat). Regarding the suggested update for the PEP, I'll make a PR to change that -- I agree it's worth saying it. On Sat, Nov 21, 2020 at 5:00 PM David Foster <davidfstr@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

I'm in favor of keeping the PEP as it currently is. Mappings are naturally structural subtypes of one another, therefore mapping patterns should be consistent with class patterns. car = Car(...) match car: case Vehicle(): pass case Car(): # will never match pass This example is analogous to the first one in the discussion. If Car is a subclass of Vehicle, then Vehicle() is a more general pattern than Car() and will always match despite the instance not being exactly of type Vehicle. With mapping patterns it's exactly the same thing. You need to match the more specific patterns first. Matching x and y is more general than matching x, y and z. One more thing. If we compare the proposed behavior to other languages the most relevant example would be object destructuring in javascript: const { 'x': x, 'y': y } = { 'x': 1, 'y': 2, 'z': 3 }; const { x, y } = { x: 1, y: 2, z: 3 }; // more common short form Object destructuring only matches the specified fields. You can also match the remaining fields but it's always explicit: const { x, y, ...rest } = { x: 1, y: 2, z: 3 }; console.log(rest); // { z: 3 } The pattern is widely adopted and the behavior generally lines up with people's expectations.
participants (5)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
David Foster
-
Guido van Rossum
-
Steven D'Aprano
-
Valentin Berlier