Mailman 3 PEP 634-636: Mapping patterns and extra keys - Python-ideas

15 Nov 2020

      From PEP 636 (Structural Pattern Matching):
...
Mapping patterns: {"bandwidth": b, "latency": l} captures the 
"bandwidth" and "latency" values from a dict. Unlike sequence patterns, 
extra keys are ignored.
It surprises me that ignoring extra keys would be the *default* 
behavior. This seems unsafe. Extra keys I would think would be best 
treated as suspicious by default.

* Ignoring extra keys loses data silently. In the current proposal:

     point = {'x': 1, 'y': 2, 'z': 3)
     match point:
         case {'x': x, 'y': y}:  # MATCHES, losing z  O_O
             pass
         case {'x': x, 'y': y, 'z': z}:  # will never match  O_O
             pass

* Ignoring extra keys is inconsistent with the handling of sequences: We 
don't allow extra items when using a destructuring assignment to a sequence:

     p = [1, 2]
     [x, y] = p
     [x, y, z] = p  # ERROR: ValueError: not enough values to unpack 
(expected 3, got 2)  :)

* Ignoring extra keys in mapping patterns is inconsistent with the 
current proposal for how sequence patterns match data:

     point = [1, 2, 3]
     match point:
         case [x, y]:  # notices extra value and does NOT match  :)
             pass
         case [x, y, z]:  # matches :)
             pass

* Ignoring extra keys is inconsistent with TypedDict's default "total" 
matching behavior:

     from typing import TypedDict

     class Point2D(TypedDict):
         x: int
         y: int

     p1: Point2D = {'x': 1, 'y': 2}
     p2: Point2D = {'x': 1, 'y': 2, 'z': 3)  # ERROR: Extra key 'z' for 
TypedDict "Point2D"  :)

* It is *possible* to force an exact key match with a pattern guard but 
it's clumsy to do so.
   It should not be clumsy to parse strictly.

     point = {'x': 1, 'y': 2, 'z': 3)
     match point:
         # notices extra value and does NOT match, but requires ugly 
guard :/
         case {'x': x, 'y': y, **rest} if rest == {}:
             pass
         case {'x': x, 'y': y, 'z': z, **rest} if rest == {}:
             pass

To avoid the above problems, **I'd advocate for disallowing extra keys 
in mapping patterns by default**. For cases where extra keys want to be 
specifically allowed and ignored, I propose allowing a **_ wildcard.

Some examples that illustrate behavior when *disallowing* extra keys in 
mapping patterns:

1. Strict parsing

     from typing import TypedDict, Union

     Point2D = TypedDict('Point2D', {'x': int, 'y': int})
     Point3D = TypedDict('Point3D', {'x': int, 'y': int, 'z': int})

     def parse_point(point_json: dict) -> Union[Point2D, Point3D]:
         match point_json:
             case {'x': int(x), 'y': int(y)}:
                 return Point2D({'x': x, 'y': y})
             case {'x': int(x), 'y': int(y), 'z': int(z)}:
                 return Point3D({'x': x, 'y': y, 'z': z})
             case _:
                 raise ValueError(f'not a valid point: {point_json!r}')

2. Loose parsing, discarding unknown data.
    Common when reading JSON-like data when it's not necessary to output 
it again later.

     from typing import TypedDict

     TodoItem_ReadOnly = TypedDict('TodoItem_ReadOnly', {'title': str, 
'completed': bool})

     def parse_todo_item(todo_item_json: Dict) -> TodoItem_ReadOnly:
         match todo_item_json:
             case {'title': str(title), 'completed': bool(completed), **_}:
                 return TodoItem_ReadOnly({'title': title, 'completed': 
completed})
             case _:
                 raise ValueError()

     input = {'title': 'Buy groceries', 'completed': True, 
'assigned_to': ['me']}
     print(parse_todo_item(input))  # prints: {'title': 'Buy groceries', 
'completed': True}

3. Loose parsing, preserving unknown data.
    Common when parsing JSON-like data when it needs to be round-tripped 
and output again later.

     from typing import Any, Dict, TypedDict

     TodoItem_ReadWrite = TypedDict('TodoItem_ReadWrite', {'title': str, 
'completed': bool, 'extra': Dict[str, Any]})

     def parse_todo_item(todo_item_json: Dict) -> TodoItem_ReadWrite:
         match todo_item_json:
             case {'title': str(title), 'completed': bool(completed), 
**extra}:
                 return TodoItem_ReadWrite({'title': title, 'completed': 
completed, 'extra': extra})
             case _:
                 raise ValueError()

     def format_todo_item(item: TodoItem_ReadWrite) -> Dict:
         return {'title': item['title'], 'completed': item['completed'], 
**item['extra']}

     input = {'title': 'Buy groceries', 'completed': True, 
'assigned_to': ['me']}
     output = format_todo_item(parse_todo_item(input))
     print(output)  # prints: {'title': 'Buy groceries', 'completed': 
True, 'assigned_to': ['me']}

Comments?

-- 
David Foster | Seattle, WA, USA
Contributor to TypedDict support for mypy

PEP 634-636: Mapping patterns and extra keys

David Foster

Guido van Rossum

David Foster

Guido van Rossum

Steven D'Aprano

2QdxY4RzWzUUiLuE＠potatochowder.com

David Foster

Guido van Rossum

David Foster

David Foster

Guido van Rossum

Valentin Berlier

tags

participants (5)