Mailman 3 Variadic patterns - Python-ideas

Variadic patterns

older
Re: Feedback before submission of...

Matt del Valle

Sept. 13, 2022

1:17 p.m.

I couldn't find any information on whether this was considered and rejected during the pattern matching PEP, so apologies if this is already settled. I've just encountered this use-case repeatedly so I figured it was time to make a thread on this. Basically, I'd like to be able to designate a pattern like this: match val: case [*str()]: ... Where 'val' will match an iterable containing zero or more strings. This should also work with more complex sub-patterns, like: match humans: case [ *{ 'name': str(), 'age': int(), } ]: ... Where 'humans' should match an iterable of zero or more mappings where each mapping contains the keys 'name' and 'age' (and where the value for name is a string and the value for age is an int). Sure, you can use the sentinel clause to do it, but imagine you've got a case statement where you're using 10 different variadic sub-patterns. You'd have an unbearably long string of: case {PATTERN_GOES_HERE} if ( all(isinstance(val1, some_type1) for val1 in sub_pattern1) and all(isinstance(val2, some_type2) for val2 in sub_pattern2) and ... ) And that's assuming simple variadics like my first example above, and not my second example! This incidentally destroys the gain in legibility of using a pattern in the first place. Is anyone aware of a reason variadic patterns weren't included in the PEP? Was it an oversight, or was it deemed too niche to include in the original spec? Or were there implementation-related reasons? Something else? It just seems like it would make match statements way more powerful for data shape validation, which is one of the primary use-cases that the tutorial PEP showed off. Cheers

Attachments:

attachment.html (text/html — 5.2 KB)

Show replies by date

Valentin Berlier

September 2022

2:12 p.m.

I really like this. One problem though is that it's not immediately obvious what happens with binding patterns and "as" clauses. In the code inside the case statement, should these identifiers refer to the last value matched, or should they accumulate all the matches in a list?

Matt del Valle

3:44 p.m.

Great point Valentin. I do think it's worthwhile to allow capturing variadic sub-patterns while destructuring. For symmetry with variadic function arguments (*args) I would actually suggest accumulating all the values using a tuple and bind that tuple to the name, but I'm not precious about using a tuple vs a list. Bikesheddable imo. Something I forgot to mention in the original proposal is that unlike with variadic function arguments where you can only have variadic positionals once in the signature, in a pattern the following should be valid: match heterogenous: case [ *str() as strings, 3, *bool() ] Where 'heterogenous' should match an iterable starting with zero or more strings and bind them to (subject to bikeshedding) a tuple with the name 'strings', followed by the integer 3, followed by zero or more booleans. On Tue, Sep 13, 2022 at 3:17 PM Valentin Berlier <berlier.v@gmail.com> wrote:

...

I really like this. One problem though is that it's not immediately obvious what happens with binding patterns and "as" clauses. In the code inside the case statement, should these identifiers refer to the last value matched, or should they accumulate all the matches in a list? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U6TBMZ... Code of Conduct: http://python.org/psf/codeofconduct/

Christopher Barker

2:43 p.m.

On Tue, Sep 13, 2022 at 8:44 AM Matt del Valle <matthewgdv@gmail.com> wrote:and bind them

...

match val: case [*str()]: ...

Where 'val' will match an iterable containing zero or more strings.

This should also work with more complex sub-patterns, like:

match humans: case [ *{ 'name': str(), 'age': int(), } ]: ...

Where 'humans' should match an iterable of zero or more mappings where each mapping contains the keys 'name' and 'age' (and where the value for name is a string and the value for age is an int).

I haven't yet used pattern matching, but even without that context, this strikes me as mathicn on type, rather than value. which doesn't, in my mind, belong in code that's going to run during run time. I wouldn't write: if obj and all( isinstance(o, str) for o in obj): do_something so why would I use it in pattern matching? Granted, the distinction between type and value gets a bit messy in Python, but I"d think: each mapping contains the keys 'name' and 'age'

...

would be value, but:

...

(and where the value for name is a string and the value for age is an int).

would be type. I do see that kind of checking to be useful, for, e.g. unpacking JSON data, but I'm not sure this is where it belongs -- that kind of checking, to me, is validation of the JSON, defining the structure belongs in a different place, in a different way. I would be interested in hearing about the use cases you have in mind, and where that fits into the whole type vs value checking continuum. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Matt del Valle

8:57 a.m.

Heya Chris. Not sure where you're getting the idea that pattern matching is meant to be only for attributes or values and not types. Matching on type is one of the main examples the tutorial PEP showed off (see the example where an event is matched against several different types such as Click, KeyPress, or Quit). Saying it is categorically bad to match on type seems to me an overly dogmatic statement, which might certainly be true in the kind of code you write in your area of work, but in general there are more use-cases for isinstance-like behaviour than I could possibly name. Just to mention the first example that comes to mind, the pandas DataFrame constructor is capable of accepting many different orientations of data to construct a DataFrame from, (row-wise as a list of tuples with 'columns=' passed as an argument, record-wise as a list of dicts, etc.). It also needs to be able to infer the type for each Series from the data passed. While for pandas in particular a pure-python implementation would never be deemed sufficiently performant, you can imagine similar but domain-specific lightweight tabular data classes needing to do the same thing and opting using the type matching capabilities of the match statement for it (if it could handle variadic patterns). My use-cases have usually been in data engineering, so for example writing integration tests that validate that a given API returns payloads with the correct shema (including types), or validating that the front-end designers making internal API calls within a webapp are actually passing valid json payloads (both structure and type) in a single compact statement. I realize there are more heavyweight alternatives like, for instance, Pydantic models, but there are environments where people don't have permissions to install third-party libraries, or where compliance requirements prevent it, or where the scope of the work is so small that it might feel like overkill to reach for something with a potentially steep learning curve. Regardless, given that the match statement already supports matching on type as a first-class operation, I think this particular point is somewhat off-topic and would have belonged when structural pattern-matching was first being discussed. My suggestion is to allow variable-length patterns (and sub-patterns). The fact that python patterns have the ability to match on type isn't really relevant, because that is existing functionality. Cheers, Matt On Wed, Sep 14, 2022 at 3:43 PM Christopher Barker <pythonchb@gmail.com> wrote:

...

On Tue, Sep 13, 2022 at 8:44 AM Matt del Valle <matthewgdv@gmail.com> wrote:and bind them

...
match val: case [*str()]: ...

Where 'val' will match an iterable containing zero or more strings.

This should also work with more complex sub-patterns, like:

match humans: case [ *{ 'name': str(), 'age': int(), } ]: ...

Where 'humans' should match an iterable of zero or more mappings where each mapping contains the keys 'name' and 'age' (and where the value for name is a string and the value for age is an int).

I haven't yet used pattern matching, but even without that context, this strikes me as mathicn on type, rather than value.

which doesn't, in my mind, belong in code that's going to run during run time.

I wouldn't write:

if obj and all( isinstance(o, str) for o in obj): do_something

so why would I use it in pattern matching?

Granted, the distinction between type and value gets a bit messy in Python, but I"d think:

each mapping contains the keys 'name' and 'age'

...
would be value, but:

...
(and where the value for name is a string and the value for age is an int).

would be type.

I do see that kind of checking to be useful, for, e.g. unpacking JSON data, but I'm not sure this is where it belongs -- that kind of checking, to me, is validation of the JSON, defining the structure belongs in a different place, in a different way.

I would be interested in hearing about the use cases you have in mind, and where that fits into the whole type vs value checking continuum.

-CHB

-- Christopher Barker, PhD (Chris)

Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Christopher Barker

7:36 p.m.

On Thu, Sep 15, 2022 at 4:57 AM Matt del Valle <matthewgdv@gmail.com> wrote:

...

Heya Chris.

Not sure where you're getting the idea that pattern matching is meant to be only for attributes or values and not types.

I don’t- I have the idea that Python itself is not really about Types. Honestly, it makes me nervous when supposedly “optional” type hints start making their way into built in parts of the language and standard idioms. Matching on type is one of the main examples the tutorial PEP showed off

...

(see the example where an event is matched against several different types such as Click, KeyPress, or Quit).

I guess my feeling is that there are better ways to solve those kinds of problems in Python.

...

Saying it is categorically bad to match on type seems to me an overly dogmatic statement, which might certainly be true in the kind of code you write in your area of work, but in general there are more use-cases for isinstance-like behaviour than I could possibly name. Just to mention the first example that comes to mind, the pandas DataFrame constructor is capable of accepting many different orientations of data to construct a DataFrame from, (row-wise as a list of tuples with 'columns=' passed as an argument, record-wise as a list of dicts, etc.). It also needs to be able to infer the type for each Series from the data passed.

I’m not so sure that matching on Python type is the way to do that — for example, you may have a collection of floats and ints — you have to look at values to do that right. And more often than not, the source is something like a CSV file, where it’s all a string type. Even unpacking JSON — JSON has a much less rich type system than Python — do properly determining what a JSON object is requires more than type checking. While for pandas in particular a pure-python implementation would never be

...

deemed sufficiently performant, you can imagine similar but domain-specific lightweight tabular data classes needing to do the same thing and opting using the type matching capabilities of the match statement for it (if it could handle variadic patterns).

My use-cases have usually been in data engineering, so for example writing integration tests that validate that a given API returns payloads with the correct shema (including types)

See above — simple type checking is likely to be inadequate— and I’m not sure validation is the right use case for pattern matching either. Regardless, given that the match statement already supports matching on

...

type as a first-class operation, I think this particular point is somewhat off-topic and would have belonged when structural pattern-matching was first being discussed. My suggestion is to allow variable-length patterns (and sub-patterns). The fact that python patterns have the ability to match on type isn't really relevant, because that is existing functionality.

This is the key point — and you are quite right. I’ll let others comment on whether this extension to pattern matching makes sense — I haven’t really dig into it enough to have an opinion. -CHB

...

Cheers, Matt

On Wed, Sep 14, 2022 at 3:43 PM Christopher Barker <pythonchb@gmail.com> wrote:

...
On Tue, Sep 13, 2022 at 8:44 AM Matt del Valle <matthewgdv@gmail.com> wrote:and bind them

...
match val: case [*str()]: ...

Where 'val' will match an iterable containing zero or more strings.

This should also work with more complex sub-patterns, like:

match humans: case [ *{ 'name': str(), 'age': int(), } ]: ...

Where 'humans' should match an iterable of zero or more mappings where each mapping contains the keys 'name' and 'age' (and where the value for name is a string and the value for age is an int).

I haven't yet used pattern matching, but even without that context, this strikes me as mathicn on type, rather than value.

which doesn't, in my mind, belong in code that's going to run during run time.

I wouldn't write:

if obj and all( isinstance(o, str) for o in obj): do_something

so why would I use it in pattern matching?

Granted, the distinction between type and value gets a bit messy in Python, but I"d think:

each mapping contains the keys 'name' and 'age'

...
would be value, but:

...
(and where the value for name is a string and the value for age is an int).

would be type.

I do see that kind of checking to be useful, for, e.g. unpacking JSON data, but I'm not sure this is where it belongs -- that kind of checking, to me, is validation of the JSON, defining the structure belongs in a different place, in a different way.

I would be interested in hearing about the use cases you have in mind, and where that fits into the whole type vs value checking continuum.

-CHB

-- Christopher Barker, PhD (Chris)

Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IKDK2F... Code of Conduct: http://python.org/psf/codeofconduct/

-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Stephen J. Turnbull

4:22 p.m.

Christopher Barker writes:

...

I don’t- I have the idea that Python itself is not really about Types.

Of course it is, and always has been. Types are even first-class objects in Python. What Python has never been about is universal type-checking.

...

Honestly, it makes me nervous when supposedly “optional” type hints start making their way into built in parts of the language and standard idioms.

Type hints are easily recognizable from the syntax, and match statements aren't that. This isn't creeping typing-ism at all. Python has *always* (at least since 1996 or so when I first encountered it) made "look before you leap" and "dispatch on type" styles possible. The match statement is clearly about enabling exactly the latter kind of programming:

...

Matt del Valle writes:

...

...
Matching on type is one of the main examples the tutorial PEP showed off (see the example where an event is matched against several different types such as Click, KeyPress, or Quit).

I guess my feeling is that there are better ways to solve those kinds of problems in Python.

Like what? Dispatching on type is fundamental to event-oriented programming. Of course you can write if isinstance(event, Click): x = event.position.x y = event.position.y button = event.b handle_click(x, y, button) or if isinstance(event, Click): handle_click(event.position.x, event.position.y, event.button) instead of match(event): Click(position=(x, y), button): handle_click(x, y, button) and I can see an argument that the former is "just as good" as the latter and we don't need match, but now that we do have the match statement, I can't see any argument that either if form is *better*.

...

This is the key point — and you are quite right. I’ll let others comment on whether this extension to pattern matching makes sense — I haven’t really dig into it enough to have an opinion.

About Matt's variadic container matching: My guess is that it was considered, but it's not easy to find pleasant applications because you can only do (x, y, *rest) = whatever destructuring on the whole sequence, which is a one-liner in a case that catches sequences. The typecheck for simple types such as str or even a user-defined class is also a one-liner (all(isinstance(o, type) for o in container). I don't think turning those three lines into one is irresistably attractive, which means it's most interesting for a case where you have a "type" defined inline (typically as a dict, I guess). I'm not sure how common that is. And in a case like: match mailbox: case [*{"From" : author, "To" : recipient, "_payload" : body}]: do_something(mailbox) isn't do_something very likely to be of the form for message in mailbox: do_something(message) and you have to destructure message explicitly anyway? So in such cases I would expect that for message in mailbox: match message: case {"From" : author, "To" : recipient, "_payload" : body}: do_something(author, recipient, body) is a simple and obvious transform. On the other hand, typecheck failures on objects in a container can be expensive if the container is large. You can only recover from them by checking for a container of untyped objects, and then handling that elementwise anyway. So on general principles, I don't think this idea is bad on the face of it, but it's not obviously great, either. The usual policy in such cases is to wait for a killer app before adding the feature (cf. the @ operator, which waited for literally decades despite having an obvious killer app, at least for a couple of communities that were pretty large then and are huge now). If you really could use match(data) to parse data and create a DataFrame in pure Python, that would be a strong PoC, though not yet a killer app since I doubt Pandas would use it. Obviously something sufficiently performant to be widely used would be nice.

881

Age (days ago)

887

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Christopher Barker
Matt del Valle
Stephen J. Turnbull
Valentin Berlier

Variadic patterns

Matt del Valle

Valentin Berlier

Matt del Valle

Christopher Barker

Matt del Valle

Christopher Barker

Stephen J. Turnbull

tags

participants (4)