PEP Idea: native f-string support as a match pattern
Since Python has built-in syntax for interpolated strings, I believe it's a good area to idea to extend it to pattern matching, like so: def unquote(string: str) -> str: match string: case f'"{value}"': return value case f"'{value}'": return value case _: return string Doing this with current match syntax is not as easy. I have other reasons to consider this idea as well, but before I try to pursue this, I'd like to know if something like this was already discussed when the proposal was being made?
Tushar Sadhwani writes:
Since Python has built-in syntax for interpolated strings, I believe it's a good area to idea to extend it to pattern matching, like so:
def unquote(string: str) -> str: match string: case f'"{value}"': return value case f"'{value}'": return value case _: return string
I think this is a pretty unconvincing example. While people seem to love to hate on regular expressions, it's hard to see how that beats def unquote(string: str) -> str: m = re.match(r"^(?:\"(.*)\"|'(.*)'|(?Pvalue3))$", string)
Doing this with current match syntax is not as easy.
I have other reasons to consider this idea as well, but before I try to pursue this, I'd like to know if something like this was already discussed when the proposal was being made? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4D2CRU... Code of Conduct: http://python.org/psf/codeofconduct/
Stephen J. Turnbull wrote:
I think this is a pretty unconvincing example. While people seem to love to hate on regular expressions, it's hard to see how that beats def unquote(string: str) -> str: m = re.match(r"^(?:"(.*)"|'(.*)'|(?Pvalue3))$", string)
RegEx feels overkill for this. Certainly takes longer to read, understand and test. Here's a more convincing example, let's build an imaginary data format parser: ```python data = '''\ scores: - Matthew: 100 - David: 90 groceries: - spam: 3 - eggs: 12 ''' parsed = {} for line in data.splitlines(): match line: case f"{heading}:" parsed[heading] = {} case f"- {name}: {count}": parsed[heading][name] = int(count) print(parsed) # Gives {'scores': {'Matthew': 100, 'David': 90}, 'groceries': {'spam': 3, 'eggs': 12}} ```
On Sat, Aug 13, 2022 at 10:23 AM Tushar Sadhwani < tushar.sadhwani000@gmail.com> wrote:
Stephen J. Turnbull wrote:
I think this is a pretty unconvincing example. While people seem to love to hate on regular expressions, it's hard to see how that beats def unquote(string: str) -> str: m = re.match(r"^(?:"(.*)"|'(.*)'|(?Pvalue3))$", string)
RegEx feels overkill for this. Certainly takes longer to read, understand and test.
f-strings are inherently ambiguous. You've chosen cases with unambiguous and mutually exclusive prefixes, but that won't generally be the case. You'd need implicit rules about greediness, which could easily lead to some buggy code looking correct. For what it's worth, Raymond Hettinger has some code examples that make using regexes in match statements much easier: see https://www.dropbox.com/s/w1bs8ckekki9ype/PyITPatternMatchingTalk.pdf?dl=0 and https://twitter.com/i/web/status/1533369943764488192 Best wishes, Lucas Wiman
Sorry, I accidentally sent before I was done. Tushar Sadhwani writes:
Since Python has built-in syntax for interpolated strings, I believe it's a good area to idea to extend it to pattern matching, like so:
def unquote(string: str) -> str: match string: case f'"{value}"': return value case f"'{value}'": return value case _: return string
I think this is a pretty unconvincing example. While people seem to love to hate on regular expressions, it's hard to see how that beats def unquote(string: str) -> str: m = re.match(r"""^(?: "(.*)" # "-delimiters |'(.*)' # '-delimiters |(.*)) # no delimiters $""", string, re.VERBOSE) return m.group(1) or m.group(2) or m.group(3) and this is absolutely clearer than either pattern-matching approach: def unquote(string: str) -> str: # Gilding the lily, but it's obvious how to extend to other # symmetric delimiters, and straightforward for asymmetric # delimiters. for quotechar in ("'", '"'): if string.startswith(quotechar) and string.endswith(quotechar): return string[1:-1] else: return string
Doing this with current match syntax is not as easy.
Sure, but that's why we have .startswith and .endswith, and for more complex cases, why we have regular expressions. Chris Angelico has proposed adding (something like) C's scanf to the stdlib, as well.
I have other reasons to consider this idea as well,
Given the above, I really think you need to bring those up if you want to pursue this idea.
but before I try to pursue this, I'd like to know if something like this was already discussed when the proposal was being made?
Looking at PEPs 634-6, and especially 635, I doubt it (but I didn't participate in those discussions). On the one hand, the match statement is for matching (a subset of) Python expression structure, not string structure. On the other, although an f-string *is* an expression, it doesn't really feel like one, to me, anyway. Also, I think it would be miserable to code (do you really want to support the full replacement field syntax with widths, justification, precision, and more) and the bikeshedding will be horrific (supposing you support a restricted field syntax with no width and precision, is case f"{value:03.2f}": an error or do you ignore the padding, width, and precision?) It's a very interesting suggestion, but I suspect the bikeshedding is going to be more fun than the implementation. Steve
The proposal can be more tricker than one may think. Leave aside the precision and width specification that f'{foo:02.f'} can get. Leave even aside the pattern match syntax. Suppose the following pattern (to differentiate it from f-string I'm going to use p-strings) pattern = p"Your id is {id}" Somehow then you use the pattern to parse an input and get the id variable id = None pattern.match(some_input) print(id) => 42 Where some_input would be "Your id is 42". So good so far. But now, let's play with the possibilities of some_input. The following will fail to match due slight minor differences: "your id is 42" => fails to match "Your id is 42." => matches "42.", probably not correct "Your id is 42" => fails to match With larger inputs/patterns, it can be really hard to spot where are the differences. What about this pattern? pattern = p"Say hello: {hello}{world}" pattern.match("Say hello: foobar") print(hello) => ??? print(world) => ??? I implemented myself something like the above as part of the engine of byexample. There, the users write what are the expected texts and optionally can match fragments of the text in a very similar way as presented in this thread. "Capturing" fragments as it is known byexample docs. Trust me, those "minor differences" appear all the time and there are a few gotchas here and there. A too-exact pattern matching will probably be useful to only a narrow set of use cases. And if a pattern fails, spotting where are the differences can be non-trivial. Computing just a naive diff between the pattern and the input will not work (well, it will work, but the diff will be much longer): diff(expected=p"Your id is {id}", obtained="your id is 42") => - Your id is {id} ? ^ ^^^^ + your id is 42 ? ^ ^^ The naive diff may point the real difference ("Y" and "y") but it will also spot non-real differences like "{id}" and "42" Larger patterns / inputs make this much worse very quickly. In byexample I spent a lot of time trying to compute a meaningful diff for the users and I improved the thing to some extend but there are patterns whose diff cannot be simplified and debugging those match failures are more than challenging. If there is interest, I could refactor byexample and extract the pattern match engine out of it as a lib so we can use a concrete functional lib to test the usefulness of it on the real world. I could perhaps extract the diff engine too. For reference: Pattern matching (in byexample they are called capture tags): https://byexamples.github.io/byexample/basic/capture-and-paste How greedy/lazy the capturing can affect the outcome: https://byexamples.github.io/byexample/advanced/greedy-lazy-tags When the input has some unwanted/unknown amount of whitespace it would be desired to make the pattern more relaxed: https://byexamples.github.io/byexample/basic/normalize-whitespace How the diff is computed when the capture tags are present in the pattern: https://byexamples.github.io/byexample/overview/differences#guessing-the-tag... Sorry for the self-promotion :) Thanks, Martin. On Sun, Aug 14, 2022 at 02:45:57AM +0900, Stephen J. Turnbull wrote:
Sorry, I accidentally sent before I was done.
Tushar Sadhwani writes:
Since Python has built-in syntax for interpolated strings, I believe it's a good area to idea to extend it to pattern matching, like so:
def unquote(string: str) -> str: match string: case f'"{value}"': return value case f"'{value}'": return value case _: return string
I think this is a pretty unconvincing example. While people seem to love to hate on regular expressions, it's hard to see how that beats
def unquote(string: str) -> str: m = re.match(r"""^(?: "(.*)" # "-delimiters |'(.*)' # '-delimiters |(.*)) # no delimiters $""", string, re.VERBOSE) return m.group(1) or m.group(2) or m.group(3)
and this is absolutely clearer than either pattern-matching approach:
def unquote(string: str) -> str: # Gilding the lily, but it's obvious how to extend to other # symmetric delimiters, and straightforward for asymmetric # delimiters. for quotechar in ("'", '"'): if string.startswith(quotechar) and string.endswith(quotechar): return string[1:-1] else: return string
Doing this with current match syntax is not as easy.
Sure, but that's why we have .startswith and .endswith, and for more complex cases, why we have regular expressions. Chris Angelico has a>proposed adding (something like) C's scanf to the stdlib, as well.
I have other reasons to consider this idea as well,
Given the above, I really think you need to bring those up if you want to pursue this idea.
but before I try to pursue this, I'd like to know if something like this was already discussed when the proposal was being made?
Looking at PEPs 634-6, and especially 635, I doubt it (but I didn't participate in those discussions). On the one hand, the match statement is for matching (a subset of) Python expression structure, not string structure. On the other, although an f-string *is* an expression, it doesn't really feel like one, to me, anyway. Also, I think it would be miserable to code (do you really want to support the full replacement field syntax with widths, justification, precision, and more) and the bikeshedding will be horrific (supposing you support a restricted field syntax with no width and precision, is
case f"{value:03.2f}":
an error or do you ignore the padding, width, and precision?)
It's a very interesting suggestion, but I suspect the bikeshedding is going to be more fun than the implementation.
Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/H5KDLU... Code of Conduct: http://python.org/psf/codeofconduct/
participants (4)
-
Lucas Wiman
-
Martin Di Paola
-
Stephen J. Turnbull
-
Tushar Sadhwani