[Python-ideas] Re: [Python-ideas][RESEND] PEP Idea: native f-string support as a match pattern

14 Aug 2022

      The proposal can be more tricker than one may think.

Leave aside the precision and width specification that f'{foo:02.f'}
can get.

Leave even aside the pattern match syntax.

Suppose the following pattern (to differentiate it from f-string
I'm going to use p-strings)

pattern = p"Your id is {id}"

Somehow then you use the pattern to parse an input and get the id
variable

id = None
pattern.match(some_input)
print(id)
=> 42

Where some_input would be "Your id is 42".

So good so far.

But now, let's play with the possibilities of some_input.

The following will fail to match due slight minor differences:

"your id is 42"   => fails to match
"Your id is 42."  => matches "42.", probably not correct
"Your id is  42"  => fails to match

With larger inputs/patterns, it can be really hard to spot where
are the differences.

What about this pattern?

pattern = p"Say hello: {hello}{world}"
pattern.match("Say hello: foobar")
print(hello)
=> ???
print(world)
=> ???

I implemented myself something like the above as part of the engine
of byexample. There, the users write what are the expected texts and
optionally can match fragments of the text in a very similar
way as presented in this thread. "Capturing" fragments as it is known
byexample docs.

Trust me, those "minor differences" appear all the time and there are a
few gotchas here and there.

A too-exact pattern matching will probably be useful to only a narrow
set of use cases.

And if a pattern fails, spotting where are the differences can be
non-trivial.

Computing just a naive diff between the pattern and the input will not
work (well, it will work, but the diff will be much longer):

diff(expected=p"Your id is {id}", obtained="your id is 42")
=>
  - Your id is {id}
  ? ^          ^^^^
  + your id is 42
  ? ^          ^^

The naive diff may point the real difference ("Y" and "y") but it will also spot
non-real differences like "{id}" and "42"

Larger patterns / inputs make this much worse very quickly.

In byexample I spent a lot of time trying to compute a
meaningful diff for the users and I improved the thing to some extend
but there are patterns whose diff cannot be simplified and debugging
those match failures are more than challenging.

If there is interest, I could refactor byexample and extract the pattern
match engine out of it as a lib so we can use a concrete functional lib
to test the usefulness of it on the real world.

I could perhaps extract the diff engine too.

For reference:

Pattern matching (in byexample they are called capture tags):
https://byexamples.github.io/byexample/basic/capture-and-paste

How greedy/lazy the capturing can affect the outcome:
https://byexamples.github.io/byexample/advanced/greedy-lazy-tags

When the input has some unwanted/unknown amount of whitespace
it would be desired to make the pattern more relaxed:
https://byexamples.github.io/byexample/basic/normalize-whitespace

How the diff is computed when the capture tags are present in the
pattern:
https://byexamples.github.io/byexample/overview/differences#guessing-the-tag...

Sorry for the self-promotion :)

Thanks,
Martin.

On Sun, Aug 14, 2022 at 02:45:57AM +0900, Stephen J. Turnbull wrote:
...
Sorry, I accidentally sent before I was done.
Tushar Sadhwani writes:
...
Since Python has built-in syntax for interpolated strings, I
believe it's a good area to idea to extend it to pattern matching,
like so:
def unquote(string: str) -> str:
        match string:
            case f'"{value}"':
                return value
            case f"'{value}'":
                return value
            case _:
                return string
I think this is a pretty unconvincing example.  While people seem to
love to hate on regular expressions, it's hard to see how that beats
def unquote(string: str) -> str:
       m = re.match(r"""^(?:
                            "(.*)"    # "-delimiters
                           |'(.*)'    # '-delimiters
                           |(.*))     # no delimiters
                           $""",
                    string,
                    re.VERBOSE)
       return m.group(1) or m.group(2) or m.group(3)
and this is absolutely clearer than either pattern-matching approach:
def unquote(string: str) -> str:
       # Gilding the lily, but it's obvious how to extend to other
       # symmetric delimiters, and straightforward for asymmetric
       # delimiters.
       for quotechar in ("'", '"'):
           if string.startswith(quotechar) and string.endswith(quotechar):
               return string[1:-1]
       else:
           return string
...
Doing this with current match syntax is not as easy.
Sure, but that's why we have .startswith and .endswith, and for more
complex cases, why we have regular expressions.  Chris Angelico has
a>proposed adding (something like) C's scanf to the stdlib, as well.
...
I have other reasons to consider this idea as well,
Given the above, I really think you need to bring those up if you want
to pursue this idea.
...
but before I try to pursue this, I'd like to know if something like this was
already discussed when the proposal was being made?
Looking at PEPs 634-6, and especially 635, I doubt it (but I didn't
participate in those discussions).  On the one hand, the match
statement is for matching (a subset of) Python expression structure,
not string structure.  On the other, although an f-string *is* an
expression, it doesn't really feel like one, to me, anyway.  Also, I
think it would be miserable to code (do you really want to support the
full replacement field syntax with widths, justification, precision,
and more) and the bikeshedding will be horrific (supposing you support
a restricted field syntax with no width and precision, is
case f"{value:03.2f}":
an error or do you ignore the padding, width, and precision?)
It's a very interesting suggestion, but I suspect the bikeshedding is
going to be more fun than the implementation.
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/H5KDLU...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: [Python-ideas][RESEND] PEP Idea: native f-string support as a match pattern

Martin Di Paola