On Sat, Sep 19, 2020 at 12:10 PM Wes Turner <wes.turner@gmail.com> wrote:
Regex uses the ? symbol to indicate that something is a "non-greedy" match (to default to "shortest match")
exactly -- Regex was designed to be a parsing language, format specifiers were not. I'm quite surprised by how little the parse package has had to adapt the format language to a parsing language, but it has indeed adapted it. I'm honestly not sure how confusing that would be to have a built in parsing language that looks like the format one, but behaves differently. I suspect it's particularly an issue if we did assigning to fstrings, and less so if it were a string method or stand alone function. Trying parse with my earlier example in this thread: In [1]: x, y, z = 23, 45, 67 In [2]: a_string = f"{x}{y}{z}" In [3]: a_string Out[3]: '234567' In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}> In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}> So that's interesting -- different level of "greadiness" for strings than integers In [7]: parse("{x:2d}{y:2d}{z:2d}", a_string) Out[7]: <Result () {'x': 23, 'y': 45, 'z': 67}> And now we get back what we started with -- not bad. I'm liking this -- I think it would be good to have parse, or something like in, in the stdlib, maybe as a string method. Then maybe consider some auto-assigning behavior -- though I'm pretty sceptical of that, and Wes' point about debugging is a good one. It would create a real debugging / testing nightmare to have stuff auto-assigned into locals. -CHB
import re str_ = "a:b:c" assert re.match(r'(.*):(.*)', str_).groups() == ("a:b", "c") assert re.match(r'(.*?):(.*)', str_).groups() == ("a", "b:c")
Typically, debugging parsing issues involves testing the output of a function (not changes to locals()).
Parse defaults to (case-insensitive) non-greedy/shortest-match:
parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:
pattern = '{dir1}/{dir2}' data = 'root/parent/subdir' sorted(parse(pattern, data).named.items()) [('dir1', 'root'), ('dir2', 'parent/subdir')]
So, even though {'dir1': 'root/parent', 'dir2': 'subdir'} would also fit the pattern, the actual match represents the shortest successful match for dir1.
https://github.com/r1chardj0n3s/parse#potential-gotchas
https://github.com/r1chardj0n3s/parse#format-specification :
Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day.
On Sat, Sep 19, 2020, 1:00 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Parsing can be ambiguous: f"{x}:{y}" = "a:b:c" Does this set x = "a" y = "b:c" or x = "a:b" y = "c" Rob Cliffe
On 17/09/2020 05:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
=== Rationale ===
Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible.
What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases:
>>> a[n] = 17 >>> a[n] == 17 True
>>> obj.x = "foo" >>> obj.x == "foo" True
# Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11'
This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink.
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
You could also do a strange mishmash of built-in str operations, like
>>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11'
But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this?
=== Specification (open to bikeshedding) ===
In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError:
>>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345'
If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string.
The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt.
Let me know what you think! _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKO... Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CVPRH5... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HFNRY3... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython