f-strings as assignment targets
TL;DR: I propose the following behavior: >>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt' >>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.' >>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59) === Rationale === Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible. What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases: >>> a[n] = 17 >>> a[n] == 17 True >>> obj.x = "foo" >>> obj.x == "foo" True # Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11' This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink. === Existing way of achieving this === As of now, you could achieve the behavior with regular expressions: >>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11' But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon. You could also do a strange mishmash of built-in str operations, like >>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11' But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this? === Specification (open to bikeshedding) === In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError: >>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345' If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string. The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt. Let me know what you think!
On Thu, 17 Sep 2020 at 13:38, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
Something very similar to this already exists on PyPI: https://pypi.org/project/parse/ I don't have a strong opinion on whether it would be useful in the stdlib, other than to say that I've never personally used it, so it's not that significant a need for me. Paul
On Thu, Sep 17, 2020 at 11:02 PM Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 17 Sep 2020 at 13:38, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
Something very similar to this already exists on PyPI: https://pypi.org/project/parse/
I don't have a strong opinion on whether it would be useful in the stdlib, other than to say that I've never personally used it, so it's not that significant a need for me.
I've frequently yearned for an sscanf-like feature in Python. Usually I end up longhanding it with string methods, or else reaching for a regex, but neither of those is quite what I want. I'd prefer scanf notation to format strings, but either is acceptable. Assigning to an f-string is VERY tempting. It doesn't quite sit right with me (f-strings feel like literals, even though they aren't, and it doesn't make sense to assign to a string literal) - but I do like the concept. If it existed in the language, I would definitely use it. That's not to say it should exist, but I would use it if it did :) ChrisA
On Thu, 17 Sep 2020 at 14:12, Chris Angelico <rosuav@gmail.com> wrote:
Assigning to an f-string is VERY tempting. It doesn't quite sit right with me (f-strings feel like literals, even though they aren't, and it doesn't make sense to assign to a string literal) - but I do like the concept. If it existed in the language, I would definitely use it. That's not to say it should exist, but I would use it if it did :)
That's a good point. If a feature like this existed, I would probably use it too. But: 1. I don't like the idea of assigning to f-strings - like you say that feels too much like assigning to a literal. A stdlib function would feel *far* more natural to me. 2. Unlike formatting, error handling is much more important (and more app-specific) for parsing. A "one size fits all" ValueError isn't likely to be sufficient for anything but quick scripts, I suspect. Paul
It seems that the variables come out magically. What about something like: a, b = "hello world".extract("{} {}") PS: I do not like extract, it's the first name that comes in my mind
On Thu, Sep 17, 2020 at 11:09:35PM +1000, Chris Angelico wrote:
I've frequently yearned for an sscanf-like feature in Python. Usually I end up longhanding it with string methods, or else reaching for a regex, but neither of those is quite what I want. I'd prefer scanf notation to format strings, but either is acceptable.
Why make this a syntactic feature when a scanf function would do? This is an often-desired feature: https://duckduckgo.com/?q=python+scanf and for a while there was even documentation on how to simulate it: https://docs.python.org/2.5/lib/node49.html and there are plenty of versions on the web, e.g.: https://code.activestate.com/recipes/502213-simple-scanf-implementation/ Perhaps someone who cares for this more than I do could do the research to find the best implementation and write a PEP. -- Steve
On Fri, Sep 18, 2020 at 10:51 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Sep 17, 2020 at 11:09:35PM +1000, Chris Angelico wrote:
I've frequently yearned for an sscanf-like feature in Python. Usually I end up longhanding it with string methods, or else reaching for a regex, but neither of those is quite what I want. I'd prefer scanf notation to format strings, but either is acceptable.
Why make this a syntactic feature when a scanf function would do?
Because a scanf function can't assign directly. In fact, the exact same issue that led to f-strings in the first place; there's no reliable way to embed the names into the format string without a lot of redundancy. ChrisA
On Fri, Sep 18, 2020 at 10:53:57AM +1000, Chris Angelico wrote:
On Fri, Sep 18, 2020 at 10:51 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Sep 17, 2020 at 11:09:35PM +1000, Chris Angelico wrote:
I've frequently yearned for an sscanf-like feature in Python. Usually I end up longhanding it with string methods, or else reaching for a regex, but neither of those is quite what I want. I'd prefer scanf notation to format strings, but either is acceptable.
Why make this a syntactic feature when a scanf function would do?
Because a scanf function can't assign directly. In fact, the exact same issue that led to f-strings in the first place; there's no reliable way to embed the names into the format string without a lot of redundancy.
But that's a *separate problem*. Regexes can't assign directly either. And we wouldn't want them to! (It's okay for a regex to have it's own internal namespace, like named groups, but it shouldn't leak out into the locals or globals.) Extracting data from a string, like scanf, regexes, sed, awk, SNOBOL etc sounds like a big win. Assignment should be a separate problem. And at last I think I have thought of a use of dict unpacking I like. If our scanf(pattern, target) function returns a dict of {name: value} pairs, how do we apply it to locals? target [, names] = **scanf(pattern, target) where dict assignment matches assignment targets on the left with keys in the dict. Acceptable target names are simple identifiers, dotted names and subscripts: spam, eggs.fried, cheese[0] = **{'cheese[0]': 3, 'spam': 1, 'eggs.fried': 2} would do the obvious assignments. (I could live without the dotted names and subscripts, if people don't like the additional complexity.) Targets missing a key:value, or keys missing a target, would raise an exception. The bottom line here is that separation of concerns is a principle we should follow. Text scanning and assignment are two distinct problems and we should keep them distinct. This will allow us to pre-process the pattern we want to match, and post-process the results of the scan, e.g. spam, eggs, cheese = **(defaults | scanf(pattern, string)) We could have multiple scanners too, anything that returned a dict of target names and values. We wouldn't need to build the scanner into the interpreter, only the assignment syntax. The scanner itself is just a function. -- Steve
On Fri, Sep 18, 2020 at 4:56 AM Steven D'Aprano <steve@pearwood.info> wrote:
But that's a *separate problem*. Regexes can't assign directly either. And we wouldn't want them to! (It's okay for a regex to have it's own internal namespace, like named groups, but it shouldn't leak out into the locals or globals.)
Extracting data from a string, like scanf, regexes, sed, awk, SNOBOL etc sounds like a big win. Assignment should be a separate problem.
And at last I think I have thought of a use of dict unpacking I like. If our scanf(pattern, target) function returns a dict of {name: value} pairs, how do we apply it to locals?
target [, names] = **scanf(pattern, target)
where dict assignment matches assignment targets on the left with keys in the dict. Acceptable target names are simple identifiers, dotted names and subscripts:
spam, eggs.fried, cheese[0] = **{'cheese[0]': 3, 'spam': 1, 'eggs.fried': 2}
would do the obvious assignments. (I could live without the dotted names and subscripts, if people don't like the additional complexity.)
Targets missing a key:value, or keys missing a target, would raise an exception.
The bottom line here is that separation of concerns is a principle we should follow. Text scanning and assignment are two distinct problems and we should keep them distinct. This will allow us to pre-process the pattern we want to match, and post-process the results of the scan, e.g.
spam, eggs, cheese = **(defaults | scanf(pattern, string))
We could have multiple scanners too, anything that returned a dict of target names and values. We wouldn't need to build the scanner into the interpreter, only the assignment syntax. The scanner itself is just a function.
+1. Dict unpacking would be great, I'm not sure why we don't have it already. And I agree with the separation of concerns. If we had dict unpacking we could already do this kind of thing with named capture groups and groupdict.
On Thu, Sep 17, 2020 at 9:01 AM Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 17 Sep 2020 at 13:38, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
Something very similar to this already exists on PyPI: https://pypi.org/project/parse/
I like the idea and am another person that would definitely use it if such a thing existed, but I agree that parse already makes this kind of thing pretty easy to do. I've at times wondered if something like parse should just become part of the std lib. It seems like a library that could be made to be pretty stable, and it doesn't seem like it is likely to grow many more features than it already has.. from what I understand at least. So keeping it outside the std lib to allow those features can evolve (which is the main argument for keeping, for example, pytest out of the std lib, and for a long time was the argument for attrs until dataclasses came along). +0. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
On Thu, Sep 17, 2020 at 2:38 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
This seems a bit of an unfair basis for comparison. I would probably write it on one line: hh, mm, am_or_pm = re.fullmatch(r'It is (.+):(.+) (.+)', "It is 11:45 PM").groups() Which is not great but not that bad either. More importantly, I would probably validate the input properly in my regex: It is (\d+):(\d+) (AM|PM) I would much rather be able to use regex syntax like that for proper matching and validation than format specifiers like .02f which AFAIK already falls down in the given example with AM|PM. So maybe something like: f"It is {hh:\d+}:{mm:\d+} {am_or_pm:AM|PM}" = "It is 11:45 PM" And then a syntax to also convert to an int would be nice, e.g. `{hh:\d+:int}`, although it probably needs a lot more thought than that.
I really like the symmetry of this approach. On Thu, Sep 17, 2020 at 8:37 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
=== Rationale ===
Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible.
What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases:
>>> a[n] = 17 >>> a[n] == 17 True
>>> obj.x = "foo" >>> obj.x == "foo" True
# Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11'
This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink.
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
You could also do a strange mishmash of built-in str operations, like
>>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11'
But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this?
=== Specification (open to bikeshedding) ===
In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError:
>>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345'
If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string.
The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt.
Let me know what you think! _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKO... Code of Conduct: http://python.org/psf/codeofconduct/
-- CALVIN SPEALMAN SENIOR QUALITY ENGINEER cspealma@redhat.com M: +1.336.210.5107 [image: https://red.ht/sig] <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
In general, there's no way to start with a format specifier string and from it get the type of the object that it should be applied to. For example, any string without %'s is a valid datetime format specifier (of dubious value, but such is life). Perhaps a better example is decimal vs. float. What if I want %.2f to return a decimal? Would that just not be possible? So I think you'd have to limit this to a small set of built-in types. In general, I think overloading f-strings as assignment targets would be confusing. But I've been wrong before. Eric On 9/17/2020 12:52 AM, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
=== Rationale ===
Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible.
What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases:
>>> a[n] = 17 >>> a[n] == 17 True
>>> obj.x = "foo" >>> obj.x == "foo" True
# Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11'
This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink.
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
You could also do a strange mishmash of built-in str operations, like
>>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11'
But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this?
=== Specification (open to bikeshedding) ===
In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError:
>>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345'
If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string.
The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt.
Let me know what you think! _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKO... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Sep 18, 2020 at 1:38 AM Eric V. Smith <eric@trueblade.com> wrote:
In general, there's no way to start with a format specifier string and from it get the type of the object that it should be applied to. For example, any string without %'s is a valid datetime format specifier (of dubious value, but such is life). Perhaps a better example is decimal vs. float. What if I want %.2f to return a decimal? Would that just not be possible?
So I think you'd have to limit this to a small set of built-in types.
In general, I think overloading f-strings as assignment targets would be confusing. But I've been wrong before.
It doesn't have to be absolutely identical, just as long as it's mostly parallel. Consider C's printf and scanf formats; any whitespace in scanf matches any whitespace, and the integer handlers are always happy to accept a leading plus sign (which has to be specifically requested in printf). Conversely, scanf can be more restrictive, eg %[a-z] which will match some sequence of lowercase ASCII letters. IMO printf notation (what in Python does with str % ...) is better suited to this than Python's .format() notation (which f-strings also use). But the advantages of f-strings would also apply here. Maybe the best way would be the tagged version of percent formatting?
x, y = 1, 2 "==> (%(x)d, %(y)d)" % globals() '==> (1, 2)'
Theory: "==> (%(x)d, %(y)d)" = "==> (1, 2)" could assign x=1 and y=2. (It'd want to have some sort of tag to show what's going on, but not f"..." since it's not an f-string.) PEP 622 pattern matching may be relevant here. ChrisA
On 9/17/2020 12:24 PM, Chris Angelico wrote:
On Fri, Sep 18, 2020 at 1:38 AM Eric V. Smith <eric@trueblade.com> wrote:
In general, there's no way to start with a format specifier string and from it get the type of the object that it should be applied to. For example, any string without %'s is a valid datetime format specifier (of dubious value, but such is life). Perhaps a better example is decimal vs. float. What if I want %.2f to return a decimal? Would that just not be possible?
So I think you'd have to limit this to a small set of built-in types.
In general, I think overloading f-strings as assignment targets would be confusing. But I've been wrong before.
It doesn't have to be absolutely identical, just as long as it's mostly parallel. Consider C's printf and scanf formats; any whitespace in scanf matches any whitespace, and the integer handlers are always happy to accept a leading plus sign (which has to be specifically requested in printf). Conversely, scanf can be more restrictive, eg %[a-z] which will match some sequence of lowercase ASCII letters.
Sure. And the less it looks like "real" f-strings, the worse it is, IMO. If someone wanted to play with this in pure Python, string.Formatter's parse() method would do all of the parsing for you.
IMO printf notation (what in Python does with str % ...) is better suited to this than Python's .format() notation (which f-strings also use). But the advantages of f-strings would also apply here. Maybe the best way would be the tagged version of percent formatting?
x, y = 1, 2 "==> (%(x)d, %(y)d)" % globals() '==> (1, 2)'
Theory:
"==> (%(x)d, %(y)d)" = "==> (1, 2)"
could assign x=1 and y=2. (It'd want to have some sort of tag to show what's going on, but not f"..." since it's not an f-string.)
PEP 622 pattern matching may be relevant here.
I was going to mention PEP 622, but deleted that portion of my response. The resulting hundred emails are on you, not me! Eric
On Thu, Sep 17, 2020 at 1:13 PM Eric V. Smith <eric@trueblade.com> wrote:
On 9/17/2020 12:24 PM, Chris Angelico wrote:
On Fri, Sep 18, 2020 at 1:38 AM Eric V. Smith <eric@trueblade.com> wrote:
In general, there's no way to start with a format specifier string and from it get the type of the object that it should be applied to. For example, any string without %'s is a valid datetime format specifier (of dubious value, but such is life). Perhaps a better example is decimal vs. float. What if I want %.2f to return a decimal? Would that just not be possible?
So I think you'd have to limit this to a small set of built-in types.
In general, I think overloading f-strings as assignment targets would be confusing. But I've been wrong before.
It doesn't have to be absolutely identical, just as long as it's mostly parallel. Consider C's printf and scanf formats; any whitespace in scanf matches any whitespace, and the integer handlers are always happy to accept a leading plus sign (which has to be specifically requested in printf). Conversely, scanf can be more restrictive, eg %[a-z] which will match some sequence of lowercase ASCII letters.
Sure. And the less it looks like "real" f-strings, the worse it is, IMO.
If someone wanted to play with this in pure Python, string.Formatter's parse() method would do all of the parsing for you.
Surprisingly it is not *quite* all. The format specification mini language parser is not exposed for use. This has bummed me out more than once in the past. Here is my attempt at recreating it from a couple years ago. Do with it what you will.
r=r'(([\s\S])?([<>=\^]))?([\+\- ])?([#])?([0])?(\d)*([,])?((\.)(\d)*)?([sbcdoxXneEfFgGn%])?'
from collections import namedtuple as nt FormatSpec = nt('FormatSpec', 'fill align sign alt zero_padding width comma decimal precision type')
import re spec = FormatSpec(*re.fullmatch(r,'x>5.2f').group(2,3,4,5,6,7,8,10,11,12)) # skip groups not interested in spec FormatSpec(fill='x', align='>', sign=None, alt=None, zero_padding=None, width='5', comma=None, decimal='.', precision='2', type='f') ''.join(s for s in spec if s is not None) # recreate the input spec 'x>5.2f'
--- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
On 9/17/2020 1:28 PM, Ricky Teachey wrote:
On Thu, Sep 17, 2020 at 1:13 PM Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:
If someone wanted to play with this in pure Python, string.Formatter's parse() method would do all of the parsing for you.
Surprisingly it is not *quite* all. The format specification mini language parser is not exposed for use. This has bummed me out more than once in the past.
Yeah, I've considered exposing this from the C code, but was never super motivated about it. I go back and forth trying to decide if it's a bad idea or not. If you want to cheat, you can use the version at _pydecimal._parse_format_specifier. The leading underscores should give you an idea of just how unsupported this is! But as far as I know, it works. Eric
On Thu, Sep 17, 2020 at 1:39 PM Eric V. Smith <eric@trueblade.com> wrote:
On 9/17/2020 1:28 PM, Ricky Teachey wrote:
On Thu, Sep 17, 2020 at 1:13 PM Eric V. Smith <eric@trueblade.com> wrote:
If someone wanted to play with this in pure Python, string.Formatter's parse() method would do all of the parsing for you.
Surprisingly it is not *quite* all. The format specification mini language parser is not exposed for use. This has bummed me out more than once in the past.
Yeah, I've considered exposing this from the C code, but was never super motivated about it. I go back and forth trying to decide if it's a bad idea or not.
This is a fantastic idea!
... The leading underscores should give you an idea of just how unsupported this is! But as far as I know, it works.
Eric
Probably should add a few more just to be safe. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
On 9/17/2020 11:38 AM, Eric V. Smith wrote:
In general, there's no way to start with a format specifier string and from it get the type of the object that it should be applied to. For example, any string without %'s is a valid datetime format specifier (of dubious value, but such is life). Perhaps a better example is decimal vs. float. What if I want %.2f to return a decimal? Would that just not be possible?
So I think you'd have to limit this to a small set of built-in types.
In general, I think overloading f-strings as assignment targets would be confusing. But I've been wrong before.
Also, it only works with literals. I can easily see wanting to build up the scanf-like string programatically. So that puts me at -1. f-strings only being literals is to avoid code injection. There's no such requirement with a scanf-like string that basically describes a regex and some assignment targets. Eric
On 2020-09-16 21:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
I don't like this at all. It looks like assigning to a literal, which is weird. Also, it hides the assignment target within a string of potentially unbounded length and complexity, which makes it difficult to reason about code because it's hard to see when variables are being assigned to. It also introduces a whole host of questions about the details of the parsing (i.e., how does the greediness work if the pattern is something like "{one} {two} {three} {four}" and the string to be parsed is five or ten words). I think a better approach for something like this would be a .parse() method of some sort on strings, sort of like the inverse of .format(). It would parse a given string according to the format string and return a dict with the mappings (just as format can take a dict in and return the string with the substitutions made). So it would be like:
pattern = "That {animal} is really {attribute}." d = pattern.parse("That snake is really powerful.") d['animal'] 'snake' d['attribute'] 'powerful'
This wouldn't let you assign the results into local variables, but I think that's a good thing. Creating local variables "programmatically" is not a good idea; it's better to use a dict. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Thu, Sep 17, 2020 at 8:57 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-09-16 21:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
I don't like this at all. It looks like assigning to a literal, which is weird.
People keep saying this, but iterable unpacking looks like assigning to a literal (a tuple or list literal) just as much. Also PEP 622 proposes something that looks like assignment to a function call, albeit within a match/case statement. It's natural to have symmetry between assignments and expression. For another example, look at subscripting, i.e. `__getitem__` vs `__setitem__`.
Also, it hides the assignment target within a string of potentially unbounded length and complexity, which makes it difficult to reason about code because it's hard to see when variables are being assigned to.
It's really not. A decent IDE should already be able to automatically show you assignments and usages of a variable - PyCharm does with one Ctrl+click. A syntax highlighter that can handle f-strings will make the assignments obvious at a glance.
It also introduces a whole host of questions about the details of the parsing (i.e., how does the greediness work if the pattern is something like "{one} {two} {three} {four}" and the string to be parsed is five or ten words).
This I agree with, another reason to go for putting regexes in the f-strings like I suggested.
I think a better approach for something like this would be a .parse() method of some sort on strings, sort of like the inverse of .format(). It would parse a given string according to the format string and return a dict with the mappings (just as format can take a dict in and return the string with the substitutions made). So it would be like:
pattern = "That {animal} is really {attribute}." d = pattern.parse("That snake is really powerful.") d['animal'] 'snake' d['attribute'] 'powerful'
I think someone else just made the same proposal. But how does this solve the greediness issue?
This wouldn't let you assign the results into local variables, but I think that's a good thing. Creating local variables "programmatically" is not a good idea; it's better to use a dict.
How is f"{a} {b}" = "1 2" creating local variables any more "programmatically" than a, b = "1 2".split() ? The variables are static and visible to the compiler.
On Thu, Sep 17, 2020 at 09:16:46PM +0200, Alex Hall wrote:
On Thu, Sep 17, 2020 at 8:57 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
I don't like this at all. It looks like assigning to a literal, which is weird.
People keep saying this, but iterable unpacking looks like assigning to a literal (a tuple or list literal) just as much.
Tuples and lists don't have literals, they have *displays*. You won't find them here: https://docs.python.org/3/reference/expressions.html#literals nor here: https://docs.python.org/3/reference/lexical_analysis.html#literals (although you will find f-strings, which I think is wrong, but I imagine that there will be a *huge* amount of pushback if I suggest we move f-strings to another section). Tuples are here: https://docs.python.org/3/reference/expressions.html#parenthesized-forms and list, set and dict displays immediately afterwards here: https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets... If there is anything that might justify the name "tuple (or list) literal" it would be a tuple with each item a literal: (1, "a", None) but certainly not one containing expressions or names: (spam+1, eggs(), aardvark.cheese) While we can use a tuple (or list) of *names* as an assignment target, we cannot use a tuple of literals as a target: (1, 2, 3) = (4, 5, 6) # SyntaxError
Also, it hides the assignment target within a string of potentially unbounded length and complexity, which makes it difficult to reason about code because it's hard to see when variables are being assigned to.
It's really not. A decent IDE should already be able to automatically show you assignments and usages of a variable - PyCharm does with one Ctrl+click. A syntax highlighter that can handle f-strings will make the assignments obvious at a glance.
"A decent IDE" is not mandatory for writing Python code, and I would object strongly to making it so. We should be able to write code in Notepad, if no better alternative presents itself. The beauty of Python is that it is a relatively simple language that encourages code that is neither complex nor complicated. If syntax encourages complex, complicated code that requires an IDE to make sense of, that counts as a point against the syntax. That's not to deny that we can write obfuscated code in any language, but we shouldn't encourage it :-) Having said that, I don't actually agree that Brendan's point will be a major risk. This proposed feature is more or less the same as scanf, and I don't think people are in the habit of writing fifty thousand word scanf templates :-) And if they do, oh well, concenting adults apply. The advantage, as I see it, of special syntax is that it could implicity assign to the named targets. We don't actually need *f* strings for that, we don't even need strings: the {spam} and {eggs} was eaten by the {aarvark} = scanf(string) # automatically assigns to spam, eggs, aardvark could work, although it would be hard to match non-printable characters or special characters that need to be escaped, like newlines. It would also be difficult to do any sort of error handling: what if only a subset of targets match? -- Steve
On Fri, Sep 18, 2020 at 3:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
If there is anything that might justify the name "tuple (or list) literal" it would be a tuple with each item a literal:
(1, "a", None)
but certainly not one containing expressions or names:
(spam+1, eggs(), aardvark.cheese)
That sounds sensible, I hadn't thought about the terminology that carefully. But in case anyone disagrees, the exact terms are not really the point. The point is that this: (a, b) = c is 'assigning to a literal' just as much as this: f"{a} {b}" = c Both are just things that look like expressions with dynamic values but aren't.
On Fri, Sep 18, 2020, 6:35 AM Alex Hall <alex.mojaki@gmail.com> wrote:
The point is that this:
(a, b) = c
is 'assigning to a literal' just as much as this:
f"{a} {b}" = c
Both are just things that look like expressions with dynamic values but aren't.
I agree with you. It's not that whacky in principle. But it still looks very odd. One reason might be because there's "stuff" inside the fstring that isn't getting assigned to (str characters). Tuple assignment syntax doesn't have that extra stuff. Why not just grow a parse method on str that returns a dict and do it this way? q = "{a} {b}" p = "1 2" (a, b) = q.parse(p) Or with the q template string literal: (a, b) = "{a} {b}".parse(p) Yes you write the a and b twice, just as with the format method: p = "{a} {b}".format(a=1, b=2) My point is that for symmetry of usability, if you're going to have the f string assignment syntax you should have a template str parse method too. And the parse method should be decided on first.
On Fri, Sep 18, 2020 at 10:26 PM Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Sep 18, 2020, 8:17 AM Ricky Teachey <ricky@teachey.org> wrote:
Why not just grow a parse method on str that returns a dict and do it this way?
q = "{a} {b}" p = "1 2" (a, b) = q.parse(p)
Sorry that should have been:
(a, b) = q.parse(p).values()
You're using a dictionary as if it were a tuple. That's going to cause a LOT of pain when someone does something like: a, b = "{b} {a}".parse(p).values() and they come out in the wrong order. Bad bad bad idea. Don't have names if they're going to be pure lies. ChrisA
On Fri, Sep 18, 2020, 8:34 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 10:26 PM Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Sep 18, 2020, 8:17 AM Ricky Teachey <ricky@teachey.org> wrote:
Why not just grow a parse method on str that returns a dict and do it
this way?
q = "{a} {b}" p = "1 2" (a, b) = q.parse(p)
Sorry that should have been:
(a, b) = q.parse(p).values()
You're using a dictionary as if it were a tuple. That's going to cause a LOT of pain when someone does something like:
a, b = "{b} {a}".parse(p).values()
and they come out in the wrong order. Bad bad bad idea. Don't have names if they're going to be pure lies.
ChrisA
I'm not sure I understand the point you are making here since dictionaries have preserved order since python 3.6...? The same problem exists here: a, b = 2, 1 assert a == 1. # whoops got the order wrong
On Fri, Sep 18, 2020 at 11:04 PM Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Sep 18, 2020, 8:34 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 10:26 PM Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Sep 18, 2020, 8:17 AM Ricky Teachey <ricky@teachey.org> wrote:
Why not just grow a parse method on str that returns a dict and do it this way?
q = "{a} {b}" p = "1 2" (a, b) = q.parse(p)
Sorry that should have been:
(a, b) = q.parse(p).values()
You're using a dictionary as if it were a tuple. That's going to cause a LOT of pain when someone does something like:
a, b = "{b} {a}".parse(p).values()
and they come out in the wrong order. Bad bad bad idea. Don't have names if they're going to be pure lies.
ChrisA
I'm not sure I understand the point you are making here since dictionaries have preserved order since python 3.6...?
The same problem exists here:
a, b = 2, 1 assert a == 1. # whoops got the order wrong
But you don't have any sort of lie in the RHS about the name mapping. It's just a sequence, so people will expect it to be a sequence. If the parser incorporates names, people will expect it to use those names. Why have it return a dictionary if you're going to assume and mandate that it be a sequence? ChrisA
On Fri, Sep 18, 2020, 9:16 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 11:04 PM Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Sep 18, 2020, 8:34 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 10:26 PM Ricky Teachey <ricky@teachey.org>
wrote:
On Fri, Sep 18, 2020, 8:17 AM Ricky Teachey <ricky@teachey.org>
wrote:
Why not just grow a parse method on str that returns a dict and do
it this way?
q = "{a} {b}" p = "1 2" (a, b) = q.parse(p)
Sorry that should have been:
(a, b) = q.parse(p).values()
You're using a dictionary as if it were a tuple. That's going to cause a LOT of pain when someone does something like:
a, b = "{b} {a}".parse(p).values()
and they come out in the wrong order. Bad bad bad idea. Don't have names if they're going to be pure lies.
ChrisA
I'm not sure I understand the point you are making here since dictionaries have preserved order since python 3.6...?
The same problem exists here:
a, b = 2, 1 assert a == 1. # whoops got the order wrong
But you don't have any sort of lie in the RHS about the name mapping. It's just a sequence, so people will expect it to be a sequence. If the parser incorporates names, people will expect it to use those names. Why have it return a dictionary if you're going to assume and mandate that it be a sequence?
ChrisA
I see your point now. Missed it before: the lie inherent to using the names not the order. But that seems to me to be an argument for dictionaries, not against. You can at least consult the dictionary to find out what name was used to grab the value. And again, similar things can happen in other contexts and it's just people's responsibility to get it right. d = dict(b=1, a=2) a1, b1, a2, b2 = {**d, **d}.values()
This functionality MUST be accessible with a function or a method. (?) As already mentioned, building up a pattern string and then using that on the LHS cannot work:
pattern = "{a:d} {b:d}" pattern += " {c:s}" pattern = "12 34 ef"
This is the way Parse works; you call parse() and it returns a Result:
r = parse("The {} who say {}", "The knights who say Ni!") print(r) <Result ('knights', 'Ni!') {}> print(r.fixed) ('knights', 'Ni!')
https://github.com/r1chardj0n3s/parse#result-and-match-objects : ```rst Result and Match Objects ------------------------ The result of a ``parse()`` and ``search()`` operation is either ``None`` (no match), a ``Result`` instance or a ``Match`` instance if ``evaluate_result`` is False. The ``Result`` instance has three attributes: ``fixed`` A tuple of the fixed-position, anonymous fields extracted from the input. ``named`` A dictionary of the named fields extracted from the input. ``spans`` A dictionary mapping the names and fixed position indices matched to a 2-tuple slice range of where the match occurred in the input. The span does not include any stripped padding (alignment or width). The ``Match`` instance has one method: ``evaluate_result()`` Generates and returns a ``Result`` instance for this ``Match`` object. ``` Similar functionality in the standard library could return e.g. Union[tuple, namedtuple] and expect users to call namedtuple.asdict() when there are template field names specified; but the parse.Result object does support .spans ("A dictionary mapping the names and fixed position indices matched to a 2-tuple slice range of where the match occurred in the input."). On Fri, Sep 18, 2020 at 9:34 AM Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Sep 18, 2020, 9:16 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 11:04 PM Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Sep 18, 2020, 8:34 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 10:26 PM Ricky Teachey <ricky@teachey.org>
wrote:
On Fri, Sep 18, 2020, 8:17 AM Ricky Teachey <ricky@teachey.org>
wrote:
Why not just grow a parse method on str that returns a dict and do
it this way?
q = "{a} {b}" p = "1 2" (a, b) = q.parse(p)
Sorry that should have been:
(a, b) = q.parse(p).values()
You're using a dictionary as if it were a tuple. That's going to cause a LOT of pain when someone does something like:
a, b = "{b} {a}".parse(p).values()
and they come out in the wrong order. Bad bad bad idea. Don't have names if they're going to be pure lies.
ChrisA
I'm not sure I understand the point you are making here since dictionaries have preserved order since python 3.6...?
The same problem exists here:
a, b = 2, 1 assert a == 1. # whoops got the order wrong
But you don't have any sort of lie in the RHS about the name mapping. It's just a sequence, so people will expect it to be a sequence. If the parser incorporates names, people will expect it to use those names. Why have it return a dictionary if you're going to assume and mandate that it be a sequence?
ChrisA
I see your point now. Missed it before: the lie inherent to using the names not the order.
But that seems to me to be an argument for dictionaries, not against. You can at least consult the dictionary to find out what name was used to grab the value.
And again, similar things can happen in other contexts and it's just people's responsibility to get it right.
d = dict(b=1, a=2) a1, b1, a2, b2 = {**d, **d}.values()
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/P4CRHI... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Sep 18, 2020 at 12:16 PM Wes Turner <wes.turner@gmail.com> wrote:
This functionality MUST be accessible with a function or a method. (?)
As already mentioned, building up a pattern string and then using that on the LHS cannot work:
pattern = "{a:d} {b:d}" pattern += " {c:s}" pattern = "12 34 ef"
This is the way Parse works; you call parse() and it returns a Result:
r = parse("The {} who say {}", "The knights who say Ni!") print(r) <Result ('knights', 'Ni!') {}> print(r.fixed) ('knights', 'Ni!')
https://github.com/r1chardj0n3s/parse#result-and-match-objects :
...
Similar functionality in the standard library could return e.g. Union[tuple, namedtuple] and expect users to call namedtuple.asdict() when there are template field names specified; but the parse.Result object does support .spans ("A dictionary mapping the names and fixed position indices matched to a 2-tuple slice range of where the match occurred in the input.").
And I for one do like how the parse.Result object provides pretty much all the information that gets... created?... when a parsing action happens: the parsed value, the name supplied in the template string associated with the value (if any), and also the position in the template string. If any of those were missing in a std lib version, it would feel sort of hobbled, at least to me. So perhaps neither dictionaries nor tuples are a good parse() method result. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
On 9/18/2020 8:19 AM, Ricky Teachey wrote:
On Fri, Sep 18, 2020, 8:17 AM Ricky Teachey <ricky@teachey.org <mailto:ricky@teachey.org>> wrote:
Why not just grow a parse method on str that returns a dict and do it this way?
q = "{a} {b}" p = "1 2" (a, b) = q.parse(p)
Sorry that should have been:
(a, b) = q.parse(p).values()
I don't understand why returning a dict is useful. Unless this becomes as complicated as regexes with grouping, etc. (in which case: use a regex), the values will be returned in the order they appear in the template string. So just have it return a tuple: a, b = q.parse(p) Or for something that could be written today: a, b = parse("{:d} {:d}", "1 2") assert a == 1 and b == 2 I don't see the need for new syntax or new ways to assign values. You're not even removing any duplication compared to: f"{a:d} {b:d}" = "1 2" Although I grant you'd be saving a few characters of typing. But there's no DRY advantage to be gained. And you wouldn't be restricted to using string literals for the template. If it got super popular, make parse() a method on str. As near as I can tell, all of these proposals would require type checkers to be modified to understand them, but that's the nature of the beast when you have the type of something dependent on the contents of a string. They would have to figure out that a and b are int's in my example. So, still -1 on any special syntax for this. Eric
On Fri, Sep 18, 2020, 8:44 AM Eric V. Smith <eric@trueblade.com> wrote:
On 9/18/2020 8:19 AM, Ricky Teachey wrote:
On Fri, Sep 18, 2020, 8:17 AM Ricky Teachey <ricky@teachey.org> wrote:
Why not just grow a parse method on str that returns a dict and do it this way?
q = "{a} {b}" p = "1 2" (a, b) = q.parse(p)
Sorry that should have been:
(a, b) = q.parse(p).values()
I don't understand why returning a dict is useful. Unless this becomes as complicated as regexes with grouping, etc. (in which case: use a regex), the values will be returned in the order they appear in the template string. So just have it return a tuple:
a, b = q.parse(p)
Or for something that could be written today:
a, b = parse("{:d} {:d}", "1 2") assert a == 1 and b == 2
I don't see the need for new syntax or new ways to assign values. You're not even removing any duplication compared to:
f"{a:d} {b:d}" = "1 2"
The larger point I'm really making is that adding parsing support of this kind to the language is a topic that needs to stand on its own without special f string magical syntax. regardless of how it's spelled, whether it's a parse function or a parse method or something else, that is what needs to be discussed first not magical syntax.
On Fri, Sep 18, 2020, 2:22 AM Ricky Teachey
On Fri, Sep 18, 2020, 6:35 AM Alex Hall
The point is that this:
(a, b) = c
is 'assigning to a literal' just as much as this:
f"{a} {b}" = c
Both are just things that look like expressions with dynamic values but aren't.
I agree with you. It's not that whacky in principle. But it still looks very odd.
Alex draws a very unlikely and toy example of f-string as assignment target. For that, we already have the more clear `c.split()` that does the same thing as the example. I never put parentheses on the LHS as in the example. But even with them added unnecessarily, it's short and immediately obvious what gets assigned to. This is not remotely true for f-string target. A better example is: f"""Buried, somewhere, in this string is a mention of {a}, and somewhere else, a mention of {b}. We leave out (c) from this example though. If the RHS gets a dot or tittle wrong we won't wind up assigning, which is {adjective}!""" = c Looking at the LHS and RHS is followed by a laborious copy editor's attention to try to figure out whether it is an assignment or not. It is painfully easy to get the RHS subtly wrong, so no assignment happens (what DOES happen remains unclear in the discussion)
I think it would be easiest to reason about if an Exception is always raised when not everything is assigned to. Just like the behavior of other unpacking assignment, either everything is assigned or there's an error. My apologies if that wasn't clear from the examples.
On Sat, Sep 19, 2020 at 8:59 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I think it would be easiest to reason about if an Exception is always raised when not everything is assigned to. Just like the behavior of other unpacking assignment, either everything is assigned or there's an error. My apologies if that wasn't clear from the examples.
The trouble with that is that it's really hard to use that if you want partial assignment. On the other hand, if the language provides partial assignment and you want full assignment or exception, you can add a marker at the end and validate it that way. But whichever way it's done, the proposal needs to be very clear about it. ChrisA
On 2020-09-17 12:16, Alex Hall wrote:
On Thu, Sep 17, 2020 at 8:57 PM Brendan Barnwell <brenbarn@brenbarn.net <mailto:brenbarn@brenbarn.net>> wrote:
On 2020-09-16 21:52, Dennis Sweeney wrote: > TL;DR: I propose the following behavior: > > >>> s = "She turned me into a newt." > >>> f"She turned me into a {animal}." = s > >>> animal > 'newt' > > >>> f"A {animal}?" = s > Traceback (most recent call last): > File "<pyshell#2>", line 1, in <module> > f"A {animal}?" = s > ValueError: f-string assignment target does not match 'She turned me into a newt.' > > >>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" > >>> hh, mm, ss > (11, 59, 59)
I don't like this at all. It looks like assigning to a literal, which is weird.
People keep saying this, but iterable unpacking looks like assigning to a literal (a tuple or list literal) just as much.
Only if you keep using extremely simple template strings that don't contain anything except the variables you're assigning to. But this proposal allows the string to contain ANYTHING. The assignment targets may represent only a small part of it. For tuples and lists of assignment targets, the target list/tuple can't contain any extraneous material apart from the variables you're assigning to. There is no assign-to-a-tuple analogue of f"My name is {name} and I am {age} years old" = "My name is Bob and I am 100 years old"
Also PEP 622 proposes something that looks like assignment to a function call, albeit within a match/case statement.
That doesn't look like an assignment, because there's no equals sign. In any case, I'm not sure I support PEP 622 as it stands either, for similar reasons to why I don't support this current proposal.
It's natural to have symmetry between assignments and expression. For another example, look at subscripting, i.e. `__getitem__` vs `__setitem__`.
Also, it hides the assignment target within a string of potentially unbounded length and complexity, which makes it difficult to reason about code because it's hard to see when variables are being assigned to.
It's really not. A decent IDE should already be able to automatically show you assignments and usages of a variable - PyCharm does with one Ctrl+click. A syntax highlighter that can handle f-strings will make the assignments obvious at a glance.
No, it won't, unless the target string is simple. Also, the assigned values won't be easily discernible in the right-hand string, because there are no delimiters there to mark them. In any case, even if it were borderline possible with an IDE, it's still not useful or important enough to me to gum up an operation as basic as assignment. I'm fine with putting it into a function call or something like that, but not overloading the assignment statement in this way.
It also introduces a whole host of questions about the details of the parsing (i.e., how does the greediness work if the pattern is something like "{one} {two} {three} {four}" and the string to be parsed is five or ten words).
This I agree with, another reason to go for putting regexes in the f-strings like I suggested.
I think a better approach for something like this would be a .parse() method of some sort on strings, sort of like the inverse of .format(). It would parse a given string according to the format string and return a dict with the mappings (just as format can take a dict in and return the string with the substitutions made). So it would be like:
>>> pattern = "That {animal} is really {attribute}." >>> d = pattern.parse("That snake is really powerful.") >>> d['animal'] 'snake' >>> d['attribute'] 'powerful'
I think someone else just made the same proposal. But how does this solve the greediness issue?
It doesn't, but that's the least important of my objections. What it does solve is it avoids adding new behavior to the assignment statement, and it avoids creating a way to inject local variables whose names and/or values are unclear in the statement where they're created.
This wouldn't let you assign the results into local variables, but I think that's a good thing. Creating local variables "programmatically" is not a good idea; it's better to use a dict.
How is
f"{a} {b}" = "1 2"
creating local variables any more "programmatically" than
a, b = "1 2".split()
? The variables are static and visible to the compiler.
As I mentioned above, your examples are unrealistically simple. If you know your string is like "1 2" and you want to assign to a, b, you do what you just showed: use split. But in the "Bob is 100 years old" example I gave, it is more programmatic, because there is matching going on which is extracting parts of the RHS string, and you have to carefully read both the LHS and the RHS to see which parts are extracted into which variable. And that is still a pretty simple example, with relatively short and unambiguous strings on both sides of the equals sign. As the string grows longer and more complicated, it only becomes more programmatic. I think we may be arguing from different premises here, though, because I think the real issue is that for me the equals sign and local variable names are quite sacred objects. I'm not sure I would even support nested tuple and list assignments if they didn't exist and were proposed today. All of these forms of assignment that are being discussed are too complicated to put into the simple assignment statement to be used to create local variables. It's fine to have the parsing side of it going on, but there just isn't any reason, in my view, to have it be part of an assignment statement that creates local variables. Just have it be a function call that creates a dict. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Thu, Sep 17, 2020 at 2:57 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-09-16 21:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
A difficulty I have with the idea as presented is this. If I can say this: "{x:d} {y:d} {z:d}" = "1 2 3" ...thus assigning 1, 2, 3 to x, y, z respectively, I might want to also do the same thing this way: q = "{x:d} {y:d} {z:d}" q = "1 2 3" The intent being: save the f-string as a variable, and then use it to assign later. But that can obviously never work because q would just become the string "1 2 3" . We can already do the reverse of this operation, of course:
q = "{x:d} {y:d} {z:d}" d = dict(x=1, y=2, z=3) q = "{x:d} {y:d} {z:d}" q.format(**d) '1 2 3'
What would be the operation we are inverting, here? Perhaps a better way would be-- rather than assigning the values to the global x,y,z-- create a string method that returns a dictionary with the names and the values inside:
q = "{x:d} {y:d} {z:d}" p = "1 2 3" q.parse(p) {'x': 1, 'y': 2, 'z': 3}
..but of course this way we can to the same thing with the literal f-string, similar to what others have proposed:
"{x:d} {y:d} {z:d}".parse("1 2 3") {'x': 1, 'y': 2, 'z': 3}
--- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
On Thu, Sep 17, 2020 at 9:27 PM Ricky Teachey <ricky@teachey.org> wrote:
On Thu, Sep 17, 2020 at 2:57 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-09-16 21:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
A difficulty I have with the idea as presented is this.
If I can say this:
"{x:d} {y:d} {z:d}" = "1 2 3"
...thus assigning 1, 2, 3 to x, y, z respectively, I might want to also do the same thing this way:
q = "{x:d} {y:d} {z:d}" q = "1 2 3"
The intent being: save the f-string as a variable, and then use it to assign later. But that can obviously never work because q would just become the string "1 2 3" .
The same problem exists for assignments to tuples, subscripts, attributes, even plain variables. I've often wanted to put an assignment target in a variable.
On Thu, Sep 17, 2020 at 3:44 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Thu, Sep 17, 2020 at 9:27 PM Ricky Teachey <ricky@teachey.org> wrote:
On Thu, Sep 17, 2020 at 2:57 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-09-16 21:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
A difficulty I have with the idea as presented is this.
If I can say this:
"{x:d} {y:d} {z:d}" = "1 2 3"
...thus assigning 1, 2, 3 to x, y, z respectively, I might want to also do the same thing this way:
q = "{x:d} {y:d} {z:d}" q = "1 2 3"
The intent being: save the f-string as a variable, and then use it to assign later. But that can obviously never work because q would just become the string "1 2 3" .
The same problem exists for assignments to tuples, subscripts, attributes, even plain variables. I've often wanted to put an assignment target in a variable.
Feels to me akin to what Einstein called spooky action at a distance. ;) # module A x = f"{a:d}" # module B x.parse("1") assert a == 1 This seems like a joke I would want to play on someone*, not a useful feature. * well, if i were a bad person... ;) --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
On Thu, Sep 17, 2020 at 9:50 PM Ricky Teachey <ricky@teachey.org> wrote:
On Thu, Sep 17, 2020 at 3:44 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Thu, Sep 17, 2020 at 9:27 PM Ricky Teachey <ricky@teachey.org> wrote:
A difficulty I have with the idea as presented is this.
If I can say this:
"{x:d} {y:d} {z:d}" = "1 2 3"
...thus assigning 1, 2, 3 to x, y, z respectively, I might want to also do the same thing this way:
q = "{x:d} {y:d} {z:d}" q = "1 2 3"
The intent being: save the f-string as a variable, and then use it to assign later. But that can obviously never work because q would just become the string "1 2 3" .
The same problem exists for assignments to tuples, subscripts, attributes, even plain variables. I've often wanted to put an assignment target in a variable.
Feels to me akin to what Einstein called spooky action at a distance. ;)
# module A x = f"{a:d}"
# module B x.parse("1") assert a == 1
This seems like a joke I would want to play on someone*, not a useful feature.
* well, if i were a bad person... ;)
I'm not actually proposing being able to store assignment targets in variables. I'm saying the hypothetical misconception you bring up already exists for every other kind of thing that can be assigned to, because they all intentionally look like expressions. I don't think I've ever seen learners be confused about this. I don't think it's a good argument against the proposal.
I like the idea of an scans like ability, but I"m afraid the existing format language is poorly suited to that. It's simply no designed to be a two-way street: * ANYTHING can be stringified in Python -- so there is no defined way to turn a string back into a particular type. OK, so we restrict ourselves to builtins -- how would you reverse this: In [17]: x, y, z = 23, 45, 67 In [18]: f"{x}{y}{z}" Out[18]: '234567' so we require a type specifier, but similar problem: In [19]: f"{x:d}{y:d}{z:d}" Out[19]: '234567' So we require a field size specifier: In [20]: f"{x:2d}{y:2d}{z:2d}" Out[20]: '234567' OK, I guess that is clearly defined. but we've now limited ourselves to a very small subset of the formatting language -- maybe it's not the right language for the job? -CHB On Thu, Sep 17, 2020 at 1:08 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Thu, Sep 17, 2020 at 9:50 PM Ricky Teachey <ricky@teachey.org> wrote:
On Thu, Sep 17, 2020 at 3:44 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Thu, Sep 17, 2020 at 9:27 PM Ricky Teachey <ricky@teachey.org> wrote:
A difficulty I have with the idea as presented is this.
If I can say this:
"{x:d} {y:d} {z:d}" = "1 2 3"
...thus assigning 1, 2, 3 to x, y, z respectively, I might want to also do the same thing this way:
q = "{x:d} {y:d} {z:d}" q = "1 2 3"
The intent being: save the f-string as a variable, and then use it to assign later. But that can obviously never work because q would just become the string "1 2 3" .
The same problem exists for assignments to tuples, subscripts, attributes, even plain variables. I've often wanted to put an assignment target in a variable.
Feels to me akin to what Einstein called spooky action at a distance. ;)
# module A x = f"{a:d}"
# module B x.parse("1") assert a == 1
This seems like a joke I would want to play on someone*, not a useful feature.
* well, if i were a bad person... ;)
I'm not actually proposing being able to store assignment targets in variables. I'm saying the hypothetical misconception you bring up already exists for every other kind of thing that can be assigned to, because they all intentionally look like expressions. I don't think I've ever seen learners be confused about this. I don't think it's a good argument against the proposal. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JZQ5BN... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Fri, Sep 18, 2020 at 6:19 AM Christopher Barker <pythonchb@gmail.com> wrote:
I like the idea of an scans like ability, but I"m afraid the existing format language is poorly suited to that.
It's simply no designed to be a two-way street:
* ANYTHING can be stringified in Python -- so there is no defined way to turn a string back into a particular type.
OK, so we restrict ourselves to builtins -- how would you reverse this:
In [17]: x, y, z = 23, 45, 67
In [18]: f"{x}{y}{z}" Out[18]: '234567'
so we require a type specifier, but similar problem:
In [19]: f"{x:d}{y:d}{z:d}" Out[19]: '234567'
So we require a field size specifier:
In [20]: f"{x:2d}{y:2d}{z:2d}" Out[20]: '234567'
OK, I guess that is clearly defined. but we've now limited ourselves to a very small subset of the formatting language -- maybe it's not the right language for the job?
And that's why the directives are NOT just a pure mirroring of format string directives. Look at C's scanf and printf functions - they correspond in many ways, but they differ in order to be useful. The point isn't to reverse format(), the point is to have a useful and practical string parser that assigns directly to variables. Also, PEP 622. ChrisA
On Thu, Sep 17, 2020 at 1:23 PM Chris Angelico <rosuav@gmail.com> wrote:
And that's why the directives are NOT just a pure mirroring of format string directives. Look at C's scanf and printf functions - they correspond in many ways, but they differ in order to be useful.
Exactly -- but the C ones are a lot simpler an more constrained, so it's a pretty good match. The
point isn't to reverse format(), the point is to have a useful and practical string parser that assigns directly to variables.
I don't think there's one point in this thread. and the OP certainly presented it as reversing a fstring format.
Also, PEP 622.
yup -- at a glance, I like that a lot better. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Maybe it would be best to talk about actual code? What exceptions will be raised? Is there a mismatch between the string format specification and regex? Why would it be suboptimal to specify types for regex match groups within regex strings? Is it any better to specify regexes with f-strings? locals().update(**kwargs) doesn't work for a reason; but I don't remember what that reason is? More test cases within in the pytest.mark.parametrize here might elucidate the situation: ```python import re def test_regex_comprehension(): rgx = re.compile("(\d{2})(\d{2})(\w{2})") teststr = "2345fg" assert not rgx.match(teststr).groupdict() assert rgx.match(teststr).groups() == ("23", "45", "fg") rgx = re.compile("(?P<a>\d{2})(?P<b>\d{2})(?P<c>\w{2})") assert rgx.match(teststr).groups() == ("23", "45", "fg") assert rgx.match(teststr).groupdict() == dict(a="23", b="45", c="fg") from types import SimpleNamespace mdo = matchdictobj = SimpleNamespace(**rgx.match(teststr).groupdict()) assert mdo.a == "23" assert mdo.b == "45" assert mdo.c == "fg" def cast_match_groupdict(matchobj, typemap): matchdict = matchobj.groupdict() if not typemap: return matchdict for attr, castfunc in typemap.items(): try: matchdict[attr] = castfunc(matchdict[attr]) except ValueError as e: raise ValueError(("attr", attr), ("rgx", matchobj.re)) from e return matchdict import pytest def test_cast_match_groupdict(): rgx = re.compile("(?P<a>\d{2})(?P<b>\d{2})(?P<c>\w{2})") teststr = "2345fg" matchobj = rgx.match(teststr) with pytest.raises(ValueError): typemap = dict(a=int, b=int, c=int) cast_match_groupdict(matchobj, typemap) typemap = dict(a=int, b=int, c=str) output = cast_match_groupdict(matchobj, typemap) assert output == dict(a=23, b=45, c="fg") from typing import Tuple def generate_regex_and_typemap_from_fstring(fstr) -> Tuple[str, dict]: # raise NotImplemented if fstr == "{a}{b}{c}": return (r"".join(rf"(?P<{name}>.*?)" for name in "abc"), None) elif fstr == "{a:d}{b:d}{c:d}": return ( r"".join(rf"(?P<{name}>.*?)" for name in "abc"), dict(a=int, b=int, c=int), ) elif fstr == "{a:d}{b:d}{c:s}": return ( r"".join(rf"(?P<{name}>.*?)" for name in "abc"), dict(a=int, b=int, c=str), ) else: raise NotImplementedError(("fstr", fstr)) def do_fstring_regex_magic(fstrpattern, string): rgxstr, typemap = generate_regex_and_typemap_from_fstring(fstrpattern) rgx = re.compile(rgxstr) matchobj = rgx.match(string) try: output = cast_match_groupdict(matchobj, typemap) # update_locals() # XXX: how to test this? return output except ValueError as e: raise ValueError(locals()) from e @pytest.mark.parametrize( "fstrpattern,string,exceptions,expoutput", [ ("{a}{b}{c}", "2345fg", None, dict(a="23", b="45", c="fg")), ("{a:d}{b:d}{c:d}", "2345fg", ValueError, None), ("{a:d}{b:d}{c:s}", "2345fg", None, dict(a="23", b="45", c="fg")), ], ) def test_do_fstring_regex_magic(fstrpattern, string, exceptions, expoutput): if exceptions: with pytest.raises(exceptions): do_fstring_regex_magic(fstrpattern, string) else: output = do_fstring_regex_magic(fstrpattern, string) assert output == expoutput def update_locals(**kwargs): raise NotImplemented locals().update(**kwargs) ``` On Thu, Sep 17, 2020 at 4:23 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 6:19 AM Christopher Barker <pythonchb@gmail.com> wrote:
I like the idea of an scans like ability, but I"m afraid the existing
format language is poorly suited to that.
It's simply no designed to be a two-way street:
* ANYTHING can be stringified in Python -- so there is no defined way to
turn a string back into a particular type.
OK, so we restrict ourselves to builtins -- how would you reverse this:
In [17]: x, y, z = 23, 45, 67
In [18]: f"{x}{y}{z}" Out[18]: '234567'
so we require a type specifier, but similar problem:
In [19]: f"{x:d}{y:d}{z:d}" Out[19]: '234567'
So we require a field size specifier:
In [20]: f"{x:2d}{y:2d}{z:2d}" Out[20]: '234567'
OK, I guess that is clearly defined. but we've now limited ourselves to
a very small subset of the formatting language -- maybe it's not the right language for the job?
And that's why the directives are NOT just a pure mirroring of format string directives. Look at C's scanf and printf functions - they correspond in many ways, but they differ in order to be useful. The point isn't to reverse format(), the point is to have a useful and practical string parser that assigns directly to variables.
Also, PEP 622.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/A67UEO... Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Sep 17, 2020 at 09:44:09PM +0200, Alex Hall wrote:
The intent being: save the f-string as a variable, and then use it to assign later. But that can obviously never work because q would just become the string "1 2 3" .
The same problem exists for assignments to tuples, subscripts, attributes, even plain variables. I've often wanted to put an assignment target in a variable.
"Often"? I'm curious about why you would want to do this, under what circumstances, because I've never want to do this, let alone often, and I can't think of why I might. In any case, I think you are missing a very important point here. This sort of scanf pattern matching is not just variable assignment targets, but it includes a *pattern* to be matched, and we might need to build up that pattern dynamically. A trivial example: if day_of_week == 'Monday': f'if today is Monday, this must be {country}' = string elif day_of_week == 'Tuesday': f'today is Tuesday, so we must be in {country}' = string elif day_of_week == 'Wednesday': f'if we're in {country} today must be Wednesday' = string etc. Wouldn't it be much nicer to build up the pattern ahead of time? I'd say it is essential to have the option, e.g. for translating strings: target = TARGETS[language] f-target = string but of course that doesn't work. But it could work if the pattern was a regular string, and we applied a scanf function: country = scanf(pattern, string) sort of thing. -- Steve
This is a terrible idea. No one should ever be trying to extract data from strings that are obviously meant for human consumption, like "It is 11:45 PM". I'll grant you that it's sometimes necessary. But it needs to be approached very carefully. You need to use a --porcelain flag if available, you need to check that the actual output format matches what you expect, and you definitely need to be able to specify where substring matching stops in more ways than "int or float or string". The last thing Python needs is a built-in language feature that makes hacking together these sorts of parsers seem easy and fun. Especially when it has no way to control backtracking or to specify any but a few trivial restrictions on what can end up in the output variables.
On Fri, Sep 18, 2020 at 8:54 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
This is a terrible idea.
No one should ever be trying to extract data from strings that are obviously meant for human consumption, like "It is 11:45 PM".
I'll grant you that it's sometimes necessary. But it needs to be approached very carefully. You need to use a --porcelain flag if available, you need to check that the actual output format matches what you expect, and you definitely need to be able to specify where substring matching stops in more ways than "int or float or string".
The last thing Python needs is a built-in language feature that makes hacking together these sorts of parsers seem easy and fun. Especially when it has no way to control backtracking or to specify any but a few trivial restrictions on what can end up in the output variables.
Python is an excellent language for text manipulation, and text manipulation is an incredibly useful real-world operation. I don't see what you're complaining at. ChrisA
On 9/17/2020 6:56 PM, Chris Angelico wrote:
On Fri, Sep 18, 2020 at 8:54 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
This is a terrible idea.
No one should ever be trying to extract data from strings that are obviously meant for human consumption, like "It is 11:45 PM".
I'll grant you that it's sometimes necessary. But it needs to be approached very carefully. You need to use a --porcelain flag if available, you need to check that the actual output format matches what you expect, and you definitely need to be able to specify where substring matching stops in more ways than "int or float or string".
The last thing Python needs is a built-in language feature that makes hacking together these sorts of parsers seem easy and fun. Especially when it has no way to control backtracking or to specify any but a few trivial restrictions on what can end up in the output variables.
Python is an excellent language for text manipulation, and text manipulation is an incredibly useful real-world operation. I don't see what you're complaining at.
ChrisA I don't either. This could be incredibly useful for simple string extraction. Obviously, there are situations where regex is a much better option, but why object to a simple option for simple problems?
--Edwin
I did actually "write the book" on text processing in Python. I think it's painful and awkward, and a terrible idea. On Thu, Sep 17, 2020, 1:00 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 8:54 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
This is a terrible idea.
Python is an excellent language for text manipulation, and text manipulation is an incredibly useful real-world operation. I don't see what you're complaining at.
f"It's not {regex:d{2}}" https://github.com/r1chardj0n3s/parse
Parse strings using a specification based on the Python format() syntax.
On Thu, Sep 17, 2020 at 7:15 PM David Mertz <mertz@gnosis.cx> wrote:
I did actually "write the book" on text processing in Python. I think it's painful and awkward, and a terrible idea.
On Thu, Sep 17, 2020, 1:00 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 8:54 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
This is a terrible idea.
Python is an excellent language for text manipulation, and text manipulation is an incredibly useful real-world operation. I don't see what you're complaining at.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JOR62X... Code of Conduct: http://python.org/psf/codeofconduct/
From https://github.com/r1chardj0n3s/parse/blob/master/README.rst : ```rst Format Specification -------------------- Most often a straight format-less ``{}`` will suffice where a more complex format specification might have been used. Most of `format()`'s `Format Specification Mini-Language`_ is supported: [[fill]align][0][width][.precision][type] The differences between `parse()` and `format()` are: - The align operators will cause spaces (or specified fill character) to be stripped from the parsed value. The width is not enforced; it just indicates there may be whitespace or "0"s to strip. - Numeric parsing will automatically handle a "0b", "0o" or "0x" prefix. That is, the "#" format character is handled automatically by d, b, o and x formats. For "d" any will be accepted, but for the others the correct prefix must be present if at all. - Numeric sign is handled automatically. - The thousands separator is handled automatically if the "n" type is used. - The types supported are a slightly different mix to the format() types. Some format() types come directly over: "d", "n", "%", "f", "e", "b", "o" and "x". In addition some regular expression character group types "D", "w", "W", "s" and "S" are also available. - The "e" and "g" types are case-insensitive so there is not need for the "E" or "G" types. The "e" type handles Fortran formatted numbers (no leading 0 before the decimal point). ===== =========================================== ======== Type Characters Matched Output ===== =========================================== ======== l Letters (ASCII) str w Letters, numbers and underscore str W Not letters, numbers and underscore str s Whitespace str S Non-whitespace str d Digits (effectively integer numbers) int D Non-digit str n Numbers with thousands separators (, or .) int % Percentage (converted to value/100.0) float f Fixed-point numbers float F Decimal numbers Decimal e Floating-point numbers with exponent float e.g. 1.1e-10, NAN (all case insensitive) g General number format (either d, f or e) float b Binary numbers int o Octal numbers int x Hexadecimal numbers (lower and upper case) int ti ISO 8601 format date/time datetime e.g. 1972-01-20T10:21:36Z ("T" and "Z" optional) te RFC2822 e-mail format date/time datetime e.g. Mon, 20 Jan 1972 10:21:36 +1000 tg Global (day/month) format date/time datetime e.g. 20/1/1972 10:21:36 AM +1:00 ta US (month/day) format date/time datetime e.g. 1/20/1972 10:21:36 PM +10:30 tc ctime() format date/time datetime e.g. Sun Sep 16 01:03:52 1973 th HTTP log format date/time datetime e.g. 21/Nov/2011:00:07:11 +0000 ts Linux system log format date/time datetime e.g. Nov 9 03:37:44 tt Time time e.g. 10:21:36 PM -5:30 ===== =========================================== ======== Some examples of typed parsing with ``None`` returned if the typing does not match: .. code-block:: pycon >>> parse('Our {:d} {:w} are...', 'Our 3 weapons are...') <Result (3, 'weapons') {}> >>> parse('Our {:d} {:w} are...', 'Our three weapons are...') >>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM') <Result (datetime.datetime(2011, 2, 1, 23, 0),) {}> And messing about with alignment: .. code-block:: pycon >>> parse('with {:>} herring', 'with a herring') <Result ('a',) {}> >>> parse('spam {:^} spam', 'spam lovely spam') <Result ('lovely',) {}> Note that the "center" alignment does not test to make sure the value is centered - it just strips leading and trailing whitespace. Width and precision may be used to restrict the size of matched text from the input. Width specifies a minimum size and precision specifies a maximum. For example: .. code-block:: pycon >>> parse('{:.2}{:.2}', 'look') # specifying precision <Result ('lo', 'ok') {}> >>> parse('{:4}{:4}', 'look at that') # specifying width <Result ('look', 'at that') {}> >>> parse('{:4}{:.4}', 'look at that') # specifying both <Result ('look at ', 'that') {}> >>> parse('{:2d}{:2d}', '0440') # parsing two contiguous numbers <Result (4, 40) {}> Some notes for the date and time types: - the presence of the time part is optional (including ISO 8601, starting at the "T"). A full datetime object will always be returned; the time will be set to 00:00:00. You may also specify a time without seconds. - when a seconds amount is present in the input fractions will be parsed to give microseconds. - except in ISO 8601 the day and month digits may be 0-padded. - the date separator for the tg and ta formats may be "-" or "/". - named months (abbreviations or full names) may be used in the ta and tg formats in place of numeric months. - as per RFC 2822 the e-mail format may omit the day (and comma), and the seconds but nothing else. - hours greater than 12 will be happily accepted. - the AM/PM are optional, and if PM is found then 12 hours will be added to the datetime object's hours amount - even if the hour is greater than 12 (for consistency.) - in ISO 8601 the "Z" (UTC) timezone part may be a numeric offset - timezones are specified as "+HH:MM" or "-HH:MM". The hour may be one or two digits (0-padded is OK.) Also, the ":" is optional. - the timezone is optional in all except the e-mail format (it defaults to UTC.) - named timezones are not handled yet. Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day. .. _`Format String Syntax`: http://docs.python.org/library/string.html#format-string-syntax .. _`Format Specification Mini-Language`: http://docs.python.org/library/string.html#format-specification-mini-languag... ``` On Thu, Sep 17, 2020 at 7:24 PM Wes Turner <wes.turner@gmail.com> wrote:
f"It's not {regex:d{2}}"
https://github.com/r1chardj0n3s/parse
Parse strings using a specification based on the Python format() syntax.
On Thu, Sep 17, 2020 at 7:15 PM David Mertz <mertz@gnosis.cx> wrote:
I did actually "write the book" on text processing in Python. I think it's painful and awkward, and a terrible idea.
On Thu, Sep 17, 2020, 1:00 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 8:54 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
This is a terrible idea.
Python is an excellent language for text manipulation, and text manipulation is an incredibly useful real-world operation. I don't see what you're complaining at.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JOR62X... Code of Conduct: http://python.org/psf/codeofconduct/
This library I like! It's very similar to the hypothetical 'template()' function I mention in my other email. But this doesn't mess with Python's model of assignment targets, names, etc. I haven't seen this before, but I want to play around with it. On Thu, Sep 17, 2020, 1:24 PM Wes Turner <wes.turner@gmail.com> wrote:
f"It's not {regex:d{2}}"
https://github.com/r1chardj0n3s/parse
Parse strings using a specification based on the Python format() syntax.
On Thu, Sep 17, 2020 at 01:12:02PM -1000, David Mertz wrote:
I did actually "write the book" on text processing in Python. I think it's painful and awkward, and a terrible idea.
*Text processing* is a terrible idea? Or hammering the square peg of scanf into the round hole of f-strings? Terrible idea or not, text processing is probably the second most common use of computing power, after games, and the second invented, after numerical calculations. I think we're stuck with it. -- Steve
On Thu, Sep 17, 2020, 4:30 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Sep 17, 2020 at 01:12:02PM -1000, David Mertz wrote:
I did actually "write the book" on text processing in Python. I think it's painful and awkward, and a terrible idea.
*Text processing* is a terrible idea?
In for a penny, in for a pound. :-) Or hammering the square peg of scanf into the round hole of f-strings?
But yes, this is the one I meant. Trying to make f-strings into a bad version of scanf, and a worse version of re.match(), is what I don't like.
I think this is a feature in Scala. A friend showed me this example: ``` def splitDate(s: String) = s match { case s"$day-$month-$year" => s"day: $day, mon: $month, yr: $year" case _ => "not a date" } ``` Note that the example shows two sides of s-strings: the first one as a pattern, the second one as a format string. (I assume it's irrelevant that the argument is also called `s`.) I think the way this works is they have generalized string literal prefixes so that for certain values of `foo` you can write `foo"stuff in quotes"`. I assume that's similar to JavaScript tagged template literals, and I believe the $variable references are recognized by the compiler (so `foo` has no say in this -- at least that's how it is in JS). Scala's match/case can be overloaded by classes providing an `unapply()` method. Scala also has "by-name" parameters (sort of like `&arg` in C++, where assignment to `arg` actually assigns to a variable provided by the caller). I hypothesize that all these things are used together to build the "s-string" concept as a standard library feature (the language reference has no specific syntax for this). The only other thing I heard about this is that the pattern matching has the power of shell globbing, not of regular expressions -- I guess this means it supports * and ? as wildcards but nothing fancier. I found it hard to find more info about this using a search engine. (Most queries lead to descriptions of regular expressions in Scala, or more general discussion of pattern matching in Scala.) Maybe that's indicative of some smell... But its existence suggests that this idea has actually been implemented quite seriously. -- --Guido van Rossum (python.org/~guido) Pronouns: he/him (why is my pronoun here?)
On Thu, Sep 17, 2020, 2:36 AM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
>>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM"
I cringe at every part of this proposal. But this is especially perplexing. How can it POSSIBLY work, unless we have "spooky action at a distance"?! The proposal a terrible abuse of OUTPUT f-strings. It is confusing, unclear, ambiguous, and would make code far harder to read. Having some sort of library for template matching seems fine. Letting the mini-language be inspired by f-strings seems fine. It should show it's merit as a third party tool first, but I can imagine using such. This I wouldn't hate: hour, minute, daytime = template( pattern="It is {hh}:{mm} {am_or_pm}", instance="It is 11:45 PM") There are lots of specifics to work out, like what to do if things don't match up between pattern and instance. I deliberately changed, e.g. 'hh' in the pattern to 'hour' as the binding, because bindings are just names. I suppose returning a dictionary using the names in the pattern might work also. Whatever, those are details of a separate library, not a syntax change.
I was definitely not proposing "spooky action at a distance". My proposal is to existing f-strings as the `__setattr__` protocol is to the `__getattr__` protocol. The template would only ever be hard-coded inline during the f-string assignment. This is the same restriction that existing f-strings have. I am suggesting a syntactic construct where f-strings can be assignment targets, not suggesting to keep track of which strings were f-strings or overriding assignment or something silly like that. Existing f-strings do not aim to replace str.format in situations where the format string is re-used. Likewise, the proposal would not aim to replace regular expression in situations where the pattern is re-used. The change would amount a syntax change for assignments: whenever an f-string is on the LHS of an assignment, the interpreter would do a parsing operation, and there would be no change to code where the f-string is in any other situation. This is again analogous to the behavior of `__getattr__` and `__setattr__`. For example, f"""<a href="{url}">{text}</a>""" = html Would be roughly equivalent to url, text = re.fullmatch(r"""<a href="(.*)">(.*)</a>""", html).groups() The LHS is never evaluated, it's only assigned to.
On Thu, Sep 17, 2020 at 8:33 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I was definitely not proposing "spooky action at a distance". My proposal is to existing f-strings as the `__setattr__` protocol is to the `__getattr__` protocol. The template would only ever be hard-coded inline during the f-string assignment. This is the same restriction that existing f-strings have. I am suggesting a syntactic construct where f-strings can be assignment targets, not suggesting to keep track of which strings were f-strings or overriding assignment or something silly like that.
Before f strings became a thing, str objects grew a format method. The format method then was made nicer to use using the new fstring syntax. I suggest, maybe, the better thing to do here would be to propose that str should grow a parse method, THEN sprinkle some sugar on the syntax to make parsing usable in the less paint-by-numbers way you describe. I think I would definitely use both if they existed (the parse method and the fstring assignment syntax). But it seems to me the first thing to do would be for str to grow a parse method. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
That would require changes to all existing static analysis tools (again). On Thu, Sep 17, 2020 at 8:31 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I was definitely not proposing "spooky action at a distance". My proposal is to existing f-strings as the `__setattr__` protocol is to the `__getattr__` protocol. The template would only ever be hard-coded inline during the f-string assignment. This is the same restriction that existing f-strings have. I am suggesting a syntactic construct where f-strings can be assignment targets, not suggesting to keep track of which strings were f-strings or overriding assignment or something silly like that.
Existing f-strings do not aim to replace str.format in situations where the format string is re-used. Likewise, the proposal would not aim to replace regular expression in situations where the pattern is re-used.
The change would amount a syntax change for assignments: whenever an f-string is on the LHS of an assignment, the interpreter would do a parsing operation, and there would be no change to code where the f-string is in any other situation. This is again analogous to the behavior of `__getattr__` and `__setattr__`.
For example,
f"""<a href="{url}">{text}</a>""" = html
Would be roughly equivalent to
url, text = re.fullmatch(r"""<a href="(.*)">(.*)</a>""", html).groups()
Would this ever raise an exception on assignment? The shortest match / greedy/non-greedy aspect of this is something that regex is designed to handle.
(a, b, c) = 1, 2 <<< ValueError: not enough values to unpack (expected 3, got 2)
The LHS is never evaluated, it's only assigned to.
Why aren't either of these syntaxes for setting local variables supported? locals().update(**kwargs) locals()['url'] = '...' Maybe it's just that every existing linter will complain about test assertions that reference local variables that are only defined in an f-string, and I assume that f'{str:d{2}}' (r'(?P<str>\d{2})') doesn't work without extending the f-string grammar, but I also cringe at this syntax. I also prefer the way that re, regex, and parse return a dict and don't overload assignment for parser evaluation or variable allocation and initialization) (Though I do find myself reaching for the more advanced destructuring assignment features of ECMAscript https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/... )
On Thu, Sep 17, 2020 at 2:31 PM Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
I was definitely not proposing "spooky action at a distance". My proposal is to existing f-strings as the `__setattr__` protocol is to the `__getattr__` protocol. The template would only ever be hard-coded inline during the f-string assignment. This is the same restriction that existing f-strings have. I am suggesting a syntactic construct where f-strings can be assignment targets, not suggesting to keep track of which strings were f-strings or overriding assignment or something silly like that.
Just to be clear, I hate the idea. But I get that you are talking about the expansions in an f-string as a weird kind of assignment target. There are so very many ways this can go wrong that would be a nightmare to debug... and at best, it's very difficult to read. Let's go through a few:
f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM"
OK, this is a weird way to call a parser on the RHS. But if it parses, it's like: hh, mm, am_or_pm = "11", "45", "PM" Let's try some others:
my_string = "It might be 11:45 PM" # ... many lines later ... f"It is {hh}:{mm} {am_or_pm}" = my_string
What are hh, mm, and am_or_pm now? Do we raise an exception? Do we run the line having no real idea whether they have been assigned at all? How do we even check whether an assignment took place?
f"It is {hh}:{mm}!" = "It is 11:45 PM"
Does this work left-to-right? Are hh and mm assigned values, but am_or_pm remains unassigned?
f"It is {hh}:{mm} {am_or_pm} right now" = "It is 11:45 PM"
Is this a match? Do we only care about the prefix, or does the whole string need to match the inflexible pattern? Obviously, why not use regexen where we can be explicit about such things?
f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM right now"
Same questions as last, but kinda flipped around? There is no easy intution here, and every behavior will be surprising to most users.
f"It is {hh()}:{mm()} {am_or_pm()}" = "It is 11:45 PM"
That's a perfectly valid f-string on the LHS. What would you imagine would happen here? If the answer is "raise an exception" then your proposal is actually that some tiny fraction of f-strings could actually be used this way. That doesn't per se mean the syntax is bad (although it is), but it means that the "symmetries" you hope for between `=` and `==` will hold only for very special cases of f-strings.
On Fri, Sep 18, 2020 at 11:51 AM David Mertz <mertz@gnosis.cx> wrote:
Let's try some others:
my_string = "It might be 11:45 PM" # ... many lines later ... f"It is {hh}:{mm} {am_or_pm}" = my_string
What are hh, mm, and am_or_pm now? Do we raise an exception? Do we run the line having no real idea whether they have been assigned at all? How do we even check whether an assignment took place?
You can get this with ANY unpacking form. I don't see how this is any different. In fact, this kind of usage - where the template is in the source code but the data comes from a variable - would be the normal way things would be done, since most text processing involves something from outside the code. This isn't "action at a distance". It's perfectly normal coding.
f"It is {hh}:{mm}!" = "It is 11:45 PM"
Does this work left-to-right? Are hh and mm assigned values, but am_or_pm remains unassigned?
Since am_or_pm wasn't in the template, of course it remains unassigned. Or is this an error in the example? It's debatable whether this should assign those it can or bomb if it can't. I'm inclined towards the "assign those it can" side of things, since you can always validate with an extra directive, same as you would use a dollar sign in a regex.
f"It is {hh}:{mm} {am_or_pm} right now" = "It is 11:45 PM"
Is this a match? Do we only care about the prefix, or does the whole string need to match the inflexible pattern? Obviously, why not use regexen where we can be explicit about such things?
As above. I'm slightly in favour of "yes it's a match", but either works as long as it's a well defined and documented behaviour.
f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM right now"
Same questions as last, but kinda flipped around? There is no easy intution here, and every behavior will be surprising to most users.
Most? Some perhaps, but that's true of pretty much anything. There are people confused about *every* behaviour.
f"It is {hh()}:{mm()} {am_or_pm()}" = "It is 11:45 PM"
That's a perfectly valid f-string on the LHS. What would you imagine would happen here? If the answer is "raise an exception" then your proposal is actually that some tiny fraction of f-strings could actually be used this way. That doesn't per se mean the syntax is bad (although it is), but it means that the "symmetries" you hope for between `=` and `==` will hold only for very special cases of f-strings.
The syntax for string interpolation is parallel to, but by no means identical to, the syntax for string parsing. It's just like list display and unpacking: # Perfectly valid spam = [x, y, z[1], z[2]] [x, y, z[1], z[2]] = range(4) # Doesn't make sense to unpack this way spam = [5, ham + eggs] [5, ham + eggs] = spam There's a LOT more symmetry than you might think; even if only a small subset of the syntax is common to both, a huge subset of actual usage will be. ChrisA
On Thu, Sep 17, 2020, 4:04 PM Chris Angelico
my_string = "It might be 11:45 PM" # ... many lines later ... f"It is {hh}:{mm} {am_or_pm}" = my_string
What are hh, mm, and am_or_pm now? Do we raise an exception? Do we run the line having no real idea whether they have been assigned at all? How do we even check whether an assignment took place?
You can get this with ANY unpacking form. I don't see how this is any different. In fact, this kind of usage - where the template is in the source code but the data comes from a variable - would be the normal
So are you saying this would be a Value error like unsuccessful tuple unpacking? That's not obvious from OP.
On Fri, Sep 18, 2020 at 12:15 PM David Mertz <mertz@gnosis.cx> wrote:
On Thu, Sep 17, 2020, 4:04 PM Chris Angelico
my_string = "It might be 11:45 PM" # ... many lines later ... f"It is {hh}:{mm} {am_or_pm}" = my_string
What are hh, mm, and am_or_pm now? Do we raise an exception? Do we run the line having no real idea whether they have been assigned at all? How do we even check whether an assignment took place?
You can get this with ANY unpacking form. I don't see how this is any different. In fact, this kind of usage - where the template is in the source code but the data comes from a variable - would be the normal
So are you saying this would be a Value error like unsuccessful tuple unpacking? That's not obvious from OP.
Either a ValueError or a partial assignment. There are arguments on both sides. Whichever is decided on, it will make perfectly good sense a lot of the time, and have to be worked against the rest of the time. It's not an argument against the proposal as a whole. ChrisA
Parsing can be ambiguous: f"{x}:{y}" = "a:b:c" Does this set x = "a" y = "b:c" or x = "a:b" y = "c" Rob Cliffe On 17/09/2020 05:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
=== Rationale ===
Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible.
What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases:
>>> a[n] = 17 >>> a[n] == 17 True
>>> obj.x = "foo" >>> obj.x == "foo" True
# Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11'
This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink.
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
You could also do a strange mishmash of built-in str operations, like
>>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11'
But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this?
=== Specification (open to bikeshedding) ===
In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError:
>>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345'
If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string.
The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt.
Let me know what you think! _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKO... Code of Conduct: http://python.org/psf/codeofconduct/
Regex uses the ? symbol to indicate that something is a "non-greedy" match (to default to "shortest match") import re str_ = "a:b:c" assert re.match(r'(.*):(.*)', str_).groups() == ("a:b", "c") assert re.match(r'(.*?):(.*)', str_).groups() == ("a", "b:c") Typically, debugging parsing issues involves testing the output of a function (not changes to locals()). Parse defaults to (case-insensitive) non-greedy/shortest-match:
parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:
pattern = '{dir1}/{dir2}' data = 'root/parent/subdir' sorted(parse(pattern, data).named.items()) [('dir1', 'root'), ('dir2', 'parent/subdir')]
So, even though {'dir1': 'root/parent', 'dir2': 'subdir'} would also fit the pattern, the actual match represents the shortest successful match for dir1.
https://github.com/r1chardj0n3s/parse#potential-gotchas https://github.com/r1chardj0n3s/parse#format-specification :
Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day.
On Sat, Sep 19, 2020, 1:00 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Parsing can be ambiguous: f"{x}:{y}" = "a:b:c" Does this set x = "a" y = "b:c" or x = "a:b" y = "c" Rob Cliffe
On 17/09/2020 05:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
=== Rationale ===
Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible.
What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases:
>>> a[n] = 17 >>> a[n] == 17 True
>>> obj.x = "foo" >>> obj.x == "foo" True
# Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11'
This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink.
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
You could also do a strange mishmash of built-in str operations, like
>>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11'
But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this?
=== Specification (open to bikeshedding) ===
In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError:
>>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345'
If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string.
The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt.
Let me know what you think! _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKO... Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CVPRH5... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Sep 19, 2020 at 12:10 PM Wes Turner <wes.turner@gmail.com> wrote:
Regex uses the ? symbol to indicate that something is a "non-greedy" match (to default to "shortest match")
exactly -- Regex was designed to be a parsing language, format specifiers were not. I'm quite surprised by how little the parse package has had to adapt the format language to a parsing language, but it has indeed adapted it. I'm honestly not sure how confusing that would be to have a built in parsing language that looks like the format one, but behaves differently. I suspect it's particularly an issue if we did assigning to fstrings, and less so if it were a string method or stand alone function. Trying parse with my earlier example in this thread: In [1]: x, y, z = 23, 45, 67 In [2]: a_string = f"{x}{y}{z}" In [3]: a_string Out[3]: '234567' In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}> In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}> So that's interesting -- different level of "greadiness" for strings than integers In [7]: parse("{x:2d}{y:2d}{z:2d}", a_string) Out[7]: <Result () {'x': 23, 'y': 45, 'z': 67}> And now we get back what we started with -- not bad. I'm liking this -- I think it would be good to have parse, or something like in, in the stdlib, maybe as a string method. Then maybe consider some auto-assigning behavior -- though I'm pretty sceptical of that, and Wes' point about debugging is a good one. It would create a real debugging / testing nightmare to have stuff auto-assigned into locals. -CHB
import re str_ = "a:b:c" assert re.match(r'(.*):(.*)', str_).groups() == ("a:b", "c") assert re.match(r'(.*?):(.*)', str_).groups() == ("a", "b:c")
Typically, debugging parsing issues involves testing the output of a function (not changes to locals()).
Parse defaults to (case-insensitive) non-greedy/shortest-match:
parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:
pattern = '{dir1}/{dir2}' data = 'root/parent/subdir' sorted(parse(pattern, data).named.items()) [('dir1', 'root'), ('dir2', 'parent/subdir')]
So, even though {'dir1': 'root/parent', 'dir2': 'subdir'} would also fit the pattern, the actual match represents the shortest successful match for dir1.
https://github.com/r1chardj0n3s/parse#potential-gotchas
https://github.com/r1chardj0n3s/parse#format-specification :
Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day.
On Sat, Sep 19, 2020, 1:00 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Parsing can be ambiguous: f"{x}:{y}" = "a:b:c" Does this set x = "a" y = "b:c" or x = "a:b" y = "c" Rob Cliffe
On 17/09/2020 05:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
=== Rationale ===
Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible.
What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases:
>>> a[n] = 17 >>> a[n] == 17 True
>>> obj.x = "foo" >>> obj.x == "foo" True
# Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11'
This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink.
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
You could also do a strange mishmash of built-in str operations, like
>>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11'
But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this?
=== Specification (open to bikeshedding) ===
In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError:
>>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345'
If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string.
The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt.
Let me know what you think! _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKO... Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CVPRH5... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HFNRY3... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
+1 on adding something like parse to the language. -0 on the assignment feature... it just doesn't seem to be that beneficial to me. But once parse exists in the language a rational and limited conversation about the fstring assignment feature becomes much more possible. On Sat, Sep 19, 2020, 3:47 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, Sep 19, 2020 at 12:10 PM Wes Turner <wes.turner@gmail.com> wrote:
Regex uses the ? symbol to indicate that something is a "non-greedy" match (to default to "shortest match")
exactly -- Regex was designed to be a parsing language, format specifiers were not.
I'm quite surprised by how little the parse package has had to adapt the format language to a parsing language, but it has indeed adapted it. I'm honestly not sure how confusing that would be to have a built in parsing language that looks like the format one, but behaves differently. I suspect it's particularly an issue if we did assigning to fstrings, and less so if it were a string method or stand alone function.
Trying parse with my earlier example in this thread:
In [1]: x, y, z = 23, 45, 67
In [2]: a_string = f"{x}{y}{z}"
In [3]: a_string Out[3]: '234567'
In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}>
In [6]: parse("{x:d}{y:d}{z:d}", a_string)
Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}>
So that's interesting -- different level of "greadiness" for strings than integers
In [7]: parse("{x:2d}{y:2d}{z:2d}", a_string)
Out[7]: <Result () {'x': 23, 'y': 45, 'z': 67}>
And now we get back what we started with -- not bad.
I'm liking this -- I think it would be good to have parse, or something like in, in the stdlib, maybe as a string method.
Then maybe consider some auto-assigning behavior -- though I'm pretty sceptical of that, and Wes' point about debugging is a good one. It would create a real debugging / testing nightmare to have stuff auto-assigned into locals.
-CHB
import re str_ = "a:b:c" assert re.match(r'(.*):(.*)', str_).groups() == ("a:b", "c") assert re.match(r'(.*?):(.*)', str_).groups() == ("a", "b:c")
Typically, debugging parsing issues involves testing the output of a function (not changes to locals()).
Parse defaults to (case-insensitive) non-greedy/shortest-match:
parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:
pattern = '{dir1}/{dir2}' data = 'root/parent/subdir' sorted(parse(pattern, data).named.items()) [('dir1', 'root'), ('dir2', 'parent/subdir')]
So, even though {'dir1': 'root/parent', 'dir2': 'subdir'} would also fit the pattern, the actual match represents the shortest successful match for dir1.
https://github.com/r1chardj0n3s/parse#potential-gotchas
https://github.com/r1chardj0n3s/parse#format-specification :
Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day.
On Sat, Sep 19, 2020, 1:00 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Parsing can be ambiguous: f"{x}:{y}" = "a:b:c" Does this set x = "a" y = "b:c" or x = "a:b" y = "c" Rob Cliffe
On 17/09/2020 05:52, Dennis Sweeney wrote:
TL;DR: I propose the following behavior:
>>> s = "She turned me into a newt." >>> f"She turned me into a {animal}." = s >>> animal 'newt'
>>> f"A {animal}?" = s Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"A {animal}?" = s ValueError: f-string assignment target does not match 'She turned me into a newt.'
>>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >>> hh, mm, ss (11, 59, 59)
=== Rationale ===
Part of the reason I like f-strings so much is that they reduce the cognitive overhead of reading code: they allow you to see *what* is being inserted into a string in a way that also effortlessly shows *where* in the string the value is being inserted. There is no need to "paint-by-numbers" and remember which variable is {0} and which is {1} in an unnecessary extra layer of indirection. F-strings allow string formatting that is not only intelligible, but *locally* intelligible.
What I propose is the inverse feature, where you can assign a string to an f-string, and the interpreter will maintain an invariant kept in many other cases:
>>> a[n] = 17 >>> a[n] == 17 True
>>> obj.x = "foo" >>> obj.x == "foo" True
# Proposed: >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" True >>> hh '11'
This could be thought of as analogous to the c language's scanf function, something I've always felt was just slightly lacking in Python. I think such a feature would more clearly allow readers of Python code to answer the question "What kinds of strings are allowed here?". It would add certainty to programs that accept strings, confirming early that the data you have is the data you want. The code reads like a specification that beginners can understand in a blink.
=== Existing way of achieving this ===
As of now, you could achieve the behavior with regular expressions:
>>> import re >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >>> match = pattern.fullmatch("It is 11:45 PM") >>> hh, mm, am_or_pm = match.groups() >>> hh '11'
But this suffers from the same paint-by-numbers, extra-indirection issue that old-style string formatting runs into, an issue that f-strings improve upon.
You could also do a strange mishmash of built-in str operations, like
>>> s = "It is 11:45 PM" >>> empty, rest = s.split("It is ") >>> assert empty == "" >>> hh, rest = rest.split(":") >>> mm, am_or_pm = s.split(" ") >>> hh '11'
But this is 5 different lines to express one simple idea. How many different times have you written a micro-parser like this?
=== Specification (open to bikeshedding) ===
In general, the goal would be to pursue the assignment-becomes-equal invariant above. By default, assignment targets within f-strings would be matched as strings. However, adding in a format specifier would allow the matches to be evaluated as different data types, e.g. f'{foo:d}' = "1" would make foo become the integer 1. If a more complex format specifier was added that did not match anything that the f-string could produce as an expression, then we'd still raise a ValueError:
>>> f"{x:.02f}" = "0.12345" Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> f"{x:.02f}" = "0.12345" ValueError: f-string assignment target does not match '0.12345'
If we're feeling adventurous, one could turn the !r repr flag in a match into an eval() of the matched string.
The f-string would match with the same eager semantics as regular expressions, backtracking when a match is not made on the first attempt.
Let me know what you think! _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKO... Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CVPRH5... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HFNRY3... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BUZPGE... Code of Conduct: http://python.org/psf/codeofconduct/
On 20/09/20 7:45 am, Christopher Barker wrote:
In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}>
In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}>
So that's interesting -- different level of "greadiness" for strings than integers
Hmmm, that seems really unintuitive. I think a better result would be a parse error -- "I was told to expect three things, but I only found one." I'm wondering whether such patterns should be disallowed on the basis that they're inherently ambiguous. -- Greg
On Sat, Sep 19, 2020 at 4:46 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 20/09/20 7:45 am, Christopher Barker wrote:
In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}>
In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}>
So that's interesting -- different level of "greadiness" for strings than integers
Hmmm, that seems really unintuitive. I think a better result would be a parse error -- "I was told to expect three things, but I only found one."
That's what I mean when I say that the format language isn't well suited to parsing. But it did find three things, or four, or .... that format specifier doesn't have any whitespace in between the {}s. So it isn't looking for spaces between the numbers.
I'm wondering whether such patterns should be disallowed on the
basis that they're inherently ambiguous.
I don't know that they are ambiguous, as long as the rules are laid out somewhere. Though confusing might be a good word :-) -CHB Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LXUZLU... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Greg Ewing writes:
On 20/09/20 7:45 am, Christopher Barker wrote:
In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}>
In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}>
So that's interesting -- different level of "greadiness" for strings than integers
Hmmm, that seems really unintuitive. I think a better result would be a parse error -- "I was told to expect three things, but I only found one."
Are you sure that shouldn't be "I was told to expect three things, but I found six?" ;-) And why not parse a_string using the "grammar" "{x}{y}{z}" as {'x': 2345, 'y': 6, 'z': 7}? That's perfectly valid *interpreting the 'grammar' as a format string", and therefore might very well be expected. Of course there's probably a rule in parse that {x} is an abbreviation for {x:s}. Regexps are hard for people to interpret, but they're well-defined and one *can* learn them. If we're going to go beyond regexps in the stdlib (and I'm certainly in favor of that!), let's have a parser that uses a grammar notation that is rarely ambiguous in the way that format strings *usually* are, and when there is ambiguity, demands that the programmer explicitly disambiguate rather than "guessing" in some arbitrary way.
Tests for parsers / [regex] pattern matchers in the CPython standard library: https://github.com/python/cpython/blob/master/Lib/test/test_fstring.py https://github.com/python/cpython/blob/master/Lib/test/re_tests.py https://github.com/python/cpython/blob/master/Lib/test/test_re.py https://github.com/python/cpython/blob/master/Lib/test/test_ast.py https://github.com/python/cpython/blob/master/Lib/test/test_unparse.py https://github.com/python/cpython/blob/master/Lib/test/test_grammar.py https://github.com/python/cpython/blob/master/Lib/test/test_tokenize.py https://github.com/python/cpython/blob/master/Lib/test/test_shlex.py https://github.com/python/cpython/blob/master/Lib/test/test_optparse.py https://github.com/python/cpython/blob/master/Lib/test/test_argparse.py Tests for other parsers / pattern matchers written in Python: https://bitbucket.org/mrabarnett/mrab-regex/src/hg/regex_3/test_regex.py https://github.com/r1chardj0n3s/parse/blob/master/test_parse.py https://github.com/pyparsing/pyparsing/blob/master/tests/test_simple_unit.py https://github.com/jszheng/py3antlr4book https://github.com/dateutil/dateutil/blob/master/dateutil/test/test_parser.p... https://github.com/arrow-py/arrow/blob/master/tests/test_parser.py On Sun, Sep 20, 2020, 5:25 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Greg Ewing writes:
On 20/09/20 7:45 am, Christopher Barker wrote:
In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}>
In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}>
So that's interesting -- different level of "greadiness" for strings than integers
Hmmm, that seems really unintuitive. I think a better result would be a parse error -- "I was told to expect three things, but I only found one."
Are you sure that shouldn't be "I was told to expect three things, but I found six?" ;-)
And why not parse a_string using the "grammar" "{x}{y}{z}" as {'x': 2345, 'y': 6, 'z': 7}? That's perfectly valid *interpreting the 'grammar' as a format string", and therefore might very well be expected. Of course there's probably a rule in parse that {x} is an abbreviation for {x:s}.
Regexps are hard for people to interpret, but they're well-defined and one *can* learn them. If we're going to go beyond regexps in the stdlib (and I'm certainly in favor of that!), let's have a parser that uses a grammar notation that is rarely ambiguous in the way that format strings *usually* are, and when there is ambiguity, demands that the programmer explicitly disambiguate rather than "guessing" in some arbitrary way. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EYIPHO... Code of Conduct: http://python.org/psf/codeofconduct/
Consolidate your understanding of regular expression patterns and substitutions using nltk.re_show(p, s) which annotates the string s to show every place where pattern p was matched, and nltk.app.nemo() which provides a graphical interface for exploring regular expressions. For more practice,
Parsing ... Parsing natural language (s) with and without Context-Free Grammars https://en.wikipedia.org/wiki/Parsing https://en. wikipedia.org/wiki/Natural_language_processing https://en.wikipedia.org/wiki/Context-free_grammar#Undecidable_problems (*) ### NLTK https://en.wikipedia.org/wiki/Natural_Language_Toolkit "3.5 Useful Applications of Regular Expressions" http://www.nltk.org/book/ch03.html try some of the exercises on regular expressions at the end of this chapter. https://github.com/nltk/nltk/blob/develop/nltk/test/tokenize.doctest https://github.com/nltk/nltk/blob/develop/nltk/test/grammar.doctest https://github.com/nltk/nltk/blob/develop/nltk/test/unit/test_tokenize.py https://github.com/nltk/nltk/blob/develop/nltk/test/unit/test_stem.py ## Other tools for natural language https://www.google.com/search?q=site%3Agithub.com+inurl%3Aawesome+spacy+nltk On Sun, Sep 20, 2020, 11:38 AM Wes Turner <wes.turner@gmail.com> wrote:
Tests for parsers / [regex] pattern matchers in the CPython standard library:
https://github.com/python/cpython/blob/master/Lib/test/test_fstring.py
https://github.com/python/cpython/blob/master/Lib/test/re_tests.py https://github.com/python/cpython/blob/master/Lib/test/test_re.py
https://github.com/python/cpython/blob/master/Lib/test/test_ast.py https://github.com/python/cpython/blob/master/Lib/test/test_unparse.py
https://github.com/python/cpython/blob/master/Lib/test/test_grammar.py
https://github.com/python/cpython/blob/master/Lib/test/test_tokenize.py
https://github.com/python/cpython/blob/master/Lib/test/test_shlex.py
https://github.com/python/cpython/blob/master/Lib/test/test_optparse.py https://github.com/python/cpython/blob/master/Lib/test/test_argparse.py
Tests for other parsers / pattern matchers written in Python:
https://bitbucket.org/mrabarnett/mrab-regex/src/hg/regex_3/test_regex.py
https://github.com/r1chardj0n3s/parse/blob/master/test_parse.py
https://github.com/pyparsing/pyparsing/blob/master/tests/test_simple_unit.py
https://github.com/jszheng/py3antlr4book
https://github.com/dateutil/dateutil/blob/master/dateutil/test/test_parser.p...
https://github.com/arrow-py/arrow/blob/master/tests/test_parser.py
On Sun, Sep 20, 2020, 5:25 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Greg Ewing writes:
On 20/09/20 7:45 am, Christopher Barker wrote:
In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}>
In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}>
So that's interesting -- different level of "greadiness" for strings than integers
Hmmm, that seems really unintuitive. I think a better result would be a parse error -- "I was told to expect three things, but I only found one."
Are you sure that shouldn't be "I was told to expect three things, but I found six?" ;-)
And why not parse a_string using the "grammar" "{x}{y}{z}" as {'x': 2345, 'y': 6, 'z': 7}? That's perfectly valid *interpreting the 'grammar' as a format string", and therefore might very well be expected. Of course there's probably a rule in parse that {x} is an abbreviation for {x:s}.
Regexps are hard for people to interpret, but they're well-defined and one *can* learn them. If we're going to go beyond regexps in the stdlib (and I'm certainly in favor of that!), let's have a parser that uses a grammar notation that is rarely ambiguous in the way that format strings *usually* are, and when there is ambiguity, demands that the programmer explicitly disambiguate rather than "guessing" in some arbitrary way. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EYIPHO... Code of Conduct: http://python.org/psf/codeofconduct/
On 20/09/20 9:25 pm, Stephen J. Turnbull wrote:
Are you sure that shouldn't be "I was told to expect three things, but I found six?" ;-)
And why not parse a_string using the "grammar" "{x}{y}{z}" as {'x': 2345, 'y': 6, 'z': 7}?
As a human I tend to expect input formats to be somewhat sensible and have delimiters between runs of letters or digits, so unless told otherwise I would assume 234567 to represent a single number or string. If not, there must be some rule for deciding where to divide it up, in which case that rule should be explicit in the pattern, not implicit in some emergent behaviour of the parser.
let's have a parser that uses a grammar notation that is rarely ambiguous in the way that format strings *usually* are, and when there is ambiguity, demands that the programmer explicitly disambiguate rather than "guessing" in some arbitrary way.
I think we're mostly in agreement. I'm not sure we need yet another pattern language, though. If you have an input language where 234567 represents more than one token, then any way of describing how to parse it will have the same need to disambiguate. The important things are to detect the ambiguity and have a way for the user to resolve it, which I think can be accomplished without inventing a new pattern language. -- Greg
On Sun, Sep 20, 2020 at 2:27 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Are you sure that shouldn't be "I was told to expect three things, but I found six?" ;-)
And why not parse a_string using the "grammar" "{x}{y}{z}" as {'x': 2345, 'y': 6, 'z': 7}? That's perfectly valid *interpreting the 'grammar' as a format string", and therefore might very well be expected. Of course there's probably a rule in parse that {x} is an abbreviation for {x:s}.
I'm sure there is. but if it were me, I would probably require a format specifier.
Regexps are hard for people to interpret, but they're well-defined and one *can* learn them.
I'm sure that's truer, but I know I haven't yet ;-). But it's only been 30 years or so ....
If we're going to go beyond regexps in the stdlib (and I'm certainly in favor of that!), let's have a parser that uses a grammar notation that is rarely ambiguous in the way that format strings *usually* are,
That was my point originally. but in fact, we already DO have regex. So what is the goal of a new syntax? It's certainly not more power or flexibility. So I think I'm actually coming around -- it is pretty nice to be able to use a parsing language that's familiar and simple, even if it doesn't give you full flexibility. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
-1 on the whole proposal. Currently when I see an f-string such as f"My name is {name} and my age is {age}." I can easily translate it mentally (if I have to - usually I don't) into "My name is " + str(name) + " and my age is " + str(age) + "." Whether such a translation is technically accurate or not is irrelevant - the point is that, modulo technical niceties, f-strings are translated by *simple*, *easily understood* and *unambiguous* rules into an expression with *easily understood* semantics. Whereas this proposal - Is a really *obscure* and *non-obvious* way of assigning to variables. I doubt that anybody could readily guess the meaning when seeing it for the first time. (Frankly, I don't care if something similar has been implemented in Scala - I still think it's horrible for Python.) - Would add to the interpreter the bloat of a whole parsing engine whose working would be a *totally **obscure* "black box" (and I suspect would be hard to implement correctly the first time). - Would require a lot of work to implement and maintain (including maintaining the docs) a feature which might not be used much. (IMO no convincing use cases have been presented so far. No doubt there are some. Just IMO not enough to justify the proposal.) Even providing reasonably helpful error messages could be quite a lot of work (a blanket ValueError, e.g., would move it towards the "unusable" end of the spectrum). - Is subject to greedy/non-greedy parsing *ambiguities*. Whatever the choice(s) made, the results will be surprising to some of the people some of the time. And trying to resolve them would surely slow down the parsing process. - The behaviour of said "black box" parsing engine _/could not be changed in any way/_ by users if it wasn't exactly what they wanted, which I suspect would happen quite often, and not just because of the greedy/non-greedy issue. This is my gut feeling - I can't articulate it concretely except to mention an issue that Chris Angelico raised: is partial assignment (i.e. assignment to some but not all of the target variables) supported, or not? Whereas users are free to tweak parse.py or write their own custom parser. I visualise a lot of Stack Overflow questions on the lines of "Why doesn't this [f-string assignment] do what I want?" In fact for this reason (impossibility of finding a spec which will suit everyone) I am opposed to building parsing into the Python _language_, e.g. with a "parse" keyword. A built-in parse() function would not be so bad, though I'd be dubious about it. But having parsing snuck in by, and _only accessible by_, assignment to f-strings, _with no way of users changing it_, strikes me as the worst of all possible worst worlds. - As others have mentioned: could inject variables into locals() making debugging harder. - There is no precedent in Python for an expression on the LHS of an assignment. (Tuple unpacking, dotted names, slices, dictionary keys etc. might or might not be considered counter-examples but at least their intent is clear.) Certainly not for something which looks like a literal on the LHS of an assignment. (As another reference point: Contrast PEP 622, Pattern Matching (whatever flavour of it we end up with). It's on record that I'm not its biggest fan, but I posit that (a) a "match" construct is *easy to understand* when seen for the first time, at least in general intent (b) a "match" construct is translatable to ordinary code containing equality tests, isinstance tests, etc. in a *simple*, *easily understood* and *unambiguous* - if tedious - way.) In short, "assigning" to f-strings is not and cannot be a simple reversal of having them in expressions. Rather, it is opening a big can of worms. Best wishes Rob Cliffe
On Tue, Oct 20, 2020 at 10:37 PM Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
In short, "assigning" to f-strings is not and cannot be a simple reversal of having them in expressions. Rather, it is opening a big can of worms.
It's not a reversal of them being in expressions any more than assigning to a list display is a reversal of constructing a list. It's a parallel operation, a counterpart. It would be Python's way of offering an sscanf-like operation, which is something that I've frequently wished for. Regular expressions aren't ideal for all situations, and there are plenty of times when a simple set of parsing rules could be very nicely encoded into a compact form like this. C's sscanf and sprintf aren't perfect counterparts, but they're incredibly valuable. Python has percent formatting for sprintf, but no form of sscanf.
There is no precedent in Python for an expression on the LHS of an assignment. (Tuple unpacking, dotted names, slices, dictionary keys etc. might or might not be considered counter-examples but at least their intent is clear.) Certainly not for something which looks like a literal on the LHS of an assignment.
And the intent here would be just as clear. Assigning to a tuple or list has very well defined semantics, and they don't perfectly correspond to the way you would construct such a thing on the RHS. So, they *are* precedent for this. An sscanf format string is simple, easily understood, and unambiguous. An f-string assignment target would be the same, but using f-string style notation instead. It's that simple. ChrisA
On Sun, Sep 20, 2020 at 03:59:40AM +0100, Rob Cliffe via Python-ideas wrote:
-1 on the whole proposal.
I agree with the negative on this proposal. I don't think that f-strings are a good fit for this, and I'm not sure that we need syntax for something that could be handled by a function almost as well. But having said that, I'm going to play Devil's Advocate:
Whereas this proposal - Is a really *obscure* and *non-obvious* way of assigning to variables. I doubt that anybody could readily guess the meaning when seeing it for the first time.
For the first time? Maybe not. But the first time I saw Python code, I couldn't make heads or tails of it. All those mysterious square brackets and colons: for x in items[1:]: obj[x] = something I had no idea what it was, had never even heard of the term "slicing", let alone what it did. Some things you just have to learn.
- Would add to the interpreter the bloat of a whole parsing engine whose working would be a *totally **obscure* "black box" (and I suspect would be hard to implement correctly the first time).
I suspect it won't be that hard really. It's a simple form of pattern matching, simpler than regular expressions. In fact you can build a scanf-type function from regular expressions, and back in Python 2.5 days that used to be part of the documentation. https://docs.python.org/2.5/lib/node49.html With a little bit of work, you could make something that took a target string with scanf-style tokens, turn it into a regex, and parse a string. That would let you write something like: path, errors, warnings = scanf('%s - %d errors, %d warnings', line) I'd use something like that. The only advantage, I guess, of f-string like syntax is that the variable names can be embedded inside the template string: f"{path:s} - {errors:d} errors, {%warnings:d}" = line but I'm not convinced that's actually an advantage. I think I prefer the C-style for this.
- Would require a lot of work to implement and maintain (including maintaining the docs)
I don't think that implementation is that hard, although I'm not volunteering to do the work :-) And maintanence would be relativly little unless new functionality is added.
a feature which might not be used much. (IMO no convincing use cases have been presented so far. No doubt there are some. Just IMO not enough to justify the proposal.)
Do you think that there are no use-cases for scanf and sscanf in C? How about regexes? Pattern matching? This is just a variant of those.
Even providing reasonably helpful error messages could be quite a lot of work (a blanket ValueError, e.g., would move it towards the "unusable" end of the spectrum).
I'm not sure what the errors would be, but they surely wouldn't be more cryptic than what you get with a regex, namely silent failure :-)
- Is subject to greedy/non-greedy parsing *ambiguities*. Whatever the choice(s) made, the results will be surprising to some of the people some of the time. And trying to resolve them would surely slow down the parsing process.
scanf if not intended as a full-blown parser. It is an intentionally simple parser, to solve simple problems.
- The behaviour of said "black box" parsing engine _/could not be changed in any way/_ by users if it wasn't exactly what they wanted, which I suspect would happen quite often, and not just because of the greedy/non-greedy issue.
Just because a built-in solution exists, doesn't mean people can't write their own. If someone needs something that isn't handled by the scanf-style parser, they can always write their own regex, or their own parser. Sometimes we want a 500 gigawatt nuclear fusion reaction; but sometimes we just want a AA battery. This is the AA battery of parsing. Personally, I think people use more AA batteries than 500GW fusion reactors :-)
- As others have mentioned: could inject variables into locals() making debugging harder.
I'm dubious about that too. I think it would be better to keep the parsing separate from the name binding, and use regular assignment for that.
In short, "assigning" to f-strings is not and cannot be a simple reversal of having them in expressions. Rather, it is opening a big can of worms.
My feeling is that f-strings is the wrong solution to this problem. It strikes me as a case of "when all you have is a hammer": "f-strings are great! Let's use f-strings for scanning strings!" Yeah, f-strings might be great, but I don't think they are a good match for this functionality. But I would definitely use the functionality. -- Steve
On Tue, Oct 20, 2020 at 11:54 PM Steven D'Aprano <steve@pearwood.info> wrote:
The only advantage, I guess, of f-string like syntax is that the variable names can be embedded inside the template string:
f"{path:s} - {errors:d} errors, {%warnings:d}" = line
but I'm not convinced that's actually an advantage. I think I prefer the C-style for this.
It's exactly the same advantage that they have as rvalues, and I think there's more to be said for it than is obvious at first glance. Consider that patterns might not be nice tidy things like this; they might be parsing an obscure sequence of tokens (because, let's face it, a lot of text out there isn't designed well), so your pattern might be something like: f"{path}: {errors:d}/{warnings:d}" = line or even: f"{acc},{date},{type},{amount:[0-9.]} {description},{refnum:12s}{reference}" = line which is only a very slight exaggeration from the "why the bleep didn't you just use actual CSV" parsing from one of my actual apps. (Banks. Banks are bizarre.)
Sometimes we want a 500 gigawatt nuclear fusion reaction; but sometimes we just want a AA battery. This is the AA battery of parsing.
Personally, I think people use more AA batteries than 500GW fusion reactors :-)
[citation needed] :-)
- As others have mentioned: could inject variables into locals() making debugging harder.
I'm dubious about that too. I think it would be better to keep the parsing separate from the name binding, and use regular assignment for that.
Not sure what the concern with "inject[ing] variables" is. The f-string has to be a literal - it's not a magic object that you can assign to and new locals appear.
Yeah, f-strings might be great, but I don't think they are a good match for this functionality.
But I would definitely use the functionality.
I'd personally prefer sscanf, myself, but I would use this functionality if it landed too. ChrisA
On Wed, Oct 21, 2020 at 02:15:40AM +1100, Chris Angelico wrote:
On Tue, Oct 20, 2020 at 11:54 PM Steven D'Aprano <steve@pearwood.info> wrote:
The only advantage, I guess, of f-string like syntax is that the variable names can be embedded inside the template string:
f"{path:s} - {errors:d} errors, {%warnings:d}" = line
but I'm not convinced that's actually an advantage. I think I prefer the C-style for this.
It's exactly the same advantage that they have as rvalues, and I think there's more to be said for it than is obvious at first glance. Consider that patterns might not be nice tidy things like this; they might be parsing an obscure sequence of tokens (because, let's face it, a lot of text out there isn't designed well), so your pattern might be something like:
f"{path}: {errors:d}/{warnings:d}" = line
*scratches head* I'm not seeing that this is a more obscure or complex example. It looks like you have just changed the format from: path - 123 errors, 456 warnings to path: 123/456
or even:
f"{acc},{date},{type},{amount:[0-9.]} {description},{refnum:12s}{reference}" = line
At the point you have something that complicated, I think that existing solutions with regexes or parsers is better. Overloading something that looks like an f-string with a mini-language that is way more complicated than the mini-language used in actual f-strings is, I think, a losing proposition.
- As others have mentioned: could inject variables into locals() making debugging harder.
I'm dubious about that too. I think it would be better to keep the parsing separate from the name binding, and use regular assignment for that.
Not sure what the concern with "inject[ing] variables" is.
What you see as a feature, some of us see as a problem: the fact that *some* variables will be updated even if the pattern doesn't match. So if you have a pattern that doesn't match the given string, it could still have side-effects of creating or over-writing local variables.
The f-string has to be a literal - it's not a magic object that you can assign to and new locals appear.
Well, no, it's not an object at all, but surely "you can assign to and new locals appear" is the whole point of the feature? You assign to the f-string and it matches values from the string on the right hand side to create new locals, or over-write existing ones. f"{name}: {value:d}" = "eleventytwo: 112" will have the side effect of binding: name = "eleventytwo" value = 112 -- Steve
On Wed, Oct 21, 2020 at 10:39 AM Steven D'Aprano <steve@pearwood.info> wrote:
The f-string has to be a literal - it's not a magic object that you can assign to and new locals appear.
Well, no, it's not an object at all, but surely "you can assign to and new locals appear" is the whole point of the feature? You assign to the f-string and it matches values from the string on the right hand side to create new locals, or over-write existing ones.
f"{name}: {value:d}" = "eleventytwo: 112"
will have the side effect of binding:
name = "eleventytwo" value = 112
Yes. It's *not an object*. The "spooky action at a distance" thing that I was responding to can't be an issue because the assignment is right there. Please, don't make me repeat all the same arguments again. Read the existing thread. ChrisA
On Wed, Oct 21, 2020 at 10:49:41AM +1100, Chris Angelico wrote:
On Wed, Oct 21, 2020 at 10:39 AM Steven D'Aprano <steve@pearwood.info> wrote:
The f-string has to be a literal - it's not a magic object that you can assign to and new locals appear.
Well, no, it's not an object at all, but surely "you can assign to and new locals appear" is the whole point of the feature? You assign to the f-string and it matches values from the string on the right hand side to create new locals, or over-write existing ones.
f"{name}: {value:d}" = "eleventytwo: 112"
will have the side effect of binding:
name = "eleventytwo" value = 112
Yes. It's *not an object*.
True. It's also not a list comprehension, or an import. Why does that matter? I think either we're talking past each other, or you're trying to squirm out of admitting that the whole point of this proposal is to do what you said it doesn't do, namely bind values to names. I don't see why not being an object makes a difference.
The "spooky action at a distance" thing that I was responding to can't be an issue because the assignment is right there.
But it is an issue for the very reason I gave: even if the pattern matching fails, it can still create or overwrite variables. Normal name binding doesn't work like that: a, b, c, d = 1, 2, 3 does not bind to a, b, c. Your take on this proposal would. You see that as a feature: "partial parsing". I, and a number of other people including the OP Dennis, see that as a problem, not a feature.
Please, don't make me repeat all the same arguments again. Read the existing thread.
I have read the entire thread. Just because I disagree with you doesn't mean I haven't read your take on it. If you want to change my mind, you need to be more persuasive in your arguments, not more dismissive of objections. -- Steve
On Wed, Oct 21, 2020 at 12:08 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 21, 2020 at 10:49:41AM +1100, Chris Angelico wrote:
On Wed, Oct 21, 2020 at 10:39 AM Steven D'Aprano <steve@pearwood.info> wrote:
The f-string has to be a literal - it's not a magic object that you can assign to and new locals appear.
Well, no, it's not an object at all, but surely "you can assign to and new locals appear" is the whole point of the feature? You assign to the f-string and it matches values from the string on the right hand side to create new locals, or over-write existing ones.
f"{name}: {value:d}" = "eleventytwo: 112"
will have the side effect of binding:
name = "eleventytwo" value = 112
Yes. It's *not an object*.
True. It's also not a list comprehension, or an import. Why does that matter? I think either we're talking past each other, or you're trying to squirm out of admitting that the whole point of this proposal is to do what you said it doesn't do, namely bind values to names. I don't see why not being an object makes a difference.
The "spooky action at a distance" thing that I was responding to can't be an issue because the assignment is right there.
But it is an issue for the very reason I gave: even if the pattern matching fails, it can still create or overwrite variables. Normal name binding doesn't work like that:
Please explain how it's "spooky action at a distance" if it's a self-contained assignment statement? I know how much you love to argue, but really, this isn't productive. ChrisA
On Tue, Oct 20, 2020, 9:17 PM Chris Angelico
Please explain how it's "spooky action at a distance" if it's a self-contained assignment statement?
I'm not Steven, but I think I'm the first one in the thread to use Einstein's phrase. As I understand your current semantics, that phrase is not the problem. My initial impression of your intent was: foo, bar = 42, 99 # ... a million lines ... line = "123/" # ... more lines ... f"{foo}/{bar}" = line # raises if bar wasn't previously set # binds to prior value if it was set But I think what you want is for the binding line never to raise, but also not to have any local means to know whether 'bar' is a name after that line. Or whether 'foo' is, for that matter. Of course, you could test the name, e.g.: try: bar except: bar = remediate() But that's cumbersome. Overall, I'd find the spooky action at a distance less troubling. In contrast, this is straightforward: line = "123/" pat = "{foo}/{bar}" dct = scan2dict(pat, line) It's either going to create a dictionary of raise an exception. Or maybe a version of the function might make a dataclass or namedtuple instead. I'm not sure exactly what the pattern language might look like, but whatever its details, everything is locally available.
On Wed, Oct 21, 2020 at 1:03 PM David Mertz <mertz@gnosis.cx> wrote:
On Tue, Oct 20, 2020, 9:17 PM Chris Angelico
Please explain how it's "spooky action at a distance" if it's a self-contained assignment statement?
I'm not Steven, but I think I'm the first one in the thread to use Einstein's phrase. As I understand your current semantics, that phrase is not the problem.
My initial impression of your intent was:
foo, bar = 42, 99 # ... a million lines ... line = "123/" # ... more lines ... f"{foo}/{bar}" = line # raises if bar wasn't previously set # binds to prior value if it was set
But I think what you want is for the binding line never to raise, but also not to have any local means to know whether 'bar' is a name after that line. Or whether 'foo' is, for that matter.
Easy: it always is. Whether it has had a value assigned to it is a separate consideration. Consider: if cond: x = 1 Is x a name after this line? Yes, yes it is. There's nothing spooky happening here. ChrisA
On Wed, Oct 21, 2020, 12:12 AM Chris Angelico
But I think what you want is for the binding line never to raise, but also not to have any local means to know whether 'bar' is a name after that line. Or whether 'foo' is, for that matter.
Easy: it always is. Whether it has had a value assigned to it is a separate consideration. Consider:
if cond: x = 1
Is x a name after this line? Yes, yes it is. There's nothing spooky happening here.
I have no idea whether 'x' is a name after that line. Neither do you. I mean, in some Platonic abstraction, I guess 'x' is eternally a name and always has been. But I have no idea whether the next line of: print(x) Will raise an exception or print something out. I.e. is it a key in globals() and/or locals()? I've been stung many times by thinking "x was surely bound" and finding it wasn't. I'm certain you have been too. But I know that conditional blocks have that special property of maybe being executed, maybe not. So among other things, you are introducing another way to spell a conditional block. Even the ternary expression cannot conditionally bind a name ('while' blocks can, of course; and try/except blocks). This is way too much baggage for something that is far clearer as a plain function.
On Tue, Oct 20, 2020, at 22:03, David Mertz wrote:
My initial impression of your intent was:
foo, bar = 42, 99 # ... a million lines ... line = "123/" # ... more lines ... f"{foo}/{bar}" = line # raises if bar wasn't previously set # binds to prior value if it was set
But I think what you want is for the binding line never to raise, but also not to have any local means to know whether 'bar' is a name after that line. Or whether 'foo' is, for that matter.
"whether 'bar' is a name"? It is definitely a name, what you have no means to know is whether it has been assigned a value. I suspect you're trying to do the thing some people do where they insist on 'name' to avoid using the term 'variable', and I don't want to start a debate about that, but you've used it incorrectly here, and your language seems to suggest it would be reasonable to treat bar as *global* if unassigned here. In fact, all names assigned in the function are in scope for the entire function regardless of whether they have been assigned values, whether that assignment is in a conditional that wasn't taken or even simply later in the function than when it is used. Do you have similar objections to assigning inside an if statement? Because the suggested semantics are essentially the same as x = sscanf(line, "%d/%d") if len(x) >= 1: foo = x[0] if len(x) >= 2: bar = x[1]
See my very detailed posts on EXACTLY the concepts you discuss. "whether 'bar' is a name"? It is definitely a name, what you have no means
to know is whether it has been assigned a value. I suspect you're trying to do the thing some people do where they insist on 'name' to avoid using the term 'variable'
I have no idea what you are trying to make "is a name" mean. In an ordinary and Python sense, an unbound "name" isn't a name. I guess you can say "variable" if that makes you happier somehow. I'm not sure if you are trying to make some hair-splitting distinction between a UnboundLocalError and a NameError.
def fun(): ... if False: x = 1 ... print('x' in locals()) ... print('x' in globals()) ... try: ... x ... except Exception as err: ... print(err) fun() False False local variable 'x' referenced before assignment
On Fri, Oct 23, 2020 at 02:23:58AM -0400, David Mertz wrote:
See my very detailed posts on EXACTLY the concepts you discuss.
"whether 'bar' is a name"? It is definitely a name, what you have no means
to know is whether it has been assigned a value. I suspect you're trying to do the thing some people do where they insist on 'name' to avoid using the term 'variable'
I have no idea what you are trying to make "is a name" mean. In an ordinary and Python sense, an unbound "name" isn't a name.
I think you and Random (and possibly Chris?) are talking past each other. At the level of syntax, of course 'bar' is a name. It's not a for-loop, or a keyword, or a float literal, etc. The language reference refers to things like 'bar' as identifiers: https://docs.python.org/3/reference/lexical_analysis.html#identifiers and explicitly gives "name" as a synonym. So from that perspective, there is no doubt that 'bar' is a name. At the level of Python code, identifiers/names are not values at all, let alone first-class values. https://en.wikipedia.org/wiki/First-class_citizen So within the universe of Python values, identifiers don't exist. Bound names *have* a value bound to them; unbound names don't even have that. I'm not sure why this distinction is so relevant to the discussion, but the distinction does exist, and I think that consequently David and Random are arguing at cross-purposes. David, I think, is focused on the fact that an unbound name doesn't even have a value so it can't be anything at all: unbound names are not things you can manipulate within the universe of Python values. Random is, I think, focused on the fact that name is syntactically a name whether it is bound or not, or even whether it appears in the source code. Is "Albuquerque" a (Python) name or not? And consequently I think the two points of view are incommensurable, a bit like centrifugal force. In some frames of reference, centrifugal force is very real, and in others, it doesn't exist. I don't think that arguing one perspective is right and the other is wrong is productive. But if we can't see both perspectives, we cannot effectively communicate with each other. -- Steve
On Wed, Oct 21, 2020 at 12:16:16PM +1100, Chris Angelico wrote:
Please explain how it's "spooky action at a distance" if it's a self-contained assignment statement?
"Spooky action at a distance" is your phrase, not mine, or Rob's. (I think David Mertz may have used it first, but I don't have to justify his choice of wording or agree with it.) I've told you why I think that partial assignments are problematic: a failed pattern match may nevertheless overwrite variables or create new ones. A *failed match* has the side-effect of changing and creating variables. I don't know if that counts as "action at a distance", spooky or otherwise, and frankly I don't care what label you put on it. It's okay if you don't agree with me, but if you want to persuade me (and others) to come around to your way of thinking, you might have more success if you are less touchy and give some concrete examples of why the potential benefits of this outweigh the negatives.
I know how much you love to argue, but really, this isn't productive.
Are we arguing or are we trying to reach consensus on the desirable behaviour and syntax? You're right that this discussion is not very productive at the moment. I haven't quite given up hope that we'll reach some sort of agreement, or at least understanding, about the desired functionality, but if you think there's no chance of that happening, then I guess this thread is going nowhere. -- Steve
On Thu, Oct 22, 2020 at 10:36 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 21, 2020 at 12:16:16PM +1100, Chris Angelico wrote:
Please explain how it's "spooky action at a distance" if it's a self-contained assignment statement?
"Spooky action at a distance" is your phrase, not mine, or Rob's.
(I think David Mertz may have used it first, but I don't have to justify his choice of wording or agree with it.)
I don't remember who it was, but I was responding to that exact phrase, and you jumped on me and said that my response wasn't a response for reasons. So you were arguing in support of the phrase.
I know how much you love to argue, but really, this isn't productive.
Are we arguing or are we trying to reach consensus on the desirable behaviour and syntax?
I'm really not sure. I'm arguing in favour of all the same things that made f-strings worth having, but from a parsing point of view. You're saying, hey, let's have all that, only not have any of it, and I have no idea what you're actually pushing for. I'm asking for something that has proven to be useful and practical in other languages, and you're saying that it wouldn't be useful or practical. Not really sure how this works. ChrisA
If I was the list moderator, at this point I would put the thread on a "cooling off" timeout for 8-16 hours. Both of you seem to be mainly complaining about each other's interaction style rather than adding anything of substance. Then again neither am I so why am I even writing this. :-) On Wed, Oct 21, 2020 at 5:51 PM Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 22, 2020 at 10:36 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 21, 2020 at 12:16:16PM +1100, Chris Angelico wrote:
Please explain how it's "spooky action at a distance" if it's a self-contained assignment statement?
"Spooky action at a distance" is your phrase, not mine, or Rob's.
(I think David Mertz may have used it first, but I don't have to justify his choice of wording or agree with it.)
I don't remember who it was, but I was responding to that exact phrase, and you jumped on me and said that my response wasn't a response for reasons. So you were arguing in support of the phrase.
I know how much you love to argue, but really, this isn't productive.
Are we arguing or are we trying to reach consensus on the desirable behaviour and syntax?
I'm really not sure. I'm arguing in favour of all the same things that made f-strings worth having, but from a parsing point of view. You're saying, hey, let's have all that, only not have any of it, and I have no idea what you're actually pushing for. I'm asking for something that has proven to be useful and practical in other languages, and you're saying that it wouldn't be useful or practical. Not really sure how this works.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FJW6SS... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
To bring it back to a concrete idea, here's how I see things: 1. The idea of f-string-like assignment targets has little support. Only Chris, and maybe the OP who seems to have gone away. 2. The idea of a "scanning language" seems to garner a fair amount of enthusiasm from everyone who has commented. 3. Having the scanning language be "inspired by" f-strings seems to fit nicely with Python 4. Lots of folks like C scanf() as another inspiration for the need. I was not being sarcastic in saying that I thought COBOL PICTURE clauses are another similar useful case. I think Perl 6 "rules" were trying to do something along those lines... but, well, Perl. 5. In my opinion, this is naturally a function, or several related functions, not new syntax (I think Steven agrees) So the question is, what should the scanning language look like? Another question is: "Does this already exist?" I'm looking around PyPI, and I see this that looks vaguely along the same lines. But most likely I am missing things: https://pypi.org/project/rebulk/ In terms of API, assuming functions, I think there are two basic models. We could have two (or more) functions that were related though: # E.g. pat_with_names = "{foo:f}/{bar:4s}/{baz:3d}" matches = scan_to_obj(pat_with_names, haystack) # something like (different match objects are possible choices, dict, dataclass, etc) print(matches.foo) print(maches['bar']) Alternately: # pat_only = "{:f}/{:4s}/{:3d}" foo, bar, baz = scan_to_tuple(pat_only, haystack) # names, if bound, have the types indicated by scanning language There are questions open about partial matching, defaults, exceptions to raise, etc. But the general utility of something along those lines seems roughly consensus. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
On Wed, Oct 21, 2020 at 7:12 PM David Mertz <mertz@gnosis.cx> wrote:
To bring it back to a concrete idea, here's how I see things:
1. The idea of f-string-like assignment targets has little support. Only Chris, and maybe the OP who seems to have gone away. 2. The idea of a "scanning language" seems to garner a fair amount of enthusiasm from everyone who has commented. 3. Having the scanning language be "inspired by" f-strings seems to fit nicely with Python 4. Lots of folks like C scanf() as another inspiration for the need. I was not being sarcastic in saying that I thought COBOL PICTURE clauses are another similar useful case. I think Perl 6 "rules" were trying to do something along those lines... but, well, Perl. 5. In my opinion, this is naturally a function, or several related functions, not new syntax (I think Steven agrees)
So the question is, what should the scanning language look like? Another question is: "Does this already exist?"
I'm looking around PyPI, and I see this that looks vaguely along the same lines. But most likely I am missing things: https://pypi.org/project/rebulk/
In terms of API, assuming functions, I think there are two basic models. We could have two (or more) functions that were related though:
# E.g. pat_with_names = "{foo:f}/{bar:4s}/{baz:3d}" matches = scan_to_obj(pat_with_names, haystack) # something like (different match objects are possible choices, dict, dataclass, etc) print(matches.foo) print(maches['bar'])
Alternately:
# pat_only = "{:f}/{:4s}/{:3d}" foo, bar, baz = scan_to_tuple(pat_only, haystack) # names, if bound, have the types indicated by scanning language
There are questions open about partial matching, defaults, exceptions to raise, etc. But the general utility of something along those lines seems roughly consensus.
Hmm, if the above is acceptable, maybe f-strings are still the logical next step, since they bring the format and the target name together again. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Thu, Oct 22, 2020 at 4:17 AM Guido van Rossum <guido@python.org> wrote:
In terms of API, assuming functions, I think there are two basic models. We could have two (or more) functions that were related though:
# E.g. pat_with_names = "{foo:f}/{bar:4s}/{baz:3d}" matches = scan_to_obj(pat_with_names, haystack) # something like (different match objects are possible choices, dict, dataclass, etc) print(matches.foo) print(maches['bar']) # pat_only = "{:f}/{:4s}/{:3d}" foo, bar, baz = scan_to_tuple(pat_only, haystack) # names, if bound, have the types indicated by scanning language
Hmm, if the above is acceptable, maybe f-strings are still the logical next step, since they bring the format and the target name together again.
Sure, but they need to be "f-like-strings". Some things are not allowed, e.g. no "{foo+1}". And there shouldn't be an 'f' prefix since they are NOT interpolated where written. But most of the format language can work. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
On Wed, Oct 21, 2020 at 07:17:21PM -0700, Guido van Rossum wrote:
Hmm, if the above is acceptable, maybe f-strings are still the logical next step, since they bring the format and the target name together again.
That's not the only way to bring the format and target name together. Both brace-style and percent-style can do that: '{number:d}' '%(number)d' I'm not sure if you've caught up on the entire thread, but Eric Smith is opposed to this: https://mail.python.org/archives/list/python-ideas@python.org/message/KTGQLT... (For the benefit of those who aren't aware of the f-string history, Eric wrote the PEP and at least part of the implementation.) f-strings also bring so much more to the table than is needed. For this syntax to be sensible, we would have to prohibit so many legal f-strings that it seems perverse to continue calling the subset of what's left over f-strings. 1. They don't behave like f-strings: text scanning, not evaluation of code. What meaning could we give an f-string target like this? f'{len(x)+1}' = string 2. They don't support the same formatting options as f-strings. Chris has suggested a superset of formatting options, similar to regexes; other f-string codes would be meaningless: f'a {spam:^5_d} b' = 'a 1234 b' # centre align; group digits using underscore 3. As Eric explains, they don't share the same mechanism as f-strings. 4. Actual f-strings need the prefix to distinguish them from regular strings. But as an assignment target, there is no existing meaning to 'email: {name}@{domain}' = string so the f prefix has no purpose. When they have so little in common with f-strings, apart from a spurious and unnecessary f-prefix, why are we calling them f-strings? Another problem is that using only a literal/display form as a target means you can't pre-assemble a pattern and apply it later: # Match only the given domain. domain = get_wanted_domain() pattern = 'email: {name}@%s' % domain # ... now what? I guess you could fall back to eval: eval('f{!r} = {!r}'.format(pattern, string)) but given that both the pattern and the string to be scanned are likely to contain untrusted strings, that's probably not a good idea. -- Steve
On Thu, Oct 22, 2020 at 8:22 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 21, 2020 at 07:17:21PM -0700, Guido van Rossum wrote:
Hmm, if the above is acceptable, maybe f-strings are still the logical next step, since they bring the format and the target name together again.
That's not the only way to bring the format and target name together. Both brace-style and percent-style can do that:
'{number:d}' '%(number)d'
Maybe, but if it isn't an actual syntactic construct, there's no value - at best, all you can do is return a dictionary, and there's no convenient way to assign those to locals.
f-strings also bring so much more to the table than is needed. For this syntax to be sensible, we would have to prohibit so many legal f-strings that it seems perverse to continue calling the subset of what's left over f-strings.
1. They don't behave like f-strings: text scanning, not evaluation of code. What meaning could we give an f-string target like this?
f'{len(x)+1}' = string
...... mini-language inspired by format language. I'm really not sure how many times this has to be said. You don't see C programmers complaining that sscanf and sprintf don't support the exact same set of things. They're sufficiently similar that it is incredibly practical and useful; I think there might be something in the zen of Python about that...
4. Actual f-strings need the prefix to distinguish them from regular strings. But as an assignment target, there is no existing meaning to
'email: {name}@{domain}' = string
so the f prefix has no purpose.
When they have so little in common with f-strings, apart from a spurious and unnecessary f-prefix, why are we calling them f-strings?
If you're saying that this would be best done with an "assign to string literal" syntax, then maybe, but I think the similarity with RHS f-strings is useful enough to keep the prefix. Additionally, the fact that both of them have to be syntactic constructs rather than reusable objects is a useful parallel and a useful reminder.
Another problem is that using only a literal/display form as a target means you can't pre-assemble a pattern and apply it later:
# Match only the given domain. domain = get_wanted_domain() pattern = 'email: {name}@%s' % domain # ... now what?
Right, and you can't do that with f-strings on the RHS either. The specific thing you're asking about could easily be implemented as a feature of the minilanguage itself, but I'm not sure it'd actually be needed. Building patterns for future parsing is simply not the job of this feature - use a regex if you need that.
I guess you could fall back to eval:
eval('f{!r} = {!r}'.format(pattern, string))
but given that both the pattern and the string to be scanned are likely to contain untrusted strings, that's probably not a good idea.
Agreed. Don't do that. There are PLENTY of better options than eval. ChrisA
On Thu, 22 Oct 2020 at 13:36, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 22, 2020 at 8:22 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 21, 2020 at 07:17:21PM -0700, Guido van Rossum wrote:
Hmm, if the above is acceptable, maybe f-strings are still the logical next step, since they bring the format and the target name together again.
That's not the only way to bring the format and target name together. Both brace-style and percent-style can do that:
'{number:d}' '%(number)d'
Maybe, but if it isn't an actual syntactic construct, there's no value - at best, all you can do is return a dictionary, and there's no convenient way to assign those to locals.
I think that there's *plenty* of value. Returning a dictionary (or maybe a dataclass) is far from merely being "at best", it's a common, well-known, and easy to use pattern. Libraries like argparse use it very successfully. Direct assignment to locals would require syntax, but I think a lot of people in this thread consider that to be at best a minor additional convenience - and many people have argued that the pitfalls are a lot worse than the benefits, and consider direct assignment to locals to be a disadvantage, rather than a benefit. Paul
On 10/22/2020 8:29 AM, Chris Angelico wrote:
Another problem is that using only a literal/display form as a target means you can't pre-assemble a pattern and apply it later:
# Match only the given domain. domain = get_wanted_domain() pattern = 'email: {name}@%s' % domain # ... now what? Right, and you can't do that with f-strings on the RHS either. The specific thing you're asking about could easily be implemented as a feature of the minilanguage itself, but I'm not sure it'd actually be needed. Building patterns for future parsing is simply not the job of
On Thu, Oct 22, 2020 at 8:22 PM Steven D'Aprano <steve@pearwood.info> wrote: this feature - use a regex if you need that.
In the case of f-strings, the fallback is str.format(), which uses the exact same format specifiers. What's the equivalent when you need dynamic 'f-string assignment targets'? Eric
On Fri, Oct 23, 2020 at 12:31 AM Eric V. Smith <eric@trueblade.com> wrote:
On 10/22/2020 8:29 AM, Chris Angelico wrote:
Another problem is that using only a literal/display form as a target means you can't pre-assemble a pattern and apply it later:
# Match only the given domain. domain = get_wanted_domain() pattern = 'email: {name}@%s' % domain # ... now what? Right, and you can't do that with f-strings on the RHS either. The specific thing you're asking about could easily be implemented as a feature of the minilanguage itself, but I'm not sure it'd actually be needed. Building patterns for future parsing is simply not the job of
On Thu, Oct 22, 2020 at 8:22 PM Steven D'Aprano <steve@pearwood.info> wrote: this feature - use a regex if you need that.
In the case of f-strings, the fallback is str.format(), which uses the exact same format specifiers. What's the equivalent when you need dynamic 'f-string assignment targets'?
Good question, but whatever it is, it can be part of the same proposal. A standard library function that does the same kind of thing but returns a dictionary would be useful, even if it isn't as important. Pike has sscanf (compiler feature that assigns directly to things) and array_sscanf (returns a sequential collection of matched items), and Python could do the same kind of thing. Returning a dict would be FAR less convenient for the most common cases, but as you say, it'd be the fallback for when you need dynamic parsing. ChrisA
On Thu, 22 Oct 2020 at 14:44, Chris Angelico <rosuav@gmail.com> wrote:
Returning a dict would be FAR less convenient for the most common cases, but as you say, it'd be the fallback for when you need dynamic parsing.
If you're that sure that direct assignment to locals would be a killer feature (I'm not, but you seem to be unimpressed by any scanning proposal that *doesn't* include it) maybe a better syntax proposal would be something that takes a dict (or more generally maybe a "namespace") and injects it into locals? That could be used by a scanner function, as well as any other library that wanted a "directly assign variables" interface. So rather than f"{var1} {var2} {var3}" = target we'd have inject_vars parser("{var1} {var2} {var3}", target) and people who don't like/want variable assignment can just use parser() as a normal function returning a dict. If you're interested in something like that (I'm not) make it a new thread, though, as it's way off topic for this thread. Paul
On Fri, Oct 23, 2020 at 12:56 AM Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 22 Oct 2020 at 14:44, Chris Angelico <rosuav@gmail.com> wrote:
Returning a dict would be FAR less convenient for the most common cases, but as you say, it'd be the fallback for when you need dynamic parsing.
If you're that sure that direct assignment to locals would be a killer feature (I'm not, but you seem to be unimpressed by any scanning proposal that *doesn't* include it) maybe a better syntax proposal would be something that takes a dict (or more generally maybe a "namespace") and injects it into locals? That could be used by a scanner function, as well as any other library that wanted a "directly assign variables" interface. So rather than
f"{var1} {var2} {var3}" = target
we'd have
inject_vars parser("{var1} {var2} {var3}", target)
and people who don't like/want variable assignment can just use parser() as a normal function returning a dict.
The trouble with this is that there's no way to know, at compilation time, what names are being assigned to. I think that basically makes it a non-starter. It'd have to be something like JavaScript's destructuring syntax, where you explicitly name the variables you want to grab: const {var1, var2, var3} = target; but that would mean the same duplication of names that was the reason for f-strings on the RHS. I think this kind of thing might be useful, but it wouldn't be enough to make this feature work. ChrisA
On Wed, Oct 21, 2020 at 07:17:21PM -0700, Guido van Rossum wrote:
Hmm, if the above is acceptable, maybe f-strings are still the logical next step, since they bring the format and the target name together again. That's not the only way to bring the format and target name together. Both brace-style and percent-style can do that:
'{number:d}' '%(number)d'
I'm not sure if you've caught up on the entire thread, but Eric Smith is opposed to this:
https://mail.python.org/archives/list/python-ideas@python.org/message/KTGQLT...
(For the benefit of those who aren't aware of the f-string history, Eric wrote the PEP and at least part of the implementation.) Steven sums up my objects very well here. Thanks, Steven. I think we definitely should not use the "f" prefix, and using no prefix at all is
On 10/22/2020 5:21 AM, Steven D'Aprano wrote: probably better than using a new prefix. The only thing this proposal brings that couldn't be done with either a str method or with a 3rd party library is automatic variable creation/binding when the LHS is a literal. If we're seriously thinking of doing that, I'd rather it be regex based that some newly invented scanf-like (or even __format__format spec-like) "language", for lack of a better term. But again, I don't think this rises to the level of usefulness that would elevate it to needing syntax. Is: "{a:%d} {b:%d} {c:%d}" ="123 432 567" such a big improvement over: a, b, c = sscanf("%d %d %d", "123 432 567") ? Reasonable people will disagree, but I think the answer is "no", especially when you factor in the inability to use a computed value for the LHS in the first example. And if all f-strings brought to the table was the ability to use simple variables as the value to be formatted, I'd have been opposed to that as well. What's the point of: f"{a} {b} {c}" when: "{} {} {}".format(a, b, c) does the same thing? This was literally one of the proposals considered when f-strings were being designed, and I was opposed. It's the ability of f-strings to handle arbitrary expressions that gives them the usefulness (and love!) that they have. Since that's by necessity missing from the "f-strings as assignment targets" proposal (as I understand it), then I again think we could just live with a str method or library that implements this. Eric
f-strings also bring so much more to the table than is needed. For this syntax to be sensible, we would have to prohibit so many legal f-strings that it seems perverse to continue calling the subset of what's left over f-strings.
1. They don't behave like f-strings: text scanning, not evaluation of code. What meaning could we give an f-string target like this?
f'{len(x)+1}' = string
2. They don't support the same formatting options as f-strings.
Chris has suggested a superset of formatting options, similar to regexes; other f-string codes would be meaningless:
f'a {spam:^5_d} b' = 'a 1234 b' # centre align; group digits using underscore
3. As Eric explains, they don't share the same mechanism as f-strings.
4. Actual f-strings need the prefix to distinguish them from regular strings. But as an assignment target, there is no existing meaning to
'email: {name}@{domain}' = string
so the f prefix has no purpose.
When they have so little in common with f-strings, apart from a spurious and unnecessary f-prefix, why are we calling them f-strings?
Another problem is that using only a literal/display form as a target means you can't pre-assemble a pattern and apply it later:
# Match only the given domain. domain = get_wanted_domain() pattern = 'email: {name}@%s' % domain # ... now what?
I guess you could fall back to eval:
eval('f{!r} = {!r}'.format(pattern, string))
but given that both the pattern and the string to be scanned are likely to contain untrusted strings, that's probably not a good idea.
On Fri, Oct 23, 2020 at 1:11 AM Eric V. Smith <eric@trueblade.com> wrote:
And if all f-strings brought to the table was the ability to use simple variables as the value to be formatted, I'd have been opposed to that as well. What's the point of:
f"{a} {b} {c}"
when:
"{} {} {}".format(a, b, c)
does the same thing? This was literally one of the proposals considered when f-strings were being designed, and I was opposed.
The rationale in PEP 498 advocates the benefits even with just simple names. You don't have to be embedding complex expressions in order to benefit from f-strings. (Also, assignment targets don't have to be simple names; targets like "x[1]" or "x.y" would be perfectly valid. Although I doubt anyone would want to assign to "a,b,c" inside an expression like this.) ChrisA
On 10/22/2020 10:15 AM, Chris Angelico wrote:
On Fri, Oct 23, 2020 at 1:11 AM Eric V. Smith <eric@trueblade.com> wrote:
And if all f-strings brought to the table was the ability to use simple variables as the value to be formatted, I'd have been opposed to that as well. What's the point of:
f"{a} {b} {c}"
when:
"{} {} {}".format(a, b, c)
does the same thing? This was literally one of the proposals considered when f-strings were being designed, and I was opposed.
The rationale in PEP 498 advocates the benefits even with just simple names. You don't have to be embedding complex expressions in order to benefit from f-strings.
(Also, assignment targets don't have to be simple names; targets like "x[1]" or "x.y" would be perfectly valid. Although I doubt anyone would want to assign to "a,b,c" inside an expression like this.)
None of which changes my opinion that it's not worth making this into syntax. As I said, reasonable people can disagree. But I think I'll wait until I see a PEP before commenting more broadly. Eric
Another way this could go: If PEP 634 (pattern matching, reborn) gets accepted, a future Python version could add f-string patterns (which the PEP currently forbids). E.g. ``` x = "123 456 789" match x: case f"{a} {b}": print("A pair:", a, b) case f"{a} {b} {c}": print("Triple", a, b, c) ``` Everything else (e.g. how to design the formatting language) would be open. We could even designate "raw" f-strings (rf"...") as regular expressions (<wink 0.5>). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 10/22/2020 11:04 AM, Guido van Rossum wrote:
Another way this could go: If PEP 634 (pattern matching, reborn) gets accepted, a future Python version could add f-string patterns (which the PEP currently forbids). E.g. ``` x = "123 456 789" match x: case f"{a} {b}": print("A pair:", a, b) case f"{a} {b} {c}": print("Triple", a, b, c) ``` Everything else (e.g. how to design the formatting language) would be open. We could even designate "raw" f-strings (rf"...") as regular expressions (<wink 0.5>).
I think it makes sense to move this into pattern matching, since it already has the variable binding baked in and well specified. As you say, it still leaves all of the issues about specifying the format language. But as I've said, I think that should be done outside of any syntax proposal, and could be done now. In particular it should address returning types other than str. As MRAB points out (and assuming the proposal allows type conversion), how would it specify datetimes, decimals, etc., and how would it be extensible? And if it does specify type conversions, I think it would need to be something that could be understood by mypy. Eric
On Thu, 22 Oct 2020 at 03:16, David Mertz <mertz@gnosis.cx> wrote:
To bring it back to a concrete idea, here's how I see things:
The idea of f-string-like assignment targets has little support. Only Chris, and maybe the OP who seems to have gone away. The idea of a "scanning language" seems to garner a fair amount of enthusiasm from everyone who has commented. Having the scanning language be "inspired by" f-strings seems to fit nicely with Python Lots of folks like C scanf() as another inspiration for the need. I was not being sarcastic in saying that I thought COBOL PICTURE clauses are another similar useful case. I think Perl 6 "rules" were trying to do something along those lines... but, well, Perl.
There's also SNOBOL (and Icon) matching functions, although they are less of a pattern "language" and more of a set of building blocks for making matchers.
In my opinion, this is naturally a function, or several related functions, not new syntax (I think Steven agrees)
So the question is, what should the scanning language look like? Another question is: "Does this already exist?"
I'm looking around PyPI, and I see this that looks vaguely along the same lines. But most likely I am missing things: https://pypi.org/project/rebulk/
The other one I know of (mentioned previously somewhere in all of this) is https://pypi.org/project/parse/
In terms of API, assuming functions, I think there are two basic models. We could have two (or more) functions that were related though:
# E.g. pat_with_names = "{foo:f}/{bar:4s}/{baz:3d}" matches = scan_to_obj(pat_with_names, haystack) # something like (different match objects are possible choices, dict, dataclass, etc) print(matches.foo) print(maches['bar'])
Alternately:
# pat_only = "{:f}/{:4s}/{:3d}" foo, bar, baz = scan_to_tuple(pat_only, haystack) # names, if bound, have the types indicated by scanning language
There are questions open about partial matching, defaults, exceptions to raise, etc. But the general utility of something along those lines seems roughly consensus.
Agreed Paul
On 22.10.2020 04:12, David Mertz wrote:
To bring it back to a concrete idea, here's how I see things:
1. The idea of f-string-like assignment targets has little support. Only Chris, and maybe the OP who seems to have gone away. 2. The idea of a "scanning language" seems to garner a fair amount of enthusiasm from everyone who has commented. 3. Having the scanning language be "inspired by" f-strings seems to fit nicely with Python 4. Lots of folks like C scanf() as another inspiration for the need. I was not being sarcastic in saying that I thought COBOL PICTURE clauses are another similar useful case. I think Perl 6 "rules" were trying to do something along those lines... but, well, Perl. 5. In my opinion, this is naturally a function, or several related functions, not new syntax (I think Steven agrees)
So the question is, what should the scanning language look like? Another question is: "Does this already exist?"
I'm looking around PyPI, and I see this that looks vaguely along the same lines. But most likely I am missing things: https://pypi.org/project/rebulk/
In terms of API, assuming functions, I think there are two basic models. We could have two (or more) functions that were related though:
# E.g. pat_with_names = "{foo:f}/{bar:4s}/{baz:3d}" matches = scan_to_obj(pat_with_names, haystack) # something like (different match objects are possible choices, dict, dataclass, etc) print(matches.foo) print(maches['bar'])
Alternately:
# pat_only = "{:f}/{:4s}/{:3d}" foo, bar, baz = scan_to_tuple(pat_only, haystack) # names, if bound, have the types indicated by scanning language
There are questions open about partial matching, defaults, exceptions to raise, etc. But the general utility of something along those lines seems roughly consensus.
I like this idea :-) There are lots of use cases where regular expressions + subsequent type conversion are just overkill for a small parsing task. The above would fit this space quite nicely, esp. since it already comes with a set of typical format you have to parse, without having to worry about the nitty details (as you have to do with REs) or the type conversion from string to e.g. float. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 22 2020)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On 2020-10-22 08:50, M.-A. Lemburg wrote:
On 22.10.2020 04:12, David Mertz wrote:
To bring it back to a concrete idea, here's how I see things:
1. The idea of f-string-like assignment targets has little support. Only Chris, and maybe the OP who seems to have gone away. 2. The idea of a "scanning language" seems to garner a fair amount of enthusiasm from everyone who has commented. 3. Having the scanning language be "inspired by" f-strings seems to fit nicely with Python 4. Lots of folks like C scanf() as another inspiration for the need. I was not being sarcastic in saying that I thought COBOL PICTURE clauses are another similar useful case. I think Perl 6 "rules" were trying to do something along those lines... but, well, Perl. 5. In my opinion, this is naturally a function, or several related functions, not new syntax (I think Steven agrees)
So the question is, what should the scanning language look like? Another question is: "Does this already exist?"
I'm looking around PyPI, and I see this that looks vaguely along the same lines. But most likely I am missing things: https://pypi.org/project/rebulk/
In terms of API, assuming functions, I think there are two basic models. We could have two (or more) functions that were related though:
# E.g. pat_with_names = "{foo:f}/{bar:4s}/{baz:3d}" matches = scan_to_obj(pat_with_names, haystack) # something like (different match objects are possible choices, dict, dataclass, etc) print(matches.foo) print(maches['bar'])
Alternately:
# pat_only = "{:f}/{:4s}/{:3d}" foo, bar, baz = scan_to_tuple(pat_only, haystack) # names, if bound, have the types indicated by scanning language
There are questions open about partial matching, defaults, exceptions to raise, etc. But the general utility of something along those lines seems roughly consensus.
I like this idea :-)
There are lots of use cases where regular expressions + subsequent type conversion are just overkill for a small parsing task.
The above would fit this space quite nicely, esp. since it already comes with a set of typical format you have to parse, without having to worry about the nitty details (as you have to do with REs) or the type conversion from string to e.g. float.
One limitation is that only a few types would supported: 's' for str, 'd' or 'x' for int, 'f' for float. But what if you wanted to scan to a Decimal instead of a float, or scan a date? A date could be formatted any number of ways! So perhaps the scanning format should also let you specify the target type. For example, given "{?datetime:%H:%M}", it would look up the pre-registered name "datetime" to get a scanner; the scanner would be given the format, the string and the position and would return the value and the new position. I used '?' in the scan format to distinguish it from a format string. It might even be possible to use the same format for both formatting and scanning. For example, given "{?datetime:%H:%M}", string formatting would just ignore the "?datetime" part.
On 10/20/2020 7:38 PM, Steven D'Aprano wrote:
or even:
f"{acc},{date},{type},{amount:[0-9.]} {description},{refnum:12s}{reference}" = line At the point you have something that complicated, I think that existing solutions with regexes or parsers is better. Overloading something that looks like an f-string with a mini-language that is way more complicated
On Wed, Oct 21, 2020 at 02:15:40AM +1100, Chris Angelico wrote: than the mini-language used in actual f-strings is, I think, a losing proposition.
I'm completely opposed to this feature, but let me make a few comments here. Not only is the mini-language here more complicated, it's the exact antithesis of __format__ machinery. With __format__, the logic is "I've got an object, which obviously has a known type, ask it how to format itself with a format spec I haven't even looked at". With the discussion I've seen so far with f-string assignment targets, you're saying "I see a format spec, let me deeply inspect the format spec, guess the type of the object it might apply to, then I'll guess on how to go from a string to that type of object". In the example above, what's the type of "amount" after it's created? A str? An int? Double? Complex? Datetime?
- As others have mentioned: could inject variables into locals() making debugging harder.
I'm dubious about that too. I think it would be better to keep the parsing separate from the name binding, and use regular assignment for that. Not sure what the concern with "inject[ing] variables" is. What you see as a feature, some of us see as a problem: the fact that *some* variables will be updated even if the pattern doesn't match.
So if you have a pattern that doesn't match the given string, it could still have side-effects of creating or over-writing local variables.
The f-string has to be a literal - it's not a magic object that you can assign to and new locals appear.
f-strings have str.format to work around the literal vs. computed value problem. They use the identical format specs, because as I noted above, the machinery doesn't try and understand the format spec. Only the object itself (well, its type) interprets them, via __format__. What's the fallback equivalent in the f-string as assignment target case? I think you should provide similar functionality using non-literals. If not the magic "locals pop into existence" version, then at least the same parsing ability. That is, you should be able to say: s = "{acc},{date},{type},{amount:[0-9.]} {description},{refnum:12s}{reference}" acc, date, type, amount, description, refnum, reference = super_scanf(s) You'd also need this to allow for i18n, just as i18n can't use f-strings. I'd be opposed to adding something like this without being able to also operate on non-literals. So first we should spec out how my super_scanf function would work, and how it would figure out the values (and especially their types) to return. I think this is the weakest, most hand-wavy part of any proposal being discussed here, and it needs to be solved as a prerequisite for the version that creates locals. And the beauty is that it could be written today, in pure Python. Eric
participants (22)
-
Alex Hall
-
Ben Rudiak-Gould
-
Brendan Barnwell
-
Calvin Spealman
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Dennis Sweeney
-
Edwin Zimmerman
-
Eric V. Smith
-
Greg Ewing
-
Guido van Rossum
-
M.-A. Lemburg
-
Marco Sulla
-
MRAB
-
Paul Moore
-
Random832
-
Ricky Teachey
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Wes Turner