![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 1:13 AM Rob Cliffe <rob.cliffe@btinternet.com> wrote:
On 20/10/2020 13:05, Chris Angelico wrote:
On Tue, Oct 20, 2020 at 10:37 PM Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
In short, "assigning" to f-strings is not and cannot be a simple reversal of having them in expressions. Rather, it is opening a big can of worms.
It's not a reversal of them being in expressions any more than assigning to a list display is a reversal of constructing a list. It's a parallel operation, a counterpart.
It would be Python's way of offering an sscanf-like operation, which is something that I've frequently wished for. Regular expressions aren't ideal for all situations, and there are plenty of times when a simple set of parsing rules could be very nicely encoded into a compact form like this. C's sscanf and sprintf aren't perfect counterparts, but they're incredibly valuable. Python has percent formatting for sprintf, but no form of sscanf.
Er, well, why not just add a sscanf to Python?
A couple of reasons, but the main one is that you can't have "output variables" (in C, sscanf takes pointers, so you pass it the address of your variables), which means that all you'd get is a fancy string split and then you put all the assignment on the left. That leads to an all-or-nothing result for the simple form, and no easy way to do anything else. Consider: a, b, c = sscanf("%d %d %d", "123 432") Boom! Instead of having a clean way to represent partial parsing. It'd still be better than regex partial parsing, where you get absolute silence and no way to debug it, but it's not nearly as good as could be done. ChrisA
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 02:18:49AM +1100, Chris Angelico wrote:
Er, well, why not just add a sscanf to Python?
A couple of reasons, but the main one is that you can't have "output variables" (in C, sscanf takes pointers, so you pass it the address of your variables), which means that all you'd get is a fancy string split and then you put all the assignment on the left.
That sounds like a big PLUS to me :-) A fancy string split is precisely what I want. The less magic the better.
That leads to an all-or-nothing result for the simple form, and no easy way to do anything else. Consider:
a, b, c = sscanf("%d %d %d", "123 432")
Boom! Instead of having a clean way to represent partial parsing.
Okay, let's try "partial parsing" with an f-string style: f"{spam:d} {eggs:d} {cheese:d}" = "123 456" Now what? How do you use this in practice? try: spam except NameError: print("no match!") else: try: eggs except NameError: print("Matched spam only") else: try: cheese except NameError: print("Matches spam and eggs only") else: process(spam, eggs, cheese) Gag me with a spoon. In general, Python bindings are *all or nothing* -- either all the targets get bound, or none of them. You may be able to find some odd corner case where this is not true, but the most common case is all-or-nothing: spam, eggs, cheese = (123, 456) I think having "Boom!" (an exception) when your pattern doesn't match is the right thing to do, at least by default. But if you want to get fancy, we can supply a missing value: spam, eggs, cheese = sscanf("%d %d %d", "123 432", missing=None) assert spam == 123 assert eggs == 432 assert cheese is None -- Steve
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On Tue, 20 Oct 2020 at 17:04, Steven D'Aprano <steve@pearwood.info> wrote:
In general, Python bindings are *all or nothing* -- either all the targets get bound, or none of them.
I wonder if this could work with the proposed pattern matching statement. The current proposal doesn't allow it, but maybe a future enhancement would make something like this possible: match the_input: case "{one} {two} {three}": print("Got three numbers", one, two, three) You'd need some sort of "matching object" rather than a string there (strings already have a meaning in the proposal) but maybe something could be made to work? Paul
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 3:04 AM Steven D'Aprano <steve@pearwood.info> wrote:
That leads to an all-or-nothing result for the simple form, and no easy way to do anything else. Consider:
a, b, c = sscanf("%d %d %d", "123 432")
Boom! Instead of having a clean way to represent partial parsing.
Okay, let's try "partial parsing" with an f-string style:
f"{spam:d} {eggs:d} {cheese:d}" = "123 456"
Now what? How do you use this in practice?
In every sscanf-like system I've used, there is a default value of some sort, either because variables are automatically initialized, or because the sscanf construct itself provides a default. You can always explicitly initialize them if you need to: spam = eggs = cheese = None f"{spam:d} {eggs:d} {cheese:d}" = "123 456" Oh look, not nearly as ugly as your strawman :)
I think having "Boom!" (an exception) when your pattern doesn't match is the right thing to do, at least by default. But if you want to get fancy, we can supply a missing value:
spam, eggs, cheese = sscanf("%d %d %d", "123 432", missing=None) assert spam == 123 assert eggs == 432 assert cheese is None
Sure, if that's what you want. The other thing that frequently comes up in my code, though, is something where the earlier matches indicate whether something else is needed. In that situation, the earlier entries - which would have been assigned to - control whether you even look at the later ones. There's no issues that way either. I don't think this is nearly as big a concern as you think, and it's way WAY easier to work with than a regex. Just because you don't want the partial assignment feature doesn't mean it isn't incredibly practical and useful in real-world situations :) ChrsiA
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 03:19:20AM +1100, Chris Angelico wrote:
In every sscanf-like system I've used, there is a default value of some sort, either because variables are automatically initialized, or because the sscanf construct itself provides a default.
Then it won't go "Boom!" as you said. It will just return the default. So your boom objection is neutralised, yay!
You can always explicitly initialize them if you need to:
spam = eggs = cheese = None f"{spam:d} {eggs:d} {cheese:d}" = "123 456"
Oh look, not nearly as ugly as your strawman :)
You still have to test for None. Perhaps not as awkward as try...except, but you still have to test each one. Hey, it's past my bed time. I didn't think of that. But I did think of this: # Check whether the pattern was matched and bound to a variable. if 'spam' in locals(): if 'eggs' in locals(): # etc -- Steve
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 3:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 21, 2020 at 03:19:20AM +1100, Chris Angelico wrote:
In every sscanf-like system I've used, there is a default value of some sort, either because variables are automatically initialized, or because the sscanf construct itself provides a default.
Then it won't go "Boom!" as you said. It will just return the default.
So your boom objection is neutralised, yay!
You can always explicitly initialize them if you need to:
spam = eggs = cheese = None f"{spam:d} {eggs:d} {cheese:d}" = "123 456"
Oh look, not nearly as ugly as your strawman :)
You still have to test for None. Perhaps not as awkward as try...except, but you still have to test each one.
Often you don't have to test for it. It's not uncommon for the default to be a real value (eg 0 when parsing for numbers - lots of file formats will work that way), so there's no testing to be done. Or you can just do "cheese or X" to handle (a) unassigned and (b) blank, which again is a common situation. ChrisA
![](https://secure.gravatar.com/avatar/92136170d43d61a5eeb6ea8784294aa2.jpg?s=120&d=mm&r=g)
On Tue, Oct 20, 2020 at 6:20 PM Chris Angelico <rosuav@gmail.com> wrote:
spam = eggs = cheese = None f"{spam:d} {eggs:d} {cheese:d}" = "123 456"
Wow! That sure looks like a strong anti-pattern. If some variable was assigned somewhere very distant in the program flow, a parser might succeed where it would otherwise fail! It's not technically spooky action at a distance, but it's troublingly close. In contrast a plain function has none of that problem behavior.
spam, eggs, cheese = sscanf("%d %d %d", "123 432", missing=None)
I wouldn't mind a version of this that was "inspired by" f-strings. But even there, the requirements are subtly different. E.g.: myvals = sscanf("{spam:d} {eggs:d} {cheese:d}", "123 432", missing=None) I can see a good idea of `myvals` being a dictionary not a tuple. But that's close to the color of the bikeshed. Just to repeat, a "scanf-string" just cannot be the same thing as an f-string. Here is a perfectly good f-string, similar to ones I use all the time: f"More than foo is {foo+1}" There's just no way to make f-strings into an assignment target... what is POSSIBLE is making "some subset of f-strings (yet to be precisely specified)" patterns for a scan. I hate the idea of putting that subset to the left of an equal sign. But defining a subset as an argument for a function sounds great. Actually, however, I think the scanf-string rules should be probably not strictly a subset, but rather a "strongly overlapping set that is mostly a subset but also has a few new things." -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 5:07 AM David Mertz <mertz@gnosis.cx> wrote:
On Tue, Oct 20, 2020 at 6:20 PM Chris Angelico <rosuav@gmail.com> wrote:
spam = eggs = cheese = None f"{spam:d} {eggs:d} {cheese:d}" = "123 456"
Wow! That sure looks like a strong anti-pattern.
If some variable was assigned somewhere very distant in the program flow, a parser might succeed where it would otherwise fail! It's not technically spooky action at a distance, but it's troublingly close.
Not sure what you mean. It either succeeds or fails based on the string it's given, but if it succeeds, it assigns only those that match. So if you want some or all of them to have default values, you assign them beforehand. It's only necessary if (a) you expect to have optional matches, and (b) you need a non-matching variable to have a value. If you want a variable to have a value, you assign it. It's that simple.
I wouldn't mind a version of this that was "inspired by" f-strings. But even there, the requirements are subtly different. E.g.:
myvals = sscanf("{spam:d} {eggs:d} {cheese:d}", "123 432", missing=None)
I can see a good idea of `myvals` being a dictionary not a tuple. But that's close to the color of the bikeshed.
Yes, but ONLY if we get dictionary unpacking. Otherwise, what's the point of any of this?
Just to repeat, a "scanf-string" just cannot be the same thing as an f-string. Here is a perfectly good f-string, similar to ones I use all the time:
f"More than foo is {foo+1}"
There's just no way to make f-strings into an assignment target... what is POSSIBLE is making "some subset of f-strings (yet to be precisely specified)" patterns for a scan.
*sigh* Scanning and printing are not the same. This has been repeated many times, yet STILL people object on the basis that they won't be the same. You can do this: x = [a + 1, b + 1] But you can't do this: [a + 1, b + 1] = x Does that mean that multiple assignment is broken? ChrisA
![](https://secure.gravatar.com/avatar/92136170d43d61a5eeb6ea8784294aa2.jpg?s=120&d=mm&r=g)
On Tue, Oct 20, 2020 at 8:41 PM Chris Angelico <rosuav@gmail.com> wrote:
f"{spam:d} {eggs:d} {cheese:d}" = "123 456"
Wow! That sure looks like a strong anti-pattern.
If some variable was assigned somewhere very distant in the program flow, a parser might succeed where it would otherwise fail! It's not technically spooky action at a distance, but it's troublingly close.
Not sure what you mean. It either succeeds or fails based on the string it's given, but if it succeeds, it assigns only those that match. So if you want some or all of them to have default values, you assign them beforehand.
So in your idea, after the above line runs, `cheese` will remain undefined?! That's a different anti-pattern than the one I thought, but equally bad. Notice that is a HUGE asymmetry with regular unpacking. I can never run: a, b, c = some_expression And wind up with a and b defined, but silently continue the program with c undefined. In your idea, is it the case that a line like yours above can simply NEVER fail, in the sense of raising exceptions? So maybe it runs and nothing is actually bound?! I hate that even more than what I previously thought the idea was.
Just to repeat, a "scanf-string" just cannot be the same thing as an f-string. Here is a perfectly good f-string, similar to ones I use all the time:
f"More than foo is {foo+1}" There's just no way to make f-strings into an assignment target... what is POSSIBLE is making "some subset of f-strings (yet to be precisely specified)" patterns for a scan.
Scanning and printing are not the same. This has been repeated many times, yet STILL people object on the basis that they won't be the same.
Well, "formatting" more generally, not only printing. But the fact they are different is EXACTLY the point I have tried to make a number of times. Trying to shoe-horn a "formatting string" into a role of a "scanning string" is exactly the problem. They are NOT the same. As I say, developing a "scanning mini-language" that is *inspired by* the formatting language sounds great. Trying to make the actual f-string do double-duty as a scanning mini-language is a terrible idea. That way lies Perl. It really does! Imagine trying to explain "What is an f-string?" in Python 3.11 under the proposal (I'm not sure which version is actually proposed, but under any of them). The answer becomes "Depending on context, the meaning and interpretation of the symbols is one of these several things." For the most part, Python tries hard to avoid "context sensitive grammar." (Yes, I know exceptions, modulo is definitely different from string interpolation, for example... although grammatically it's actually not, just an operator that does very different things with strings versus numbers). -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 6:06 AM David Mertz <mertz@gnosis.cx> wrote:
On Tue, Oct 20, 2020 at 8:41 PM Chris Angelico <rosuav@gmail.com> wrote:
f"{spam:d} {eggs:d} {cheese:d}" = "123 456"
Wow! That sure looks like a strong anti-pattern.
If some variable was assigned somewhere very distant in the program flow, a parser might succeed where it would otherwise fail! It's not technically spooky action at a distance, but it's troublingly close.
Not sure what you mean. It either succeeds or fails based on the string it's given, but if it succeeds, it assigns only those that match. So if you want some or all of them to have default values, you assign them beforehand.
So in your idea, after the above line runs, `cheese` will remain undefined?! That's a different anti-pattern than the one I thought, but equally bad.
Notice that is a HUGE asymmetry with regular unpacking. I can never run:
a, b, c = some_expression
And wind up with a and b defined, but silently continue the program with c undefined.
In your idea, is it the case that a line like yours above can simply NEVER fail, in the sense of raising exceptions? So maybe it runs and nothing is actually bound?! I hate that even more than what I previously thought the idea was.
No; it can fail if the pattern actually rejects it. For instance, a pattern of "a{x}b{y}c" can match the string "a" but won't assign to y, however it can't match the string "q" because that doesn't match. This behaviour of leaving variables unassigned happens ONLY if there's a partial match. And just like with a regex, it would be easy enough to have a notation that demands end-of-string, and cannot match unless you reach that point in the pattern right as you reach the end of the string.
Just to repeat, a "scanf-string" just cannot be the same thing as an f-string. Here is a perfectly good f-string, similar to ones I use all the time: f"More than foo is {foo+1}" There's just no way to make f-strings into an assignment target... what is POSSIBLE is making "some subset of f-strings (yet to be precisely specified)" patterns for a scan.
Scanning and printing are not the same. This has been repeated many times, yet STILL people object on the basis that they won't be the same.
Well, "formatting" more generally, not only printing. But the fact they are different is EXACTLY the point I have tried to make a number of times. Trying to shoe-horn a "formatting string" into a role of a "scanning string" is exactly the problem. They are NOT the same.
I said "print" because the comparison is between printf and scanf, but yes, formatting generally (sprintf isn't really "printing", but then again, Python's print function can print to anything too, and the pprint module has formatting functions that don't print).
As I say, developing a "scanning mini-language" that is *inspired by* the formatting language sounds great. Trying to make the actual f-string do double-duty as a scanning mini-language is a terrible idea. That way lies Perl.
Well... yes. And oddly enough, developing a scanning mini-language inspired by the formatting language is *exactly* what this proposal is, and has always been.
It really does! Imagine trying to explain "What is an f-string?" in Python 3.11 under the proposal (I'm not sure which version is actually proposed, but under any of them). The answer becomes "Depending on context, the meaning and interpretation of the symbols is one of these several things." For the most part, Python tries hard to avoid "context sensitive grammar." (Yes, I know exceptions, modulo is definitely different from string interpolation, for example... although grammatically it's actually not, just an operator that does very different things with strings versus numbers).
What is a comma? Please explain that to me with fewer convolutions than this. What is an assignment target? Are all of these the same? TARGET = thing TARGET += thing with thing as TARGET: for TARGET in thing: except ThingException as TARGET: import thing as TARGET Python already has many contexts in which something is slightly different. An f-string being assigned to isn't the same as an f-string being constructed. They are related, but distinct. In real, practical terms, what will happen is that the vast majority of pattern strings can be used identically on the RHS and LHS, and if you parse a string using a particular pattern and then format it back, it'll give back the same thing you started with. But that's not a guarantee, it can never be guaranteed, and we wouldn't want it to be that restricted anyway. ChrisA
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 06:17:27AM +1100, Chris Angelico wrote:
No; it can fail if the pattern actually rejects it. For instance, a pattern of "a{x}b{y}c" can match the string "a" but won't assign to y,
It won't assign to x either, if I'm understanding it correctly, since there is nothing following the 'a' that matches. Or will x get the empty string?
however it can't match the string "q" because that doesn't match. This behaviour of leaving variables unassigned happens ONLY if there's a partial match.
And just like with a regex, it would be easy enough to have a notation that demands end-of-string, and cannot match unless you reach that point in the pattern right as you reach the end of the string.
But you'll still get a partial match. Using regex syntax, if you try to match "spam {x} eggs$" (i.e. the prefix "spam", space, substring to be extracted, space, suffix "eggs", end of string) against "spam is great" you will get the partial match x="is". If you're trying to match the address "{num:d} Main Street {city}$" against "3.1415" you will get num=3. Maybe I'm missing something, but I don't see why this is desirable behaviour. [...]
Well... yes. And oddly enough, developing a scanning mini-language inspired by the formatting language is *exactly* what this proposal is, and has always been.
That's incorrect. The first post in this thread: https://mail.python.org/archives/list/python-ideas@python.org/thread/JEGSKOD... literally describes it as f-strings as the target. There's no hint that only a subset of f-string functionality will be accepted. Dennis says: "you can assign a string to an f-string" and proposes an exception: "ValueError: f-string assignment target does not match ..." I can't work out how to see messages in non-threaded date order on the website, but it looks to me like the first suggestion that this might not *literally* be f-strings comes about a dozen emails into the thread, and from you, not Dennis. The subject line doesn't say "Scanning mini-language inspired by f-strings", it says that f-strings should be allowed as an assignment target. I think that people can be forgiven for thinking that a thread that says it is about making f-strings assignment targets is about making f-strings assignment targets *wink* I think we have at least six mini-languages now? - regexes - string interpolation with % - template strings - format mini-language - f-strings - date/time format strings I think it would be quite an uphill battle to have a seventh built into the language. You're welcome to write a PEP :-) -- Steve
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 10:22 AM Steven D'Aprano <steve@pearwood.info> wrote:
Well... yes. And oddly enough, developing a scanning mini-language inspired by the formatting language is *exactly* what this proposal is, and has always been.
That's incorrect.
The first post in this thread:
https://mail.python.org/archives/list/python-ideas@python.org/thread/JEGSKOD...
literally describes it as f-strings as the target. There's no hint that only a subset of f-string functionality will be accepted. Dennis says:
"you can assign a string to an f-string"
and proposes an exception:
"ValueError: f-string assignment target does not match ..."
The first post was just talking about the syntactic concept, without any explanation of how an f-string match pattern would work. Once we got to actual concrete proposals, they were always about a scanning mini-language inspired by the formatting language. I'm really not sure what you're arguing here.
I think we have at least six mini-languages now?
- regexes - string interpolation with % - template strings - format mini-language - f-strings - date/time format strings
I think it would be quite an uphill battle to have a seventh built into the language. You're welcome to write a PEP :-)
If you're going to count date/time format strings, you should also count logging formats, and a ton of other things around the place. Mini-languages aren't a problem. They're just a compact form of structured data (or code, depending on your context). What's the issue here? Remember, this is *closely related to* an existing one, meaning that it won't have the cognitive load of a complete new system. ChrisA
![](https://secure.gravatar.com/avatar/92136170d43d61a5eeb6ea8784294aa2.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 1:22 AM Steven D'Aprano <steve@pearwood.info> wrote:
I think we have at least six mini-languages now? - regexes - string interpolation with % - template strings - format mini-language - f-strings - date/time format strings
We also have the struct mini-language for interpreting packed bytes. I feel like we're forgetting something else. :-) (not being coy, I'm not sure what it is off the cuff). -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
![](https://secure.gravatar.com/avatar/176220408ba450411279916772c36066.jpg?s=120&d=mm&r=g)
On Tue, Oct 20, 2020 at 12:14 PM David Mertz <mertz@gnosis.cx> wrote:
Well, "formatting" more generally, not only printing. But the fact they are different is EXACTLY the point I have tried to make a number of times. Trying to shoe-horn a "formatting string" into a role of a "scanning string" is exactly the problem. They are NOT the same.
As I say, developing a "scanning mini-language" that is *inspired by* the formatting language sounds great. Trying to make the actual f-string do double-duty as a scanning mini-language is a terrible idea. That way lies Perl.
Agreed! When this thread was started, I was strongly negative on the whole idea, because the formatting mini-language[*] seemed very poorly suited to be used as a scanning language. But I've been thinking about it more, and i think my first impression was because a very comon use is simple default string conversion: f"{x}, {y}, {z}" And that is not very useful as a scanning pattern -- what types do you want x,y, and z to be? how do you handle whitespace? But if we use a subset of the format specifiers, it starts to look pretty reasonable: x, y, z = "{:2d}, {:f}, {:10s}".scan("12, 32.4, Fred Jones") results in: x == 12 y == 32.4 z == "Fred Jones" Some careful thinking about whitespace would have to be done, but this could be pretty nice. Now that I think about it -- some must have a version of this on PyPi :-) As for the question of do we need a scanning language at all? We already have pretty full features string methods, and regex for the complex stuff. I think yes -- for both simplicity for the simple stuff (the easy stuff should be easy) and performance. The fact is that while it's pretty easy to write a simple text file parser in Python with the usual string methods (I've done a LOT of that) -- it is code to write, and it's pretty darn slow. The scipy community has a lot of need for fast and easy parsing of text files, and that is met by numpy's loadtext() and genfromtxt(), and more recently Pandas' CSV reader. All written in C for speed. But these only handle fairly "ordinary" files, variations of CSV, which is indeed extremely common, but not universal. I happen to have the need for fast reading of text files that are not really CSV like, so I wrote, years ago, a C module for fast scanning of numbers for text files: essentially a wrapper around fscanf() -- it was orders of magnitude faster than using pure python and string methods, and also easier to write the code. But it's not all that flexible, 'cause I only wrote it to do what I needed: read the next N floats from the file. So a build-in, C-speed simple text parser would be very nice. And using a language inspired by the formatting mini-language would also be nice -- to make it all more familiar to Python users. Eric Smith wrote (in a later message):
So first we should spec out how my super_scanf function would work, and how it would figure out the values (and especially their types) to return. I think this is the weakest, most hand-wavy part of any proposal being discussed here, and it needs to be solved as a prerequisite for the version that creates locals. And the beauty is that it could be written today, in pure Python.
Totally agree here, except for one thing -- I don't think we want/need a "super" scanf -- we need a simple one. I don't think there is any need at all to be able to construct an arbitrary type. Or probably even some basic built-ins like tuples and lists (though you may be able to do that). This "epiphany" is what brought me around to the idea -- the formating system is very powerful and flexible -- it is essentially impossible to make it reversible. But that doesn't mean we can't borrow some of the same syntax for a scanning language. [*] I actually think f-strings are pretty much irrelevant here -- I don't want the variable names assigned to be buried in the string -- that makes it far less usable as a general scanner, where the scanning string may be generated far from where it's used. But fstrings and .format() use the same formatting language -- and that consistency is nice. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020 at 11:07:28AM -0700, Christopher Barker wrote:
As for the question of do we need a scanning language at all? We already have pretty full features string methods, and regex for the complex stuff.
I think yes -- for both simplicity for the simple stuff (the easy stuff should be easy) and performance. The fact is that while it's pretty easy to write a simple text file parser in Python with the usual string methods (I've done a LOT of that) -- it is code to write, and it's pretty darn slow.
I concur with your reasoning here. We have regexes for heavy duty string parsing, and third-party libraries for writing full-blown parsers or arbitrary complexity. We have Python string methods that can be used to parse strings, but beyond the simplest cases it soon becomes awkward and slow. There's a middle ground of text parsing tasks that would seem to be a good match for some sort of scanner, inspired by C's scanf, whether it uses % or {} format codes.
[*] I actually think f-strings are pretty much irrelevant here -- I don't want the variable names assigned to be buried in the string -- that makes it far less usable as a general scanner, where the scanning string may be generated far from where it's used.
Indeed. As I pointed out back in September: https://mail.python.org/archives/list/python-ideas@python.org/message/LNLCYR... having the template string built up separately from where it is applied to scanning is an important feature. -- Steve
![](https://secure.gravatar.com/avatar/92136170d43d61a5eeb6ea8784294aa2.jpg?s=120&d=mm&r=g)
On Wed, Oct 21, 2020, 7:21 PM Steven D'Aprano
There's a middle ground of text parsing tasks that would seem to be a good match for some sort of scanner, inspired by C's scanf, whether it uses % or {} format codes.
Maybe COBOL PICTURE clauses. Admittedly, I've never used COBOL, but I thought that one feature looks really good.
participants (5)
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Paul Moore
-
Steven D'Aprano