On Tue, Oct 20, 2020 at 12:14 PM David Mertz <mertz@gnosis.cx> wrote:
Well, "formatting" more generally, not only printing.  But the fact they are different is EXACTLY the point I have tried to make a number of times.  Trying to shoe-horn a "formatting string" into a role of a "scanning string" is exactly the problem.  They are NOT the same.

As I say, developing a "scanning mini-language" that is *inspired by* the formatting language sounds great.  Trying to make the actual f-string do double-duty as a scanning mini-language is a terrible idea.  That way lies Perl.

Agreed! When this thread was started, I was strongly negative on the whole idea, because the formatting mini-language[*] seemed very poorly suited to be used as a scanning language. But I've been thinking about it more, and i think my first impression was because a very comon use is simple default string conversion:

f"{x}, {y}, {z}"

And that is not very useful as a scanning pattern -- what types do you want x,y, and z to be? how do you handle whitespace? 

But if we use a subset of the format specifiers, it starts to look pretty reasonable:

x, y, z = "{:2d}, {:f}, {:10s}".scan("12, 32.4, Fred Jones")

results in:

x == 12
y == 32.4
z == "Fred Jones"

Some careful thinking about whitespace would have to be done, but this could be pretty nice.

Now that I think about it -- some must have a version of this on PyPi :-)

As for the question of do we need a scanning language at all? We already have pretty full features string methods, and regex for the complex stuff.

I think yes -- for both simplicity for the simple stuff (the easy stuff should be easy) and performance. The fact is that while it's pretty easy to write a simple text file parser in Python with the usual string methods (I've done a LOT of that) -- it is code to write, and it's pretty darn slow.

The scipy community has a lot of need for fast and easy parsing of text files, and that is met by numpy's loadtext() and genfromtxt(), and more recently Pandas' CSV reader. All written in C for speed. But these only handle fairly "ordinary" files, variations of CSV, which is indeed extremely common, but not universal. 

I happen to have the need for fast reading of text files that are not really CSV like, so I wrote, years ago, a C module for fast scanning of numbers for text files: essentially a wrapper around fscanf() -- it was orders of magnitude faster than using pure python and string methods, and also easier to write the code. But it's not all that flexible, 'cause I only wrote it to do what I needed: read the next N floats from the file.

So a build-in, C-speed simple text parser would be very nice.

And using a language inspired by the formatting mini-language would also be nice -- to make it all more familiar to Python users.

Eric Smith wrote (in a later message):
So first we should spec out how my super_scanf function would work, and
how it would figure out the values (and especially their types) to
return. I think this is the weakest, most hand-wavy part of any proposal
being discussed here, and it needs to be solved as a prerequisite for
the version that creates locals. And the beauty is that it could be
written today, in pure Python.

Totally agree here, except for one thing -- I don't think we want/need a "super" scanf -- we need a simple one. I don't think there is any need at all to be able to construct an arbitrary type. Or probably even some basic built-ins like tuples and lists (though you may be able to do that). This "epiphany" is what brought me around to the idea -- the formating system is very powerful and flexible -- it is essentially impossible to make it reversible. But that doesn't mean we can't borrow some of the same syntax for a scanning language.

[*] I actually think f-strings are pretty much irrelevant here -- I don't want the variable names assigned to be buried in the string -- that makes it far less usable as a general scanner, where the scanning string may be generated far from where it's used. But fstrings and .format() use the same formatting language -- and that consistency is nice.


-CHB


--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython