On 2020-10-22 08:50, M.-A. Lemburg wrote:
On 22.10.2020 04:12, David Mertz wrote:
To bring it back to a concrete idea, here's how I see things:
1. The idea of f-string-like assignment targets has little support. Only Chris, and maybe the OP who seems to have gone away. 2. The idea of a "scanning language" seems to garner a fair amount of enthusiasm from everyone who has commented. 3. Having the scanning language be "inspired by" f-strings seems to fit nicely with Python 4. Lots of folks like C scanf() as another inspiration for the need. I was not being sarcastic in saying that I thought COBOL PICTURE clauses are another similar useful case. I think Perl 6 "rules" were trying to do something along those lines... but, well, Perl. 5. In my opinion, this is naturally a function, or several related functions, not new syntax (I think Steven agrees)
So the question is, what should the scanning language look like? Another question is: "Does this already exist?"
I'm looking around PyPI, and I see this that looks vaguely along the same lines. But most likely I am missing things: https://pypi.org/project/rebulk/
In terms of API, assuming functions, I think there are two basic models. We could have two (or more) functions that were related though:
# E.g. pat_with_names = "{foo:f}/{bar:4s}/{baz:3d}" matches = scan_to_obj(pat_with_names, haystack) # something like (different match objects are possible choices, dict, dataclass, etc) print(matches.foo) print(maches['bar'])
Alternately:
# pat_only = "{:f}/{:4s}/{:3d}" foo, bar, baz = scan_to_tuple(pat_only, haystack) # names, if bound, have the types indicated by scanning language
There are questions open about partial matching, defaults, exceptions to raise, etc. But the general utility of something along those lines seems roughly consensus.
I like this idea :-)
There are lots of use cases where regular expressions + subsequent type conversion are just overkill for a small parsing task.
The above would fit this space quite nicely, esp. since it already comes with a set of typical format you have to parse, without having to worry about the nitty details (as you have to do with REs) or the type conversion from string to e.g. float.
One limitation is that only a few types would supported: 's' for str, 'd' or 'x' for int, 'f' for float. But what if you wanted to scan to a Decimal instead of a float, or scan a date? A date could be formatted any number of ways! So perhaps the scanning format should also let you specify the target type. For example, given "{?datetime:%H:%M}", it would look up the pre-registered name "datetime" to get a scanner; the scanner would be given the format, the string and the position and would return the value and the new position. I used '?' in the scan format to distinguish it from a format string. It might even be possible to use the same format for both formatting and scanning. For example, given "{?datetime:%H:%M}", string formatting would just ignore the "?datetime" part.