Maybe it would be best to talk about actual code?

What exceptions will be raised?

Is there a mismatch between the string format specification and regex?

Why would it be suboptimal to specify types for regex match groups within regex strings?

Is it any better to specify regexes with f-strings?

locals().update(**kwargs) doesn't work for a reason; but I don't remember what that reason is?

More test cases within in the pytest.mark.parametrize here might elucidate the situation:

```python

import re

def test_regex_comprehension():
rgx = re.compile("(\d{2})(\d{2})(\w{2})")
teststr = "2345fg"
assert not rgx.match(teststr).groupdict()
assert rgx.match(teststr).groups() == ("23", "45", "fg")

rgx = re.compile("(?P<a>\d{2})(?P<b>\d{2})(?P<c>\w{2})")
assert rgx.match(teststr).groups() == ("23", "45", "fg")
assert rgx.match(teststr).groupdict() == dict(a="23", b="45", c="fg")

from types import SimpleNamespace

mdo = matchdictobj = SimpleNamespace(**rgx.match(teststr).groupdict())
assert mdo.a == "23"
assert mdo.b == "45"
assert mdo.c == "fg"

def cast_match_groupdict(matchobj, typemap):
matchdict = matchobj.groupdict()
if not typemap:
return matchdict
for attr, castfunc in typemap.items():
try:
matchdict[attr] = castfunc(matchdict[attr])
except ValueError as e:
raise ValueError(("attr", attr), ("rgx", matchobj.re)) from e
return matchdict

import pytest

def test_cast_match_groupdict():
rgx = re.compile("(?P<a>\d{2})(?P<b>\d{2})(?P<c>\w{2})")
teststr = "2345fg"
matchobj = rgx.match(teststr)

with pytest.raises(ValueError):
typemap = dict(a=int, b=int, c=int)
cast_match_groupdict(matchobj, typemap)

typemap = dict(a=int, b=int, c=str)
output = cast_match_groupdict(matchobj, typemap)
assert output == dict(a=23, b=45, c="fg")

from typing import Tuple

def generate_regex_and_typemap_from_fstring(fstr) -> Tuple[str, dict]:
# raise NotImplemented
if fstr == "{a}{b}{c}":
return (r"".join(rf"(?P<{name}>.*?)" for name in "abc"), None)
elif fstr == "{a:d}{b:d}{c:d}":
return (
r"".join(rf"(?P<{name}>.*?)" for name in "abc"),
dict(a=int, b=int, c=int),
)
elif fstr == "{a:d}{b:d}{c:s}":
return (
r"".join(rf"(?P<{name}>.*?)" for name in "abc"),
dict(a=int, b=int, c=str),
)
else:
raise NotImplementedError(("fstr", fstr))

def do_fstring_regex_magic(fstrpattern, string):
rgxstr, typemap = generate_regex_and_typemap_from_fstring(fstrpattern)
rgx = re.compile(rgxstr)
matchobj = rgx.match(string)
try:
output = cast_match_groupdict(matchobj, typemap)
# update_locals() # XXX: how to test this?
return output
except ValueError as e:
raise ValueError(locals()) from e

@pytest.mark.parametrize(
"fstrpattern,string,exceptions,expoutput",
[
("{a}{b}{c}", "2345fg", None, dict(a="23", b="45", c="fg")),
("{a:d}{b:d}{c:d}", "2345fg", ValueError, None),
("{a:d}{b:d}{c:s}", "2345fg", None, dict(a="23", b="45", c="fg")),
],
)
def test_do_fstring_regex_magic(fstrpattern, string, exceptions, expoutput):
if exceptions:
with pytest.raises(exceptions):
do_fstring_regex_magic(fstrpattern, string)
else:
output = do_fstring_regex_magic(fstrpattern, string)
assert output == expoutput

def update_locals(**kwargs):
raise NotImplemented
locals().update(**kwargs)

```

On Thu, Sep 17, 2020 at 4:23 PM Chris Angelico <rosuav@gmail.com> wrote:

On Fri, Sep 18, 2020 at 6:19 AM Christopher Barker <pythonchb@gmail.com> wrote:
>
> I like the idea of an scans like ability, but I"m afraid the existing format language is poorly suited to that.
>
> It's simply no designed to be a two-way street:
>
> * ANYTHING can be stringified in Python -- so there is no defined way to turn a string back into a particular type.
>
> OK, so we restrict ourselves to builtins -- how would you reverse this:
>
> In [17]: x, y, z = 23, 45, 67
>
> In [18]: f"{x}{y}{z}"
> Out[18]: '234567'
>
> so we require a type specifier, but similar problem:
>
> In [19]: f"{x:d}{y:d}{z:d}"
> Out[19]: '234567'
>
> So we require a field size specifier:
>
> In [20]: f"{x:2d}{y:2d}{z:2d}"
> Out[20]: '234567'
>
> OK, I guess that is clearly defined. but we've now limited ourselves to a very small subset of the formatting language -- maybe it's not the right language for the job?
>

And that's why the directives are NOT just a pure mirroring of format
string directives. Look at C's scanf and printf functions - they
correspond in many ways, but they differ in order to be useful. The
point isn't to reverse format(), the point is to have a useful and
practical string parser that assigns directly to variables.

Also, PEP 622.

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/A67UEO6FONE6H5PTYCJCP5B5AXBMZSD3/
Code of Conduct: http://python.org/psf/codeofconduct/