Maybe it would be best to talk about actual code? What exceptions will be raised? Is there a mismatch between the string format specification and regex? Why would it be suboptimal to specify types for regex match groups within regex strings? Is it any better to specify regexes with f-strings? locals().update(**kwargs) doesn't work for a reason; but I don't remember what that reason is? More test cases within in the pytest.mark.parametrize here might elucidate the situation: ```python import re def test_regex_comprehension(): rgx = re.compile("(\d{2})(\d{2})(\w{2})") teststr = "2345fg" assert not rgx.match(teststr).groupdict() assert rgx.match(teststr).groups() == ("23", "45", "fg") rgx = re.compile("(?P<a>\d{2})(?P<b>\d{2})(?P<c>\w{2})") assert rgx.match(teststr).groups() == ("23", "45", "fg") assert rgx.match(teststr).groupdict() == dict(a="23", b="45", c="fg") from types import SimpleNamespace mdo = matchdictobj = SimpleNamespace(**rgx.match(teststr).groupdict()) assert mdo.a == "23" assert mdo.b == "45" assert mdo.c == "fg" def cast_match_groupdict(matchobj, typemap): matchdict = matchobj.groupdict() if not typemap: return matchdict for attr, castfunc in typemap.items(): try: matchdict[attr] = castfunc(matchdict[attr]) except ValueError as e: raise ValueError(("attr", attr), ("rgx", matchobj.re)) from e return matchdict import pytest def test_cast_match_groupdict(): rgx = re.compile("(?P<a>\d{2})(?P<b>\d{2})(?P<c>\w{2})") teststr = "2345fg" matchobj = rgx.match(teststr) with pytest.raises(ValueError): typemap = dict(a=int, b=int, c=int) cast_match_groupdict(matchobj, typemap) typemap = dict(a=int, b=int, c=str) output = cast_match_groupdict(matchobj, typemap) assert output == dict(a=23, b=45, c="fg") from typing import Tuple def generate_regex_and_typemap_from_fstring(fstr) -> Tuple[str, dict]: # raise NotImplemented if fstr == "{a}{b}{c}": return (r"".join(rf"(?P<{name}>.*?)" for name in "abc"), None) elif fstr == "{a:d}{b:d}{c:d}": return ( r"".join(rf"(?P<{name}>.*?)" for name in "abc"), dict(a=int, b=int, c=int), ) elif fstr == "{a:d}{b:d}{c:s}": return ( r"".join(rf"(?P<{name}>.*?)" for name in "abc"), dict(a=int, b=int, c=str), ) else: raise NotImplementedError(("fstr", fstr)) def do_fstring_regex_magic(fstrpattern, string): rgxstr, typemap = generate_regex_and_typemap_from_fstring(fstrpattern) rgx = re.compile(rgxstr) matchobj = rgx.match(string) try: output = cast_match_groupdict(matchobj, typemap) # update_locals() # XXX: how to test this? return output except ValueError as e: raise ValueError(locals()) from e @pytest.mark.parametrize( "fstrpattern,string,exceptions,expoutput", [ ("{a}{b}{c}", "2345fg", None, dict(a="23", b="45", c="fg")), ("{a:d}{b:d}{c:d}", "2345fg", ValueError, None), ("{a:d}{b:d}{c:s}", "2345fg", None, dict(a="23", b="45", c="fg")), ], ) def test_do_fstring_regex_magic(fstrpattern, string, exceptions, expoutput): if exceptions: with pytest.raises(exceptions): do_fstring_regex_magic(fstrpattern, string) else: output = do_fstring_regex_magic(fstrpattern, string) assert output == expoutput def update_locals(**kwargs): raise NotImplemented locals().update(**kwargs) ``` On Thu, Sep 17, 2020 at 4:23 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 6:19 AM Christopher Barker <pythonchb@gmail.com> wrote:
I like the idea of an scans like ability, but I"m afraid the existing
format language is poorly suited to that.
It's simply no designed to be a two-way street:
* ANYTHING can be stringified in Python -- so there is no defined way to
turn a string back into a particular type.
OK, so we restrict ourselves to builtins -- how would you reverse this:
In [17]: x, y, z = 23, 45, 67
In [18]: f"{x}{y}{z}" Out[18]: '234567'
so we require a type specifier, but similar problem:
In [19]: f"{x:d}{y:d}{z:d}" Out[19]: '234567'
So we require a field size specifier:
In [20]: f"{x:2d}{y:2d}{z:2d}" Out[20]: '234567'
OK, I guess that is clearly defined. but we've now limited ourselves to
a very small subset of the formatting language -- maybe it's not the right language for the job?
And that's why the directives are NOT just a pure mirroring of format string directives. Look at C's scanf and printf functions - they correspond in many ways, but they differ in order to be useful. The point isn't to reverse format(), the point is to have a useful and practical string parser that assigns directly to variables.
Also, PEP 622.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/A67UEO... Code of Conduct: http://python.org/psf/codeofconduct/