Currently format strings (and f-string expressions) support three conversions: !s -- str, !r -- repr and !a for ascii. I propose to add support of additional conversions: for int, float and operator.index. It will help to convert automatically printf-like format strings to f-string expressions: %d, %i, %u -- use int, %f -- use float, %o, %x -- use operator.index. For float the conversion letter is obvious -- !f. But I am not sure for what !i should be used, for int or operator.index. If make it operator.index, then !d perhaps should be used for int. If make it int, then !I perhaps should be used for operator.index. Or vice verse? Also I propose to support applying multiple conversions for the same item. It is common when you output a path or URL object as quoted string with all escaping, because in general it can contain special or non-printable characters. Currently I write f"path = {repr(str(path))}" or f"path = {str(path)!r}", but want to write f"path = {path!s!r}". Do we need support of more standard conversions? Do we want to support custom conversions (registered globally as encodings and error handlers). re.escape, html.escape and shlex.quote could be very useful in some applications.
On Wed, 21 Apr 2021 at 09:01, Serhiy Storchaka <storchaka@gmail.com> wrote:
Currently format strings (and f-string expressions) support three conversions: !s -- str, !r -- repr and !a for ascii.
I propose to add support of additional conversions: for int, float and operator.index. It will help to convert automatically printf-like format strings to f-string expressions: %d, %i, %u -- use int, %f -- use float, %o, %x -- use operator.index.
I've never had any particular need for these, but I can see that they would be logical additions.
For float the conversion letter is obvious -- !f. But I am not sure for what !i should be used, for int or operator.index. If make it operator.index, then !d perhaps should be used for int. If make it int, then !I perhaps should be used for operator.index. Or vice verse?
I don't have a particularly strong opinion here, other than to say I'm not sure I like the upper case "I". It looks far too much like a lower case "L" in the font I'm using here, which makes me think of C's "long", so it's easy to confuse. So of the two options, I prefer !f, !d, !i over !f, !i, !I.
Also I propose to support applying multiple conversions for the same item. It is common when you output a path or URL object as quoted string with all escaping, because in general it can contain special or non-printable characters. Currently I write f"path = {repr(str(path))}" or f"path = {str(path)!r}", but want to write f"path = {path!s!r}".
This, I would definitely use. I use f"path = {str(path)!r}" quite a lot, and being able to reduce that to f"{path=!s!r}" would be really convenient for debugging (even if it does look a bit like a string of magic characters at first glance).
Do we need support of more standard conversions? Do we want to support custom conversions (registered globally as encodings and error handlers). re.escape, html.escape and shlex.quote could be very useful in some applications.
That appeals to me just because I like generic features in general, but I'm not sure there are sufficient benefits to justify the complexity for what would basically be a small convenience over calling the function directly. Paul
On 21Apr2021 10:14, Paul Moore <p.f.moore@gmail.com> wrote:
On Wed, 21 Apr 2021 at 09:01, Serhiy Storchaka <storchaka@gmail.com> wrote:
Do we need support of more standard conversions? Do we want to support custom conversions (registered globally as encodings and error handlers). re.escape, html.escape and shlex.quote could be very useful in some applications.
That appeals to me just because I like generic features in general, but I'm not sure there are sufficient benefits to justify the complexity for what would basically be a small convenience over calling the function directly.
I frequently use str.format and str.format_map to produce strings from formats provided as configuration; in those methods I can't use arbitrary functions (injection, anyone?) For example: fstags ls -o format-string-here ... to produce a particular listing format. I would _frequently_ like to be able to provide custom conversions. At present I'm using elaborate hacks based on __getattr__ etc to recognise things like this: '{x} is {x_lc} in lowercase' where the _lc suffix is caught and a value computed from "x". Custom conversions would let me use this: '{x} is {x!lc} in lowercase' just by registering 'lc' as a conversion from my code. Chaining then per Serhiy's other suggestion would bring a fair amount of power. Cheers, Cameron Simpson <cs@cskk.id.au>
But IIUC once a conversion was added, it could not be removed without breaking backward compatibility. So I think we should be cautious about building what is in effect a Domain Specific Language and take care to get it (as) right (as possible) first time. Rob Cliffe On 21/04/2021 22:44, Cameron Simpson wrote:
On Wed, 21 Apr 2021 at 09:01, Serhiy Storchaka <storchaka@gmail.com> wrote:
Do we need support of more standard conversions? Do we want to support custom conversions (registered globally as encodings and error handlers). re.escape, html.escape and shlex.quote could be very useful in some applications. That appeals to me just because I like generic features in general, but I'm not sure there are sufficient benefits to justify the complexity for what would basically be a small convenience over calling the function directly. I frequently use str.format and str.format_map to produce strings from
On 21Apr2021 10:14, Paul Moore <p.f.moore@gmail.com> wrote: formats provided as configuration; in those methods I can't use arbitrary functions (injection, anyone?) For example:
fstags ls -o format-string-here ...
to produce a particular listing format.
I would _frequently_ like to be able to provide custom conversions. At present I'm using elaborate hacks based on __getattr__ etc to recognise things like this:
'{x} is {x_lc} in lowercase'
where the _lc suffix is caught and a value computed from "x".
Custom conversions would let me use this:
'{x} is {x!lc} in lowercase'
just by registering 'lc' as a conversion from my code. Chaining then per Serhiy's other suggestion would bring a fair amount of power.
Cheers, Cameron Simpson <cs@cskk.id.au> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7NWRKA... Code of Conduct: http://python.org/psf/codeofconduct/
Cameron Simpson writes:
I would _frequently_ like to be able to provide custom conversions. At present I'm using elaborate hacks based on __getattr__ etc to recognise things like this:
'{x} is {x_lc} in lowercase'
where the _lc suffix is caught and a value computed from "x".
Custom conversions would let me use this:
'{x} is {x!lc} in lowercase'
I don't understand how this is supposed to work. It looks to me like !code is a preprocessor: >>> print(f'{1!a:g}') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Unknown format code 'g' for object of type 'str' If so, '{x} is {x!lc:foo} in lowercase' will fail because str doesn't implement the 'foo' format code. Do we really need to extend format() rather than using def lc(x): return str(x).lower() '{x} is {lc(x)} in lowercase' ?
On Fri, Apr 23, 2021 at 7:26 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Cameron Simpson writes:
I would _frequently_ like to be able to provide custom conversions. At present I'm using elaborate hacks based on __getattr__ etc to recognise things like this:
'{x} is {x_lc} in lowercase'
where the _lc suffix is caught and a value computed from "x".
Custom conversions would let me use this:
'{x} is {x!lc} in lowercase'
I don't understand how this is supposed to work. It looks to me like !code is a preprocessor:
>>> print(f'{1!a:g}') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Unknown format code 'g' for object of type 'str'
If so,
'{x} is {x!lc:foo} in lowercase'
will fail because str doesn't implement the 'foo' format code. Do we really need to extend format() rather than using
def lc(x): return str(x).lower()
'{x} is {lc(x)} in lowercase'
That works in an f-string but not in str.format(), so it's not i18n-compatible. I'm sympathetic to the plea for conversions, but I'm not sure that this is the best way to do it. What WOULD work, though, is actual attributes. I'm not sure why x_lc is a thing, but it would certainly work as x.lc - though not as x.lower(), since method calls aren't supported. So the built-in str type still won't work here. ChrisA
On 23Apr2021 18:25, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Cameron Simpson writes:
I would _frequently_ like to be able to provide custom conversions. At present I'm using elaborate hacks based on __getattr__ etc to recognise things like this:
'{x} is {x_lc} in lowercase'
where the _lc suffix is caught and a value computed from "x".
Custom conversions would let me use this:
'{x} is {x!lc} in lowercase'
I don't understand how this is supposed to work. It looks to me like !code is a preprocessor: [...] If so,
'{x} is {x!lc:foo} in lowercase'
will fail because str doesn't implement the 'foo' format code.
Maybe we're talking about different things. In the example above, I'm talking about "lc", not "foo".
Do we really need to extend format() rather than using
def lc(x): return str(x).lower()
'{x} is {lc(x)} in lowercase'
I'm not talking about f'' strings, but str.format_map. They use the same {} placeholder syntax, but for obvious injection related reasons you can't call an arbitrary function from str.format_map. (You can play games with magic __getattr__ based attributes, like: {z.some_attr} My use case is presupplied strings, eg a command line supplied format string. So _not_ an f'blah {lc(x)}' string, but a plain string using format_map. I'm enjoying using Python style string formats, and who wants to write yet another macro syntax? Contrast: Python 3.9.2 (default, Feb 24 2021, 13:30:36) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> def lc(x): return x.lower() ... >>> z='ABC' >>> f'{lc(z)}' 'abc' >>> fmt='{lc(z)}' >>> fmt.format_map({z:"ABC"}) Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'lc(z)' If I could provide additional formatters to format_map to extend beyond 'r' for repr and friends, I could go: fmt.format_map({z:"ABC"}, conversion_map={'lc':lambda str.lower}) and use: fmt = '{z!lc}' Cheers, Cameron Simpson <cs@cskk.id.au>
Cameron Simpson writes:
On 23Apr2021 18:25, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
I don't understand how this is supposed to work. It looks to me like !code is a preprocessor: [...] If so,
'{x} is {x!lc:foo} in lowercase'
will fail because str doesn't implement the 'foo' format code.
Maybe we're talking about different things. In the example above, I'm talking about "lc", not "foo".
I don't think so. I know you want to talk only about "lc", but I want to understand how this interacts with "foo", and why you can't use "foo" in your application. My basic issue with these proposals is that the existing !conversions are basically syntactic sugar for formats that could have been implemented as format codes. Like the debugging '=' code in f-strings, they are there because they're facilities that are commonly desired by programmers for introspecting the program, regardless of the type of object. I suppose the designers of the feature wanted to avoid *mandating* any interpretation of the format string, though they did provide a standard syntax for convenience. In fact I think that Serhiy's use case (casting numbers to a more appropriate type for the desired format) is more plausible than the !a and !r conversions.
My use case is presupplied strings, eg a command line supplied format string.
In that case the format string is user input, and x is a variable in the program that the user can have substituted into their string? Assuming that *exact* use case, wouldn't >>> class LowerableStr(str): ... def __format__(self, fmt): ... if fmt == 'lc': ... return self.lower() ... else: ... return str.__format__(self, fmt) ... >>> "{x} is {x:lc} in lowercase".format_map({'x' : LowerableStr("This")}) 'This is this in lowercase' do? For generic x you'd have to do something a bit more complicated, but the basic strategy would be the same. Similarly, it shouldn't be hard to design a fairly generic wrapper class that allows you to map format codes to arbitrary functions. Certainly allowing an arbitrary map of preprocessers that can be invoked with "{x!key}" is more aesthetic and elegant in your source. But functionally, it seems to me it's just duplicative of the __format__ mechanism, and so it probably not a great idea. Steve
On Sat, Apr 24, 2021 at 11:36 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Assuming that *exact* use case, wouldn't
>>> class LowerableStr(str): ... def __format__(self, fmt): ... if fmt == 'lc': ... return self.lower() ... else: ... return str.__format__(self, fmt) ... >>> "{x} is {x:lc} in lowercase".format_map({'x' : LowerableStr("This")}) 'This is this in lowercase'
do? For generic x you'd have to do something a bit more complicated, but the basic strategy would be the same. Similarly, it shouldn't be hard to design a fairly generic wrapper class that allows you to map format codes to arbitrary functions.
Now also add in that you will need uppercase and titlecase, and then allow actual formatting of the string as well as case selection. Then make sure that every message you format can have every piece of text supporting this, since you'll never know exactly what parts of what messages may need this sort of manipulation. It won't look nearly as simple any more. ChrisA
On 25Apr2021 01:01, Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Apr 24, 2021 at 11:36 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Assuming that *exact* use case, wouldn't
>>> class LowerableStr(str): ... def __format__(self, fmt): ... if fmt == 'lc': ... return self.lower() ... else: ... return str.__format__(self, fmt) ... >>> "{x} is {x:lc} in lowercase".format_map({'x' : LowerableStr("This")}) 'This is this in lowercase'
do? For generic x you'd have to do something a bit more complicated, but the basic strategy would be the same. Similarly, it shouldn't be hard to design a fairly generic wrapper class that allows you to map format codes to arbitrary functions.
Now also add in that you will need uppercase and titlecase, and then allow actual formatting of the string as well as case selection.
Yes but those are just red herrings. To me that translates as "you need to provide the conversions you want to support". I suppose a mixin would let me present the suite my application wanted.
Then make sure that every message you format can have every piece of text supporting this, since you'll never know exactly what parts of what messages may need this sort of manipulation. It won't look nearly as simple any more.
That's there the __format__ side caused me trouble - everything wanting ":foo" needs a __format__ method. Whereas "!foo" and an augumentation to str.format_map to add conversions means I don't have to magic the values being used in the format string. See my adjacent much longer post. Maybe I'm trying to say that "!foo" would benefit similar extensibilty as ":foo" already has. The former is for presenting arbitrary values, and the latter is for presenting particular types of values i.e. outside the class vs inside the class. Cheers, Cameron Simpson <cs@cskk.id.au>
On 24Apr2021 22:35, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Cameron Simpson writes:
On 23Apr2021 18:25, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
I don't understand how this is supposed to work. It looks to me like !code is a preprocessor: [...] If so,
'{x} is {x!lc:foo} in lowercase'
will fail because str doesn't implement the 'foo' format code.
Maybe we're talking about different things. In the example above, I'm talking about "lc", not "foo".
I don't think so. I know you want to talk only about "lc", but I want to understand how this interacts with "foo", and why you can't use "foo" in your application. [...]
My use case is presupplied strings, eg a command line supplied format string.
In that case the format string is user input, and x is a variable in the program that the user can have substituted into their string?
Assuming that *exact* use case, wouldn't
class LowerableStr(str): ... def __format__(self, fmt): ... if fmt == 'lc': ... return self.lower() ... else: ... return str.__format__(self, fmt) ... "{x} is {x:lc} in lowercase".format_map({'x' : LowerableStr("This")}) 'This is this in lowercase'
do?
You're perfectly correct. ":lc" can be shoehorned into doing what I ask. But __format__ is in the wrong place for how I'd like to do this. First up, I have somehow missed this (":format_name") in the semirecursive mess which is the python format-and-friends descriptions. (object.__format__? str.format? str.formap_map? f''? the format mini-language? all in separate places, for reasonable reasons, but my head has exploded multiple times trying to stitch them together). On reflection, __format__ is something I may already have considered and rejected. Let me explain. The "!r" conversion is applied _by_ the formatting, _to_ an arbitrary value. So I could write a single function for my hypothetical "!lc" which did what I want for an arbitrary object, because it would be called with the value being converted. By contrast, the ":lc" format specifier form requires the value being formatted _itself_ to have a special __format__ method. This scales poorly when I might put almost anything into the format string. I'm not speaking here of allowing an end user to inject arbitrary access code into my programme via a format string, but that the format strings I'm using via .format_map(mapping) are given a mapping which is a pretty rich view of an almost arbitrary data structure I've made available for display via the format string. In case you care, my primary use case is a tag library _with_ an ontology, where tag values can be arbitrary Python values. (In reality, those values need to be JSON renderable just now, since they land in text files or database JSON blobs when persisted.) Anyway, the format string lets me write formats like: track {track_id} has artist {artist._meta.fullname} which hops off through the ontology, or: # from a config or command line or default # I like lowercased filenames a lot filename_format = '{artist_lc}--{album_lc}--{track_id}--{title_lc}.mp3' with open(filename_format.format_map(tagset.ns()), 'wb'): ... write file data ... The filename_format is the example where I want some kind of "lc" conversion/formatting to apply to an arbitrary value. In the code above, tagset.ns() returns a magic subclass of SimpleNamespace which has the following properties: - it allows mapping-like attribute access so that I can pass it to format_map() - it has attributes/keys computed from the tags in tagset so that I can use them in the format string - it has an elaborate __getattr__ method recognising a number of suffixes like "_lc" - should there be no actual tag of that name it will find the prefix and lowercase that, for example - a tag title="My Name" gets "My Name" from {title} and "my_name" from {title_lc} - the same __getattr__ recognises some things like _meta and returns another namespace containing metadata from the ontology, letting me say: {artist._meta.fullname} All this is to support giving the user/config a fairly rich suite of stuff which can go in a format string using the Python format string syntax. Regarding "!lc" vs ":lc": The "!lc" approach: If the format syntax let one supply a mapping of conversions to functions for something like "!lc" then I could rip out a big chunk of complexity from __getattr__, because the "_lc" suffixes above are essentially a syntax hack to work around that shortcoming. The ":lc" approach: The problem with __format__ is that it must be applied to a class. The values inside {foo.bar.zot} in the format string might be almost any type. The only way to get __format__ to do what I'd like is to wrap every such value in a proxy of some kind with a .__format__ method. In principle I can do that in my magic namespace class (from .ns() above). But that's yet another layer of complexity in something I'm already unhappy with. Hence the word "shoehorn" earlier. So to my mind, being able to plug in a mapping of additional (or overriding) conversion specifiers would be the most natural way to improve my situation. Architecturally, that is where I would want my magic "lc" et al to land. I could move the magic "._meta" attribute in there as well producing fewer "magic" attributes, etc. That is why I'm for being able to augument the "!conversion" stuff. Cheers, Cameron Simpson <cs@cskk.id.au>
Cameron Simpson writes:
First up, I have somehow missed this (":format_name") in the semirecursive mess which is the python format-and-friends descriptions. (object.__format__? str.format? str.formap_map? f''? the format mini-language? all in separate places, for reasonable reasons, but my head has exploded multiple times trying to stitch them together).
Agreed on the doc. I'm going to try to do something about that, but it may take a while.
The "!r" conversion is applied _by_ the formatting, _to_ an arbitrary value. So I could write a single function for my hypothetical "!lc" which did what I want for an arbitrary object, because it would be called with the value being converted.
True. Which works if all you want to do is convert the object. But this requires that you do all the work in your hypothetical "!lc", which does not take parameters other than the value it converts, and does not have access to the format specification.
By contrast, the ":lc" format specifier form requires the value being formatted _itself_ to have a special __format__ method. This scales poorly when I might put almost anything into the format string.
I don't understand what you're talking about. As you point out yourself:
The ":lc" approach:
The problem with __format__ is that it must be applied to a class. The values inside {foo.bar.zot} in the format string might be almost any type. The only way to get __format__ to do what I'd like is to wrap every such value in a proxy of some kind with a .__format__ method.
Sure. That's exactly what LowerableStr does, and it's very general, since it has access to both the value and the format spec, and can do both preprocessing of the value and postprocessing of the formatted string. The !conversion approach really is basically a cast, the internal function that does the conversion takes the object to be converted and a code (a str, possibly just a single character).
So to my mind, being able to plug in a mapping of additional (or overriding) conversion specifiers would be the most natural way to improve my situation.
I can see why you would want it for this particular application, but I'm not yet persuaded it's general enough to worth making this aspect of Python user-configurable when there's another way to do it already. In particular, although *you* won't use it to allow users to invoke arbitrary functions, there's no reason it couldn't be used that way, as __format__ already can. I don't think it's a good idea to introduce more places to execute arbitrary code even as a purely theoretical matter. "Although practicality beats purity," of course, but that's my current opinion.
Architecturally, that is where I would want my magic "lc" et al to land. I could move the magic "._meta" attribute in there as well producing fewer "magic" attributes, etc.
I don't see why you couldn't get the same benefits from the __format__ approach. The main thing difference I see with the __format__ approach is that you do need a proxy object in general (if __format__ is mutable, I don't think you do although I haven't tried it). But you'd only have to write it once, and in the event someone did want to use the original __format__ and format spec in post-processing the resulting string, it would be right there. Steve
On 25Apr2021 10:54, Cameron Simpson <cs@cskk.id.au> wrote:
On 24Apr2021 22:35, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote: [...]
My use case is presupplied strings, eg a command line supplied format string.
In that case the format string is user input, and x is a variable in the program that the user can have substituted into their string?
Assuming that *exact* use case, wouldn't
class LowerableStr(str): ... def __format__(self, fmt): ... if fmt == 'lc': ... return self.lower() ... else: ... return str.__format__(self, fmt) ... "{x} is {x:lc} in lowercase".format_map({'x' : LowerableStr("This")}) 'This is this in lowercase'
do?
You're perfectly correct. ":lc" can be shoehorned into doing what I ask. But __format__ is in the wrong place for how I'd like to do this.
Just to follow up on this, I've been experimenting with Stephen's suggestion of using ":foo" style format specifiers. With enough success that I'm probably going to run with it is I can make the implementation clean enough. I'm currently making a subclass of the Formatter builtin class, which supplies the string parser and lets one override various methods to implemenent the "{...}" parts. Currently I can do this: I've got a little ontology, saying that a tag named "colour" is "a colour, a hue" and expects to be a Python str. It also has metadata for the particular colour "blue". { 'type.colour': TagSet:{'description': 'a colour, a hue', 'type': 'str'}, 'meta.colour.blue': TagSet:{'url': 'https://en.wikipedia.org/wiki/Blue', 'wavelengths': '450nm-495nm'} } And I've got a tag set describing some object which is blue: {'colour': 'blue', 'labels': ['a', 'b', 'c'], 'size': 9} The TagSet class has a .format_as(format_string) method which hooks into the Formatter subclass and formats a string according to the TagSet. A little test programme to write information into some command line supplied format strings: [~/hg/css-tagsets(hg:tagsets)]fleet2*> py3 -m cs.tagset '{colour}' '{colour:meta}' '{colour:meta.url}' {colour} => 'blue' {colour:meta} => 'url="https://en.wikipedia.org/wiki/Blue" wavelengths="450nm-495nm"' {colour:meta.url} => 'https://en.wikipedia.org/wiki/Blue' Looking promising to me. Cheers, Cameron Simpson <cs@cskk.id.au>
Maybe I'm missing something, but why do you need the SimpleNamespace at all? Why not make your own mapping as in class StringMapper: ... def __getitem__(self, s): # Whatever arbitrary behavior you want # Process suffixes, etc here, for example: if s.endswith(".lc"): return self.wrapped[s.removesuffix(".lc")].lower() return self.wrapped[s] format_string.format_map(StringMapper(whatever)) Maybe then the data can just be data and this wrapper can handle all the formatting conversions.
On 06May2021 03:43, Dennis Sweeney <sweeney.dennis650@gmail.com> wrote:
Maybe I'm missing something, but why do you need the SimpleNamespace at all? Why not make your own mapping as in
class StringMapper: ... def __getitem__(self, s): # Whatever arbitrary behavior you want # Process suffixes, etc here, for example: if s.endswith(".lc"): return self.wrapped[s.removesuffix(".lc")].lower() return self.wrapped[s]
format_string.format_map(StringMapper(whatever))
Maybe then the data can just be data and this wrapper can handle all the formatting conversions.
All this is somewhat orthoganal to adding more !foo things. If we're not talking about extending the conversions, we should probably take this over to python-list, because the ":foo" stuff is just implementation. That said, because my tags can have dots in their names and I want to be able to write: {tag.name.with.dot} and this is accomplished with an elaborate SimpleNamespace subclass to make the .name.with.dot stuff work as attributes. Just don't ask :-( Oh, you did :-) Anyway, I may be shifting sideways per my recent ":foo" post, using a new class which is a magic view of the TagSet and a Formatter subclass which parses the field_names itself (thus grabbing the dotted identifier). Still a work in progress. Cheers, Cameron Simpson <cs@cskk.id.au>
21.04.21 12:14, Paul Moore пише:
I don't have a particularly strong opinion here, other than to say I'm not sure I like the upper case "I". It looks far too much like a lower case "L" in the font I'm using here, which makes me think of C's "long", so it's easy to confuse. So of the two options, I prefer !f, !d, !i over !f, !i, !I.
Thank you. The upper case "I" is used in many formatting strings: struct, memoryview, array, PyArg_Parse, Py_BuildValue, and there are no plans for !l to confuse. But I follow you suggestion if there would not be other arguments. Also, %d is more common than %d, so using !d for lossy conversion to integer is understandable. Maybe we even repurpose rarely used %i in printf-style formatting for lossless conversion to integer.
Serhiy Storchaka writes:
Currently format strings (and f-string expressions) support three conversions: !s -- str, !r -- repr and !a for ascii.
It's not clear to me what these are good for, to be honest. Why not just have s, r, and a format codes? The !conversions don't compose with format codes: >>> f"{10!r:g}" Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Unknown format code 'g' for object of type 'str' So I don't think I want to go further. I have some sympathy for your proposal, in part because I'd like to see something done about moving I18N into the format() mechanism. But I'm going to play devil's advocate, mostly because I'm getting old enough to not like change so much. ;-)
I propose to add support of additional conversions: for int, float and operator.index. It will help to convert automatically printf- like format strings to f-string expressions: %d, %i, %u -- use int, %f -- use float, %o, %x -- use operator.index.
This makes more sense to me than !s, !r, and !a -- you might or might not want these conversions, I guess. But it seems like a lot of complexity to add. On the other hand, isn't the answer "fix __format__ in class definitions?" For example, currently >>> for i in "befgodxs": ... print(format(10, i)) ... 1010 1.000000e+01 10.000000 10 12 10 a Traceback (most recent call last): File "<stdin>", line 2, in <module> ValueError: Unknown format code 's' for object of type 'int' But we could change int.__format__ to allow 's' as a format code[1], automagically calling str(), just as 'efg' are allowed and automagically call float().
Currently I write f"path = {repr(str(path))}" or f"path = {str(path)!r}", but want to write f"path = {path!s!r}".
I have some sympathy for this; it's not a big change, and given the syntax you propose I doubt anyone would be confused about the semantics, including the order of conversions. However: To me, this seems like a clear case where you want to embed the conversions in the format code mechanism for those specific types: extend the __format__ method for URL objects to allow {url:h} where the 'h' format code applies "hex-escape", or you could repurpose the "u" code from the standard minilanguage to apply url-escape, or (I don't know if format allows) you could use {url:%}! How many types would need an additional format code to handle whatever use case wants repr(str())? Or are you envisioning heavy use of !f!a etc? (I can't see how any of the existing conversions could have an effect on the output of the numerical conversions you propose, though.)
Do we need support of more standard conversions?
I don't know about conversions, but somehow getting I18N substitutions into format so we could eventually allow Americans to "def _()" again! I know, I18N is hard, but isn't that the Holy Grail? {date_time!p} ('p' for "POSIX locale"), anyone? I think !!conversions might be useful if we could find a way to move I18N substitutions into the format mechanism, though. At the very least, {currency!p!a} might be used frequently. One thing: I18N is likely to eat up most of the ISO 8859 repertoire for conversion codes. Just consider the number of variations on ISO 8601 datetime formatting, let alone the scads of other variations on datetimes, and the other POSIX substitutions.
Do we want to support custom conversions (registered globally as encodings and error handlers). re.escape, html.escape and shlex.quote could be very useful in some applications.
This really seems to me to be the kind of thing that should be handled in __init__ whenever possible, and __format__ if not. Footnotes: [1] I recognize this may imply many changes, since some (all?) of the builtin types seem to be supported by object.__format__.
On Fri, Apr 23, 2021 at 7:24 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Serhiy Storchaka writes:
Currently format strings (and f-string expressions) support three conversions: !s -- str, !r -- repr and !a for ascii.
It's not clear to me what these are good for, to be honest. Why not just have s, r, and a format codes? The !conversions don't compose with format codes:
>>> f"{10!r:g}" Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Unknown format code 'g' for object of type 'str'
They do, but they're applied in sequence. After !r, you're dealing with a string, not an integer.
f"{2/3!r:8.6s}" '0.6666 '
So I don't think I want to go further. I have some sympathy for your proposal, in part because I'd like to see something done about moving I18N into the format() mechanism. But I'm going to play devil's advocate, mostly because I'm getting old enough to not like change so much. ;-)
How would that work? I18n needs to work with the whole string, not a single value, as format() does. Or if you mean the str.format() method, that's already an i18n target.
I propose to add support of additional conversions: for int, float and operator.index. It will help to convert automatically printf- like format strings to f-string expressions: %d, %i, %u -- use int, %f -- use float, %o, %x -- use operator.index.
This makes more sense to me than !s, !r, and !a -- you might or might not want these conversions, I guess. But it seems like a lot of complexity to add. On the other hand, isn't the answer "fix __format__ in class definitions?"
For example, currently
>>> for i in "befgodxs": ... print(format(10, i)) ... 1010 1.000000e+01 10.000000 10 12 10 a Traceback (most recent call last): File "<stdin>", line 2, in <module> ValueError: Unknown format code 's' for object of type 'int'
But we could change int.__format__ to allow 's' as a format code[1], automagically calling str(), just as 'efg' are allowed and automagically call float().
Are you asking for every single __format__ function to be required to support specific format codes, or will format() itself handle those? Currently, format codes are handled by the type, but the bang conversions are handled by the f-string itself. ChrisA
On 4/23/2021 5:22 AM, Stephen J. Turnbull wrote:
But we could change int.__format__ to allow 's' as a format code[1], automagically calling str(), just as 'efg' are allowed and automagically call float(). ... [1] I recognize this may imply many changes, since some (all?) of the builtin types seem to be supported by object.__format__.
object.__format__ doesn't do anything except call str() (assuming the format spec is not specified). It's each builtin type that supports formatting itself. There's an int.__format__, str.__format__, float.__format__, datetime.datetime.__format__, decimal.__format__, etc. Many of these use some shared code to do the heavy lifting, and many of them share the same format spec, or much of it. But conceptually they're all distinct and each one can interpret the format spec however they choose. For example, int.__format__ chose not to support the "s" type. And datetime.datetime.__format__ chose to call .strftime(). It's unfortunate that PEP 3101 says "If an object does not define its own format specifiers, a standard set of format specifiers is used", because it's not true. It should say that "Many types will chose to support a standard format specifier". Eric
23.04.21 12:22, Stephen J. Turnbull пише:
Serhiy Storchaka writes:
Currently format strings (and f-string expressions) support three conversions: !s -- str, !r -- repr and !a for ascii.
It's not clear to me what these are good for, to be honest. Why not just have s, r, and a format codes? The !conversions don't compose with format codes:
>>> f"{10!r:g}" Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Unknown format code 'g' for object of type 'str'
Because it converts value to string, and string formatting does not support "g". Converters !s, !r and !a are separated from format specifier, and it is old and widely used feature. I only propose to add more converters, because they are needed for some compiler optimizations. I was going to add them as private AST detail in any case, but if we are going to make this feature public, it is worth to discuss it ahead to avoid name conflicts in future. I asked what letters should be chosen for convertors for int() and index().
So I don't think I want to go further. I have some sympathy for your proposal, in part because I'd like to see something done about moving I18N into the format() mechanism. But I'm going to play devil's advocate, mostly because I'm getting old enough to not like change so much. ;-)
I am not sure which relation does it have to I18N.
I propose to add support of additional conversions: for int, float and operator.index. It will help to convert automatically printf- like format strings to f-string expressions: %d, %i, %u -- use int, %f -- use float, %o, %x -- use operator.index.
This makes more sense to me than !s, !r, and !a -- you might or might not want these conversions, I guess. But it seems like a lot of complexity to add. On the other hand, isn't the answer "fix __format__ in class definitions?"
We need to format a value as integer or float independently from __format__ implementation, and raise an error if it cannot be converted to integer or float. The purpose of the feature is bypassing __format__ and get the same result as in printf-style formatting.
But we could change int.__format__ to allow 's' as a format code[1], automagically calling str(), just as 'efg' are allowed and automagically call float().
Yes, we could add support for "s" in int.__format__, but it was decided to not do this for some reasons. It would confuse format specifier with converter, it would make some errors be uncaught (like passing integer when string is expected), it would require to duplicate the code of str.__format__ in int.__format__ (and all other __format__'s where you want to support "s").
Currently I write f"path = {repr(str(path))}" or f"path = {str(path)!r}", but want to write f"path = {path!s!r}".
I have some sympathy for this; it's not a big change, and given the syntax you propose I doubt anyone would be confused about the semantics, including the order of conversions. However:
To me, this seems like a clear case where you want to embed the conversions in the format code mechanism for those specific types: extend the __format__ method for URL objects to allow {url:h} where the 'h' format code applies "hex-escape", or you could repurpose the "u" code from the standard minilanguage to apply url-escape, or (I don't know if format allows) you could use {url:%}!
Supporting the "h" format code in the __format__ method for URL objects is a reasonable idea, and it is the purpose of __format__ methods. But it does not relates to converters. If you want to dump the value of some variable using repr(), do you want to add support of "r" in every implementation of __format__ in the world (and what if some is not support it or use it with different semantic)? "%r" % x just calls repr(), and we wanted this feature in new formatting.
How many types would need an additional format code to handle whatever use case wants repr(str())?
All types with custom __str__. If it is convertable to str, you often want to see the string representation, because it is shorter and more human readable than the result of repr(). But since it can contain any special characters, you want to see them and the boundary of that string, thus use repr() or ascii() on the resulted string.
Or are you envisioning heavy use of !f!a etc? (I can't see how any of the existing conversions could have an effect on the output of the numerical conversions you propose, though.)
No, I only need !s!r and !s!a. Maybe !f!s will have some use, but since repr of float is the same as str, !f!a is the same as !f!s.
Serhiy Storchaka writes:
Because it converts value to string, and string formatting does not support "g". Converters !s, !r and !a are separated from format specifier, and it is old and widely used feature.
And poorly documented, IMO. I'll see if I can do better.
but if we are going to make this feature public, it is worth to discuss it ahead to avoid name conflicts in future. I asked what letters should be chosen for convertors for int() and index().
I would go with !f for "float", !d for "decimal" (int) and !x or !i for "index". I like !x better because of the ambiguity of "i". But I don't think I'm likely to use any of them, so that's about +0.2 for all of them.
I am not sure which relation does it have to I18N.
First, the use of "_()" as the localizable string marker conflicts with the common use of "_" as a "throwaway" identifier. Second, POSIX locale lookup (and possibly gettext, too) seems like a very obvious candidate for !conversion before formatting a string field, similar to casting a number to float for e, f, or g formatting.
We need to format a value as integer or float independently from __format__ implementation, and raise an error if it cannot be converted to integer or float. The purpose of the feature is bypassing __format__ and get the same result as in printf-style formatting.
Surely you don't mean "bypass format"? If you bypass format, how do you select among e, f, and g? This is exactly the confusion that caused me trouble: the use cases for !s, !r, and !a generally do bypass __format__ (but something like "x!r:>10" is occasionally useful and does work). So at the language level they appear to be syntactic sugar, since they are not programmable (see Cameron Simpson's posts).
Yes, we could add support for "s" in int.__format__, but it was decided to not do this for some reasons.
I'm not really suggesting that we do this *now*, since the backward incompatibility is horrifying. I'm asking why converters were exposed in the first place since they look like syntactic suger. I guess the basic answer is "they're already needed for int -> float conversion, and they're nice abbreviations for str(), repr(), and ascii() because format() and friends don't allow function calls in {}". Anyway, I'm satisfied with that, you don't need to convince me.
Supporting the "h" format code in the __format__ method for URL objects is a reasonable idea, and it is the purpose of __format__ methods. But it does not relates to converters. If you want to dump the value of some variable using repr(), do you want to add support of "r" in every implementation of __format__ in the world (and what if some is not support it or use it with different semantic)? "%r" % x just calls repr(), and we wanted this feature in new formatting.
You could have it if the format codes 's', 'r', and 'a' were implemented in the formatting mechanisms, i.e., in print() itself, in format(), in str.format, and in f-strings. I'm not ready to argue for changing it now, but I think it would have been better if format specs included the preprocessor !conversion and also a postprocessor (for Cameron), and maybe allowed those to be programmable via a mapping of codes to type converters for the preprocessor and a mapping of codes to str -> str functions for the postprocessor, or passed these to __format__ as a triple (format_spec, pre_code, post_code). Steve
participants (8)
-
Cameron Simpson
-
Chris Angelico
-
Dennis Sweeney
-
Eric V. Smith
-
Paul Moore
-
Rob Cliffe
-
Serhiy Storchaka
-
Stephen J. Turnbull