PEP 498 (interpolated f-string) tweak
While finishing up the implementation of PEP 498, I realized that the PEP has an error. It says that this code: f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi' Is equivalent to: 'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(expr3).__format__('') + 'ghi' But that's not correct. The right way to call __format__ is: type(expr1).__format__(expr1, spec1) That is, the lookup of __format__ is done on the type, not the instance. Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever). I've posted a new version of the code in issue 24965. I'll update the PEP itself sometime this weekend.
Good catch, Eric. For those who wonder what the difference is or why it matters, it's what we do for all dunder operators: the magic method is looked up on the class, not on the instance. This means you can't e.g. override + on a per-instance basis by having an instance variable named '__add__' pointing to some function. That's a rare use case, it's pretty obfuscated, it would slow down everything, and if you really need that pattern, there are other ways to do it (you could have a regular instance method __add__ on the class that looks for an instance variable, e.g. named _add, and call it if it exists -- otherwise call super().__add__). --Guido On Sat, Sep 19, 2015 at 4:03 AM, Eric V. Smith <eric@trueblade.com> wrote:
While finishing up the implementation of PEP 498, I realized that the PEP has an error. It says that this code:
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi'
Is equivalent to:
'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(expr3).__format__('') + 'ghi'
But that's not correct. The right way to call __format__ is:
type(expr1).__format__(expr1, spec1)
That is, the lookup of __format__ is done on the type, not the instance.
Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
I've posted a new version of the code in issue 24965. I'll update the PEP itself sometime this weekend.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On 19.09.15 14:03, Eric V. Smith wrote:
While finishing up the implementation of PEP 498, I realized that the PEP has an error. It says that this code:
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi'
Is equivalent to:
'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(expr3).__format__('') + 'ghi'
But that's not correct. The right way to call __format__ is:
type(expr1).__format__(expr1, spec1)
That is, the lookup of __format__ is done on the type, not the instance.
Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Concatenating many strings is not efficient. More efficient way is to use string formatting. Why not translate f-string to 'abc%s%sdef%sghi' % (format(expr1, spec1), format(repr(expr2), spec2), expr3) or even to 'abc{:spec1}{!r:spec2}def{!s}ghi'.format(expr1, expr2, expr3) ?
On 9/19/2015 3:22 PM, Serhiy Storchaka wrote:
On 19.09.15 14:03, Eric V. Smith wrote:
While finishing up the implementation of PEP 498, I realized that the PEP has an error. It says that this code:
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi'
Is equivalent to:
'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(expr3).__format__('') + 'ghi'
But that's not correct. The right way to call __format__ is:
type(expr1).__format__(expr1, spec1)
That is, the lookup of __format__ is done on the type, not the instance.
Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Concatenating many strings is not efficient. More efficient way is to use string formatting. Why not translate f-string to
'abc%s%sdef%sghi' % (format(expr1, spec1), format(repr(expr2), spec2), expr3)
As the PEP says, the expression with '+' is illustrative, not how it's actually implemented. The implementation currently uses ''.join, although I reserve the right to change it.
or even to
'abc{:spec1}{!r:spec2}def{!s}ghi'.format(expr1, expr2, expr3)
That's surprisingly difficult to get right. I've implemented the code at least 3 different ways, and the current implementation seems the most straightforward. I might add a special opcode to use a _PyUncode_Writer, though. Eric.
On 9/19/2015 3:36 PM, Eric V. Smith wrote:
On 9/19/2015 3:22 PM, Serhiy Storchaka wrote:
On 19.09.15 14:03, Eric V. Smith wrote:
Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Concatenating many strings is not efficient. More efficient way is to use string formatting. Why not translate f-string to
'abc%s%sdef%sghi' % (format(expr1, spec1), format(repr(expr2), spec2), expr3)
As the PEP says, the expression with '+' is illustrative, not how it's actually implemented. The implementation currently uses ''.join, although I reserve the right to change it.
I should also note: an earlier version of the PEP showed the ''.join() version of the equivalent code, but the feedback was that it was confusing, and the '+' version was easier to understand. And another reason that I don't use %-formatting or ''.format() as the implementation is for performance: the parser spends a lot of effort to parse the f-string. To then put it back together and have ''.format() immediately re-parse it didn't make much sense. And I'm not convinced there aren't edge cases where the f-string parser and the ''.format() parse differ, especially when dealing with nested format_specs with funky characters. Instead, I generate two additional AST nodes: ast.FormattedValue and ast.JoinedStr. JoinedStr just has list of string expressions which get joined together (as I said: currently using ''.join()). FormattedValue contains the expression and its optional conversion character and optional format_spec, which get formatted into a string (currently using the builtin format()). Eric.
On Sun, Sep 20, 2015 at 5:36 AM, Eric V. Smith <eric@trueblade.com> wrote:
As the PEP says, the expression with '+' is illustrative, not how it's actually implemented. The implementation currently uses ''.join, although I reserve the right to change it.
or even to
'abc{:spec1}{!r:spec2}def{!s}ghi'.format(expr1, expr2, expr3)
That's surprisingly difficult to get right. I've implemented the code at least 3 different ways, and the current implementation seems the most straightforward. I might add a special opcode to use a _PyUncode_Writer, though.
Since this is entirely under the control of the parser, there's no particular reason to promise anything about the implementation or its performance metrics. The semantics won't change if CPython 3.7.4 decides to change to using ''.join() instead of concatenation. Let's not bikeshed the implementation details. I'm sure Eric knows what he's doing. ChrisA
On 19 September 2015 at 21:03, Eric V. Smith <eric@trueblade.com> wrote:
While finishing up the implementation of PEP 498, I realized that the PEP has an error. It says that this code:
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi'
Is equivalent to:
'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(expr3).__format__('') + 'ghi'
But that's not correct. The right way to call __format__ is:
type(expr1).__format__(expr1, spec1)
That is, the lookup of __format__ is done on the type, not the instance.
Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Does this mean overriding format at the module level or in builtins will affect the way f-strings are evaluated at runtime? (I don't have a strong preference one way or the other, but I think the PEP should be explicit as to the expected behaviour rather than leaving it as implementation defined). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 9/20/2015 8:37 AM, Nick Coghlan wrote:
On 19 September 2015 at 21:03, Eric V. Smith <eric@trueblade.com> wrote:
Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Does this mean overriding format at the module level or in builtins will affect the way f-strings are evaluated at runtime? (I don't have a strong preference one way or the other, but I think the PEP should be explicit as to the expected behaviour rather than leaving it as implementation defined).
Yes, in the current implementation, if you mess with format(), str(), repr(), or ascii() you can break f-strings. The latter 3 are used to implement !s, !r, and !a. I have a plan to change this, by adding one or more opcodes to implement the formatting and string joining. I'll defer a decision on updating the PEP until I can establish the feasibility (and desirability) of that approach. Eric.
On 20.09.15 16:51, Eric V. Smith wrote:
On 9/20/2015 8:37 AM, Nick Coghlan wrote:
On 19 September 2015 at 21:03, Eric V. Smith <eric@trueblade.com> wrote:
Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Does this mean overriding format at the module level or in builtins will affect the way f-strings are evaluated at runtime? (I don't have a strong preference one way or the other, but I think the PEP should be explicit as to the expected behaviour rather than leaving it as implementation defined).
Yes, in the current implementation, if you mess with format(), str(), repr(), or ascii() you can break f-strings. The latter 3 are used to implement !s, !r, and !a.
I have a plan to change this, by adding one or more opcodes to implement the formatting and string joining. I'll defer a decision on updating the PEP until I can establish the feasibility (and desirability) of that approach.
I propose to add internal builting formatter type. Instances should be marshallable and callable. The code generated for f-strings should just load formatter constant and call it with arguments. The formatter builds resulting string by concatenating literal strings and results of formatting arguments with specified specifications. Later we could change compiler (just peephole optimizer?) to replace literal_string.format(*args) and literal_string % args with calling precompiled formatter. Later we could rewrite str.format, str.__mod__ and re.sub to create temporary formatter object and call it. Later we could expose public API for creating formatter object. It can be used by third-party template engines.
On Sep 20, 2015, at 11:15 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 20.09.15 16:51, Eric V. Smith wrote:
On 9/20/2015 8:37 AM, Nick Coghlan wrote:
On 19 September 2015 at 21:03, Eric V. Smith <eric@trueblade.com> wrote: Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Does this mean overriding format at the module level or in builtins will affect the way f-strings are evaluated at runtime? (I don't have a strong preference one way or the other, but I think the PEP should be explicit as to the expected behaviour rather than leaving it as implementation defined).
Yes, in the current implementation, if you mess with format(), str(), repr(), or ascii() you can break f-strings. The latter 3 are used to implement !s, !r, and !a.
I have a plan to change this, by adding one or more opcodes to implement the formatting and string joining. I'll defer a decision on updating the PEP until I can establish the feasibility (and desirability) of that approach.
I propose to add internal builting formatter type. Instances should be marshallable and callable. The code generated for f-strings should just load formatter constant and call it with arguments. The formatter builds resulting string by concatenating literal strings and results of formatting arguments with specified specifications.
Later we could change compiler (just peephole optimizer?) to replace literal_string.format(*args) and literal_string % args with calling precompiled formatter.
Later we could rewrite str.format, str.__mod__ and re.sub to create temporary formatter object and call it.
Later we could expose public API for creating formatter object. It can be used by third-party template engines.
I think this is InterpolationTemplate from PEP 501. Eric.
On 21 September 2015 at 05:22, Eric V. Smith <eric@trueblade.com> wrote:
On Sep 20, 2015, at 11:15 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 20.09.15 16:51, Eric V. Smith wrote:
On 9/20/2015 8:37 AM, Nick Coghlan wrote:
On 19 September 2015 at 21:03, Eric V. Smith <eric@trueblade.com> wrote: Instead of calling __format__, I've changed the code generator to call format(expr1, spec1). As an optimization, I might add special opcodes to deal with this and string concatenation, but that's for another day (if ever).
Does this mean overriding format at the module level or in builtins will affect the way f-strings are evaluated at runtime? (I don't have a strong preference one way or the other, but I think the PEP should be explicit as to the expected behaviour rather than leaving it as implementation defined).
Yes, in the current implementation, if you mess with format(), str(), repr(), or ascii() you can break f-strings. The latter 3 are used to implement !s, !r, and !a.
I have a plan to change this, by adding one or more opcodes to implement the formatting and string joining. I'll defer a decision on updating the PEP until I can establish the feasibility (and desirability) of that approach.
I propose to add internal builting formatter type. Instances should be marshallable and callable. The code generated for f-strings should just load formatter constant and call it with arguments. The formatter builds resulting string by concatenating literal strings and results of formatting arguments with specified specifications.
Later we could change compiler (just peephole optimizer?) to replace literal_string.format(*args) and literal_string % args with calling precompiled formatter.
Later we could rewrite str.format, str.__mod__ and re.sub to create temporary formatter object and call it.
Later we could expose public API for creating formatter object. It can be used by third-party template engines.
I think this is InterpolationTemplate from PEP 501.
It's certainly a similar idea, although PEP 501 just proposed storing strings and tuples on the code object, with the interpolation template itself still being a mutable object constructed at runtime. Serhiy's suggestion goes a step further to suggest making the template itself immutable, and passing in all the potentially mutable data as method arguments. I think there's a simpler approach available though, which is to go the way we went in introducing first the __import__ builtin and later the __build_class__ builtin to encapsulate some of the complexity of their respective statements without requiring a raft of new opcodes. The last draft of PEP 501 before I deferred it proposed the following for interpolation templates, since it was able to rely on having f-strings available as a primitive and wanted to offer more flexibility than string formatting needs: _raw_template = "Substitute {names} and {expressions()} at runtime" _parsed_template = ( ("Substitute ", "names"), (" and ", "expressions()"), (" at runtime", None), ) _field_values = (names, expressions()) _format_specifiers = (f"", f"") template = types.InterpolationTemplate(_raw_template, _parsed_template, _field_values, _format_specifiers) A __format__ builtin (or a dedicated opcode) could use a simpler data model that consisted of the following constant and variable elements: Compile time constant: tuple of (<leading_text>, <tuple_of_leading_specifier_elements>) pairs Runtime variable: tuple of (<substitution_field_value>, <tuple_of_specifier_substitution_field_values>) pairs If the format string didn't end with a substitution field, then the runtime variable tuple would be 1 element shorter than the constant tuple. With that approach, then __format__ (or an opcode that popped these two tuples directly off the stack) could be defined as something like: def __format__(constant_parts, variable_parts): num_fields = len(variable_parts) segments = [] for idx, (leading_text, specifier_constants) in constant_parts: segments.append(leading_text) if idx < num_fields: field_value, specifier_variables = variable_parts[idx] if specifier_variables: specifier = __format__(specifier_constants, specifier_variables) else: assert len(specifier_constants) == 1 specifier = specifier_constants[0] if specifier.startswith("!"): # Handle "!a", "!r", "!s" by modifying field_value *and* specifier if specifier: segments.append(format(field_value, specifier) return "".join(segments) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Chris Angelico -
Eric V. Smith -
Guido van Rossum -
Nick Coghlan -
Serhiy Storchaka