[Python-ideas] Briefer string format

Eric V. Smith eric at trueblade.com
Tue Jul 21 13:58:08 CEST 2015


On 7/21/2015 2:05 AM, Guido van Rossum wrote:
>     And now that I think about it, it's somewhat more complex than just
>     expanding the expression. In .format(), this:
>     '{a[0]}{b[c]}'
>     is evaluated roughly as
>     format(a[0]) + format(b['c'])
> 
> 
> Oooh, this is very unfortunate. I cannot support this. Treating b[c] as
> b['c'] in a "real" format string is one way, but treating it that way in
> an expression is just too weird.

I think you're right here, and my other emails were trying too much to
simplify the implementation and keep the parallels with str.format().
The difference between str.format() and f-strings is that in
str.format() you can have an arbitrarily complex expression as the
passed in argument to .format(). With f-strings, you'd be limited to
just what can be extracted from the string itself: there are no
arguments to be passed in. So maybe we do want to allow arbitrary
expressions inside the f-string.

For example:

'{a.foo}'.format(a=b[c])

If we limit f-strings to just what str.format() string expressions can
represent, it would be impossible to represent this with an f-string,
without an intermediate assignment.

But if we allowed arbitrary expressions inside an f-string, then we'd have:
f'{b[c].foo}'

and similarly:
'{a.foo}'.format(a=b['c'])
would become:
f'{b["c"].foo}'

But now we'd be breaking compatibility with str.format(). Maybe it's
worth it, though. I can see 80% of the uses of str.format() being
replaced by f-strings. The remainder would be cases where format strings
are passed in to other functions. I do this a lot with custom logging [1].

The implementation complexity goes up by allowing arbitrary expressions.
Not that that is necessarily a reason to drive a design decision.

For example:
f'{a[2:3]:20d}'

We need to extract the expression "a[2:3]" and the format spec "20d". I
can't just scan for a colon any more, I've got to actually parse the
expression until I find a "}", ":", or "!" that's not part of the
expression so that I know where it ends. But since it's happening at
compile time, I surely have all of the tools at my disposal. I'll have
to look through the grammar to see what the complexities here are and
where this would fit in.

>     So given that, I think we should just support what .format() allows,
>     since it's really not quite as simple as "evaluate the expression inside
>     the braces".
> 
> Alas. And this is probably why we don't already have this feature.

Agreed. So I think it's either "don't be compatible with str.format
expressions" or "abandon the proposed f-strings".

>     > Not sure what you mean by "implicit merging" -- if you mean literal
>     > concatenation (e.g. 'foo' "bar" == 'foobar') then I think it should be
>     > allowed, just like we support mixing quotes and r''.
> 
>     If I understand it, I think the concern is:
> 
>     f'{a}{b}' 'foo{}' f'{c}{d}'
> 
>     would need to become:
>     f'{a}{b}foo{{}}{c}{d}'
> 
>     So you have to escape the braces in non-f-strings when merging strings
>     and any of them are f-strings, and make the result an f-string. But I
>     think that's the only complication.
> 
> 
> That's possible; another possibility would be to just have multiple
> .format() calls (one per f'...') and use the + operator to concatenate
> the pieces.

Right. I think the application would actually use _PyUnicodeWriter to
build the string up, but it would logically be equivalent to:

'foo ' f'b:{b["c"].foo:20d} is {on_off}' ' bar'

becoming:

'foo' + 'b:' + format(b["c"].foo, '20d') + ' is ' +
   format(on_off) + ' bar'

At this point, the implementation wouldn't call str.format() because
it's not being used to evaluate the expression. It would just call
format() directly. And since it's doing that without having to look up
.format on the string, we'd get some performance back that str.format()
currently suffers from.

Nothing is really lost by not merging the adjacent strings, since the
f-strings by definition are replaced by function calls. Maybe the
optimizer could figure out that 'foo ' + 'b:' could be merged in to 'foo
b:'. Or maybe the user should refactor the strings if it's that important.

I'm out of the office all day and won't be able to respond to any follow
ups until later. But that's good, since I'll be forced to think before
typing!

Eric.

[1] Which makes me think of the crazy idea of passing in unevaluated
f-strings in to another function to be evaluated in their context. But
the code injection opportunities with doing this with arbitrary
user-specified strings are just too scary to think about. At least with
str.format() you're limited in to what the expressions can do. Basically
indexing and attribute access. No function calls: '{.exit()}'.format(sys) !



More information about the Python-ideas mailing list