[Python-Dev] PEP 498 f-string: is it a preprocessor?

Tue Aug 11 17:05:46 CEST 2015

On 08/10/2015 07:23 PM, Victor Stinner wrote:
> 
> 
> Le mardi 11 août 2015, Eric V. Smith <eric at trueblade.com
> <mailto:eric at trueblade.com>> a écrit :
> 
>     Oops, I was thinking of going the other way (str.format -> f''). Yes, I
>     think you're correct.
> 
> 
> Ah ok. 
> 
>     But in any event, I don't see the distinction between calling
>     str.format(), and calling each object's __format__ method. Both are
>     compliant with the PEP, which doesn't specify exactly how the
>     transformation is done.
> 
> 
> When I read the PEP for the first time, I understood that you
> reimplemented str.format() using the __format__() methods. So i
> understood that it's a new formatting language and it would be tricky to
> reimplement it, for example in a library providing i18n with f-string
> syntax (I'm not sure that it's feasible, it's just an example). I also
> expected many subtle differences between .format() and f-string.
> 
> In fact, f-string is quite standard and not new, it's just a compact
> syntax to call .format() (well, with some minor and acceptable subtle
> differences). For me, it's a good thing to rely on the existing
> .format() method because it's well known (no need to learn a new
> formatting language).
> 
> Maybe you should rephrase some parts of your PEP and rewrite some
> examples to say that's it's "just" a compact syntax to call .format().
> 
> --
> 
> For me, calling __format__() multiple times or format() once matters,
> for performances, because I contributed to the implementation of
> _PyUnicodeWriter. I spent a lot of time to keep good performances
> when the implementation of Unicode was rewritten for the PEP 393. With
> this PEP, writing an efficient implementation is much harder. The dummy
> benchmark is to compare Python 2.7 str.format() (bytes!) to Python 3
> str.format() (Unicode!). Users want similar performances! If I recall
> correctly, Python 3 is not bad (faster is some corner cases).
> 
> Concatenate temporary strings is less efficient Than _PyUnicodeWriter
> (single buffer) when you have UCS-1, UCS-2 and UCS-4 strings (1/2/4
> bytes per character). It's more efficient to write directly into the
> final format (UCS-1/2/4), even if you may need to convert the buffer
> from UCS-1 to UCS-2 (and maybe even one more time to UCS-4).

I think I've pinpointed what bothers me about building up a string for
str.format: You're building up a string which is then parsed, and after
it's parsed, you make the exact same function calls that you could
instead make directly.

When Mark Dickinson and I implemented short float repr we spent a lot of
time taking apart code that did things like this.

But yes, there are some optimizations in str.format dealing with both
_PyUncicodeWriter and with not calling __format__ for some builtin
types. So maybe there's a win to be had there, even with the extra
parsing that would happen.

In any event, it should be driven by testing.

That said, I now think that when handling nested f-strings:

f'value: {value:{width}}'

It will be easier to translate this to:

'value: {0:{1}s}'.format(value, width)

than:

''.join(['value: ',
          value.__format__(''.join([width.__format__(), 's']))
         ])

But I don't see any need to modify the PEP. The exact mechanism used
isn't specified. I just want the PEP to be clear that it's using the
__format__ protocol. The implementation can either do so explicitly or
via str.format, and which one might change in the future.

Eric.