Re: [Python-Dev] PEP-498: Literal String Formatting

Aug. 17, 2015

      On 8/17/2015 2:24 PM, Guido van Rossum wrote:
...
On Mon, Aug 17, 2015 at 7:13 AM, Eric V. Smith <eric@trueblade.com
<mailto:eric@trueblade.com>> wrote:
[...]
    My current plan is to replace an f-string with a call to .format_map:
    >>> foo = 100
    >>> bar = 20
    >>> f'foo: {foo} bar: { bar+1}'
Would become:
    'foo: {foo} bar: { bar+1}'.format_map({'foo': 100, ' bar+1': 21})
The string on which format_map is called is the identical string that's
    in the source code. With the exception noted in PEP 498, I think this
    satisfies the principle of least surprise.
Does this really work? Shouldn't this be using some internal variant of
format_map() that doesn't attempt to interpret the keys in brackets in
any ways? Otherwise there'd be problems with the different meaning of
e.g. {a[x]} (unless I misunderstand .format_map() -- I'm assuming it's
just like .format(**blah).
Good point. It will require a similar function to format_map which
doesn't interpret the contents of the braces (except to the extent that
the f-string parser already has to). For argument's sake in point #4
below, let's call this str.format_map_simple.
...
As I've said elsewhere, we could then have some i18n function look up
    and replace the string before format_map is called on it. As long as it
    leaves the expression text alone, everything will work out fine. There
    are some quirks with having the same expression appear twice, if the
    expression has side effects. But I'm not so worried about that.
The more I hear Barry's objections against arbitrary expressions from
the i18n POV the more I am thinking that this is just a square peg and a
round hole situation, and we should leave i18n alone. The requirements
for i18n are just too different than the requirements for other use
cases (i18n cares deeply about preserving the original text of the {...}
interpolations; the opposite is the case for the other use cases).
I think it would be possible to create a version of this that works for
both i18n and regular interpolation. I think the open issues are:

1. Barry wants the substitutions to look like $identifier and possibly
${identifier}, and the PEP 498 proposal just uses {}.

2. There needs to be a way to identify interpolated strings and i18n
strings, and possibly combinations of those. This leads to PEP 501's i-
and iu- strings.

3. A way to enforce identifiers-only, instead of generalized expressions.

4. We need a "safe substitution" mode for str.format_map_simple (from
above).

#1 is just a matter of preference: there's no technical reason to prefer
{} over $ or ${}. We can make any decision here. I prefer {} because
it's the same as str.format.

#2 needs to be decided in concert with the tooling needed to extract the
strings from the source code. The particular prefixes are up for debate.
I'm not a big fan of using "u" to have a meaning different from it's
current "do nothing" interpretation in 3.5. But really any prefixes will
do, if we decide to use string prefixes. I think that's the question: do
we want to distinguish among these cases using string prefixes or
combinations thereof?

#3 is doable, either at runtime or in the tooling that does the string
extraction.

#4 is simple, as long as we always turn it on for the localized strings.

Personally I can go either way on including i18n. But I agree it's
beginning to sound like i18n is just too complicated for PEP 498, and I
think PEP 501 is already too complicated. I'd like to make a decision on
this one way or the other, so we can move forward.
...
[...]
    > The understanding here is that there are these new types of tokens:
    > F_STRING_OPEN for f'...{, F_STRING_MIDDLE for }...{, F_STRING_END for
    > }...', and I suppose we also need F_STRING_OPEN_CLOSE for f'...' (i.e.
    > not containing any substitutions). These token types can then be used in
    > the grammar. (A complication would be different kinds of string quotes;
    > I propose to handle that in the lexer, otherwise the number of
    > open/close token types would balloon out of proportions.)
This would save a few hundred lines of C code. But a quick glance at the
    lexer and I can't see how to make the opening quotes agree with the
    closing quotes.
The lexer would have to develop another stack for this purpose.
I'll give it some thought.

Eric.