[Python-Dev] PEP-498: Literal String Formatting
Eric V. Smith
eric at trueblade.com
Mon Aug 17 22:26:13 CEST 2015
On 8/17/2015 2:24 PM, Guido van Rossum wrote:
> On Mon, Aug 17, 2015 at 7:13 AM, Eric V. Smith <eric at trueblade.com
> <mailto:eric at trueblade.com>> wrote:
>
> [...]
> My current plan is to replace an f-string with a call to .format_map:
> >>> foo = 100
> >>> bar = 20
> >>> f'foo: {foo} bar: { bar+1}'
>
> Would become:
> 'foo: {foo} bar: { bar+1}'.format_map({'foo': 100, ' bar+1': 21})
>
> The string on which format_map is called is the identical string that's
> in the source code. With the exception noted in PEP 498, I think this
> satisfies the principle of least surprise.
>
>
> Does this really work? Shouldn't this be using some internal variant of
> format_map() that doesn't attempt to interpret the keys in brackets in
> any ways? Otherwise there'd be problems with the different meaning of
> e.g. {a[x]} (unless I misunderstand .format_map() -- I'm assuming it's
> just like .format(**blah).
Good point. It will require a similar function to format_map which
doesn't interpret the contents of the braces (except to the extent that
the f-string parser already has to). For argument's sake in point #4
below, let's call this str.format_map_simple.
> As I've said elsewhere, we could then have some i18n function look up
> and replace the string before format_map is called on it. As long as it
> leaves the expression text alone, everything will work out fine. There
> are some quirks with having the same expression appear twice, if the
> expression has side effects. But I'm not so worried about that.
>
>
> The more I hear Barry's objections against arbitrary expressions from
> the i18n POV the more I am thinking that this is just a square peg and a
> round hole situation, and we should leave i18n alone. The requirements
> for i18n are just too different than the requirements for other use
> cases (i18n cares deeply about preserving the original text of the {...}
> interpolations; the opposite is the case for the other use cases).
I think it would be possible to create a version of this that works for
both i18n and regular interpolation. I think the open issues are:
1. Barry wants the substitutions to look like $identifier and possibly
${identifier}, and the PEP 498 proposal just uses {}.
2. There needs to be a way to identify interpolated strings and i18n
strings, and possibly combinations of those. This leads to PEP 501's i-
and iu- strings.
3. A way to enforce identifiers-only, instead of generalized expressions.
4. We need a "safe substitution" mode for str.format_map_simple (from
above).
#1 is just a matter of preference: there's no technical reason to prefer
{} over $ or ${}. We can make any decision here. I prefer {} because
it's the same as str.format.
#2 needs to be decided in concert with the tooling needed to extract the
strings from the source code. The particular prefixes are up for debate.
I'm not a big fan of using "u" to have a meaning different from it's
current "do nothing" interpretation in 3.5. But really any prefixes will
do, if we decide to use string prefixes. I think that's the question: do
we want to distinguish among these cases using string prefixes or
combinations thereof?
#3 is doable, either at runtime or in the tooling that does the string
extraction.
#4 is simple, as long as we always turn it on for the localized strings.
Personally I can go either way on including i18n. But I agree it's
beginning to sound like i18n is just too complicated for PEP 498, and I
think PEP 501 is already too complicated. I'd like to make a decision on
this one way or the other, so we can move forward.
> [...]
> > The understanding here is that there are these new types of tokens:
> > F_STRING_OPEN for f'...{, F_STRING_MIDDLE for }...{, F_STRING_END for
> > }...', and I suppose we also need F_STRING_OPEN_CLOSE for f'...' (i.e.
> > not containing any substitutions). These token types can then be used in
> > the grammar. (A complication would be different kinds of string quotes;
> > I propose to handle that in the lexer, otherwise the number of
> > open/close token types would balloon out of proportions.)
>
> This would save a few hundred lines of C code. But a quick glance at the
> lexer and I can't see how to make the opening quotes agree with the
> closing quotes.
>
>
> The lexer would have to develop another stack for this purpose.
I'll give it some thought.
Eric.
More information about the Python-Dev
mailing list