[Python-ideas] Briefer string format

Steven D'Aprano steve at pearwood.info
Sat Jul 25 08:05:58 CEST 2015


TL;DR:

Please let's just ban implicit concatenation between f strings (a 
runtime function call) and non-f strings. The user should be explicit in 
what they want, using either explicitly escaped braces or the + 
operator. Anything else is going to be surprising.


On Thu, Jul 23, 2015 at 06:57:25PM -0700, Bruce Leban wrote:
> On Thu, Jul 23, 2015 at 7:22 AM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> 
> >
> > If I had a dollar for everytime somebody on the Internet misused
> > "strawman argument", I would be a rich man.
> 
> 
> You wouldn't get a dollar here. If you want to be strict, a strawman
> argument is misrepresenting an opponent's viewpoint to make it easier to
> refute but it also applies to similar arguments.

Are you saying that any good faith disagreement about people's position 
is a strawman? If not, I don't understand what you mean by "similar 
arguments".

A strawman argument is explicitly a bad-faith argument. Describing my 
argument as a strawman implies bad faith on my part. I don't mind if you 
think my argument is wrong, mistaken or even incoherent, but it is not 
made in bad faith and you should imply that it is without good reason.

Moving on to the feature:

> You stated that "constant
> folding ... *would* change the semantics" *[emphasis added]*.

In context, I said that constant-folding the *explicit* + concatenation 
of f'{a}' + '{b}' to f'{a}{b}' would change the semantics. I'm sorry if 
it was not clear enough that I specifically meant that. I thought that 
the context was enough to show what I meant.

By constant-folding, I mean when the parser/lexer/compiler/whatever (I 
really don't care which) folds expressions like the following:

    'a' + 'b'

to this:

    'ab'

If the parser/whatever does that to mixed f and non-f strings, I think 
that would be harmful, because it would change the semantics:

    f'{a}' + '{b}'

executed at runtime with no constant-folding is not equivalent to the 
folded version:

    f'{a}{b}'

Hence, the peephole optimizer should not do that. I hoped that wouldn't 
be controversial.


[...]
> So the straw here is imagining that the implementer of
> this feature would ignore the accepted rules regarding constant folding and
> then criticizing the implementer for doing that.

I'm taken aback that you seem to think my pointing out the above is a 
criticism of an implementer who doesn't even exist yet! We're still 
discussing what the semantics of f strings should be, and I don't think 
anyone should be offended or threatened by me being explicit about what 
the behaviour should be.

And for the record, it is not unheard of for constant-folding peephole 
optimizers to accidentally, or deliberately, change the sematics of 
code. For example, in D constant-folded 0.1 + 0.2 is not the same as 0.1 
+ 0.2 done at runtime (constant folding is done at single precision 
instead of double precision):

http://stackoverflow.com/questions/6874357/why-0-1-0-2-0-3-in-d

This paper discusses the many pitfalls of optimizing floating point 
code, and mentions that C may change the value of literal expressions 
depending on whether they are done at runtime or not:

    Another effect of this pragma is to change how much the 
    compiler can evaluate at compile time regarding constant 
    initialisations. [...] If it is set to OFF, the compiler can 
    evaluate floating-point constants at compile time, whereas
    if they had been evaluated at runtime, they would have 
    resulted in different values (because of different rounding
    modes) or floating-point exception.

http://arxiv.org/pdf/cs/0701192.pdf


Constant-folding *shouldn't* change the semantics of code, but 
programmers are only human. They make bad design decisions or write 
buggy code the same as all of us.


> (3) The hard case, when you mix f and non-f strings.
> >
> >     f'{spam}' '{eggs}'
> >
> > Notwithstanding raw strings, the behaviour which makes sense to me is
> > that the implicit string concatenation occurs first, followed by format.
> >
> 
> You talk about which happens "first" so let's recast this as an operator
> precedence question. Think of f as a unary operator. Does f bind tighter
> than implicit concatenation? Well, all other string operators like this
> bind more tightly than concatenation.
> f'{spam}' '{eggs}'

I don't think this is correct. Can you give an example? All the examples 
I can come up with show implicit concatenation binding more tightly 
(i.e. it occurs first), e.g.:

py> 'a' 'ba'.replace('a', 'z')
'zbz'

not 'abz'. And of course, you can't implicitly concat to a method call:

py> 'a'.replace('a', 'z') 'ba'
  File "<stdin>", line 1
    'a'.replace('a', 'z') 'ba'
                             ^
SyntaxError: invalid syntax


So I think it would be completely unprecedented if the f pseudo-operator 
bound more tightly than the implicit concatenation.


> > Secondly, it feels that this does the concatenation in the wrong order.
> > Implicit concatenation occurs as early as possible in every other case.
> > But here, we're delaying the concatenation until after the format. So
> > this feels wrong to me.
> >
> 
> Implicit concatenation does NOT happen as early as possible in every case.
> When I write:
> 
>     r'a\n' 'b\n'  ==>  'a\\nb\n'
> 
> the r is applied to the first string *before* the concatenation with the
> second string.

r isn't a function, it's syntax. There's nothing to apply. This is why I 
don't think that the behaviour of mixed raw and cooked strings is a good 
model for mixing f and non-f strings. Both raw and cooked strings are 
lexical features and should be read from left to right, in the order 
that they occur, not function calls which must be delayed until runtime.


[...]
> Imagine that we have another prefix that escapes strings for regex. That is
> e'a+b' ==> 'a\\+b'. This is another function call in disguise, just calling
> re.escape. 

Now you're the one confusing interface with implementation :-) Such an e 
string need not be a function call, it could be a lexical feature like 
raw strings. In fact, I would expect that they should be.

These hypothetical e strings could be a lexical feature, or a runtime 
function, but f *must* be a runtime function since the variables being 
interpolated don't have values to interpolate until runtime. We have no 
choice in the manner, whereas we do have a choice with e strings.

In any case, I don't think it is a productive use of our time to discuss 
a hypothetical e string that neither of us intend to propose.


> Maybe you can't say that concatenation is an optimization but I can (new
> text underlined):
>
> Multiple adjacent string or bytes literals (delimited by whitespace),
> possibly using different quoting conventions, are allowed, and their
> meaning is the same as their concatenation. ... Thus, "hello" 'world' is
> equivalent to "helloworld". This feature can be used to reduce the number
> of backslashes needed, to split long strings conveniently across long
> lines, *to mix formatted and unformatted strings,* or even to add comments
> to parts of strings, for example:
>
> re.compile("[A-Za-z_]"       # letter or underscore
>            "[A-Za-z0-9_]*"   # letter, digit or underscore
>           )
> Note that this feature is defined at the syntactical level, but implemented
> at compile time *as an optimization*.

I don't think that flies. It's *not just an optimization* when it comes 
to f strings. It makes a difference to the semantics.

    f'{spam}' '{eggs}'

being turned into "format first, then concat" has a very different 
meaning to "concat first, then format".

To get the semantics you want, you need a third option:

    escape first, then concat, then format

But there's nothing obvious in the syntax '{eggs}' that tells anyone 
when it will be escaped and when it won't be. You need to be aware of 
the special case "when implicitly concat'ed to f strings, BUT NO OTHER 
TIME, braces in ordinary strings will be escaped".

I dislike special cases. They increase the number of things to memorise 
and lead to surprises.


> *If formatted strings are
> mixed with unformatted strings, they are concatenated at compile time and
> the unformatted parts are escaped so they will not be subject to format
> substitutions.*

That's your opinion for the desirable behaviour.

I don't like it, I don't expect it. The fact that you have to explicitly 
document it shows that it is a special case that doesn't follow from the 
existing behaviour of Python's implicit concatenation rules. I don't 
think we should have such a special case, when there are already at 
least two other ways to get the same effect.

But since my preferred suggestion is unpopular, I'd much rather just ban 
implicit concat'ing of f and non-f strings and avoid the whole argument. 
That's not an onerous burden on the coder:

    result = (f'{this}' + '{that}')

is not that much more difficult to type than:

    result = (f'{this}' '{that}')


and it makes the behaviour clear.


-- 
Steve


More information about the Python-ideas mailing list