
TL;DR: Please let's just ban implicit concatenation between f strings (a runtime function call) and non-f strings. The user should be explicit in what they want, using either explicitly escaped braces or the + operator. Anything else is going to be surprising. On Thu, Jul 23, 2015 at 06:57:25PM -0700, Bruce Leban wrote:
On Thu, Jul 23, 2015 at 7:22 AM, Steven D'Aprano <steve@pearwood.info> wrote:
If I had a dollar for everytime somebody on the Internet misused "strawman argument", I would be a rich man.
You wouldn't get a dollar here. If you want to be strict, a strawman argument is misrepresenting an opponent's viewpoint to make it easier to refute but it also applies to similar arguments.
Are you saying that any good faith disagreement about people's position is a strawman? If not, I don't understand what you mean by "similar arguments". A strawman argument is explicitly a bad-faith argument. Describing my argument as a strawman implies bad faith on my part. I don't mind if you think my argument is wrong, mistaken or even incoherent, but it is not made in bad faith and you should imply that it is without good reason. Moving on to the feature:
You stated that "constant folding ... *would* change the semantics" *[emphasis added]*.
In context, I said that constant-folding the *explicit* + concatenation of f'{a}' + '{b}' to f'{a}{b}' would change the semantics. I'm sorry if it was not clear enough that I specifically meant that. I thought that the context was enough to show what I meant. By constant-folding, I mean when the parser/lexer/compiler/whatever (I really don't care which) folds expressions like the following: 'a' + 'b' to this: 'ab' If the parser/whatever does that to mixed f and non-f strings, I think that would be harmful, because it would change the semantics: f'{a}' + '{b}' executed at runtime with no constant-folding is not equivalent to the folded version: f'{a}{b}' Hence, the peephole optimizer should not do that. I hoped that wouldn't be controversial. [...]
So the straw here is imagining that the implementer of this feature would ignore the accepted rules regarding constant folding and then criticizing the implementer for doing that.
I'm taken aback that you seem to think my pointing out the above is a criticism of an implementer who doesn't even exist yet! We're still discussing what the semantics of f strings should be, and I don't think anyone should be offended or threatened by me being explicit about what the behaviour should be. And for the record, it is not unheard of for constant-folding peephole optimizers to accidentally, or deliberately, change the sematics of code. For example, in D constant-folded 0.1 + 0.2 is not the same as 0.1 + 0.2 done at runtime (constant folding is done at single precision instead of double precision): http://stackoverflow.com/questions/6874357/why-0-1-0-2-0-3-in-d This paper discusses the many pitfalls of optimizing floating point code, and mentions that C may change the value of literal expressions depending on whether they are done at runtime or not: Another effect of this pragma is to change how much the compiler can evaluate at compile time regarding constant initialisations. [...] If it is set to OFF, the compiler can evaluate floating-point constants at compile time, whereas if they had been evaluated at runtime, they would have resulted in different values (because of different rounding modes) or floating-point exception. http://arxiv.org/pdf/cs/0701192.pdf Constant-folding *shouldn't* change the semantics of code, but programmers are only human. They make bad design decisions or write buggy code the same as all of us.
(3) The hard case, when you mix f and non-f strings.
f'{spam}' '{eggs}'
Notwithstanding raw strings, the behaviour which makes sense to me is that the implicit string concatenation occurs first, followed by format.
You talk about which happens "first" so let's recast this as an operator precedence question. Think of f as a unary operator. Does f bind tighter than implicit concatenation? Well, all other string operators like this bind more tightly than concatenation. f'{spam}' '{eggs}'
I don't think this is correct. Can you give an example? All the examples I can come up with show implicit concatenation binding more tightly (i.e. it occurs first), e.g.: py> 'a' 'ba'.replace('a', 'z') 'zbz' not 'abz'. And of course, you can't implicitly concat to a method call: py> 'a'.replace('a', 'z') 'ba' File "<stdin>", line 1 'a'.replace('a', 'z') 'ba' ^ SyntaxError: invalid syntax So I think it would be completely unprecedented if the f pseudo-operator bound more tightly than the implicit concatenation.
Secondly, it feels that this does the concatenation in the wrong order. Implicit concatenation occurs as early as possible in every other case. But here, we're delaying the concatenation until after the format. So this feels wrong to me.
Implicit concatenation does NOT happen as early as possible in every case. When I write:
r'a\n' 'b\n' ==> 'a\\nb\n'
the r is applied to the first string *before* the concatenation with the second string.
r isn't a function, it's syntax. There's nothing to apply. This is why I don't think that the behaviour of mixed raw and cooked strings is a good model for mixing f and non-f strings. Both raw and cooked strings are lexical features and should be read from left to right, in the order that they occur, not function calls which must be delayed until runtime. [...]
Imagine that we have another prefix that escapes strings for regex. That is e'a+b' ==> 'a\\+b'. This is another function call in disguise, just calling re.escape.
Now you're the one confusing interface with implementation :-) Such an e string need not be a function call, it could be a lexical feature like raw strings. In fact, I would expect that they should be. These hypothetical e strings could be a lexical feature, or a runtime function, but f *must* be a runtime function since the variables being interpolated don't have values to interpolate until runtime. We have no choice in the manner, whereas we do have a choice with e strings. In any case, I don't think it is a productive use of our time to discuss a hypothetical e string that neither of us intend to propose.
Maybe you can't say that concatenation is an optimization but I can (new text underlined):
Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. ... Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, *to mix formatted and unformatted strings,* or even to add comments to parts of strings, for example:
re.compile("[A-Za-z_]" # letter or underscore "[A-Za-z0-9_]*" # letter, digit or underscore ) Note that this feature is defined at the syntactical level, but implemented at compile time *as an optimization*.
I don't think that flies. It's *not just an optimization* when it comes to f strings. It makes a difference to the semantics. f'{spam}' '{eggs}' being turned into "format first, then concat" has a very different meaning to "concat first, then format". To get the semantics you want, you need a third option: escape first, then concat, then format But there's nothing obvious in the syntax '{eggs}' that tells anyone when it will be escaped and when it won't be. You need to be aware of the special case "when implicitly concat'ed to f strings, BUT NO OTHER TIME, braces in ordinary strings will be escaped". I dislike special cases. They increase the number of things to memorise and lead to surprises.
*If formatted strings are mixed with unformatted strings, they are concatenated at compile time and the unformatted parts are escaped so they will not be subject to format substitutions.*
That's your opinion for the desirable behaviour. I don't like it, I don't expect it. The fact that you have to explicitly document it shows that it is a special case that doesn't follow from the existing behaviour of Python's implicit concatenation rules. I don't think we should have such a special case, when there are already at least two other ways to get the same effect. But since my preferred suggestion is unpopular, I'd much rather just ban implicit concat'ing of f and non-f strings and avoid the whole argument. That's not an onerous burden on the coder: result = (f'{this}' + '{that}') is not that much more difficult to type than: result = (f'{this}' '{that}') and it makes the behaviour clear. -- Steve