
On Wed, Jul 22, 2015 at 09:28:19PM -0700, Bruce Leban wrote:
On Wed, Jul 22, 2015 at 8:31 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Constant-folding 'a' + 'b' to 'ab' is an optimization, it doesn't change the semantics of the concat. But constant-folding f'{a}' + '{b}' would change the semantics of the concatenation, because f strings aren't constants, they only look like them.
It doesn't have to change semantics and it shouldn't. This is a strawman argument.
If I had a dollar for everytime somebody on the Internet misused "strawman argument", I would be a rich man. Just because you disagree with me or think I'm wrong doesn't make my argument a strawman. It just makes me wrong-headed, or wrong :-) I'm having trouble understand what precisely you are disagreeing with. The example I give which you quote involves explicit concatenation with the + operator, but your examples below use implicit concatenation with no operator at all. Putting aside the question of implementation, I think: (1) Explicit concatenation with the + operator should be treated as occuring after the f strings are evaluated, *as if* the following occurs: f'{spam}' + '{eggs}' => compiles to format(spam) + '{eggs}' If you can come up with a clever optimization that avoids the need to *actually* build two temporary strings and then concatenate them, I don't have a problem with that. I'm only talking about the semantics. I don't want this: f'{spam}' + '{eggs}' => compiles to format(spam) + format(eggs) # not this! Do you agree with those semantics for explicit + concatenation? If not, what behaviour do you want? (2) Implicit concatenation should occur as early as possible, before the format. Take the easy case first: both fragments are f-strings. f'{spam}' f'{eggs}' => behaves as if you wrote f'{spam}{eggs}' => which compiles to format(spam) + format(eggs) Do you agree with those semantics for implicit concatenation? (3) The hard case, when you mix f and non-f strings. f'{spam}' '{eggs}' Notwithstanding raw strings, the behaviour which makes sense to me is that the implicit string concatenation occurs first, followed by format. So, semantically, if the parser sees the above, it should concat the string: => f'{spam}{eggs}' then transform it to a call to format: => format(spam) + format(eggs) I described that as the f "infecting" the other string. Guido has said he doesn't like this, but I'm not sure what behaviour he wants instead. I don't think I want this behaviour: f'{spam}' '{eggs}' => format(spam) + '{eggs}' for two reasons. Firstly, I already have (at least!) one way of getting that behaviour, such as explicit + concatenation as above. Secondly, it feels that this does the concatenation in the wrong order. Implicit concatenation occurs as early as possible in every other case. But here, we're delaying the concatenation until after the format. So this feels wrong to me. (Again, I'm talking semantics, not implementation. Clever tricks with escaping the brackets don't matter.) If there's no consensus on the behaviour of mixed f and non-f strings with implicit concatenation, rather than pick one and frustrate and surprise half the users, we should make it an error: f'{spam}' '{eggs}' => raises SyntaxError and require people to be explicit about what they want, e.g.: f'{spam}' + '{eggs}' # concatenation occurs after the format() f'{spam}' f'{eggs}' # implicit concatenation before format() (for the avoidance of doubt, I don't care whether the concatenation *actually* occurs after the format, I'm only talking about semantics, not implementation, sorry to keep beating this dead horse).
I would go further and allow all the f prefixes apart from the first to be optional. To put it another way, the first f prefix "infects" all the other string fragments:
I'd call that a bug. I suppose one person's bug is another person's feature. It violates the principle of least surprise. When I look at a line in isolation and it starts and ends with a quote, I would not expect that to not just be a plain string.
I don't think we can look at strings in isolation line-by-line. s = r'''This is a long \raw s\tring that goes over mul\tiple lines and contains "\backslashes" okay? '''
(Implicit concatenation is a compile-time operation, the format(...) stuff is run-time, so there is a clear and logical order of operations.)
To you, maybe. To the average developer, I doubt it.
I'm not sure if you are complementing me on being a genius, or putting the average developer down for being even more dimwitted than me :-)
I view the compile time evaluation of implicit concatenation as a compiler implementation detail as it makes essentially no difference to the semantics of the program.
But once you bring f strings into the picture, then it DOES make a very large semantic difference. f'{spam}' '{eggs}' is very different depending on whether that is semantically the same as: - concat '{spam}' and '{eggs}', then format - format spam alone, then concat '{eggs}' We can't just say that when the concatenation actually occurs is an optimization, as we can with raw and cooked string literals, because the f string is not a literal, it's actually a function call in disguise. So we have to pick one or the other (or refuse to guess and raise a syntax error). You're right that it doesn't have to occur at compile time. (Although that has been the case all the way back to at least Python 1.5.) But it is a syntactic feature: "Note that this feature is defined at the syntactical level, but implemented at compile time. The ‘+’ operator must be used to concatenate string expressions at run time." https://docs.python.org/3/reference/lexical_analysis.html#string-literal-con... which suggests to me that *semantically* it should occur as early as possible, before the format() operation. That is, it should be equivalent to: - concat '{spam}' and '{eggs}', then format and not format followed by concat. You mentioned the principle of least surprise. I think it would be very surprising to have implicit concatenation behave *as if* it were occurring after the format, which is what you get if you escape the {{eggs}}. But YMMV. If we (the community) cannot reach consensus, perhaps the safest thing would be to just refuse to guess and raise an error on implicit concat of f and non-f strings. -- Steve