Please don't cross-post as it means anyone replying to your email will now split the conversation as not everyone will be subscribed to all of the mailing lists you sent this to. I have stripped out all but python-dev for my reply.
sorry, i’ll remember that! i just hit reply on the post and didn’t realize it was posted to more than python-dev.
I don't remember specifically seeing any email on this. Do you have a link to your post from the python-ideas archive showing your email actually made it to the list?
how am i going about changing f-literal grammar before the beta hits?
You can post to python-ideas and start a discussion there as the PEP has already been accepted and implemented with the current semantics or ask for clarification for the reasoning behind the decision here on python-dev.
thanks. i’d like to hereby do the latter. i think the PEP’s wording is pretty clear:
Due to Python's string tokenizing rules, the f-string
f'abc
{a['x']}
def'
is invalid. The tokenizer parses this as 3
tokens:
f'abc {a['
,
x
, and
']} def'
. Just like regular
strings, this cannot be fixed by using raw strings. There are a number
of correct ways to write this f-string
i guess that means that python’s string tokenization rules are reused for f-literals, even though they aren’t actually strings.
They are still strings, there is just post-processing on the string itself to do the interpolation.
could someone please explain if this is right and if so, why it was chosen to do this instead of writing more fitting tokenization code?
My suspicion is simplification of the implementation, but Eric Smith can tell me if I'm wrong. By doing it this way the implementation can use Python itself to do the tokenizing of the string, while if you do the string interpolation beforehand you would then need to do it entirely at the C level which is very messy and painful since you're explicitly avoiding Python's automatic handling of Unicode, etc. As I think you pointed out, doing it the way it is currently implemented allows for re-using the str.format() code which is way easier. When it comes to an open source project where no one is paid to work on it, easy and pragmatic beats out purer and harder (it's in the Zen of Python after all :) .
You also make it harder to work with Unicode-based variable names (or at least explain it). If you have Unicode in a variable name but you can't use \N{} in the string to help express it you then have to say "normal Unicode support in the string applies everywhere *but* in the string interpolation part".
Or another reason is you can explain f-strings as "basically str.format_map(**locals(), **globals()), but without having to make the actual method call" (and worrying about clashing keys but I couldn't think of a way of using dict.update() in a single line). But with your desired change it kills this explanation by saying f-strings aren't like this but some magical string that does all of this stuff before normal string normalization occurs.
naively i’d assume f'abc
{a['x']}
def' to tokenize as something like:
F_BEGIN
F_STRING_BEGIN "a" "b" "c" " " F_STRING_END
F_EXPR_START
NAME_START "a" NAME_END
GETITEM_BEGIN STRING_BEGIN "x" STRING_END GETITEM_END
F_EXPR_END
F_STRING_BEGIN " " "d" "e" "f" F_STRING_END
F_END
where f-literals are defined as F_START + F_STRING + (F_EXPR + F_STRING)* + F_END
all of this of course accounting for different delimiters and so on
i consider this important enough to defer f-literals to 3.7 if it can’t get in in time.
I just wanted to let you know, Philipp, that your email comes off as somewhat demanding, e.g. "I want this changed". Had you asked why the decision was made then your email would not come off as "I'm right and you're wrong" and more about you asking for clarification to understand why, and then if you still disagreed with the thought process then bring up that you think it may have been a mistake.
sorry. i just wanted to make my feelings clear since i think this is an overlooked issue and the time is tight, in hope that maybe someone is inspired to listen. i thought the PEP’s wording was hint enough to explain the rationale (convenient reuse of tokenization code)
i’ll patiently await clarification about this, and again: sorry for sounding demanding :(
Not a problem! I figured you didn't mean for it to, hence why I took the time to point it out and reply calmly (and if it didn't come off as that I'm sorry).
-Brett