
On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote:
Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf'').
The idea is that:
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
Might be translated as:
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')
What's wrong with this? f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii') This eliminates all your questions about which encoding we should guess is more useful (ascii? utf-8? something else?), allows the caller to set an error handler without inventing yet more cryptic format codes, and is nicely explicit. If people are worried about the length of ".encode(...)", a helper function works great: def b(s): return bytes(s, 'utf-8') # or whatever encoding makes sense for them b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n')
Which would result in: b'datestamp:20150927\r\n'
The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.
Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8.
Using UTF-8 is not sufficient, since there are strings that can't be encoded into UTF-8 because they contain surrogates: py> '\uDA11'.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in position 0: surrogates not allowed but we surely don't want to suppress such errors by default. Sometimes they will be an error that needs fixing. -- Steve