[Python-ideas] Binary f-strings
Steven D'Aprano
steve at pearwood.info
Mon Sep 28 04:09:58 CEST 2015
On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote:
> Now that f-strings are in the 3.6 branch, I'd like to turn my attention
> to binary f-strings (fb'' or bf'').
>
> The idea is that:
>
> >>> bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
>
> Might be translated as:
>
> >>> (b'datestamp:' +
> ... bytes(format(datetime.datetime.now(),
> ... str(b'%Y%m%d', 'ascii')),
> ... 'ascii') +
> ... b'\r\n')
What's wrong with this?
f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii')
This eliminates all your questions about which encoding we should guess
is more useful (ascii? utf-8? something else?), allows the caller
to set an error handler without inventing yet more cryptic format codes,
and is nicely explicit.
If people are worried about the length of ".encode(...)", a helper
function works great:
def b(s): return bytes(s, 'utf-8')
# or whatever encoding makes sense for them
b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n')
> Which would result in:
> b'datestamp:20150927\r\n'
>
> The only real question is: what encoding to use for the second parameter
> to bytes()? Since an object must return unicode from __format__(), I
> need to convert that to bytes in order to join everything together. But how?
>
> Here I suggest 'ascii'. Unfortunately, this would give an error if
> __format__ returned anything with a char greater than 127. I think we've
> learned that an API that only raises an exception with certain specific
> inputs is fragile.
>
> Guido has suggested using 'utf-8' as the encoding. That has some appeal,
> but if we're designing this for wire protocols, not all protocols will
> be using utf-8.
Using UTF-8 is not sufficient, since there are strings that can't be
encoded into UTF-8 because they contain surrogates:
py> '\uDA11'.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in
position 0: surrogates not allowed
but we surely don't want to suppress such errors by default. Sometimes
they will be an error that needs fixing.
--
Steve
More information about the Python-ideas
mailing list