
Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were? -n On Sun, Sep 27, 2015 at 6:23 PM, Eric V. Smith <eric@trueblade.com> wrote:
Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf'').
The idea is that:
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
Might be translated as:
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')
Which would result in: b'datestamp:20150927\r\n'
The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.
Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8.
Another idea would be to extend the "conversion char" from just 's', 'r', or 'a', which don't make much sense for bytes, to instead be a string that specifies the encoding. The default could be ascii, and if you want to specify something else: bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n'
That would work for any encoding that doesn't have ':', '{', or '}' in the encoding name. Which seems like a reasonable restriction.
And I might be over-generalizing here, but you'd presumably want to make the encoding a non-constant: bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n'
I think my initial proposal will be to use 'ascii', and not support any conversion characters at all for fb-strings, not even 's', 'r', and 'a'. In the future, if we want to support encodings other than 'ascii', we could then add !conversions mapping to encodings.
My reasoning for using 'ascii' is that 'utf-8' could easily be an error for non-utf-8 protocols. And by using 'ascii', at least we'd give a runtime error and not put possibly bogus data into the resulting binary string. Granted, the tradeoff is that we now have a case where whether or not the code raises an exception is dependent upon the values being formatted. If 'ascii' is the default, we could later switch to 'utf-8', but we couldn't go the other way.
The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__.
Once we make a decision, I can apply the same logic to b''.format(), if that's desirable.
I'm open to suggestions on this.
Thanks for reading.
-- Eric. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- http://vorpus.org