[Python-ideas] Binary f-strings

Mon Sep 28 04:41:50 CEST 2015

Naively, I'd expect that since f-strings and .format share the same
infrastructure, fb-strings should work the same way as bytes.format --
and in particular, either both should be supported or neither. Since
bytes.format apparently got rejected during the PEP 460/PEP 461
discussions:
    https://bugs.python.org/issue3982#msg224023
I guess you'd need to dig up those earlier discussions and see what
the issues were?

-n

On Sun, Sep 27, 2015 at 6:23 PM, Eric V. Smith <eric at trueblade.com> wrote:
> Now that f-strings are in the 3.6 branch, I'd like to turn my attention
> to binary f-strings (fb'' or bf'').
>
> The idea is that:
>
>>>> bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
>
> Might be translated as:
>
>>>> (b'datestamp:' +
> ...  bytes(format(datetime.datetime.now(),
> ...               str(b'%Y%m%d', 'ascii')),
> ...        'ascii') +
> ...  b'\r\n')
>
>
> Which would result in:
> b'datestamp:20150927\r\n'
>
> The only real question is: what encoding to use for the second parameter
> to bytes()? Since an object must return unicode from __format__(), I
> need to convert that to bytes in order to join everything together. But how?
>
> Here I suggest 'ascii'. Unfortunately, this would give an error if
> __format__ returned anything with a char greater than 127. I think we've
> learned that an API that only raises an exception with certain specific
> inputs is fragile.
>
> Guido has suggested using 'utf-8' as the encoding. That has some appeal,
> but if we're designing this for wire protocols, not all protocols will
> be using utf-8.
>
> Another idea would be to extend the "conversion char" from just 's',
> 'r', or 'a', which don't make much sense for bytes, to instead be a
> string that specifies the encoding. The default could be ascii, and if
> you want to specify something else:
> bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n'
>
> That would work for any encoding that doesn't have ':', '{', or '}' in
> the encoding name. Which seems like a reasonable restriction.
>
> And I might be over-generalizing here, but you'd presumably want to make
> the encoding a non-constant:
> bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n'
>
> I think my initial proposal will be to use 'ascii', and not support any
> conversion characters at all for fb-strings, not even 's', 'r', and 'a'.
> In the future, if we want to support encodings other than 'ascii', we
> could then add !conversions mapping to encodings.
>
> My reasoning for using 'ascii' is that 'utf-8' could easily be an error
> for non-utf-8 protocols. And by using 'ascii', at least we'd give a
> runtime error and not put possibly bogus data into the resulting binary
> string. Granted, the tradeoff is that we now have a case where whether
> or not the code raises an exception is dependent upon the values being
> formatted. If 'ascii' is the default, we could later switch to 'utf-8',
> but we couldn't go the other way.
>
> The only place this is likely to be a problem is when formatting unicode
> string values. No other built-in type is going to have a non-ascii
> compatible character in its __format__, unless you do tricky things with
> datetime format_specs. Of course user-defined types can return any
> unicode chars from __format__.
>
> Once we make a decision, I can apply the same logic to b''.format(), if
> that's desirable.
>
> I'm open to suggestions on this.
>
> Thanks for reading.
>
> --
> Eric.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


-- 
Nathaniel J. Smith -- http://vorpus.org