
Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf''). The idea is that:
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
Might be translated as:
Which would result in: b'datestamp:20150927\r\n' The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how? Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile. Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8. Another idea would be to extend the "conversion char" from just 's', 'r', or 'a', which don't make much sense for bytes, to instead be a string that specifies the encoding. The default could be ascii, and if you want to specify something else: bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n' That would work for any encoding that doesn't have ':', '{', or '}' in the encoding name. Which seems like a reasonable restriction. And I might be over-generalizing here, but you'd presumably want to make the encoding a non-constant: bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n' I think my initial proposal will be to use 'ascii', and not support any conversion characters at all for fb-strings, not even 's', 'r', and 'a'. In the future, if we want to support encodings other than 'ascii', we could then add !conversions mapping to encodings. My reasoning for using 'ascii' is that 'utf-8' could easily be an error for non-utf-8 protocols. And by using 'ascii', at least we'd give a runtime error and not put possibly bogus data into the resulting binary string. Granted, the tradeoff is that we now have a case where whether or not the code raises an exception is dependent upon the values being formatted. If 'ascii' is the default, we could later switch to 'utf-8', but we couldn't go the other way. The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__. Once we make a decision, I can apply the same logic to b''.format(), if that's desirable. I'm open to suggestions on this. Thanks for reading. -- Eric.

On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote:
What's wrong with this? f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii') This eliminates all your questions about which encoding we should guess is more useful (ascii? utf-8? something else?), allows the caller to set an error handler without inventing yet more cryptic format codes, and is nicely explicit. If people are worried about the length of ".encode(...)", a helper function works great: def b(s): return bytes(s, 'utf-8') # or whatever encoding makes sense for them b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n')
Using UTF-8 is not sufficient, since there are strings that can't be encoded into UTF-8 because they contain surrogates: py> '\uDA11'.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in position 0: surrogates not allowed but we surely don't want to suppress such errors by default. Sometimes they will be an error that needs fixing. -- Steve

Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were? -n On Sun, Sep 27, 2015 at 6:23 PM, Eric V. Smith <eric@trueblade.com> wrote:
-- Nathaniel J. Smith -- http://vorpus.org

On Mon, Sep 28, 2015 at 12:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
The biggest issues are summarized into PEP 461: https://www.python.org/dev/peps/pep-0461/#proposed-variations Since the __format__ machinery is all based around text strings, there'll need to be some (explicit or implicit) encode step. Hence this thread. How bad would it be to simply say "there are no bf strings"? As Steven says, you can simply use a normal f''.encode() operation, with no confusion. Otherwise, there'll be these "format-like" operations that can do things that format() can't do... and then there'd be edge cases, too, like a string with a b-prefix that contains non-ASCII characters in it:
If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string? In any case, bf strings can be added later, but once they're added, their semantics would be locked in. I'd be inclined to leave them out for 3.6 and see what people say. A bit of real-world usage of f-strings might show a clear front-runner in terms of expectations (UTF-8, ASCII, or something else). ChrisA

On 28.09.2015 05:03, Chris Angelico wrote:
I don't think so. "{...}" indicates the injection of whatever "..." stands for, thus is not part of the resulting string. So, no issue here for me. (The only thing that would confuse me, is that "восток" is an allowed identifier in the first place. But that seems to be a different matter.)
I tend to agree here. Best, Sven

On Sep 27, 2015, at 18:23, Eric V. Smith <eric@trueblade.com> wrote:
The fact that it can't handle bytes and bytes-like types makes this much less useful than %. Beyond that, the fact that it only works reliably for the same types as %, minus bytes, plus a few others including datetime means the benefit isn't nearly as large as for f-strings and str.format, which work reliably for every type in the world, and extensibly so for many types. And meanwhile, the cost is much higher, from code that seems to work if you don't test it well to even higher performance costs (and usually in code that needs performance more). Of course you could create a __bformat__(*args, encoding, errors, **kw) protocol (where object.__bformat__ just returns self.__format__(*args, **kw).encode(encoding, errors)), which has the same effect as your proposal except that types that need to know they're being bytes-formatted to do something reasonable, or that just want to know so they can optimize, can do so. And this of course lets you add __bformat__ to bytes, etc.--although it doesn't seem to help for types that support the buffer protocol, so it's still not as good as %b. But I don't think anyone will want that.

On 28.09.2015 03:23, Eric V. Smith wrote:
Cf. https://www.python.org/dev/peps/pep-0461/#interpolation It says: b"%x" % val is equivalent to: ("%x" % val).encode("ascii") So, ASCII would make a lot of sense to me as well.
Could you be more specific here? Best, Sven

On 10/02/2015 10:20 AM, Sven R. Kunze wrote:
bf'{foo}' Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails. But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it. Eric.

On 02.10.2015 16:26, Eric V. Smith wrote:
I know a lot of functions that fail when passing the wrong kind of arguments. What's so wrong with it?
But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.
I see. Do you think of an alternative solution? I was digging deeper into the matter of binary/byte strings formatting in order to sympathise why {} is not usable for binary protocols. Let's look at this practical example https://blog.tox.chat/2015/09/fuzzing-the-new-groupchats/ I hope the tox protocol fully qualifies as a wireframe protocol. What he's trying to do it is to fuzzing his new groupchat implementation by creating more-or-less random packages and feed them into tox core to see if it breaks. He conveniently uses this type of syntax to describe the structure of the first header: Header 1: [ Packet ID (1 b) | Chat ID hash (4 b) | Sender PK (32 b) | nonce (24 b) ] Interested in writing a fuzzer, I would find the following really helpful as it mirrors the description within his blog post: header1 = bf'{packet_id}{chat_id}{sender_pk}{nonce}' # which should be the same as header1 = b'%b%b%b%b' % (packet_id, chat_id, sender_pk, nonce) I wouldn't mind specifying the encoding for all non-byte-string arguments. Why? Because I would be working with bytes anyway, so no formatting (as in format()) would be necessary in the first place. However, I like the syntax for specifying the structure of (byte) strings. Does this makes sense? Best, Sven

Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only). On Fri, Oct 2, 2015 at 8:43 AM, Sven R. Kunze <srkunze@mail.de> wrote:
-- --Guido van Rossum (python.org/~guido)

On Sat, Oct 3, 2015 at 1:27 PM, Steven D'Aprano <steve@pearwood.info> wrote:
It should be technically legal, btw; it's just going to look very odd. The check for ASCII-only has to be done _after_ the fracturing into strings and expressions. But I don't like how that reads. ChrisA

Steven D'Aprano writes:
What are we to make of something like this?
bf'{αριθμός + 1}'
Greek authorship. What would you make of bf'{junban + 1}' and why is that better than bf'{順番 + 1}' ? (I would guess that is a sort of Japanese approximation to your Greek.) I think it was Alex Martelli who argued very strongly at the time of PEP 263 that using native identifiers and even comments in your native language is a very risky practice (at least from management's POV <wink/>). I think that's still true, but it's clearly in consenting adults territory.

On 3 October 2015 at 02:00, Guido van Rossum <guido@python.org> wrote:
Given that restriction, what if we dropped the format() call in the bytestring case, and instead always used printf-style formatting? That is: bf'{packet_id}{chat_id}{sender_pk}{nonce}' could be roughly equivalent to (with parens to help make the pieces clearer): (b'%b' % packet_id) + (b'%b' % chat_id) + (b'%b' % sender_pk) + (b'%b' % nonce) If a ":fmt" section is provided for the substitution field, it would replace the mod-format sequence for that section: bf'{number:0.2d} ===> b'%0.2d' % number With that approach, over time, printf-style formatting (aka mod-formatting) may come to just be known as bytes formatting (even though text strings also support it). Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol, while doing the higher level composition with bf-strings*: bf'{header}{struct.pack('<10sHHb', record)}{footer}' Cheers, Nick. * which I am now tempted to call Big Friendly Strings**, since I read a lot of Roald Dahl books as a kid :) ** this would further mean that normal f-strings are friendly strings in addition to being format strings ;) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol" Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. bf'{header}{record:<10sHHb}{footer}' Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding. $0.02 Cheers, Steve Top-posted from my Windows Phone -----Original Message----- From: "Nick Coghlan" <ncoghlan@gmail.com> Sent: 10/3/2015 7:37 To: "Guido van Rossum" <guido@python.org> Cc: "Python-Ideas" <python-ideas@python.org> Subject: Re: [Python-ideas] Binary f-strings On 3 October 2015 at 02:00, Guido van Rossum <guido@python.org> wrote:
Given that restriction, what if we dropped the format() call in the bytestring case, and instead always used printf-style formatting? That is: bf'{packet_id}{chat_id}{sender_pk}{nonce}' could be roughly equivalent to (with parens to help make the pieces clearer): (b'%b' % packet_id) + (b'%b' % chat_id) + (b'%b' % sender_pk) + (b'%b' % nonce) If a ":fmt" section is provided for the substitution field, it would replace the mod-format sequence for that section: bf'{number:0.2d} ===> b'%0.2d' % number With that approach, over time, printf-style formatting (aka mod-formatting) may come to just be known as bytes formatting (even though text strings also support it). Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol, while doing the higher level composition with bf-strings*: bf'{header}{struct.pack('<10sHHb', record)}{footer}' Cheers, Nick. * which I am now tempted to call Big Friendly Strings**, since I read a lot of Roald Dahl books as a kid :) ** this would further mean that normal f-strings are friendly strings in addition to being format strings ;) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On 10/03/2015 12:20 PM, Steve Dower wrote:
That appeals to me, too. There are a number of practical problems that would need to be worked out. We can argue those later :) I guess it comes down to: what would the commonest use case for fb-strings be?
This is one of my two big concerns. If we do something other than remove 'b' and encode, then we've got two similar looking things that have vastly different implementations. But maybe struct.pack or %-formatting are so compelling that it's worth breaking the equivalence. My other concern is non-ascii chars inside the braces in an fb-string. Eric.

On 03.10.2015 21:18, Eric V. Smith wrote:
I think that's were I reach the limit of my "binary" experience in Python. But if people (here Steve) found it useful, why not? If there are problems that cannot be solved easily, we can do a V1 and later a V2 that includes the struct packing.
I guess it comes down to: what would the commonest use case for fb-strings be?
To me, it's the same as for all f-strings: the bloody easy string concatenation of easily distinguishable parts. I even have to admit I thought they were called *format strings* because they *give format/structure* to the resulting string. Well, now I know better (format refers to the formatting of the input expressions) but the analogy is still in my mind.
My other concern is non-ascii chars inside the braces in an fb-string.
What's wrong with them? Best, Sven

On Oct 3, 2015, at 09:20, Steve Dower <steve.dower@python.org> wrote:
I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that.
Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.
But removing the b and encoding the resulting f-string is useless. For example: header = b'Spam' value = 42 lines.append(bf'{header}: {value}\r\n') This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that? The only way the feature makes sense is if it does something different. Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

Maybe we could spell it {spam!p:<10sHHb}? Top-posted from my Windows Phone -----Original Message----- From: "Andrew Barnert" <abarnert@yahoo.com> Sent: 10/3/2015 15:31 To: "Steve Dower" <steve.dower@python.org> Cc: "Nick Coghlan" <ncoghlan@gmail.com>; "Guido van Rossum" <guido@python.org>; "Python-Ideas" <python-ideas@python.org> Subject: Re: [Python-ideas] Binary f-strings On Oct 3, 2015, at 09:20, Steve Dower <steve.dower@python.org> wrote: "Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol" Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. bf'{header}{record:<10sHHb}{footer}' I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that. Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding. But removing the b and encoding the resulting f-string is useless. For example: header = b'Spam' value = 42 lines.append(bf'{header}: {value}\r\n') This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that? The only way the feature makes sense is if it does something different. Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

On 4 October 2015 at 08:25, Andrew Barnert <abarnert@yahoo.com> wrote:
Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields. The parallel I'd attempt to draw is that: f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer) as: bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer) To make the behaviour clearer in the latter case, it may be reasonable to *require* an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting. I'm not sold on the idea of a struct.pack conversion specifier - if we added binary format strings, I think it would be better to start with explicit "pack(value, format)" expressions, and see how that goes for a release. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Oct 7, 2015, at 04:35, Nick Coghlan <ncoghlan@gmail.com> wrote:
Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).
Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

On 10/07/2015 12:25 PM, Guido van Rossum wrote:
I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.
I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved. Eric.

Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.) On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith <eric@trueblade.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 7 October 2015 at 22:34, Andrew Barnert <abarnert@yahoo.com> wrote:
I was thinking the latter, but your idea of ":b" being implied only if there's no format specifier at all (and otherwise requiring an explicit "b" or other format code) might be better. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote:
What's wrong with this? f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii') This eliminates all your questions about which encoding we should guess is more useful (ascii? utf-8? something else?), allows the caller to set an error handler without inventing yet more cryptic format codes, and is nicely explicit. If people are worried about the length of ".encode(...)", a helper function works great: def b(s): return bytes(s, 'utf-8') # or whatever encoding makes sense for them b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n')
Using UTF-8 is not sufficient, since there are strings that can't be encoded into UTF-8 because they contain surrogates: py> '\uDA11'.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in position 0: surrogates not allowed but we surely don't want to suppress such errors by default. Sometimes they will be an error that needs fixing. -- Steve

Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were? -n On Sun, Sep 27, 2015 at 6:23 PM, Eric V. Smith <eric@trueblade.com> wrote:
-- Nathaniel J. Smith -- http://vorpus.org

On Mon, Sep 28, 2015 at 12:41 PM, Nathaniel Smith <njs@pobox.com> wrote:
The biggest issues are summarized into PEP 461: https://www.python.org/dev/peps/pep-0461/#proposed-variations Since the __format__ machinery is all based around text strings, there'll need to be some (explicit or implicit) encode step. Hence this thread. How bad would it be to simply say "there are no bf strings"? As Steven says, you can simply use a normal f''.encode() operation, with no confusion. Otherwise, there'll be these "format-like" operations that can do things that format() can't do... and then there'd be edge cases, too, like a string with a b-prefix that contains non-ASCII characters in it:
If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string? In any case, bf strings can be added later, but once they're added, their semantics would be locked in. I'd be inclined to leave them out for 3.6 and see what people say. A bit of real-world usage of f-strings might show a clear front-runner in terms of expectations (UTF-8, ASCII, or something else). ChrisA

On 28.09.2015 05:03, Chris Angelico wrote:
I don't think so. "{...}" indicates the injection of whatever "..." stands for, thus is not part of the resulting string. So, no issue here for me. (The only thing that would confuse me, is that "восток" is an allowed identifier in the first place. But that seems to be a different matter.)
I tend to agree here. Best, Sven

On Sep 27, 2015, at 18:23, Eric V. Smith <eric@trueblade.com> wrote:
The fact that it can't handle bytes and bytes-like types makes this much less useful than %. Beyond that, the fact that it only works reliably for the same types as %, minus bytes, plus a few others including datetime means the benefit isn't nearly as large as for f-strings and str.format, which work reliably for every type in the world, and extensibly so for many types. And meanwhile, the cost is much higher, from code that seems to work if you don't test it well to even higher performance costs (and usually in code that needs performance more). Of course you could create a __bformat__(*args, encoding, errors, **kw) protocol (where object.__bformat__ just returns self.__format__(*args, **kw).encode(encoding, errors)), which has the same effect as your proposal except that types that need to know they're being bytes-formatted to do something reasonable, or that just want to know so they can optimize, can do so. And this of course lets you add __bformat__ to bytes, etc.--although it doesn't seem to help for types that support the buffer protocol, so it's still not as good as %b. But I don't think anyone will want that.

On 28.09.2015 03:23, Eric V. Smith wrote:
Cf. https://www.python.org/dev/peps/pep-0461/#interpolation It says: b"%x" % val is equivalent to: ("%x" % val).encode("ascii") So, ASCII would make a lot of sense to me as well.
Could you be more specific here? Best, Sven

On 10/02/2015 10:20 AM, Sven R. Kunze wrote:
bf'{foo}' Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails. But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it. Eric.

On 02.10.2015 16:26, Eric V. Smith wrote:
I know a lot of functions that fail when passing the wrong kind of arguments. What's so wrong with it?
But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.
I see. Do you think of an alternative solution? I was digging deeper into the matter of binary/byte strings formatting in order to sympathise why {} is not usable for binary protocols. Let's look at this practical example https://blog.tox.chat/2015/09/fuzzing-the-new-groupchats/ I hope the tox protocol fully qualifies as a wireframe protocol. What he's trying to do it is to fuzzing his new groupchat implementation by creating more-or-less random packages and feed them into tox core to see if it breaks. He conveniently uses this type of syntax to describe the structure of the first header: Header 1: [ Packet ID (1 b) | Chat ID hash (4 b) | Sender PK (32 b) | nonce (24 b) ] Interested in writing a fuzzer, I would find the following really helpful as it mirrors the description within his blog post: header1 = bf'{packet_id}{chat_id}{sender_pk}{nonce}' # which should be the same as header1 = b'%b%b%b%b' % (packet_id, chat_id, sender_pk, nonce) I wouldn't mind specifying the encoding for all non-byte-string arguments. Why? Because I would be working with bytes anyway, so no formatting (as in format()) would be necessary in the first place. However, I like the syntax for specifying the structure of (byte) strings. Does this makes sense? Best, Sven

Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only). On Fri, Oct 2, 2015 at 8:43 AM, Sven R. Kunze <srkunze@mail.de> wrote:
-- --Guido van Rossum (python.org/~guido)

Steven D'Aprano writes:
What are we to make of something like this?
bf'{αριθμός + 1}'
Greek authorship. What would you make of bf'{junban + 1}' and why is that better than bf'{順番 + 1}' ? (I would guess that is a sort of Japanese approximation to your Greek.) I think it was Alex Martelli who argued very strongly at the time of PEP 263 that using native identifiers and even comments in your native language is a very risky practice (at least from management's POV <wink/>). I think that's still true, but it's clearly in consenting adults territory.

On 3 October 2015 at 02:00, Guido van Rossum <guido@python.org> wrote:
Given that restriction, what if we dropped the format() call in the bytestring case, and instead always used printf-style formatting? That is: bf'{packet_id}{chat_id}{sender_pk}{nonce}' could be roughly equivalent to (with parens to help make the pieces clearer): (b'%b' % packet_id) + (b'%b' % chat_id) + (b'%b' % sender_pk) + (b'%b' % nonce) If a ":fmt" section is provided for the substitution field, it would replace the mod-format sequence for that section: bf'{number:0.2d} ===> b'%0.2d' % number With that approach, over time, printf-style formatting (aka mod-formatting) may come to just be known as bytes formatting (even though text strings also support it). Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol, while doing the higher level composition with bf-strings*: bf'{header}{struct.pack('<10sHHb', record)}{footer}' Cheers, Nick. * which I am now tempted to call Big Friendly Strings**, since I read a lot of Roald Dahl books as a kid :) ** this would further mean that normal f-strings are friendly strings in addition to being format strings ;) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol" Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. bf'{header}{record:<10sHHb}{footer}' Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding. $0.02 Cheers, Steve Top-posted from my Windows Phone -----Original Message----- From: "Nick Coghlan" <ncoghlan@gmail.com> Sent: 10/3/2015 7:37 To: "Guido van Rossum" <guido@python.org> Cc: "Python-Ideas" <python-ideas@python.org> Subject: Re: [Python-ideas] Binary f-strings On 3 October 2015 at 02:00, Guido van Rossum <guido@python.org> wrote:
Given that restriction, what if we dropped the format() call in the bytestring case, and instead always used printf-style formatting? That is: bf'{packet_id}{chat_id}{sender_pk}{nonce}' could be roughly equivalent to (with parens to help make the pieces clearer): (b'%b' % packet_id) + (b'%b' % chat_id) + (b'%b' % sender_pk) + (b'%b' % nonce) If a ":fmt" section is provided for the substitution field, it would replace the mod-format sequence for that section: bf'{number:0.2d} ===> b'%0.2d' % number With that approach, over time, printf-style formatting (aka mod-formatting) may come to just be known as bytes formatting (even though text strings also support it). Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol, while doing the higher level composition with bf-strings*: bf'{header}{struct.pack('<10sHHb', record)}{footer}' Cheers, Nick. * which I am now tempted to call Big Friendly Strings**, since I read a lot of Roald Dahl books as a kid :) ** this would further mean that normal f-strings are friendly strings in addition to being format strings ;) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On 10/03/2015 12:20 PM, Steve Dower wrote:
That appeals to me, too. There are a number of practical problems that would need to be worked out. We can argue those later :) I guess it comes down to: what would the commonest use case for fb-strings be?
This is one of my two big concerns. If we do something other than remove 'b' and encode, then we've got two similar looking things that have vastly different implementations. But maybe struct.pack or %-formatting are so compelling that it's worth breaking the equivalence. My other concern is non-ascii chars inside the braces in an fb-string. Eric.

On 03.10.2015 21:18, Eric V. Smith wrote:
I think that's were I reach the limit of my "binary" experience in Python. But if people (here Steve) found it useful, why not? If there are problems that cannot be solved easily, we can do a V1 and later a V2 that includes the struct packing.
I guess it comes down to: what would the commonest use case for fb-strings be?
To me, it's the same as for all f-strings: the bloody easy string concatenation of easily distinguishable parts. I even have to admit I thought they were called *format strings* because they *give format/structure* to the resulting string. Well, now I know better (format refers to the formatting of the input expressions) but the analogy is still in my mind.
My other concern is non-ascii chars inside the braces in an fb-string.
What's wrong with them? Best, Sven

On Oct 3, 2015, at 09:20, Steve Dower <steve.dower@python.org> wrote:
I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that.
Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.
But removing the b and encoding the resulting f-string is useless. For example: header = b'Spam' value = 42 lines.append(bf'{header}: {value}\r\n') This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that? The only way the feature makes sense is if it does something different. Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

Maybe we could spell it {spam!p:<10sHHb}? Top-posted from my Windows Phone -----Original Message----- From: "Andrew Barnert" <abarnert@yahoo.com> Sent: 10/3/2015 15:31 To: "Steve Dower" <steve.dower@python.org> Cc: "Nick Coghlan" <ncoghlan@gmail.com>; "Guido van Rossum" <guido@python.org>; "Python-Ideas" <python-ideas@python.org> Subject: Re: [Python-ideas] Binary f-strings On Oct 3, 2015, at 09:20, Steve Dower <steve.dower@python.org> wrote: "Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol" Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. bf'{header}{record:<10sHHb}{footer}' I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that. Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding. But removing the b and encoding the resulting f-string is useless. For example: header = b'Spam' value = 42 lines.append(bf'{header}: {value}\r\n') This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that? The only way the feature makes sense is if it does something different. Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

On 4 October 2015 at 08:25, Andrew Barnert <abarnert@yahoo.com> wrote:
Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields. The parallel I'd attempt to draw is that: f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer) as: bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer) To make the behaviour clearer in the latter case, it may be reasonable to *require* an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting. I'm not sold on the idea of a struct.pack conversion specifier - if we added binary format strings, I think it would be better to start with explicit "pack(value, format)" expressions, and see how that goes for a release. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Oct 7, 2015, at 04:35, Nick Coghlan <ncoghlan@gmail.com> wrote:
Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).
Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

On 10/07/2015 12:25 PM, Guido van Rossum wrote:
I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.
I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved. Eric.

Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.) On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith <eric@trueblade.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 7 October 2015 at 22:34, Andrew Barnert <abarnert@yahoo.com> wrote:
I was thinking the latter, but your idea of ":b" being implied only if there's no format specifier at all (and otherwise requiring an explicit "b" or other format code) might be better. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (11)
-
Andrew Barnert
-
Chris Angelico
-
Eric V. Smith
-
Guido van Rossum
-
João Bernardo
-
Nathaniel Smith
-
Nick Coghlan
-
Stephen J. Turnbull
-
Steve Dower
-
Steven D'Aprano
-
Sven R. Kunze