Mailman 3 Binary f-strings - Python-ideas

newer
Utility to override custom signal...

Binary f-strings

Eric V. Smith

28 Sep 2015 28 Sep '15

6:53 a.m.

Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf''). The idea is that:

...

...
...
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'

Might be translated as:

...

...
...
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')

Which would result in: b'datestamp:20150927\r\n' The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how? Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile. Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8. Another idea would be to extend the "conversion char" from just 's', 'r', or 'a', which don't make much sense for bytes, to instead be a string that specifies the encoding. The default could be ascii, and if you want to specify something else: bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n' That would work for any encoding that doesn't have ':', '{', or '}' in the encoding name. Which seems like a reasonable restriction. And I might be over-generalizing here, but you'd presumably want to make the encoding a non-constant: bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n' I think my initial proposal will be to use 'ascii', and not support any conversion characters at all for fb-strings, not even 's', 'r', and 'a'. In the future, if we want to support encodings other than 'ascii', we could then add !conversions mapping to encodings. My reasoning for using 'ascii' is that 'utf-8' could easily be an error for non-utf-8 protocols. And by using 'ascii', at least we'd give a runtime error and not put possibly bogus data into the resulting binary string. Granted, the tradeoff is that we now have a case where whether or not the code raises an exception is dependent upon the values being formatted. If 'ascii' is the default, we could later switch to 'utf-8', but we couldn't go the other way. The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__. Once we make a decision, I can apply the same logic to b''.format(), if that's desirable. I'm open to suggestions on this. Thanks for reading. -- Eric.

Show replies by date

Steven D'Aprano

28 Sep 28 Sep

7:39 a.m.

On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote:

...

Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf'').

The idea is that:

...
...
...
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'

Might be translated as:

...
...
...
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')

What's wrong with this? f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii') This eliminates all your questions about which encoding we should guess is more useful (ascii? utf-8? something else?), allows the caller to set an error handler without inventing yet more cryptic format codes, and is nicely explicit. If people are worried about the length of ".encode(...)", a helper function works great: def b(s): return bytes(s, 'utf-8') # or whatever encoding makes sense for them b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n')

...

Which would result in: b'datestamp:20150927\r\n'

The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?

Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.

Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8.

Using UTF-8 is not sufficient, since there are strings that can't be encoded into UTF-8 because they contain surrogates: py> '\uDA11'.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in position 0: surrogates not allowed but we surely don't want to suppress such errors by default. Sometimes they will be an error that needs fixing. -- Steve

Nathaniel Smith

8:11 a.m.

Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were? -n On Sun, Sep 27, 2015 at 6:23 PM, Eric V. Smith wrote:

...

Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf'').

The idea is that:

...
...
...
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'

Might be translated as:

...
...
...
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')

Which would result in: b'datestamp:20150927\r\n'

The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?

Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.

Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8.

Another idea would be to extend the "conversion char" from just 's', 'r', or 'a', which don't make much sense for bytes, to instead be a string that specifies the encoding. The default could be ascii, and if you want to specify something else: bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n'

That would work for any encoding that doesn't have ':', '{', or '}' in the encoding name. Which seems like a reasonable restriction.

And I might be over-generalizing here, but you'd presumably want to make the encoding a non-constant: bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n'

I think my initial proposal will be to use 'ascii', and not support any conversion characters at all for fb-strings, not even 's', 'r', and 'a'. In the future, if we want to support encodings other than 'ascii', we could then add !conversions mapping to encodings.

My reasoning for using 'ascii' is that 'utf-8' could easily be an error for non-utf-8 protocols. And by using 'ascii', at least we'd give a runtime error and not put possibly bogus data into the resulting binary string. Granted, the tradeoff is that we now have a case where whether or not the code raises an exception is dependent upon the values being formatted. If 'ascii' is the default, we could later switch to 'utf-8', but we couldn't go the other way.

The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__.

Once we make a decision, I can apply the same logic to b''.format(), if that's desirable.

I'm open to suggestions on this.

Thanks for reading.

-- Eric. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- Nathaniel J. Smith -- http://vorpus.org

Chris Angelico

8:33 a.m.

On Mon, Sep 28, 2015 at 12:41 PM, Nathaniel Smith wrote:

...

Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were?

The biggest issues are summarized into PEP 461: https://www.python.org/dev/peps/pep-0461/#proposed-variations Since the __format__ machinery is all based around text strings, there'll need to be some (explicit or implicit) encode step. Hence this thread. How bad would it be to simply say "there are no bf strings"? As Steven says, you can simply use a normal f''.encode() operation, with no confusion. Otherwise, there'll be these "format-like" operations that can do things that format() can't do... and then there'd be edge cases, too, like a string with a b-prefix that contains non-ASCII characters in it:

...

...
...
восток = 1961 apollo = 1969 print(f"It took {apollo-восток} years to get from orbit to the moon.") It took 8 years to get from orbit to the moon. print(b"It took {apollo-восток} years to get from orbit to the moon.") File "<stdin>", line 1 SyntaxError: bytes can only contain ASCII literal characters.

If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string? In any case, bf strings can be added later, but once they're added, their semantics would be locked in. I'd be inclined to leave them out for 3.6 and see what people say. A bit of real-world usage of f-strings might show a clear front-runner in terms of expectations (UTF-8, ASCII, or something else). ChrisA

Steven D'Aprano

8:58 a.m.

On Mon, Sep 28, 2015 at 01:03:32PM +1000, Chris Angelico wrote: [...]

...

...
...
...
восток = 1961 apollo = 1969 print(f"It took {apollo-восток} years to get from orbit to the moon.") It took 8 years to get from orbit to the moon. print(b"It took {apollo-восток} years to get from orbit to the moon.") File "<stdin>", line 1 SyntaxError: bytes can only contain ASCII literal characters.

If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string?

It would confuse the heck out of me. I leave it to the reader to decide whether I am a human or a tool. -- Steve

Sven R. Kunze

10:30 p.m.

On 28.09.2015 05:03, Chris Angelico wrote:

...

...
...
...
восток = 1961 apollo = 1969 print(f"It took {apollo-восток} years to get from orbit to the moon.") It took 8 years to get from orbit to the moon. print(b"It took {apollo-восток} years to get from orbit to the moon.") File "<stdin>", line 1 SyntaxError: bytes can only contain ASCII literal characters.

If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string?

I don't think so. "{...}" indicates the injection of whatever "..." stands for, thus is not part of the resulting string. So, no issue here for me. (The only thing that would confuse me, is that "восток" is an allowed identifier in the first place. But that seems to be a different matter.)

...

In any case, bf strings can be added later, but once they're added, their semantics would be locked in. I'd be inclined to leave them out for 3.6 and see what people say. A bit of real-world usage of f-strings might show a clear front-runner in terms of expectations (UTF-8, ASCII, or something else).

I tend to agree here. Best, Sven

Andrew Barnert

9:18 a.m.

On Sep 27, 2015, at 18:23, Eric V. Smith wrote:

...

The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__.

The fact that it can't handle bytes and bytes-like types makes this much less useful than %. Beyond that, the fact that it only works reliably for the same types as %, minus bytes, plus a few others including datetime means the benefit isn't nearly as large as for f-strings and str.format, which work reliably for every type in the world, and extensibly so for many types. And meanwhile, the cost is much higher, from code that seems to work if you don't test it well to even higher performance costs (and usually in code that needs performance more). Of course you could create a __bformat__(*args, encoding, errors, **kw) protocol (where object.__bformat__ just returns self.__format__(*args, **kw).encode(encoding, errors)), which has the same effect as your proposal except that types that need to know they're being bytes-formatted to do something reasonable, or that just want to know so they can optimize, can do so. And this of course lets you add __bformat__ to bytes, etc.--although it doesn't seem to help for types that support the buffer protocol, so it's still not as good as %b. But I don't think anyone will want that.

Sven R. Kunze

2 Oct 2 Oct

7:50 p.m.

On 28.09.2015 03:23, Eric V. Smith wrote:

...

The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?

Cf. https://www.python.org/dev/peps/pep-0461/#interpolation It says: b"%x" % val is equivalent to: ("%x" % val).encode("ascii") So, ASCII would make a lot of sense to me as well.

...

Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.

Could you be more specific here? Best, Sven

Eric V. Smith

7:56 p.m.

On 10/02/2015 10:20 AM, Sven R. Kunze wrote:

...

On 28.09.2015 03:23, Eric V. Smith wrote:

...
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.

Could you be more specific here?

bf'{foo}' Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails. But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it. Eric.

João Bernardo

8:18 p.m.

...

But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.

What about rf''? (sorry for being off topic here) Regex could benefit from it: my_regex = rf"ˆ\w+\s*({'|'.join(expected_words)})$"

Eric V. Smith

8:22 p.m.

On 10/02/2015 10:48 AM, João Bernardo wrote:

...

But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.

What about rf''? (sorry for being off topic here)

Regex could benefit from it:

my_regex = rf"ˆ\w+\s*({'|'.join(expected_words)})$"

That's already implemented. Its use in regular expressions is mentioned in the PEP.

Chris Angelico

8:22 p.m.

On Sat, Oct 3, 2015 at 12:48 AM, João Bernardo wrote:

...

...
But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.

What about rf''? (sorry for being off topic here)

Regex could benefit from it:

my_regex = rf"ˆ\w+\s*({'|'.join(expected_words)})$"

Works fine: rosuav@sikorsky:~$ python3 Python 3.6.0a0 (default:48943533965e, Sep 28 2015, 11:27:38) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information.

...

...
...
expected_words = ["foo", "bar"] my_regex = rf"ˆ\w+\s*({'|'.join(expected_words)})$" print(my_regex) ˆ\w+\s*(foo|bar)$

ChrisA

Sven R. Kunze

9:13 p.m.

On 02.10.2015 16:26, Eric V. Smith wrote:

...

bf'{foo}'

Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails.

I know a lot of functions that fail when passing the wrong kind of arguments. What's so wrong with it?

...

But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.

I see. Do you think of an alternative solution? I was digging deeper into the matter of binary/byte strings formatting in order to sympathise why {} is not usable for binary protocols. Let's look at this practical example https://blog.tox.chat/2015/09/fuzzing-the-new-groupchats/ I hope the tox protocol fully qualifies as a wireframe protocol. What he's trying to do it is to fuzzing his new groupchat implementation by creating more-or-less random packages and feed them into tox core to see if it breaks. He conveniently uses this type of syntax to describe the structure of the first header: Header 1: [ Packet ID (1 b) | Chat ID hash (4 b) | Sender PK (32 b) | nonce (24 b) ] Interested in writing a fuzzer, I would find the following really helpful as it mirrors the description within his blog post: header1 = bf'{packet_id}{chat_id}{sender_pk}{nonce}' # which should be the same as header1 = b'%b%b%b%b' % (packet_id, chat_id, sender_pk, nonce) I wouldn't mind specifying the encoding for all non-byte-string arguments. Why? Because I would be working with bytes anyway, so no formatting (as in format()) would be necessary in the first place. However, I like the syntax for specifying the structure of (byte) strings. Does this makes sense? Best, Sven

Guido van Rossum

9:30 p.m.

Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only). On Fri, Oct 2, 2015 at 8:43 AM, Sven R. Kunze wrote:

...

On 02.10.2015 16:26, Eric V. Smith wrote:

...
bf'{foo}'

Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails.

I know a lot of functions that fail when passing the wrong kind of arguments. What's so wrong with it?

But some of the other arguments are making me think bf'' is a bad idea,

...
so now I'm leaning towards not implementing it.

I see. Do you think of an alternative solution?

I was digging deeper into the matter of binary/byte strings formatting in order to sympathise why {} is not usable for binary protocols. Let's look at this practical example https://blog.tox.chat/2015/09/fuzzing-the-new-groupchats/ I hope the tox protocol fully qualifies as a wireframe protocol. What he's trying to do it is to fuzzing his new groupchat implementation by creating more-or-less random packages and feed them into tox core to see if it breaks.

He conveniently uses this type of syntax to describe the structure of the first header:

Header 1: [ Packet ID (1 b) | Chat ID hash (4 b) | Sender PK (32 b) | nonce (24 b) ]

Interested in writing a fuzzer, I would find the following really helpful as it mirrors the description within his blog post:

header1 = bf'{packet_id}{chat_id}{sender_pk}{nonce}'

# which should be the same as

header1 = b'%b%b%b%b' % (packet_id, chat_id, sender_pk, nonce)

I wouldn't mind specifying the encoding for all non-byte-string arguments. Why? Because I would be working with bytes anyway, so no formatting (as in format()) would be necessary in the first place. However, I like the syntax for specifying the structure of (byte) strings.

Does this makes sense?

Best, Sven

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

Steven D'Aprano

3 Oct 3 Oct

8:57 a.m.

On Fri, Oct 02, 2015 at 09:00:56AM -0700, Guido van Rossum wrote:

...

Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).

As Chris A pointed out earlier, identifiers are not ASCII only. What are we to make of something like this? bf'{αριθμός + 1}' And don't say "re-write your code to only use ASCII variable names" :-) -- Steve

Chris Angelico

9:03 a.m.

On Sat, Oct 3, 2015 at 1:27 PM, Steven D'Aprano wrote:

...

On Fri, Oct 02, 2015 at 09:00:56AM -0700, Guido van Rossum wrote:

...
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).

As Chris A pointed out earlier, identifiers are not ASCII only. What are we to make of something like this?

bf'{αριθμός + 1}'

And don't say "re-write your code to only use ASCII variable names" :-)

It should be technically legal, btw; it's just going to look very odd. The check for ASCII-only has to be done _after_ the fracturing into strings and expressions. But I don't like how that reads. ChrisA

Guido van Rossum

9:14 a.m.

On Fri, Oct 2, 2015 at 8:33 PM, Chris Angelico wrote:

...

...
On Fri, Oct 02, 2015 at 09:00:56AM -0700, Guido van Rossum wrote:

...
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better

On Sat, Oct 3, 2015 at 1:27 PM, Steven D'Aprano wrote: than

...
...
'%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).

As Chris A pointed out earlier, identifiers are not ASCII only. What are we to make of something like this?

bf'{αριθμός + 1}'

And don't say "re-write your code to only use ASCII variable names" :-)

It should be technically legal, btw; it's just going to look very odd. The check for ASCII-only has to be done _after_ the fracturing into strings and expressions. But I don't like how that reads.

I don't think this concern should be a showstopper. Honestly either approach sounds fine to me. :-) -- --Guido van Rossum (python.org/~guido)

Stephen J. Turnbull

8:23 p.m.

Steven D'Aprano writes:

...

What are we to make of something like this?

bf'{αριθμός + 1}'

Greek authorship. What would you make of bf'{junban + 1}' and why is that better than bf'{順番 + 1}' ? (I would guess that is a sort of Japanese approximation to your Greek.) I think it was Alex Martelli who argued very strongly at the time of PEP 263 that using native identifiers and even comments in your native language is a very risky practice (at least from management's POV <wink/>). I think that's still true, but it's clearly in consenting adults territory.

Nick Coghlan

8:07 p.m.

On 3 October 2015 at 02:00, Guido van Rossum wrote:

...

Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).

Given that restriction, what if we dropped the format() call in the bytestring case, and instead always used printf-style formatting? That is: bf'{packet_id}{chat_id}{sender_pk}{nonce}' could be roughly equivalent to (with parens to help make the pieces clearer): (b'%b' % packet_id) + (b'%b' % chat_id) + (b'%b' % sender_pk) + (b'%b' % nonce) If a ":fmt" section is provided for the substitution field, it would replace the mod-format sequence for that section: bf'{number:0.2d} ===> b'%0.2d' % number With that approach, over time, printf-style formatting (aka mod-formatting) may come to just be known as bytes formatting (even though text strings also support it). Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol, while doing the higher level composition with bf-strings*: bf'{header}{struct.pack('<10sHHb', record)}{footer}' Cheers, Nick. * which I am now tempted to call Big Friendly Strings**, since I read a lot of Roald Dahl books as a kid :) ** this would further mean that normal f-strings are friendly strings in addition to being format strings ;) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Steve Dower

9:50 p.m.

"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol" Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. bf'{header}{record:<10sHHb}{footer}' Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding. $0.02 Cheers, Steve Top-posted from my Windows Phone -----Original Message----- From: "Nick Coghlan" Sent: ‎10/‎3/‎2015 7:37 To: "Guido van Rossum" Cc: "Python-Ideas" Subject: Re: [Python-ideas] Binary f-strings On 3 October 2015 at 02:00, Guido van Rossum wrote:

...

Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).

Eric V. Smith

4 Oct 4 Oct

12:48 a.m.

On 10/03/2015 12:20 PM, Steve Dower wrote:

...

"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"

Personally, if binary f-strings did struct packing by default, I'd want to use them all the time.

That appeals to me, too. There are a number of practical problems that would need to be worked out. We can argue those later :) I guess it comes down to: what would the commonest use case for fb-strings be?

...

Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.

This is one of my two big concerns. If we do something other than remove 'b' and encode, then we've got two similar looking things that have vastly different implementations. But maybe struct.pack or %-formatting are so compelling that it's worth breaking the equivalence. My other concern is non-ascii chars inside the braces in an fb-string. Eric.

Sven R. Kunze

5 Oct 5 Oct

12:29 p.m.

On 03.10.2015 21:18, Eric V. Smith wrote:

...

On 10/03/2015 12:20 PM, Steve Dower wrote:

...
"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"

Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. That appeals to me, too. There are a number of practical problems that would need to be worked out. We can argue those later :)

I think that's were I reach the limit of my "binary" experience in Python. But if people (here Steve) found it useful, why not? If there are problems that cannot be solved easily, we can do a V1 and later a V2 that includes the struct packing.

...

I guess it comes down to: what would the commonest use case for fb-strings be?

To me, it's the same as for all f-strings: the bloody easy string concatenation of easily distinguishable parts. I even have to admit I thought they were called *format strings* because they *give format/structure* to the resulting string. Well, now I know better (format refers to the formatting of the input expressions) but the analogy is still in my mind.

...

My other concern is non-ascii chars inside the braces in an fb-string.

What's wrong with them? Best, Sven

Eric V. Smith

2:16 p.m.

...

...
My other concern is non-ascii chars inside the braces in an fb-string.

What's wrong with them?

It has to do with the order of processing we defined for regular f-strings. I'm still working through it to see what the implications are. Eric.

Andrew Barnert

4 Oct 4 Oct

3:55 a.m.

On Oct 3, 2015, at 09:20, Steve Dower wrote:

...

"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"

Personally, if binary f-strings did struct packing by default, I'd want to use them all the time.

bf'{header}{record:<10sHHb}{footer}'

I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that.

...

Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.

But removing the b and encoding the resulting f-string is useless. For example: header = b'Spam' value = 42 lines.append(bf'{header}: {value}\r\n') This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that? The only way the feature makes sense is if it does something different. Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

Steve Dower

9:04 p.m.

Maybe we could spell it {spam!p:<10sHHb}? Top-posted from my Windows Phone -----Original Message----- From: "Andrew Barnert" Sent: ‎10/‎3/‎2015 15:31 To: "Steve Dower" Cc: "Nick Coghlan" ; "Guido van Rossum" ; "Python-Ideas" Subject: Re: [Python-ideas] Binary f-strings On Oct 3, 2015, at 09:20, Steve Dower wrote: "Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol" Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. bf'{header}{record:<10sHHb}{footer}' I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that. Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding. But removing the b and encoding the resulting f-string is useless. For example: header = b'Spam' value = 42 lines.append(bf'{header}: {value}\r\n') This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that? The only way the feature makes sense is if it does something different. Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

Nick Coghlan

7 Oct 7 Oct

5:05 p.m.

On 4 October 2015 at 08:25, Andrew Barnert wrote:

...

Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields. The parallel I'd attempt to draw is that: f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer) as: bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer) To make the behaviour clearer in the latter case, it may be reasonable to *require* an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting. I'm not sold on the idea of a struct.pack conversion specifier - if we added binary format strings, I think it would be better to start with explicit "pack(value, format)" expressions, and see how that goes for a release. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Andrew Barnert

6:04 p.m.

On Oct 7, 2015, at 04:35, Nick Coghlan wrote:

...

...
On 4 October 2015 at 08:25, Andrew Barnert wrote: Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields.

Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).

...

The parallel I'd attempt to draw is that:

f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)

as:

bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)

To make the behaviour clearer in the latter case, it may be reasonable to *require* an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.

Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

Guido van Rossum

9:55 p.m.

I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%. On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert wrote:

...

On Oct 7, 2015, at 04:35, Nick Coghlan wrote:

...
...
On 4 October 2015 at 08:25, Andrew Barnert wrote: Nick's suggestion of having it do %-formatting makes sense. Yes, it

...
...
that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious

means problem

...
...
than bytes formatting not being able to handle bytes.

Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields.

Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).

...
The parallel I'd attempt to draw is that:

f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)

as:

bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)

To make the behaviour clearer in the latter case, it may be reasonable to *require* an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.

Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

-- --Guido van Rossum (python.org/~guido)

Eric V. Smith

11:23 p.m.

On 10/07/2015 12:25 PM, Guido van Rossum wrote:

...

I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.

I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved. Eric.

...

On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert mailto:abarnert@yahoo.com> wrote:

On Oct 7, 2015, at 04:35, Nick Coghlan mailto:ncoghlan@gmail.com> wrote: > >> On 4 October 2015 at 08:25, Andrew Barnert mailto:abarnert@yahoo.com> wrote: >> Nick's suggestion of having it do %-formatting makes sense. Yes, it means >> that {count:03} is an error and you need '{count:03d}', which is >> inconsistent with f-strings. But that seems like a much less serious problem >> than bytes formatting not being able to handle bytes. > > Exactly, if someone is mistakenly thinking > bf"{header}{content}{footer}" is equivalent to > f"{header}{content}{footer}".encode(), they're likely to get immediate > noisy errors when they start trying to format fields.

Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).

> The parallel I'd attempt to draw is that: > > f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer) > > as: > > bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % > (header, content, footer) > > To make the behaviour clearer in the latter case, it may be reasonable > to *require* an explicit field format code, since that corresponds > more closely to the mandatory field format codes in mod-formatting.

Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

-- --Guido van Rossum (python.org/~guido http://python.org/~guido)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

11:28 p.m.

Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.) On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith wrote:

...

On 10/07/2015 12:25 PM, Guido van Rossum wrote:

...
I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.

I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved.

Eric.

...
On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert mailto:abarnert@yahoo.com> wrote:

On Oct 7, 2015, at 04:35, Nick Coghlan mailto:ncoghlan@gmail.com> wrote: > >> On 4 October 2015 at 08:25, Andrew Barnert
mailto:abarnert@yahoo.com> wrote:

...
>> Nick's suggestion of having it do %-formatting makes sense. Yes,

it means

...
>> that {count:03} is an error and you need '{count:03d}', which is >> inconsistent with f-strings. But that seems like a much less

serious problem

...
>> than bytes formatting not being able to handle bytes. > > Exactly, if someone is mistakenly thinking > bf"{header}{content}{footer}" is equivalent to > f"{header}{content}{footer}".encode(), they're likely to get

immediate

...
> noisy errors when they start trying to format fields.

Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).

> The parallel I'd attempt to draw is that: > > f"{header}{content}{footer}" is to "{}{}{}".format(header,

content, footer)

...
> > as: > > bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % > (header, content, footer) > > To make the behaviour clearer in the latter case, it may be

reasonable

...
> to *require* an explicit field format code, since that corresponds > more closely to the mandatory field format codes in mod-formatting.

Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

-- --Guido van Rossum (python.org/~guido http://python.org/~guido)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

Eric V. Smith

11:31 p.m.

On 10/07/2015 01:58 PM, Guido van Rossum wrote:

...

Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.)

Right. I think Steve Dower's idea of :p switching to struct.pack mode is reasonable. But as Nick says, we don't need to add it on day 1. Eric.

...

On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith mailto:eric@trueblade.com> wrote:

On 10/07/2015 12:25 PM, Guido van Rossum wrote: > I think bf'...' should be compared to b'...' % rather than to f'...'. > IOW bf'...' is to f'...' as b'...'% is to '...'%.

I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved.

Eric.

> > On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert mailto:abarnert@yahoo.com > mailto:abarnert@yahoo.com>> wrote: > > On Oct 7, 2015, at 04:35, Nick Coghlan mailto:ncoghlan@gmail.com > mailto:ncoghlan@gmail.com>> wrote: > > > >> On 4 October 2015 at 08:25, Andrew Barnert mailto:abarnert@yahoo.com mailto:abarnert@yahoo.com>> wrote: > >> Nick's suggestion of having it do %-formatting makes sense. Yes, it means > >> that {count:03} is an error and you need '{count:03d}', which is > >> inconsistent with f-strings. But that seems like a much less serious problem > >> than bytes formatting not being able to handle bytes. > > > > Exactly, if someone is mistakenly thinking > > bf"{header}{content}{footer}" is equivalent to > > f"{header}{content}{footer}".encode(), they're likely to get immediate > > noisy errors when they start trying to format fields. > > Except that multiple people in this thread are saying that'd exactly > what it should mean (which I think is a very bad idea). > > > The parallel I'd attempt to draw is that: > > > > f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer) > > > > as: > > > > bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % > > (header, content, footer) > > > > To make the behaviour clearer in the latter case, it may be reasonable > > to *require* an explicit field format code, since that corresponds > > more closely to the mandatory field format codes in mod-formatting. > > Are you suggestive that if a format specifier is given, it must > include the format code (which seems perfectly reasonable to > me--guessing that :3 means %3b is likely to be wrong more often than > it's right…), or that a format specifier must always be given, with > no default to :b (which seems more obtrusive and solves less of a > problem). > > > > > -- > --Guido van Rossum (python.org/~guido http://python.org/~guido http://python.org/~guido) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org mailto:Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ >

-- --Guido van Rossum (python.org/~guido http://python.org/~guido)

Eric V. Smith

8 Oct 8 Oct

2:57 a.m.

...

On Oct 7, 2015, at 2:01 PM, Eric V. Smith wrote:

...
On 10/07/2015 01:58 PM, Guido van Rossum wrote: Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.)

Right. I think Steve Dower's idea of :p switching to struct.pack mode is reasonable. But as Nick says, we don't need to add it on day 1.

Make that "!p". Eric.

...

Eric.

...
On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith mailto:eric@trueblade.com> wrote:

...
On 10/07/2015 12:25 PM, Guido van Rossum wrote: I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.

I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved.

Eric.

...
On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert mailto:abarnert@yahoo.com mailto:abarnert@yahoo.com>> wrote:

On Oct 7, 2015, at 04:35, Nick Coghlan mailto:ncoghlan@gmail.com mailto:ncoghlan@gmail.com>> wrote:

...
...
On 4 October 2015 at 08:25, Andrew Barnert

mailto:abarnert@yahoo.com mailto:abarnert@yahoo.com>> wrote:

...
...
...
Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.

Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields.

Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).

...
The parallel I'd attempt to draw is that:

f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)

as:

bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)

To make the behaviour clearer in the latter case, it may be reasonable to *require* an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.

Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

-- --Guido van Rossum (python.org/~guido http://python.org/~guido http://python.org/~guido)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org mailto:Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido http://python.org/~guido)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nick Coghlan

1:09 p.m.

On 7 October 2015 at 22:34, Andrew Barnert wrote:

...

On Oct 7, 2015, at 04:35, Nick Coghlan wrote:

...
The parallel I'd attempt to draw is that:

f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)

as:

bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)

To make the behaviour clearer in the latter case, it may be reasonable to *require* an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.

Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).

I was thinking the latter, but your idea of ":b" being implied only if there's no format specifier at all (and otherwise requiring an explicit "b" or other format code) might be better. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

3117

Age (days ago)

3127

Last active (days ago)

List overview

Download

32 comments

11 participants

participants (11)

Andrew Barnert
Chris Angelico
Eric V. Smith
Guido van Rossum
João Bernardo
Nathaniel Smith
Nick Coghlan
Stephen J. Turnbull
Steve Dower
Steven D'Aprano
Sven R. Kunze

Binary f-strings

tags

participants (11)