Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf'').
The idea is that:
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
Might be translated as:
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')
Which would result in: b'datestamp:20150927\r\n'
The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.
Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8.
Another idea would be to extend the "conversion char" from just 's', 'r', or 'a', which don't make much sense for bytes, to instead be a string that specifies the encoding. The default could be ascii, and if you want to specify something else: bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n'
That would work for any encoding that doesn't have ':', '{', or '}' in the encoding name. Which seems like a reasonable restriction.
And I might be over-generalizing here, but you'd presumably want to make the encoding a non-constant: bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n'
I think my initial proposal will be to use 'ascii', and not support any conversion characters at all for fb-strings, not even 's', 'r', and 'a'. In the future, if we want to support encodings other than 'ascii', we could then add !conversions mapping to encodings.
My reasoning for using 'ascii' is that 'utf-8' could easily be an error for non-utf-8 protocols. And by using 'ascii', at least we'd give a runtime error and not put possibly bogus data into the resulting binary string. Granted, the tradeoff is that we now have a case where whether or not the code raises an exception is dependent upon the values being formatted. If 'ascii' is the default, we could later switch to 'utf-8', but we couldn't go the other way.
The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__.
Once we make a decision, I can apply the same logic to b''.format(), if that's desirable.
I'm open to suggestions on this.
Thanks for reading.
-- Eric.
On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote:
Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf'').
The idea is that:
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
Might be translated as:
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')
What's wrong with this?
f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii')
This eliminates all your questions about which encoding we should guess is more useful (ascii? utf-8? something else?), allows the caller to set an error handler without inventing yet more cryptic format codes, and is nicely explicit.
If people are worried about the length of ".encode(...)", a helper function works great:
def b(s): return bytes(s, 'utf-8')
# or whatever encoding makes sense for them
b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n')
Which would result in: b'datestamp:20150927\r\n'
The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.
Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8.
Using UTF-8 is not sufficient, since there are strings that can't be encoded into UTF-8 because they contain surrogates:
py> '\uDA11'.encode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in position 0: surrogates not allowed
but we surely don't want to suppress such errors by default. Sometimes they will be an error that needs fixing.
-- Steve
Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were?
-n
On Sun, Sep 27, 2015 at 6:23 PM, Eric V. Smith eric@trueblade.com wrote:
Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf'').
The idea is that:
bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
Might be translated as:
(b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n')
Which would result in: b'datestamp:20150927\r\n'
The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.
Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8.
Another idea would be to extend the "conversion char" from just 's', 'r', or 'a', which don't make much sense for bytes, to instead be a string that specifies the encoding. The default could be ascii, and if you want to specify something else: bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n'
That would work for any encoding that doesn't have ':', '{', or '}' in the encoding name. Which seems like a reasonable restriction.
And I might be over-generalizing here, but you'd presumably want to make the encoding a non-constant: bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n'
I think my initial proposal will be to use 'ascii', and not support any conversion characters at all for fb-strings, not even 's', 'r', and 'a'. In the future, if we want to support encodings other than 'ascii', we could then add !conversions mapping to encodings.
My reasoning for using 'ascii' is that 'utf-8' could easily be an error for non-utf-8 protocols. And by using 'ascii', at least we'd give a runtime error and not put possibly bogus data into the resulting binary string. Granted, the tradeoff is that we now have a case where whether or not the code raises an exception is dependent upon the values being formatted. If 'ascii' is the default, we could later switch to 'utf-8', but we couldn't go the other way.
The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__.
Once we make a decision, I can apply the same logic to b''.format(), if that's desirable.
I'm open to suggestions on this.
Thanks for reading.
-- Eric.
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- http://vorpus.org
On Mon, Sep 28, 2015 at 12:41 PM, Nathaniel Smith njs@pobox.com wrote:
Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were?
The biggest issues are summarized into PEP 461:
https://www.python.org/dev/peps/pep-0461/#proposed-variations
Since the __format__ machinery is all based around text strings, there'll need to be some (explicit or implicit) encode step. Hence this thread.
How bad would it be to simply say "there are no bf strings"? As Steven says, you can simply use a normal f''.encode() operation, with no confusion. Otherwise, there'll be these "format-like" operations that can do things that format() can't do... and then there'd be edge cases, too, like a string with a b-prefix that contains non-ASCII characters in it:
восток = 1961 apollo = 1969 print(f"It took {apollo-восток} years to get from orbit to the moon.") It took 8 years to get from orbit to the moon. print(b"It took {apollo-восток} years to get from orbit to the moon.") File "<stdin>", line 1 SyntaxError: bytes can only contain ASCII literal characters.
If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string?
In any case, bf strings can be added later, but once they're added, their semantics would be locked in. I'd be inclined to leave them out for 3.6 and see what people say. A bit of real-world usage of f-strings might show a clear front-runner in terms of expectations (UTF-8, ASCII, or something else).
ChrisA
On Mon, Sep 28, 2015 at 01:03:32PM +1000, Chris Angelico wrote: [...]
восток = 1961 apollo = 1969 print(f"It took {apollo-восток} years to get from orbit to the moon.") It took 8 years to get from orbit to the moon. print(b"It took {apollo-восток} years to get from orbit to the moon.") File "<stdin>", line 1 SyntaxError: bytes can only contain ASCII literal characters.
If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string?
It would confuse the heck out of me. I leave it to the reader to decide whether I am a human or a tool.
-- Steve
On 28.09.2015 05:03, Chris Angelico wrote: >
восток = 1961 apollo = 1969 print(f"It took {apollo-восток} years to get from orbit to the moon.") It took 8 years to get from orbit to the moon. print(b"It took {apollo-восток} years to get from orbit to the moon.") File "<stdin>", line 1 SyntaxError: bytes can only contain ASCII literal characters.
If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string?
I don't think so. "{...}" indicates the injection of whatever "..." stands for, thus is not part of the resulting string. So, no issue here for me.
(The only thing that would confuse me, is that "восток" is an allowed identifier in the first place. But that seems to be a different matter.)
In any case, bf strings can be added later, but once they're added, their semantics would be locked in. I'd be inclined to leave them out for 3.6 and see what people say. A bit of real-world usage of f-strings might show a clear front-runner in terms of expectations (UTF-8, ASCII, or something else).
I tend to agree here.
Best, Sven
On Sep 27, 2015, at 18:23, Eric V. Smith eric@trueblade.com wrote:
The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__.
The fact that it can't handle bytes and bytes-like types makes this much less useful than %.
Beyond that, the fact that it only works reliably for the same types as %, minus bytes, plus a few others including datetime means the benefit isn't nearly as large as for f-strings and str.format, which work reliably for every type in the world, and extensibly so for many types. And meanwhile, the cost is much higher, from code that seems to work if you don't test it well to even higher performance costs (and usually in code that needs performance more).
Of course you could create a __bformat__(args, encoding, errors, **kw) protocol (where object.__bformat__ just returns self.__format__(args, **kw).encode(encoding, errors)), which has the same effect as your proposal except that types that need to know they're being bytes-formatted to do something reasonable, or that just want to know so they can optimize, can do so. And this of course lets you add __bformat__ to bytes, etc.--although it doesn't seem to help for types that support the buffer protocol, so it's still not as good as %b. But I don't think anyone will want that.
On 28.09.2015 03:23, Eric V. Smith wrote:
The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how?
Cf. https://www.python.org/dev/peps/pep-0461/#interpolation
It says:
b"%x" % val
is equivalent to:
("%x" % val).encode("ascii")
So, ASCII would make a lot of sense to me as well.
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.
Could you be more specific here?
Best, Sven
On 10/02/2015 10:20 AM, Sven R. Kunze wrote:
On 28.09.2015 03:23, Eric V. Smith wrote:
Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile.
Could you be more specific here?
bf'{foo}'
Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails.
But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.
Eric.
On 10/02/2015 10:48 AM, João Bernardo wrote:
But some of the other arguments are making me
think bf'' is a bad idea,
so now I'm leaning towards not implementing it.
What about rf''? (sorry for being off topic here)
Regex could benefit from it:
my_regex = rf"ˆ\w+\s*({'|'.join(expected_words)})$"
That's already implemented. Its use in regular expressions is mentioned in the PEP.
On Sat, Oct 3, 2015 at 12:48 AM, João Bernardo jbvsmo@gmail.com wrote:
But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.
What about rf''? (sorry for being off topic here)
Regex could benefit from it:
my_regex = rf"ˆ\w+\s*({'|'.join(expected_words)})$"
Works fine:
rosuav@sikorsky:~$ python3 Python 3.6.0a0 (default:48943533965e, Sep 28 2015, 11:27:38) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information.
expected_words = ["foo", "bar"] my_regex = rf"ˆ\w+\s({'|'.join(expected_words)})$" print(my_regex) ˆ\w+\s(foo|bar)$
ChrisA
On 02.10.2015 16:26, Eric V. Smith wrote:
bf'{foo}'
Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails.
I know a lot of functions that fail when passing the wrong kind of arguments. What's so wrong with it?
But some of the other arguments are making me think bf'' is a bad idea, so now I'm leaning towards not implementing it.
I see. Do you think of an alternative solution?
I was digging deeper into the matter of binary/byte strings formatting in order to sympathise why {} is not usable for binary protocols. Let's look at this practical example https://blog.tox.chat/2015/09/fuzzing-the-new-groupchats/ I hope the tox protocol fully qualifies as a wireframe protocol. What he's trying to do it is to fuzzing his new groupchat implementation by creating more-or-less random packages and feed them into tox core to see if it breaks.
He conveniently uses this type of syntax to describe the structure of the first header:
Header 1: [ Packet ID (1 b) | Chat ID hash (4 b) | Sender PK (32 b) | nonce (24 b) ]
Interested in writing a fuzzer, I would find the following really helpful as it mirrors the description within his blog post:
header1 = bf'{packet_id}{chat_id}{sender_pk}{nonce}'
# which should be the same as
header1 = b'%b%b%b%b' % (packet_id, chat_id, sender_pk, nonce)
I wouldn't mind specifying the encoding for all non-byte-string arguments. Why? Because I would be working with bytes anyway, so no formatting (as in format()) would be necessary in the first place. However, I like the syntax for specifying the structure of (byte) strings.
Does this makes sense?
Best, Sven
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).
On Fri, Oct 2, 2015 at 8:43 AM, Sven R. Kunze srkunze@mail.de wrote:
On 02.10.2015 16:26, Eric V. Smith wrote:
bf'{foo}'
Might succeed or fail, depending on what foo returns for __format__. If foo is 'bar', it succeeds. If it's '\u1234', it fails.
I know a lot of functions that fail when passing the wrong kind of arguments. What's so wrong with it?
But some of the other arguments are making me think bf'' is a bad idea,
so now I'm leaning towards not implementing it.
I see. Do you think of an alternative solution?
I was digging deeper into the matter of binary/byte strings formatting in order to sympathise why {} is not usable for binary protocols. Let's look at this practical example https://blog.tox.chat/2015/09/fuzzing-the-new-groupchats/ I hope the tox protocol fully qualifies as a wireframe protocol. What he's trying to do it is to fuzzing his new groupchat implementation by creating more-or-less random packages and feed them into tox core to see if it breaks.
He conveniently uses this type of syntax to describe the structure of the first header:
Header 1: [ Packet ID (1 b) | Chat ID hash (4 b) | Sender PK (32 b) | nonce (24 b) ]
Interested in writing a fuzzer, I would find the following really helpful as it mirrors the description within his blog post:
header1 = bf'{packet_id}{chat_id}{sender_pk}{nonce}'
# which should be the same as
header1 = b'%b%b%b%b' % (packet_id, chat_id, sender_pk, nonce)
I wouldn't mind specifying the encoding for all non-byte-string arguments. Why? Because I would be working with bytes anyway, so no formatting (as in format()) would be necessary in the first place. However, I like the syntax for specifying the structure of (byte) strings.
Does this makes sense?
Best, Sven
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On Fri, Oct 02, 2015 at 09:00:56AM -0700, Guido van Rossum wrote:
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).
As Chris A pointed out earlier, identifiers are not ASCII only. What are we to make of something like this?
bf'{αριθμός + 1}'
And don't say "re-write your code to only use ASCII variable names" :-)
-- Steve
On Sat, Oct 3, 2015 at 1:27 PM, Steven D'Aprano steve@pearwood.info wrote:
On Fri, Oct 02, 2015 at 09:00:56AM -0700, Guido van Rossum wrote:
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).
As Chris A pointed out earlier, identifiers are not ASCII only. What are we to make of something like this?
bf'{αριθμός + 1}'
And don't say "re-write your code to only use ASCII variable names" :-)
It should be technically legal, btw; it's just going to look very odd. The check for ASCII-only has to be done _after_ the fracturing into strings and expressions. But I don't like how that reads.
ChrisA
On Fri, Oct 2, 2015 at 8:33 PM, Chris Angelico rosuav@gmail.com wrote:
On Sat, Oct 3, 2015 at 1:27 PM, Steven D'Aprano steve@pearwood.info wrote:
On Fri, Oct 02, 2015 at 09:00:56AM -0700, Guido van Rossum wrote:
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).
As Chris A pointed out earlier, identifiers are not ASCII only. What are we to make of something like this?
bf'{αριθμός + 1}'
And don't say "re-write your code to only use ASCII variable names" :-)
It should be technically legal, btw; it's just going to look very odd. The check for ASCII-only has to be done _after_ the fracturing into strings and expressions. But I don't like how that reads.
I don't think this concern should be a showstopper. Honestly either approach sounds fine to me. :-)
-- --Guido van Rossum (python.org/~guido)
Steven D'Aprano writes:
What are we to make of something like this?
bf'{αριθμός + 1}'
Greek authorship. What would you make of
bf'{junban + 1}'
and why is that better than
bf'{順番 + 1}'
? (I would guess that is a sort of Japanese approximation to your Greek.)
I think it was Alex Martelli who argued very strongly at the time of PEP 263 that using native identifiers and even comments in your native language is a very risky practice (at least from management's POV
<wink/>). I think that's still true, but it's clearly in consenting adults territory.
On 3 October 2015 at 02:00, Guido van Rossum guido@python.org wrote:
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).
Given that restriction, what if we dropped the format() call in the bytestring case, and instead always used printf-style formatting?
That is:
bf'{packet_id}{chat_id}{sender_pk}{nonce}'
could be roughly equivalent to (with parens to help make the pieces clearer):
(b'%b' % packet_id) + (b'%b' % chat_id) + (b'%b' % sender_pk) +
(b'%b' % nonce)
If a ":fmt" section is provided for the substitution field, it would replace the mod-format sequence for that section:
bf'{number:0.2d} ===> b'%0.2d' % number
With that approach, over time, printf-style formatting (aka mod-formatting) may come to just be known as bytes formatting (even though text strings also support it).
Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol, while doing the higher level composition with bf-strings*:
bf'{header}{struct.pack('<10sHHb', record)}{footer}'
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"
Personally, if binary f-strings did struct packing by default, I'd want to use them all the time.
bf'{header}{record:<10sHHb}{footer}'
Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.
$0.02
Cheers, Steve
Top-posted from my Windows Phone
-----Original Message----- From: "Nick Coghlan" ncoghlan@gmail.com Sent: 10/3/2015 7:37 To: "Guido van Rossum" guido@python.org Cc: "Python-Ideas" python-ideas@python.org Subject: Re: [Python-ideas] Binary f-strings
On 3 October 2015 at 02:00, Guido van Rossum guido@python.org wrote:
Bingo. IMO the exact same arguments that show why f'{x} {y}' is better than '%s %s' % (x, y) applies to byte strings. It would be totally acceptable if it only took bytes (and bytearray, and memoryview) and numbers (which we can guarantee are rendered in ASCII only).
Given that restriction, what if we dropped the format() call in the bytestring case, and instead always used printf-style formatting?
That is:
bf'{packet_id}{chat_id}{sender_pk}{nonce}'
could be roughly equivalent to (with parens to help make the pieces clearer):
(b'%b' % packet_id) + (b'%b' % chat_id) + (b'%b' % sender_pk) +
(b'%b' % nonce)
If a ":fmt" section is provided for the substitution field, it would replace the mod-format sequence for that section:
bf'{number:0.2d} ===> b'%0.2d' % number
With that approach, over time, printf-style formatting (aka mod-formatting) may come to just be known as bytes formatting (even though text strings also support it).
Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol, while doing the higher level composition with bf-strings*:
bf'{header}{struct.pack('<10sHHb', record)}{footer}'
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 10/03/2015 12:20 PM, Steve Dower wrote:
"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"
Personally, if binary f-strings did struct packing by default, I'd want to use them all the time.
That appeals to me, too. There are a number of practical problems that would need to be worked out. We can argue those later :)
I guess it comes down to: what would the commonest use case for fb-strings be?
Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.
This is one of my two big concerns. If we do something other than remove 'b' and encode, then we've got two similar looking things that have vastly different implementations. But maybe struct.pack or %-formatting are so compelling that it's worth breaking the equivalence.
My other concern is non-ascii chars inside the braces in an fb-string.
Eric.
On 03.10.2015 21:18, Eric V. Smith wrote:
On 10/03/2015 12:20 PM, Steve Dower wrote:
"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"
Personally, if binary f-strings did struct packing by default, I'd want to use them all the time. That appeals to me, too. There are a number of practical problems that would need to be worked out. We can argue those later :)
I think that's were I reach the limit of my "binary" experience in Python. But if people (here Steve) found it useful, why not? If there are problems that cannot be solved easily, we can do a V1 and later a V2 that includes the struct packing.
I guess it comes down to: what would the commonest use case for fb-strings be?
To me, it's the same as for all f-strings: the bloody easy string concatenation of easily distinguishable parts.
I even have to admit I thought they were called format strings because they give format/structure to the resulting string. Well, now I know better (format refers to the formatting of the input expressions) but the analogy is still in my mind.
My other concern is non-ascii chars inside the braces in an fb-string.
What's wrong with them?
Best, Sven
On Oct 3, 2015, at 09:20, Steve Dower steve.dower@python.org wrote:
"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"
Personally, if binary f-strings did struct packing by default, I'd want to use them all the time.
bf'{header}{record:<10sHHb}{footer}'
I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that.
Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.
But removing the b and encoding the resulting f-string is useless. For example:
header = b'Spam'
value = 42
lines.append(bf'{header}: {value}\r\n')
This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that?
The only way the feature makes sense is if it does something different.
Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.
Maybe we could spell it {spam!p:<10sHHb}?
Top-posted from my Windows Phone
-----Original Message----- From: "Andrew Barnert" abarnert@yahoo.com Sent: 10/3/2015 15:31 To: "Steve Dower" steve.dower@python.org Cc: "Nick Coghlan" ncoghlan@gmail.com; "Guido van Rossum" guido@python.org; "Python-Ideas" python-ideas@python.org Subject: Re: [Python-ideas] Binary f-strings
On Oct 3, 2015, at 09:20, Steve Dower steve.dower@python.org wrote:
"Something else that's neat with this: you could use the struct module for more complex subsections of a binary protocol"
Personally, if binary f-strings did struct packing by default, I'd want to use them all the time.
bf'{header}{record:<10sHHb}{footer}'
I love that at first glance. But if the point of bf-strings (like the point of bytes.__mod__ and the other str-like stuff added back to bytes since 3.0) is for things like ascii-based, partly-human-readable protocols and formats, it's obviously important to do things like hex and octal, space- and zero-padding, etc., and if the format specifier always means struct, there's no way to do that.
Practically, if they aren't equivalent to removing the b and encoding the resulting f-string, I expect we'll regularly hit confusion and misunderstanding.
But removing the b and encoding the resulting f-string is useless. For example:
header = b'Spam'
value = 42
lines.append(bf'{header}: {value}\r\n')
This gives you b"b'Spam': 42\r\n". Can you imagine ever wanting that?
The only way the feature makes sense is if it does something different.
Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.
On 4 October 2015 at 08:25, Andrew Barnert abarnert@yahoo.com wrote:
Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.
Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields.
The parallel I'd attempt to draw is that:
f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)
as:
bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" %
(header, content, footer)
To make the behaviour clearer in the latter case, it may be reasonable to require an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.
I'm not sold on the idea of a struct.pack conversion specifier - if we added binary format strings, I think it would be better to start with explicit "pack(value, format)" expressions, and see how that goes for a release.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Oct 7, 2015, at 04:35, Nick Coghlan ncoghlan@gmail.com wrote:
On 4 October 2015 at 08:25, Andrew Barnert abarnert@yahoo.com wrote: Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.
Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields.
Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).
The parallel I'd attempt to draw is that:
f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)
as:
bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)
To make the behaviour clearer in the latter case, it may be reasonable to require an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.
Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).
I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.
On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert abarnert@yahoo.com wrote:
On Oct 7, 2015, at 04:35, Nick Coghlan ncoghlan@gmail.com wrote: >
On 4 October 2015 at 08:25, Andrew Barnert abarnert@yahoo.com wrote: Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.
Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields.
Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).
The parallel I'd attempt to draw is that:
f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)
as:
bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)
To make the behaviour clearer in the latter case, it may be reasonable to require an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.
Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).
-- --Guido van Rossum (python.org/~guido)
On 10/07/2015 12:25 PM, Guido van Rossum wrote:
I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.
I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved.
Eric.
On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert <abarnert@yahoo.com
mailto:abarnert@yahoo.com> wrote:
On Oct 7, 2015, at 04:35, Nick Coghlan <ncoghlan@gmail.com
<mailto:ncoghlan@gmail.com>> wrote:
>
>> On 4 October 2015 at 08:25, Andrew Barnert <abarnert@yahoo.com
<mailto:abarnert@yahoo.com>> wrote:
>> Nick's suggestion of having it do %-formatting makes sense. Yes, it means
>> that {count:03} is an error and you need '{count:03d}', which is
>> inconsistent with f-strings. But that seems like a much less serious problem
>> than bytes formatting not being able to handle bytes.
>
> Exactly, if someone is mistakenly thinking
> bf"{header}{content}{footer}" is equivalent to
> f"{header}{content}{footer}".encode(), they're likely to get immediate
> noisy errors when they start trying to format fields.
Except that multiple people in this thread are saying that'd exactly
what it should mean (which I think is a very bad idea).
> The parallel I'd attempt to draw is that:
>
> f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)
>
> as:
>
> bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" %
> (header, content, footer)
>
> To make the behaviour clearer in the latter case, it may be reasonable
> to *require* an explicit field format code, since that corresponds
> more closely to the mandatory field format codes in mod-formatting.
Are you suggestive that if a format specifier is given, it must
include the format code (which seems perfectly reasonable to
me--guessing that :3 means %3b is likely to be wrong more often than
it's right…), or that a format specifier must always be given, with
no default to :b (which seems more obtrusive and solves less of a
problem).
-- --Guido van Rossum (python.org/~guido http://python.org/~guido)
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.)
On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith eric@trueblade.com wrote:
On 10/07/2015 12:25 PM, Guido van Rossum wrote:
I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.
I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved.
Eric.
>
On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert <abarnert@yahoo.com
mailto:abarnert@yahoo.com> wrote:
On Oct 7, 2015, at 04:35, Nick Coghlan <ncoghlan@gmail.com
<mailto:ncoghlan@gmail.com>> wrote:
>
>> On 4 October 2015 at 08:25, Andrew Barnert <abarnert@yahoo.com
mailto:abarnert@yahoo.com> wrote:
>> Nick's suggestion of having it do %-formatting makes sense. Yes,
it means
>> that {count:03} is an error and you need '{count:03d}', which is
>> inconsistent with f-strings. But that seems like a much less
serious problem
>> than bytes formatting not being able to handle bytes.
>
> Exactly, if someone is mistakenly thinking
> bf"{header}{content}{footer}" is equivalent to
> f"{header}{content}{footer}".encode(), they're likely to get
immediate
> noisy errors when they start trying to format fields.
Except that multiple people in this thread are saying that'd exactly
what it should mean (which I think is a very bad idea).
> The parallel I'd attempt to draw is that:
>
> f"{header}{content}{footer}" is to "{}{}{}".format(header,
content, footer) >
> as:
>
> bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" %
> (header, content, footer)
>
> To make the behaviour clearer in the latter case, it may be
reasonable
> to *require* an explicit field format code, since that corresponds
> more closely to the mandatory field format codes in mod-formatting.
Are you suggestive that if a format specifier is given, it must
include the format code (which seems perfectly reasonable to
me--guessing that :3 means %3b is likely to be wrong more often than
it's right…), or that a format specifier must always be given, with
no default to :b (which seems more obtrusive and solves less of a
problem).
-- --Guido van Rossum (python.org/~guido http://python.org/~guido)
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 10/07/2015 01:58 PM, Guido van Rossum wrote:
Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.)
Right. I think Steve Dower's idea of :p switching to struct.pack mode is reasonable. But as Nick says, we don't need to add it on day 1.
Eric.
On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith <eric@trueblade.com
mailto:eric@trueblade.com> wrote:
On 10/07/2015 12:25 PM, Guido van Rossum wrote:
> I think bf'...' should be compared to b'...' % rather than to f'...'.
> IOW bf'...' is to f'...' as b'...'% is to '...'%.
I'm leaning this way, at least in the sense of "there's a fixed number
of known types supported, and there's no extensible protocol involved.
Eric.
>
> On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert <abarnert@yahoo.com
<mailto:abarnert@yahoo.com>
> <mailto:abarnert@yahoo.com <mailto:abarnert@yahoo.com>>> wrote:
>
> On Oct 7, 2015, at 04:35, Nick Coghlan <ncoghlan@gmail.com
<mailto:ncoghlan@gmail.com>
> <mailto:ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>>> wrote:
> >
> >> On 4 October 2015 at 08:25, Andrew Barnert
<abarnert@yahoo.com <mailto:abarnert@yahoo.com>
<mailto:abarnert@yahoo.com <mailto:abarnert@yahoo.com>>> wrote:
> >> Nick's suggestion of having it do %-formatting makes sense.
Yes, it means
> >> that {count:03} is an error and you need '{count:03d}',
which is
> >> inconsistent with f-strings. But that seems like a much
less serious problem
> >> than bytes formatting not being able to handle bytes.
> >
> > Exactly, if someone is mistakenly thinking
> > bf"{header}{content}{footer}" is equivalent to
> > f"{header}{content}{footer}".encode(), they're likely to get
immediate
> > noisy errors when they start trying to format fields.
>
> Except that multiple people in this thread are saying that'd
exactly
> what it should mean (which I think is a very bad idea).
>
> > The parallel I'd attempt to draw is that:
> >
> > f"{header}{content}{footer}" is to
"{}{}{}".format(header, content, footer)
> >
> > as:
> >
> > bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" %
> > (header, content, footer)
> >
> > To make the behaviour clearer in the latter case, it may be
reasonable
> > to *require* an explicit field format code, since that
corresponds
> > more closely to the mandatory field format codes in
mod-formatting.
>
> Are you suggestive that if a format specifier is given, it must
> include the format code (which seems perfectly reasonable to
> me--guessing that :3 means %3b is likely to be wrong more
often than
> it's right…), or that a format specifier must always be given,
with
> no default to :b (which seems more obtrusive and solves less of a
> problem).
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido <http://python.org/~guido>
<http://python.org/~guido>)
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas@python.org <mailto:Python-ideas@python.org>
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- --Guido van Rossum (python.org/~guido http://python.org/~guido)
On Oct 7, 2015, at 2:01 PM, Eric V. Smith eric@trueblade.com wrote:
On 10/07/2015 01:58 PM, Guido van Rossum wrote: Of course that would still leave the door open for struct.pack support (maybe recognized by having the string start with <,=, > or @). Pro: everybody who currently uses struct.pack will love it. Con: the struct.pack mini-language is pretty inscrutable if you don't already know it. (And no, I don't propose to invent a different mini-language -- it's just easier to figure out where to find docs for this when the code explicitly imports the struct module.)
Right. I think Steve Dower's idea of :p switching to struct.pack mode is reasonable. But as Nick says, we don't need to add it on day 1.
Make that "!p".
Eric.
Eric.
On Wed, Oct 7, 2015 at 10:53 AM, Eric V. Smith <eric@trueblade.com
mailto:eric@trueblade.com> wrote:
On 10/07/2015 12:25 PM, Guido van Rossum wrote: I think bf'...' should be compared to b'...' % rather than to f'...'. IOW bf'...' is to f'...' as b'...'% is to '...'%.
I'm leaning this way, at least in the sense of "there's a fixed number of known types supported, and there's no extensible protocol involved.
Eric.
On Wed, Oct 7, 2015 at 5:34 AM, Andrew Barnert <abarnert@yahoo.com mailto:abarnert@yahoo.com
<mailto:abarnert@yahoo.com mailto:abarnert@yahoo.com>> wrote:
On Oct 7, 2015, at 04:35, Nick Coghlan <ncoghlan@gmail.com mailto:ncoghlan@gmail.com <mailto:ncoghlan@gmail.com mailto:ncoghlan@gmail.com>> wrote:
On 4 October 2015 at 08:25, Andrew Barnert <abarnert@yahoo.com mailto:abarnert@yahoo.com <mailto:abarnert@yahoo.com mailto:abarnert@yahoo.com>> wrote: Nick's suggestion of having it do %-formatting makes sense. Yes, it means that {count:03} is an error and you need '{count:03d}', which is inconsistent with f-strings. But that seems like a much less serious problem than bytes formatting not being able to handle bytes.
Exactly, if someone is mistakenly thinking bf"{header}{content}{footer}" is equivalent to f"{header}{content}{footer}".encode(), they're likely to get immediate noisy errors when they start trying to format fields.
Except that multiple people in this thread are saying that'd exactly what it should mean (which I think is a very bad idea).
The parallel I'd attempt to draw is that:
f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)
as:
bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)
To make the behaviour clearer in the latter case, it may be reasonable to require an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.
Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).
-- --Guido van Rossum (python.org/~guido http://python.org/~guido http://python.org/~guido)
Python-ideas mailing list Python-ideas@python.org mailto:Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido http://python.org/~guido)
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 7 October 2015 at 22:34, Andrew Barnert abarnert@yahoo.com wrote:
On Oct 7, 2015, at 04:35, Nick Coghlan ncoghlan@gmail.com wrote:
The parallel I'd attempt to draw is that:
f"{header}{content}{footer}" is to "{}{}{}".format(header, content, footer)
as:
bf"{header:b}{content:b}{footer:b}" would be to b"%b%b%b" % (header, content, footer)
To make the behaviour clearer in the latter case, it may be reasonable to require an explicit field format code, since that corresponds more closely to the mandatory field format codes in mod-formatting.
Are you suggestive that if a format specifier is given, it must include the format code (which seems perfectly reasonable to me--guessing that :3 means %3b is likely to be wrong more often than it's right…), or that a format specifier must always be given, with no default to :b (which seems more obtrusive and solves less of a problem).
I was thinking the latter, but your idea of ":b" being implied only if there's no format specifier at all (and otherwise requiring an explicit "b" or other format code) might be better.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia