PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3

Okay, I included that last round of comments (from late February). Barring typos, this should be the final version. Final comments? ----------------------------------------------------------------------------- PEP: 461 Title: Adding % formatting to bytes and bytearray Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman <ethan@stoneleaf.us> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25 Resolution: Abstract ======== This PEP proposes adding % formatting operations similar to Python 2's ``str`` type to ``bytes`` and ``bytearray`` [1]_ [2]_. Rationale ========= While interpolation is usually thought of as a string operation, there are cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and the work needed to make up for this missing functionality detracts from the overall readability of the code. Motivation ========== With Python 3 and the split between ``str`` and ``bytes``, one small but important area of programming became slightly more difficult, and much more painful -- wire format protocols [3]_. This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in writing new wire format code, and in porting Python 2 wire format code. Common use-cases include ``dbf`` and ``pdf`` file formats, ``email`` formats, and ``FTP`` and ``HTTP`` communications, among many others. Proposed semantics for ``bytes`` and ``bytearray`` formatting ============================================================= %-interpolation --------------- All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. The only difference will be that the results from these codes will be ASCII-encoded text, not unicode. In other words, for any numeric formatting code `%x`:: b"%x" % val is equivalent to ("%x" % val).encode("ascii") Examples:: >>> b'%4x' % 10 b' a' >>> b'%#4x' % 10 ' 0xa' >>> b'%04X' % 10 '000A' ``%c`` will insert a single byte, either from an ``int`` in range(256), or from a ``bytes`` argument of length 1, not from a ``str``. Examples:: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' ``%s`` is included for two reasons: 1) `b` is already a format code for ``format`` numerics (binary), and 2) it will make 2/3 code easier as Python 2.x code uses ``%s``; however, it is restricted in what it will accept:: - input type supports ``Py_buffer`` [6]_? use it to collect the necessary bytes - input type is something else? use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError`` In particular, ``%s`` will not accept numbers (use a numeric format code for that), nor ``str`` (encode it to ``bytes``). Examples:: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 'some string'.encode('utf8') b'some string' >>> b'%s' % 3.14 Traceback (most recent call last): ... TypeError: b'%s' does not accept numbers, use a numeric code instead >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: b'%s' does not accept 'str', it must be encoded to `bytes` ``%a`` will call ``ascii()`` on the interpolated value. This is intended as a debugging aid, rather than something that should be used in production. Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn`` representation. Use cases include developing a new protocol and writing landmarks into the stream; debugging data going into an existing protocol to see if the problem is the protocol itself or bad data; a fall-back for a serialization format; or even a rudimentary serialization format when defining ``__bytes__`` would not be appropriate [8]. .. note:: If a ``str`` is passed into ``%a``, it will be surrounded by quotes. Unsupported codes ----------------- ``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported. Proposed variations =================== It was suggested to let ``%s`` accept numbers, but since numbers have their own format codes this idea was discarded. It has been suggested to use ``%b`` for bytes as well as ``%s``. This was rejected as not adding any value either in clarity or simplicity. It has been proposed to automatically use ``.encode('ascii','strict')`` for ``str`` arguments to ``%s``. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have ``%s`` return the ascii-encoded repr when the value is a ``str`` (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Originally this PEP also proposed adding format-style formatting, but it was decided that format and its related machinery were all strictly text (aka ``str``) based, and it was dropped. Various new special methods were proposed, such as ``__ascii__``, ``__format_bytes__``, etc.; such methods are not needed at this time, but can be visited again later if real-world use shows deficiencies with this solution. Objections ========== The objections raised against this PEP were mainly variations on two themes:: - the ``bytes`` and ``bytearray`` types are for pure binary data, with no assumptions about encodings - offering %-interpolation that assumes an ASCII encoding will be an attractive nuisance and lead us back to the problems of the Python 2 ``str``/``unicode`` text model As was seen during the discussion, ``bytes`` and ``bytearray`` are also used for mixed binary data and ASCII-compatible segments: file formats such as ``dbf`` and ``pdf``, network protocols such as ``ftp`` and ``email``, etc. ``bytes`` and ``bytearray`` already have several methods which assume an ASCII compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to name just a few. %-interpolation, with its very restricted mini-language, will not be any more of a nuisance than the already existing methods. Some have objected to allowing the full range of numeric formatting codes with the claim that decimal alone would be sufficient. However, at least two formats (dbf and pdf) make use of non-decimal numbers. Footnotes ========= .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting .. [2] neither string.Template, format, nor str.format are under consideration .. [3] https://mail.python.org/pipermail/python-dev/2014-January/131518.html .. [4] to use a str object in a bytes interpolation, encode it first .. [5] %c is not an exception as neither of its possible arguments are str .. [6] http://docs.python.org/3/c-api/buffer.html examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes`` .. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__ .. [8] https://mail.python.org/pipermail/python-dev/2014-February/132750.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

2014-03-25 23:37 GMT+01:00 Ethan Furman <ethan@stoneleaf.us>:
``%a`` will call ``ascii()`` on the interpolated value.
I'm not sure that I understood correctly: is the "%a" format supported? The result of ascii() is a Unicode string. Does it mean that ("%a" % obj) should give the same result than ascii(obj).encode('ascii', 'strict')? Would it be possible to add a table or list to summarize supported format characters? I found: - single byte: %c - integer: %d, %u, %i, %o, %x, %X, %f, %g, "etc." (can you please complete "etc." ?) - bytes and __bytes__ method: %s - ascii(): %a I guess that the implementation of %a can avoid a conversion from ASCII ("PyUnicode_DecodeASCII" in the following code) and then a conversion to ASCII again (in bytes%args): PyObject * PyObject_ASCII(PyObject *v) { PyObject *repr, *ascii, *res; repr = PyObject_Repr(v); if (repr == NULL) return NULL; if (PyUnicode_IS_ASCII(repr)) return repr; /* repr is guaranteed to be a PyUnicode object by PyObject_Repr */ ascii = _PyUnicode_AsASCIIString(repr, "backslashreplace"); Py_DECREF(repr); if (ascii == NULL) return NULL; res = PyUnicode_DecodeASCII( <==== HERE PyBytes_AS_STRING(ascii), PyBytes_GET_SIZE(ascii), NULL); Py_DECREF(ascii); return res; }
This is intended as a debugging aid, rather than something that should be used in production.
I don't understand the purpose of this sentence. Does it mean that %a must not be used? IMO this sentence can be removed.
Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn`` representation.
Unicode is larger than that! print(ascii(chr(0x10ffff))) => '\U0010ffff'
I understand the debug use case. I'm not convinced by the serialization idea :-)
.. note::
If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
And: - bytes gets a "b" prefix and surrounded by quotes as well (b'...') - the quote ' is escaped as \' if the string contains quotes ' and " Can you also please add examples for %a?
The following more complex examples are maybe not needed:
Proposed variations ===================
It would be fair to mention also a whole different PEP, Antoine's PEP 460! Victor

On 03/26/2014 03:10 AM, Victor Stinner wrote:
Changed to: ------------------------------------------------------------------------------- ``%a`` will give the equivalent of ``repr(some_obj).encode('ascii', 'backslashreplace')`` on the interpolated value. Use cases include developing a new protocol and writing landmarks into the stream; debugging data going into an existing protocol to see if the problem is the protocol itself or bad data; a fall-back for a serialization format; or any situation where defining ``__bytes__`` would not be appropriate but a readable/informative representation is needed [8]. -------------------------------------------------------------------------------
Changed to: ------------------------------------------------------------------------------- %-interpolation --------------- All the numeric formatting codes (``d``, ``i``, ``o``, ``u``, ``x``, ``X``, ``e``, ``E'', ``f``, ``F``, ``g``, ``G``, and any that are subsequently added to Python 3) will be supported, and will work as they do for str, including the padding, justification and other related modifiers (currently ``#``, ``0``, ``-``, `` `` (space), and ``+`` (plus any added to Python 3)). The only non-numeric codes allowed are ``c``, ``s``, and ``a``. For the numeric codes, the only difference between ``str`` and ``bytes`` (or ``bytearray``) interpolation is that the results from these codes will be ASCII-encoded text, not unicode. In other words, for any numeric formatting code `%x`:: -------------------------------------------------------------------------------
I don't understand the purpose of this sentence. Does it mean that %a must not be used? IMO this sentence can be removed.
The sentence about %a being for debugging has been removed.
Removed. With the explicit reference to the 'backslashreplace' error handler any who want to know what it might look like can refer to that.
Shouldn't be an issue now with the new definition which no longer references the ascii() function.
Can you also please add examples for %a?
Examples:: >>> b'%a' % 3.14 b'3.14' >>> b'%a' % b'abc' b'abc' >>> b'%a' % 'def' b"'def'" -------------------------------------------------------------------------------
My apologies for the omission. ------------------------------------------------------------------------------- A competing PEP, ``PEP 460 Add binary interpolation and formatting`` [9], also exists. .. [9] http://python.org/dev/peps/pep-0460/ ------------------------------------------------------------------------------- Thank you, Victor.

The PEP 461 looks good to me. It's a nice addition to Python 3.5 and the PEP is well defined. I can help to implement it. Maybe, it would be nice to provide an implementation as a third-party party module on PyPI for Python 2.6-3.4. Note: I fixed a typo in your PEP (reST syntax). Victor 2014-03-26 23:47 GMT+01:00 Ethan Furman <ethan@stoneleaf.us>:

On 27 March 2014 20:47, Victor Stinner <victor.stinner@gmail.com> wrote:
The PEP 461 looks good to me. It's a nice addition to Python 3.5 and the PEP is well defined.
+1 from me as well. One minor request is that I don't think the rationale for rejecting numbers from "%s" is incomplete - IIRC, the problem there is that the normal path for handling those is the coercion via str() and this proposal deliberately *doesn't* allow that path. That means supporting numbers would mean writing a lot of *additional* code, and that isn't needed since 2/3 compatible code can just be adjusted to use an appropriate numeric code.
Note: I fixed a typo in your PEP (reST syntax).
I also committed a couple of markup tweaks, since it seemed easier to just fix them than explain what was broken. However, there are also two dead footnotes (4 & 5), which I have left alone - I'm not sure if the problem is a missing reference, or if the footnote can go away now. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 03/27/2014 04:26 AM, Nick Coghlan wrote:
Changed to --------------------------------------------------------------------------------- In particular, ``%s`` will not accept numbers nor ``str``. ``str`` is rejected as the string to bytes conversion requires an encoding, and we are refusing to guess; numbers are rejected because: - what makes a number is fuzzy (float? Decimal? Fraction? some user type?) - allowing numbers would lead to ambiguity between numbers and textual representations of numbers (3.14 vs '3.14') - given the nature of wire formats, explicit is definitely better than implicit ---------------------------------------------------------------------------------
Thanks to both of you for that.
Fixed. -- ~Ethan~

On Thu, Mar 27, 2014 at 3:47 AM, Victor Stinner <victor.stinner@gmail.com>wrote:
The PEP 461 looks good to me. It's a nice addition to Python 3.5 and the PEP is well defined.
+1
That is possible and would enable bytes formatting on earlier 3.x versions. I'm not sure if there is any value in backporting to 2.x as those already have such formatting with Python 2's str.__mod__ % operator. Though I don't know what it'd look like as an API as a module. Brainstorming: It'd either involve function calls to format instead of % or a container class to wrap format strings in with a __mod__ method that calls the bytes formatting code instead of native str % formatting when needed.
-gps

On 30 March 2014 05:01, Gregory P. Smith <greg@krypto.org> wrote:
The "future" project already contains a full backport of a true bytes type, rather than relying on Python 2 str objects: http://python-future.org/what_else.html#bytes It seems to me that the easiest way to make any forthcoming Python 3.5 enhancements (both binary interpolation and the other cleanups we are discussing over on Python ideas) available to single source 2/3 code bases is to commit to an API freeze for *those particular builtins* early, and then update "future" accordingly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Mar 25, 2014 at 11:37 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
I don't understand this restriction, and there isn't a rationale for it in the PEP (other than "you can already use numeric formats", which doesn't explain why it's undesirable to have it anyway.) It is extremely common in existing 2.x code to use %s for anything, just like people use {} for anything with str.format. Not supporting this feels like it would be problematic for porting code. Did this come up in the earlier discussions? -- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!

On 03/26/2014 08:14 AM, Thomas Wouters wrote:
And that's the problem -- in 2.x %s works always, but 3.x for bytes and bytearray %s will fail in numerous situations. It seems to me the main reason for using %s instead of %d is that 'some_var' may have a number, or it may have the textual representation of that number; in 3.x the first would succeed, the second would fail. That's the kind of intermittent failure we do not want. The PEP is not designed to make it so 2.x code can be ported as-is, but rather that 2.x code can be cleaned up (if necessary) and then run the same in both 2.x and 3.x (at least as far as byte and bytearray %-formatting is concerned).
Did this come up in the earlier discussions?
https://mail.python.org/pipermail/python-dev/2014-January/131576.html -- ~Ethan~

On 27 March 2014 21:24, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm the one that raised the "discourage misuse of __bytes__" concern, so I'd like %a to stay in at least for that reason. %a is a perfectly well defined format code (albeit one you'd only be likely to use while messing about with serialisation protocols, as the PEP describes - for example, if a %b code was ending up producing wrong data, you might switch to %a temporarily to get a better idea of where the bad data was coming from), while using __bytes__ to make %s behave the way %a is defined in the PEP would just be wrong in most cases. I consider %a the preemptive PEP 308 of binary interpolation format codes - in the absence of %a, I'm certain that users would end up abusing __bytes__ and %s to get the same effect, just as they used the known bug magnet that was the and/or hack for a long time in the absence of PEP 308. I also seem to recall Guido saying he liked it, which flipped the discussion from "do we have a good rationale for including it?" to "do we have a good rationale for the BDFL to ignore his instincts?". However, it would be up to Guido to confirm that recollection, and if "Guido likes it" is part of the reason for inclusion of the %a code, the PEP should mention that explicitly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Actually, I had ignored this discussion for so long that I was surprised by the outcome. My main use case isn't printing a number that may already be a string (I understand why that isn't reasonable when the output is expected to be bytes); it's printing a usually numeric value that may sometimes be None. It's a little surprising to have to use %a for this, but I guess I can live with it. On Thu, Mar 27, 2014 at 8:58 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
-- --Guido van Rossum (python.org/~guido)

I also don't understand why we can't use %b instead of %s. AFAIK %b currently doesn't mean anything and I somehow don't expect we're likely to add it for other reasons (unless there's a proposal I'm missing?). Just like we use %a instead of %r to remind people that it's not quite the same (since it applies .encode('ascii', 'backslashreplace')), shouldn't we use anything *but* %s to remind people that that is also not the same (not at all, in fact)? The PEP's argument against %b ("rejected as not adding any value either in clarity or simplicity") is hardly a good reason. On Thu, Mar 27, 2014 at 10:20 AM, Guido van Rossum <guido@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On 03/27/2014 10:29 AM, Guido van Rossum wrote:
The biggest reason to use %s is to support a common code base for 2/3 endeavors. The biggest reason to not include %b is that it means binary number in format(); given that each type can invent it's own mini-language, this probably isn't a very strong argument against it. I have moderate feelings for keeping %s as a synonym for %b for backwards compatibility with Py2 code (when it's appropriate). -- ~Ethan~

On 03/27/2014 10:55 AM, Ethan Furman wrote:
Changed to: ---------------------------------------------------------------------------------- ``%b`` will insert a series of bytes. These bytes are collected in one of two ways: - input type supports ``Py_buffer`` [4]_? use it to collect the necessary bytes - input type is something else? use its ``__bytes__`` method [5]_ ; if there isn't one, raise a ``TypeError`` In particular, ``%b`` will not accept numbers nor ``str``. ``str`` is rejected as the string to bytes conversion requires an encoding, and we are refusing to guess; numbers are rejected because: - what makes a number is fuzzy (float? Decimal? Fraction? some user type?) - allowing numbers would lead to ambiguity between numbers and textual representations of numbers (3.14 vs '3.14') - given the nature of wire formats, explicit is definitely better than implicit ``%s`` is included as a synonym for ``%b`` for the sole purpose of making 2/3 code bases easier to maintain. Python 3 only code should use ``%b``. Examples:: >>> b'%b' % b'abc' b'abc' >>> b'%b' % 'some string'.encode('utf8') b'some string' >>> b'%b' % 3.14 Traceback (most recent call last): ... TypeError: b'%b' does not accept 'float' >>> b'%b' % 'hello world!' Traceback (most recent call last): ... TypeError: b'%b' does not accept 'str' ---------------------------------------------------------------------------------- -- ~Ethan~

On Thu Mar 27 2014 at 2:42:40 PM, Guido van Rossum <guido@python.org> wrote:
Much better, but I'm still not happy with including %s at all. Otherwise it's accept-worthy. (How's that for pressure. :-)
But if we only add %b and leave out %s then how is this going to lead to Python 2/3 compatible code since %b is not in Python 2? Or am I misunderstanding you? -Brett

So what's the use case for Python 2/3 compatible code? IMO the main use case for the PEP is simply to be able to construct bytes from a combination of a template and some input that may include further bytes and numbers. E.g. in asyncio when you write an HTTP client or server you have to construct bytes to write to the socket, and I'd be happy if I could write b'HTTP/1.0 %d %b\r\n' % (status, message) rather than having to use str(status).encode('ascii') and concatenation or join(). On Thu, Mar 27, 2014 at 11:47 AM, Brett Cannon <bcannon@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 03/27/2014 11:53 AM, Guido van Rossum wrote:
My own dbf module [1] would make use of this feature, and I'm sure some of the pdf modules would as well (I recall somebody chiming in about their own pdf module). -- ~Ethan~ [1] https://pypi.python.org/pypi/dbf

On Thu, Mar 27, 2014 at 2:53 PM, Guido van Rossum <guido@python.org> wrote:
It seems to be notoriously difficult to understand or explain why Unicode can still be very hard in Python 3 or in code that is in the middle of being ported or has to run in both interpreters. As far as I can tell part of it is when a symbol has type(str or bytes) depending (declared as if we had a static type system with union types); some of it is because incorrect mixing can happen without an exception, only to be discovered later and far away in space and time from the error (worse of all in a serialized file), and part of it is all of the not easily checkable "types" a particular Unicode object has depending on whether it contains surrogates or codes > n. Sometimes you might simply disagree about whether an API should be returning bytes or Unicode in mildly ambiguous cases like base64 encoding. Sometimes Unicode is just intrinsically complicated. For me this PEP holds the promise of being able to do work in the bytes domain, with no accidental mixing ever, when I *really* want bytes. For 2+3 I would get exceptions sometimes in Python 2 and exceptions all the time in Python 3 for mistakes. I hope this is less error prone in strict domains than for example u"string processing".encode('latin1'). And I hope that there is very little type(str or int) in HTTP for example or other "legitimate" bytes domains but I don't know; I suspect that if you have a lot of problems with bytes' %s then it's a clue you should use (u"%s" % (argument)).encode() instead. sprintf()'s version of %s just takes a char* and puts it in without doing any type conversion of course. IANACL (I am not a C lawyer).

On Thu, 27 Mar 2014 18:47:59 +0000 Brett Cannon <bcannon@gmail.com> wrote:
I think we have reached a point where adding porting-related facilities in 3.5 may actually slow down the pace of porting, rather than accelerate it (because people will then wait for 3.5 to start porting stuff). Regards Antoine.

On Thu, Mar 27, 2014 at 3:05 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I understand that sentiment but that is an unjustified fear. It is not a good reason not to do it. Projects are already trying to port stuff today and running into roadblocks when it comes to ascii-compatible bytes formatting for real world data formats in code needing to be 2.x compatible. I'm pulling out my practicality beats purity card here. Mercurial is one of the large Python 2.4-2.7 code bases that needs this feature in order to support Python 3 in a sane manner. (+Augie Fackler to look at the latest http://legacy.python.org/dev/peps/pep-0461/ to confirm usefulness) -gps

On Sat, 29 Mar 2014 11:53:45 -0700 "Gregory P. Smith" <greg@krypto.org> wrote:
"Roadblocks" is an unjustified term here. Important code bases such as Tornado have already achieved this a long time ago. While lack of bytes formatting does make porting harder, it is not a "roadblock" as in you can't work it around.
http://www.selenic.com/pipermail/mercurial-devel/2014-March/057474.html Regards Antoine.

On 30 March 2014 07:01, Ethan Furman <ethan@stoneleaf.us> wrote:
I tend to call them "barriers to migration". Up to Python 3.4, my focus has been more on general "barriers to entry" for Python 3 that applied as much or more to new users as they did to existing ones - hence working on getting pip incorporated, providing a better path to mastery for the codec system, helping Larry with Argument Clinic, helping Eric with the simpler import customisation, trying to help improve the integration with the POSIX text model, assorted tweaks to make the type system more accessible etc. I think Python 3.4 is now in a pretty good place on that front, particularly with Larry stating up front that he considers the ongoing rollout of Argument Clinic usage to be in scope for Python 3.4.x maintenance releases. So for 3.5, I think it makes sense to focus on those "barriers to migration" and other activities that benefit existing Python 2 users more so than users that are completely new to Python and starting directly with Python 3. Binary interpolation is a big one (thanks Ethan!), as is the proposed policy change to allow network security features to evolve within Python 2.7 maintenance releases. Our community has done a lot of work to support us in our goal of modernising and migrating a large fraction of the ecosystem to a new version of the language, even though the full implications of the revised models for binary and text data turned out to be more profound than I think any of us realised back in 2006 when Guido first turned the previously hypothetical "Py3k" into a genuine active effort to create a new revision of the language, better suited to the global nature of the 21st century. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Mar 27, 2014 at 3:05 PM, Antoine Pitrou <solipsis@pitrou.net <mailto:solipsis@pitrou.net>> wrote:
I think we have reached a point where adding porting-related facilities
AFAIK, The only porting specific feature is %s as a synonym for %b. Not pretty, but tolerable. Otherwise, I have the impression that the PEP pretty much stands on its own.
Or, they should download the source and compile and continue or start porting as soon as the bytes % is added. Having earlier Windows and Mac preview binaries might help a tiny bit. If you are saying that Py3 development should not be driven by Py2 concerns, I agree. -- Terry Jan Reedy

On Mar 29, 2014, at 2:53 PM, Gregory P. Smith <greg@krypto.org> wrote:
That looks sufficient to me - the biggest thing is being able to do "abort: %s is broken" % some_filename_that_is_bytes and have that work sanely, as well as the numerics. This looks like exactly what we need, but I'd love to test it soon (I'm happy to build a 3.5 from tip for testing) so that if it's not Right[0] changes can be made before it's permanent. Feel encouraged to CC me on patches or something for testing (or mail me directly when it lands). Thanks! AF
-gps

On 4/12/2014 11:08 AM, Augie Fackler wrote:
Add yourself as nosy to http://bugs.python.org/issue20284 "patch to implement PEP 461 (%-interpolation for bytes)" Indeed, you could help test it the latest version, and others as posted. -- Terry Jan Reedy

I feel not including %s is nuts. Should I write .replace('%b', '%s')? All I desperately need are APIs that provide enough unicode / str type safety that I get an exception when mixing them accidentally... in my own code, dynamic typing is usually a bug. As has been endlessly discussed, %s for bytes is a bit like exposing sprintf()... On Thu, Mar 27, 2014 at 2:41 PM, Guido van Rossum <guido@python.org> wrote:

On Thu, Mar 27, 2014 at 11:52 AM, Daniel Holth <dholth@gmail.com> wrote:
I feel not including %s is nuts. Should I write .replace('%b', '%s')?
I assume you meant .replace('%s', '%b') (unless you're converting Python 3 code to Python 2, which would mean you really are nuts :-). But that's not going to help for the majority of code using %s -- as I am trying to argue, %s doesn't mean "expect the argument to be a str" and neither is that how it's commonly used (although it's *possible* that that is how *you* use it exclusively -- that doesn't make you nuts, just more strict than most people).
I don't understand that last claim (I can't figure out whether in this context is exposing sprintf() is considered good or bad). But apart from that, can you give some specific examples? PS. I am not trying to be difficult. I honestly don't understand the use case yet, and the PEP doesn't do much to support it. -- --Guido van Rossum (python.org/~guido)

On 3/27/2014 11:59 AM, Guido van Rossum wrote:
That _is_ how it is commonly used in Py2 when dealing with binary data in mixed ASCII/binary protocols, is what I've been hearing in this discussion, and what small use I've made of Py2 when some unported module forced me to use it (I started Python about the time Py3 was released)... the expected argument is a (Py2) str containing binary data (would be bytes in Py3). While there are many other reasons to use %s in other coding situations, this is the only way to do bytes interpolations using %. And there is no %b in Py2, so for Py2/3 compatibility, %s needs to do bytes interpolations in Py3. And if it does, there is no need for %b in Py3 %, because they would be identical and redundant.

On 03/27/2014 11:59 AM, Guido van Rossum wrote:
PS. I am not trying to be difficult. I honestly don't understand the use case yet, and the PEP doesn't do much to support it.
How's this? ---------------------------------------------------------------------------- Compatibility with Python 2 =========================== As noted above, ``%s`` is being included solely to help ease migration from, and/or have a single code base with, Python 2. This is important as there are modules both in the wild and behind closed doors that currently use the Python 2 ``str`` type as a ``bytes`` container, and hence are using ``%s`` as a bytes interpolator. However, ``%b`` should be used in new, Python 3 only code, so ``%s`` will immediately be deprecated, but not removed until the next major Python release. ---------------------------------------------------------------------------- -- ~Ethan~

On 03/27/2014 11:41 AM, Guido van Rossum wrote:
Much better, but I'm still not happy with including %s at all. Otherwise it's accept-worthy. (How's that for pressure. :-)
FWIW, I feel the same, but the need for compatible 2/3 code bases is real. Hey, how's this? We'll let %s in, but immediately deprecate it. ;) Of course, we won't remove it until Python IV. -- ~Ethan~

On Thu, Mar 27, 2014 at 10:55 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
But it's mostly useless for that purpose. In Python 2, in practice %s doesn't mean "string". It means "use the default formatting just as if I was using print." And in theory it also means that -- in fact "call __str__()" is the formal definition, and print is also defined as using __str__, and this is all intentional. (I also intended __str__ to be *mostly* the same as __repr__, with a specific exception for the str type itself. In practice some frameworks have adopted a different interpretation, making __repr__ produce something *more* "user friendly" than __str__ but including newlines, because some people believe the main use case for __repr__ is the interactive prompt. I believe this causes problems for some *other* uses of __repr__, such as for producing an "unambiguous" representation useful for e.g. logging -- but I don't want to be too bitter about it. :-) The biggest reason to not include %b is that it means binary number in
format(); given that each type can invent it's own mini-language, this probably isn't a very strong argument against it.
Especially since I can't imagine the spelling in format() includes '%'.
I have moderate feelings for keeping %s as a synonym for %b for backwards compatibility with Py2 code (when it's appropriate).
I think it's mere existence (with the restrictions currently in the PEP) would cause more confusion than that is worth. -- --Guido van Rossum (python.org/~guido)

On Thu, Mar 27, 2014 at 11:34 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
That is true. And we can't change Python 2. I still have this idea in my head that *most* cases where %s is used in Python 2 will break in Python 3 under the PEP's rules, but perhaps they are not the majority of situations where the context is manipulating bytes. And I suppose that *very* few internet protocols are designed to accept either an integer or the literal string None, so that use case (which I brought up) isn't very realistic -- in fact it may be better to raise an exception rather than sending a protocol violation. So, I think you have changed my mind. I still like the idea of promoting %b in pure Python 3 code to emphasize that it really behaves very differently from %s; but I now have peace with %s as an alias. (It might also benefit cases where somehow there's a symmetry in some Python 3 code between bytes and str.) -- --Guido van Rossum (python.org/~guido)

On 2014-03-27 15:58, Ethan Furman wrote:
Date: Mon, 13 Jan 2014 12:09:23 -0800 Subject: Re: [Python-Dev] PEP 460 reboot """If we have %b for strictly interpolating bytes, I'm fine with adding %a for calling ascii() on the argument and then interpolating the result after ASCII-encoding it."""

On Thu, 27 Mar 2014 12:24:49 +0100, Antoine Pitrou <solipsis@pitrou.net> wrote:
The use cases came from someone else (Jim Jewett?) so you should be asking him, not Ethan :) As for the "did you actually do those things in real life", I know I've done the "dump the repr into the data (protocol) stream to see what I've really got here" debug trick in the string context, so I have no doubt that I will want to do it in the bytes context as well. In fact, it is probably somewhat more likely in the bytes context, since I know I've been in situations with data exchange protocols where I couldn't get console output and setting up logging was much more painful than just dumping the debug data into into the data stream. Or where doing so made it much clearer what was going on than separate logging would. I've done the 'landmark' thing as well, in the string context; that can be very useful when doing incremental test driven development. (Granted, you could do that with __bytes__; you might well be writing a __bytes__ method anyway as the next step, but it *is* more overhead/boilerplate than just starting with %a...and it gets people used to reaching for __bytes__ for the "wrong" purpose, which is Nick's concern). In theory I can see using %a for serialization in certain limited contexts (I've done that with string repr in private utility scripts), but in practice I doubt that would happen in a binary context, since those are much more likely to be actually going over a "wire" of some sort (ie: places you really don't want to use eval even when it would work). So yeah, I think %a has *practical* utility. --David

2014-03-25 23:37 GMT+01:00 Ethan Furman <ethan@stoneleaf.us>:
``%a`` will call ``ascii()`` on the interpolated value.
I'm not sure that I understood correctly: is the "%a" format supported? The result of ascii() is a Unicode string. Does it mean that ("%a" % obj) should give the same result than ascii(obj).encode('ascii', 'strict')? Would it be possible to add a table or list to summarize supported format characters? I found: - single byte: %c - integer: %d, %u, %i, %o, %x, %X, %f, %g, "etc." (can you please complete "etc." ?) - bytes and __bytes__ method: %s - ascii(): %a I guess that the implementation of %a can avoid a conversion from ASCII ("PyUnicode_DecodeASCII" in the following code) and then a conversion to ASCII again (in bytes%args): PyObject * PyObject_ASCII(PyObject *v) { PyObject *repr, *ascii, *res; repr = PyObject_Repr(v); if (repr == NULL) return NULL; if (PyUnicode_IS_ASCII(repr)) return repr; /* repr is guaranteed to be a PyUnicode object by PyObject_Repr */ ascii = _PyUnicode_AsASCIIString(repr, "backslashreplace"); Py_DECREF(repr); if (ascii == NULL) return NULL; res = PyUnicode_DecodeASCII( <==== HERE PyBytes_AS_STRING(ascii), PyBytes_GET_SIZE(ascii), NULL); Py_DECREF(ascii); return res; }
This is intended as a debugging aid, rather than something that should be used in production.
I don't understand the purpose of this sentence. Does it mean that %a must not be used? IMO this sentence can be removed.
Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn`` representation.
Unicode is larger than that! print(ascii(chr(0x10ffff))) => '\U0010ffff'
I understand the debug use case. I'm not convinced by the serialization idea :-)
.. note::
If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
And: - bytes gets a "b" prefix and surrounded by quotes as well (b'...') - the quote ' is escaped as \' if the string contains quotes ' and " Can you also please add examples for %a?
The following more complex examples are maybe not needed:
Proposed variations ===================
It would be fair to mention also a whole different PEP, Antoine's PEP 460! Victor

On 03/26/2014 03:10 AM, Victor Stinner wrote:
Changed to: ------------------------------------------------------------------------------- ``%a`` will give the equivalent of ``repr(some_obj).encode('ascii', 'backslashreplace')`` on the interpolated value. Use cases include developing a new protocol and writing landmarks into the stream; debugging data going into an existing protocol to see if the problem is the protocol itself or bad data; a fall-back for a serialization format; or any situation where defining ``__bytes__`` would not be appropriate but a readable/informative representation is needed [8]. -------------------------------------------------------------------------------
Changed to: ------------------------------------------------------------------------------- %-interpolation --------------- All the numeric formatting codes (``d``, ``i``, ``o``, ``u``, ``x``, ``X``, ``e``, ``E'', ``f``, ``F``, ``g``, ``G``, and any that are subsequently added to Python 3) will be supported, and will work as they do for str, including the padding, justification and other related modifiers (currently ``#``, ``0``, ``-``, `` `` (space), and ``+`` (plus any added to Python 3)). The only non-numeric codes allowed are ``c``, ``s``, and ``a``. For the numeric codes, the only difference between ``str`` and ``bytes`` (or ``bytearray``) interpolation is that the results from these codes will be ASCII-encoded text, not unicode. In other words, for any numeric formatting code `%x`:: -------------------------------------------------------------------------------
I don't understand the purpose of this sentence. Does it mean that %a must not be used? IMO this sentence can be removed.
The sentence about %a being for debugging has been removed.
Removed. With the explicit reference to the 'backslashreplace' error handler any who want to know what it might look like can refer to that.
Shouldn't be an issue now with the new definition which no longer references the ascii() function.
Can you also please add examples for %a?
Examples:: >>> b'%a' % 3.14 b'3.14' >>> b'%a' % b'abc' b'abc' >>> b'%a' % 'def' b"'def'" -------------------------------------------------------------------------------
My apologies for the omission. ------------------------------------------------------------------------------- A competing PEP, ``PEP 460 Add binary interpolation and formatting`` [9], also exists. .. [9] http://python.org/dev/peps/pep-0460/ ------------------------------------------------------------------------------- Thank you, Victor.

The PEP 461 looks good to me. It's a nice addition to Python 3.5 and the PEP is well defined. I can help to implement it. Maybe, it would be nice to provide an implementation as a third-party party module on PyPI for Python 2.6-3.4. Note: I fixed a typo in your PEP (reST syntax). Victor 2014-03-26 23:47 GMT+01:00 Ethan Furman <ethan@stoneleaf.us>:

On 27 March 2014 20:47, Victor Stinner <victor.stinner@gmail.com> wrote:
The PEP 461 looks good to me. It's a nice addition to Python 3.5 and the PEP is well defined.
+1 from me as well. One minor request is that I don't think the rationale for rejecting numbers from "%s" is incomplete - IIRC, the problem there is that the normal path for handling those is the coercion via str() and this proposal deliberately *doesn't* allow that path. That means supporting numbers would mean writing a lot of *additional* code, and that isn't needed since 2/3 compatible code can just be adjusted to use an appropriate numeric code.
Note: I fixed a typo in your PEP (reST syntax).
I also committed a couple of markup tweaks, since it seemed easier to just fix them than explain what was broken. However, there are also two dead footnotes (4 & 5), which I have left alone - I'm not sure if the problem is a missing reference, or if the footnote can go away now. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 03/27/2014 04:26 AM, Nick Coghlan wrote:
Changed to --------------------------------------------------------------------------------- In particular, ``%s`` will not accept numbers nor ``str``. ``str`` is rejected as the string to bytes conversion requires an encoding, and we are refusing to guess; numbers are rejected because: - what makes a number is fuzzy (float? Decimal? Fraction? some user type?) - allowing numbers would lead to ambiguity between numbers and textual representations of numbers (3.14 vs '3.14') - given the nature of wire formats, explicit is definitely better than implicit ---------------------------------------------------------------------------------
Thanks to both of you for that.
Fixed. -- ~Ethan~

On Thu, Mar 27, 2014 at 3:47 AM, Victor Stinner <victor.stinner@gmail.com>wrote:
The PEP 461 looks good to me. It's a nice addition to Python 3.5 and the PEP is well defined.
+1
That is possible and would enable bytes formatting on earlier 3.x versions. I'm not sure if there is any value in backporting to 2.x as those already have such formatting with Python 2's str.__mod__ % operator. Though I don't know what it'd look like as an API as a module. Brainstorming: It'd either involve function calls to format instead of % or a container class to wrap format strings in with a __mod__ method that calls the bytes formatting code instead of native str % formatting when needed.
-gps

On 30 March 2014 05:01, Gregory P. Smith <greg@krypto.org> wrote:
The "future" project already contains a full backport of a true bytes type, rather than relying on Python 2 str objects: http://python-future.org/what_else.html#bytes It seems to me that the easiest way to make any forthcoming Python 3.5 enhancements (both binary interpolation and the other cleanups we are discussing over on Python ideas) available to single source 2/3 code bases is to commit to an API freeze for *those particular builtins* early, and then update "future" accordingly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Mar 25, 2014 at 11:37 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
I don't understand this restriction, and there isn't a rationale for it in the PEP (other than "you can already use numeric formats", which doesn't explain why it's undesirable to have it anyway.) It is extremely common in existing 2.x code to use %s for anything, just like people use {} for anything with str.format. Not supporting this feels like it would be problematic for porting code. Did this come up in the earlier discussions? -- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!

On 03/26/2014 08:14 AM, Thomas Wouters wrote:
And that's the problem -- in 2.x %s works always, but 3.x for bytes and bytearray %s will fail in numerous situations. It seems to me the main reason for using %s instead of %d is that 'some_var' may have a number, or it may have the textual representation of that number; in 3.x the first would succeed, the second would fail. That's the kind of intermittent failure we do not want. The PEP is not designed to make it so 2.x code can be ported as-is, but rather that 2.x code can be cleaned up (if necessary) and then run the same in both 2.x and 3.x (at least as far as byte and bytearray %-formatting is concerned).
Did this come up in the earlier discussions?
https://mail.python.org/pipermail/python-dev/2014-January/131576.html -- ~Ethan~

On 27 March 2014 21:24, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm the one that raised the "discourage misuse of __bytes__" concern, so I'd like %a to stay in at least for that reason. %a is a perfectly well defined format code (albeit one you'd only be likely to use while messing about with serialisation protocols, as the PEP describes - for example, if a %b code was ending up producing wrong data, you might switch to %a temporarily to get a better idea of where the bad data was coming from), while using __bytes__ to make %s behave the way %a is defined in the PEP would just be wrong in most cases. I consider %a the preemptive PEP 308 of binary interpolation format codes - in the absence of %a, I'm certain that users would end up abusing __bytes__ and %s to get the same effect, just as they used the known bug magnet that was the and/or hack for a long time in the absence of PEP 308. I also seem to recall Guido saying he liked it, which flipped the discussion from "do we have a good rationale for including it?" to "do we have a good rationale for the BDFL to ignore his instincts?". However, it would be up to Guido to confirm that recollection, and if "Guido likes it" is part of the reason for inclusion of the %a code, the PEP should mention that explicitly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Actually, I had ignored this discussion for so long that I was surprised by the outcome. My main use case isn't printing a number that may already be a string (I understand why that isn't reasonable when the output is expected to be bytes); it's printing a usually numeric value that may sometimes be None. It's a little surprising to have to use %a for this, but I guess I can live with it. On Thu, Mar 27, 2014 at 8:58 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
-- --Guido van Rossum (python.org/~guido)

I also don't understand why we can't use %b instead of %s. AFAIK %b currently doesn't mean anything and I somehow don't expect we're likely to add it for other reasons (unless there's a proposal I'm missing?). Just like we use %a instead of %r to remind people that it's not quite the same (since it applies .encode('ascii', 'backslashreplace')), shouldn't we use anything *but* %s to remind people that that is also not the same (not at all, in fact)? The PEP's argument against %b ("rejected as not adding any value either in clarity or simplicity") is hardly a good reason. On Thu, Mar 27, 2014 at 10:20 AM, Guido van Rossum <guido@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On 03/27/2014 10:29 AM, Guido van Rossum wrote:
The biggest reason to use %s is to support a common code base for 2/3 endeavors. The biggest reason to not include %b is that it means binary number in format(); given that each type can invent it's own mini-language, this probably isn't a very strong argument against it. I have moderate feelings for keeping %s as a synonym for %b for backwards compatibility with Py2 code (when it's appropriate). -- ~Ethan~

On 03/27/2014 10:55 AM, Ethan Furman wrote:
Changed to: ---------------------------------------------------------------------------------- ``%b`` will insert a series of bytes. These bytes are collected in one of two ways: - input type supports ``Py_buffer`` [4]_? use it to collect the necessary bytes - input type is something else? use its ``__bytes__`` method [5]_ ; if there isn't one, raise a ``TypeError`` In particular, ``%b`` will not accept numbers nor ``str``. ``str`` is rejected as the string to bytes conversion requires an encoding, and we are refusing to guess; numbers are rejected because: - what makes a number is fuzzy (float? Decimal? Fraction? some user type?) - allowing numbers would lead to ambiguity between numbers and textual representations of numbers (3.14 vs '3.14') - given the nature of wire formats, explicit is definitely better than implicit ``%s`` is included as a synonym for ``%b`` for the sole purpose of making 2/3 code bases easier to maintain. Python 3 only code should use ``%b``. Examples:: >>> b'%b' % b'abc' b'abc' >>> b'%b' % 'some string'.encode('utf8') b'some string' >>> b'%b' % 3.14 Traceback (most recent call last): ... TypeError: b'%b' does not accept 'float' >>> b'%b' % 'hello world!' Traceback (most recent call last): ... TypeError: b'%b' does not accept 'str' ---------------------------------------------------------------------------------- -- ~Ethan~

On Thu Mar 27 2014 at 2:42:40 PM, Guido van Rossum <guido@python.org> wrote:
Much better, but I'm still not happy with including %s at all. Otherwise it's accept-worthy. (How's that for pressure. :-)
But if we only add %b and leave out %s then how is this going to lead to Python 2/3 compatible code since %b is not in Python 2? Or am I misunderstanding you? -Brett

So what's the use case for Python 2/3 compatible code? IMO the main use case for the PEP is simply to be able to construct bytes from a combination of a template and some input that may include further bytes and numbers. E.g. in asyncio when you write an HTTP client or server you have to construct bytes to write to the socket, and I'd be happy if I could write b'HTTP/1.0 %d %b\r\n' % (status, message) rather than having to use str(status).encode('ascii') and concatenation or join(). On Thu, Mar 27, 2014 at 11:47 AM, Brett Cannon <bcannon@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 03/27/2014 11:53 AM, Guido van Rossum wrote:
My own dbf module [1] would make use of this feature, and I'm sure some of the pdf modules would as well (I recall somebody chiming in about their own pdf module). -- ~Ethan~ [1] https://pypi.python.org/pypi/dbf

On Thu, Mar 27, 2014 at 2:53 PM, Guido van Rossum <guido@python.org> wrote:
It seems to be notoriously difficult to understand or explain why Unicode can still be very hard in Python 3 or in code that is in the middle of being ported or has to run in both interpreters. As far as I can tell part of it is when a symbol has type(str or bytes) depending (declared as if we had a static type system with union types); some of it is because incorrect mixing can happen without an exception, only to be discovered later and far away in space and time from the error (worse of all in a serialized file), and part of it is all of the not easily checkable "types" a particular Unicode object has depending on whether it contains surrogates or codes > n. Sometimes you might simply disagree about whether an API should be returning bytes or Unicode in mildly ambiguous cases like base64 encoding. Sometimes Unicode is just intrinsically complicated. For me this PEP holds the promise of being able to do work in the bytes domain, with no accidental mixing ever, when I *really* want bytes. For 2+3 I would get exceptions sometimes in Python 2 and exceptions all the time in Python 3 for mistakes. I hope this is less error prone in strict domains than for example u"string processing".encode('latin1'). And I hope that there is very little type(str or int) in HTTP for example or other "legitimate" bytes domains but I don't know; I suspect that if you have a lot of problems with bytes' %s then it's a clue you should use (u"%s" % (argument)).encode() instead. sprintf()'s version of %s just takes a char* and puts it in without doing any type conversion of course. IANACL (I am not a C lawyer).

On Thu, 27 Mar 2014 18:47:59 +0000 Brett Cannon <bcannon@gmail.com> wrote:
I think we have reached a point where adding porting-related facilities in 3.5 may actually slow down the pace of porting, rather than accelerate it (because people will then wait for 3.5 to start porting stuff). Regards Antoine.

On Thu, Mar 27, 2014 at 3:05 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I understand that sentiment but that is an unjustified fear. It is not a good reason not to do it. Projects are already trying to port stuff today and running into roadblocks when it comes to ascii-compatible bytes formatting for real world data formats in code needing to be 2.x compatible. I'm pulling out my practicality beats purity card here. Mercurial is one of the large Python 2.4-2.7 code bases that needs this feature in order to support Python 3 in a sane manner. (+Augie Fackler to look at the latest http://legacy.python.org/dev/peps/pep-0461/ to confirm usefulness) -gps

On Sat, 29 Mar 2014 11:53:45 -0700 "Gregory P. Smith" <greg@krypto.org> wrote:
"Roadblocks" is an unjustified term here. Important code bases such as Tornado have already achieved this a long time ago. While lack of bytes formatting does make porting harder, it is not a "roadblock" as in you can't work it around.
http://www.selenic.com/pipermail/mercurial-devel/2014-March/057474.html Regards Antoine.

On 30 March 2014 07:01, Ethan Furman <ethan@stoneleaf.us> wrote:
I tend to call them "barriers to migration". Up to Python 3.4, my focus has been more on general "barriers to entry" for Python 3 that applied as much or more to new users as they did to existing ones - hence working on getting pip incorporated, providing a better path to mastery for the codec system, helping Larry with Argument Clinic, helping Eric with the simpler import customisation, trying to help improve the integration with the POSIX text model, assorted tweaks to make the type system more accessible etc. I think Python 3.4 is now in a pretty good place on that front, particularly with Larry stating up front that he considers the ongoing rollout of Argument Clinic usage to be in scope for Python 3.4.x maintenance releases. So for 3.5, I think it makes sense to focus on those "barriers to migration" and other activities that benefit existing Python 2 users more so than users that are completely new to Python and starting directly with Python 3. Binary interpolation is a big one (thanks Ethan!), as is the proposed policy change to allow network security features to evolve within Python 2.7 maintenance releases. Our community has done a lot of work to support us in our goal of modernising and migrating a large fraction of the ecosystem to a new version of the language, even though the full implications of the revised models for binary and text data turned out to be more profound than I think any of us realised back in 2006 when Guido first turned the previously hypothetical "Py3k" into a genuine active effort to create a new revision of the language, better suited to the global nature of the 21st century. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Mar 27, 2014 at 3:05 PM, Antoine Pitrou <solipsis@pitrou.net <mailto:solipsis@pitrou.net>> wrote:
I think we have reached a point where adding porting-related facilities
AFAIK, The only porting specific feature is %s as a synonym for %b. Not pretty, but tolerable. Otherwise, I have the impression that the PEP pretty much stands on its own.
Or, they should download the source and compile and continue or start porting as soon as the bytes % is added. Having earlier Windows and Mac preview binaries might help a tiny bit. If you are saying that Py3 development should not be driven by Py2 concerns, I agree. -- Terry Jan Reedy

On Mar 29, 2014, at 2:53 PM, Gregory P. Smith <greg@krypto.org> wrote:
That looks sufficient to me - the biggest thing is being able to do "abort: %s is broken" % some_filename_that_is_bytes and have that work sanely, as well as the numerics. This looks like exactly what we need, but I'd love to test it soon (I'm happy to build a 3.5 from tip for testing) so that if it's not Right[0] changes can be made before it's permanent. Feel encouraged to CC me on patches or something for testing (or mail me directly when it lands). Thanks! AF
-gps

On 4/12/2014 11:08 AM, Augie Fackler wrote:
Add yourself as nosy to http://bugs.python.org/issue20284 "patch to implement PEP 461 (%-interpolation for bytes)" Indeed, you could help test it the latest version, and others as posted. -- Terry Jan Reedy

I feel not including %s is nuts. Should I write .replace('%b', '%s')? All I desperately need are APIs that provide enough unicode / str type safety that I get an exception when mixing them accidentally... in my own code, dynamic typing is usually a bug. As has been endlessly discussed, %s for bytes is a bit like exposing sprintf()... On Thu, Mar 27, 2014 at 2:41 PM, Guido van Rossum <guido@python.org> wrote:

On Thu, Mar 27, 2014 at 11:52 AM, Daniel Holth <dholth@gmail.com> wrote:
I feel not including %s is nuts. Should I write .replace('%b', '%s')?
I assume you meant .replace('%s', '%b') (unless you're converting Python 3 code to Python 2, which would mean you really are nuts :-). But that's not going to help for the majority of code using %s -- as I am trying to argue, %s doesn't mean "expect the argument to be a str" and neither is that how it's commonly used (although it's *possible* that that is how *you* use it exclusively -- that doesn't make you nuts, just more strict than most people).
I don't understand that last claim (I can't figure out whether in this context is exposing sprintf() is considered good or bad). But apart from that, can you give some specific examples? PS. I am not trying to be difficult. I honestly don't understand the use case yet, and the PEP doesn't do much to support it. -- --Guido van Rossum (python.org/~guido)

On 3/27/2014 11:59 AM, Guido van Rossum wrote:
That _is_ how it is commonly used in Py2 when dealing with binary data in mixed ASCII/binary protocols, is what I've been hearing in this discussion, and what small use I've made of Py2 when some unported module forced me to use it (I started Python about the time Py3 was released)... the expected argument is a (Py2) str containing binary data (would be bytes in Py3). While there are many other reasons to use %s in other coding situations, this is the only way to do bytes interpolations using %. And there is no %b in Py2, so for Py2/3 compatibility, %s needs to do bytes interpolations in Py3. And if it does, there is no need for %b in Py3 %, because they would be identical and redundant.

On 03/27/2014 11:59 AM, Guido van Rossum wrote:
PS. I am not trying to be difficult. I honestly don't understand the use case yet, and the PEP doesn't do much to support it.
How's this? ---------------------------------------------------------------------------- Compatibility with Python 2 =========================== As noted above, ``%s`` is being included solely to help ease migration from, and/or have a single code base with, Python 2. This is important as there are modules both in the wild and behind closed doors that currently use the Python 2 ``str`` type as a ``bytes`` container, and hence are using ``%s`` as a bytes interpolator. However, ``%b`` should be used in new, Python 3 only code, so ``%s`` will immediately be deprecated, but not removed until the next major Python release. ---------------------------------------------------------------------------- -- ~Ethan~

On 03/27/2014 11:41 AM, Guido van Rossum wrote:
Much better, but I'm still not happy with including %s at all. Otherwise it's accept-worthy. (How's that for pressure. :-)
FWIW, I feel the same, but the need for compatible 2/3 code bases is real. Hey, how's this? We'll let %s in, but immediately deprecate it. ;) Of course, we won't remove it until Python IV. -- ~Ethan~

On Thu, Mar 27, 2014 at 10:55 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
But it's mostly useless for that purpose. In Python 2, in practice %s doesn't mean "string". It means "use the default formatting just as if I was using print." And in theory it also means that -- in fact "call __str__()" is the formal definition, and print is also defined as using __str__, and this is all intentional. (I also intended __str__ to be *mostly* the same as __repr__, with a specific exception for the str type itself. In practice some frameworks have adopted a different interpretation, making __repr__ produce something *more* "user friendly" than __str__ but including newlines, because some people believe the main use case for __repr__ is the interactive prompt. I believe this causes problems for some *other* uses of __repr__, such as for producing an "unambiguous" representation useful for e.g. logging -- but I don't want to be too bitter about it. :-) The biggest reason to not include %b is that it means binary number in
format(); given that each type can invent it's own mini-language, this probably isn't a very strong argument against it.
Especially since I can't imagine the spelling in format() includes '%'.
I have moderate feelings for keeping %s as a synonym for %b for backwards compatibility with Py2 code (when it's appropriate).
I think it's mere existence (with the restrictions currently in the PEP) would cause more confusion than that is worth. -- --Guido van Rossum (python.org/~guido)

On Thu, Mar 27, 2014 at 11:34 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
That is true. And we can't change Python 2. I still have this idea in my head that *most* cases where %s is used in Python 2 will break in Python 3 under the PEP's rules, but perhaps they are not the majority of situations where the context is manipulating bytes. And I suppose that *very* few internet protocols are designed to accept either an integer or the literal string None, so that use case (which I brought up) isn't very realistic -- in fact it may be better to raise an exception rather than sending a protocol violation. So, I think you have changed my mind. I still like the idea of promoting %b in pure Python 3 code to emphasize that it really behaves very differently from %s; but I now have peace with %s as an alias. (It might also benefit cases where somehow there's a symmetry in some Python 3 code between bytes and str.) -- --Guido van Rossum (python.org/~guido)

On 2014-03-27 15:58, Ethan Furman wrote:
Date: Mon, 13 Jan 2014 12:09:23 -0800 Subject: Re: [Python-Dev] PEP 460 reboot """If we have %b for strictly interpolating bytes, I'm fine with adding %a for calling ascii() on the argument and then interpolating the result after ASCII-encoding it."""

On Thu, 27 Mar 2014 12:24:49 +0100, Antoine Pitrou <solipsis@pitrou.net> wrote:
The use cases came from someone else (Jim Jewett?) so you should be asking him, not Ethan :) As for the "did you actually do those things in real life", I know I've done the "dump the repr into the data (protocol) stream to see what I've really got here" debug trick in the string context, so I have no doubt that I will want to do it in the bytes context as well. In fact, it is probably somewhat more likely in the bytes context, since I know I've been in situations with data exchange protocols where I couldn't get console output and setting up logging was much more painful than just dumping the debug data into into the data stream. Or where doing so made it much clearer what was going on than separate logging would. I've done the 'landmark' thing as well, in the string context; that can be very useful when doing incremental test driven development. (Granted, you could do that with __bytes__; you might well be writing a __bytes__ method anyway as the next step, but it *is* more overhead/boilerplate than just starting with %a...and it gets people used to reaching for __bytes__ for the "wrong" purpose, which is Nick's concern). In theory I can see using %a for serialization in certain limited contexts (I've done that with string repr in private utility scripts), but in practice I doubt that would happen in a binary context, since those are much more likely to be actually going over a "wire" of some sort (ie: places you really don't want to use eval even when it would work). So yeah, I think %a has *practical* utility. --David
participants (15)
-
Antoine Pitrou
-
Augie Fackler
-
Brett Cannon
-
Daniel Holth
-
Ethan Furman
-
Glenn Linderman
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
MRAB
-
Nick Coghlan
-
R. David Murray
-
Terry Reedy
-
Thomas Wouters
-
Victor Stinner