[Python-Dev] PEP 461 - Adding % and {} formatting to bytes

Brett Cannon brett at python.org
Wed Jan 15 15:45:10 CET 2014


bytes.format() below. I'll leave it to you to decide if they warrant using,
leaving as an open question, or rejecting.


On Tue, Jan 14, 2014 at 2:56 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> Duh.  Here's the text, as well.  ;)
>
>
> PEP: 461
> Title: Adding % and {} formatting to bytes
> Version: $Revision$
> Last-Modified: $Date$
> Author: Ethan Furman <ethan at stoneleaf.us>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-01-13
> Python-Version: 3.5
> Post-History: 2014-01-13
> Resolution:
>
>
> Abstract
> ========
>
> This PEP proposes adding the % and {} formatting operations from str to
> bytes.
>
>
> Proposed semantics for bytes formatting
> =======================================
>
> %-interpolation
> ---------------
>
> All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
> will be supported, and will work as they do for str, including the
> padding, justification and other related modifiers.
>
> Example::
>
>    >>> b'%4x' % 10
>    b'   a'
>
> %c will insert a single byte, either from an int in range(256), or from
> a bytes argument of length 1.
>
> Example:
>
>     >>> b'%c' % 48
>     b'0'
>
>     >>> b'%c' % b'a'
>     b'a'
>
> %s, because it is the most general, has the most convoluted resolution:
>
>   - input type is bytes?
>     pass it straight through
>
>   - input type is numeric?
>     use its __xxx__ [1] [2] method and ascii-encode it (strictly)
>
>   - input type is something else?
>     use its __bytes__ method; if there isn't one, raise an exception [3]
>
> Examples:
>
>     >>> b'%s' % b'abc'
>     b'abc'
>
>     >>> b'%s' % 3.14
>     b'3.14'
>
>     >>> b'%s' % 'hello world!'
>     Traceback (most recent call last):
>     ...
>     TypeError: 'hello world' has no __bytes__ method, perhaps you need to
> encode it?
>
> .. note::
>
>    Because the str type does not have a __bytes__ method, attempts to
>    directly use 'a string' as a bytes interpolation value will raise an
>    exception.  To use 'string' values, they must be encoded or otherwise
>    transformed into a bytes sequence::
>
>       'a string'.encode('latin-1')
>
>
> format
> ------
>
> The format mini language will be used as-is, with the behaviors as listed
> for %-interpolation.
>

That's too vague; % interpolation does not support other format operators
in the same way as str.format() does. % interpolation has specific code to
support %d, etc. But str.format() gets supported for {:d} not from special
code but because e.g. float.__format__('d') works. So you can't say
"bytes.format() supports {:d} just like %d works with string interpolation"
since the mechanisms are fundamentally different.

This is why I have argued that if you specify it as "if there is a format
spec specified, then the return value from calling __format__() will have
str.decode('ascii', 'strict') called on it" you get the support for the
various number-specific format specs for free. It also means if you pass in
a string that you just want the strict ASCII bytes of then you can get it
with {:s}.

I also think that a 'b' conversion be added to bytes.format(). This doesn't
have the same issue as %b if you make {} implicitly mean {!b} in Python 3.5
as {} will mean what is the most accurate for bytes.format() in either
version. It also allows for explicit support where you know you only want a
byte and allows {!s} to mean you only want a string (and thus throw an
error otherwise).

And all of this means that much like %s only taking bytes, the only way for
bytes.format() to accept a non-byte argument is for some format spec to be
specified to trigger the .encode('ascii', 'strict') call.

-Brett


>
>
> Open Questions
> ==============
>
> For %s there has been some discussion of trying to use the buffer protocol
> (Py_buffer) before trying __bytes__.  This question should be answered
> before
> the PEP is implemented.
>
>
> Proposed variations
> ===================
>
> It has been suggested to use %b for bytes instead of %s.
>
>   - Rejected as %b does not exist in Python 2.x %-interpolation, which is
>     why we are using %s.
>
> It has been proposed to automatically use .encode('ascii','strict') for str
> arguments to %s.
>
>   - Rejected as this would lead to intermittent failures.  Better to have
> the
>     operation always fail so the trouble-spot can be correctly fixed.
>
> It has been proposed to have %s return the ascii-encoded repr when the
> value
> is a str  (b'%s' % 'abc'  --> b"'abc'").
>
>   - Rejected as this would lead to hard to debug failures far from the
> problem
>     site.  Better to have the operation always fail so the trouble-spot
> can be
>     easily fixed.
>
>
> Foot notes
> ==========
>
> .. [1] Not sure if this should be the numeric __str__ or the numeric
> __repr__,
>        or if there's any difference
> .. [2] Any proper numeric class would then have to provide an ascii
>        representation of its value, either via __repr__ or __str__
> (whichever
>        we choose in [1]).
> .. [3] TypeError, ValueError, or UnicodeEncodeError?
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140115/22708b67/attachment.html>


More information about the Python-Dev mailing list