[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3

Victor Stinner victor.stinner at gmail.com
Wed Mar 26 11:10:14 CET 2014


2014-03-25 23:37 GMT+01:00 Ethan Furman <ethan at stoneleaf.us>:
> ``%a`` will call ``ascii()`` on the interpolated value.

I'm not sure that I understood correctly: is the "%a" format
supported? The result of ascii() is a Unicode string. Does it mean
that ("%a" % obj) should give the same result than
ascii(obj).encode('ascii', 'strict')?

Would it be possible to add a table or list to summarize supported
format characters? I found:

- single byte: %c
- integer: %d, %u, %i, %o, %x, %X, %f, %g, "etc." (can you please
complete "etc." ?)
- bytes and __bytes__ method: %s
- ascii(): %a


I guess that the implementation of %a can avoid a conversion from
ASCII ("PyUnicode_DecodeASCII" in the following code) and then a
conversion to ASCII again (in bytes%args):

PyObject *
PyObject_ASCII(PyObject *v)
{
    PyObject *repr, *ascii, *res;

    repr = PyObject_Repr(v);
    if (repr == NULL)
        return NULL;

    if (PyUnicode_IS_ASCII(repr))
        return repr;

    /* repr is guaranteed to be a PyUnicode object by PyObject_Repr */
    ascii = _PyUnicode_AsASCIIString(repr, "backslashreplace");
    Py_DECREF(repr);
    if (ascii == NULL)
        return NULL;

    res = PyUnicode_DecodeASCII(   <==== HERE
        PyBytes_AS_STRING(ascii),
        PyBytes_GET_SIZE(ascii),
        NULL);

    Py_DECREF(ascii);
    return res;
}

>  This is intended
> as a debugging aid, rather than something that should be used in production.

I don't understand the purpose of this sentence. Does it mean that %a
must not be used? IMO this sentence can be removed.

> Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
> representation.

Unicode is larger than that! print(ascii(chr(0x10ffff))) => '\U0010ffff'

> Use cases include developing a new protocol and writing
> landmarks into the stream; debugging data going into an existing protocol
> to see if the problem is the protocol itself or bad data; a fall-back for a
> serialization format; or even a rudimentary serialization format when
> defining ``__bytes__`` would not be appropriate [8].

I understand the debug use case. I'm not convinced by the serialization idea :-)

> .. note::
>
>     If a ``str`` is passed into ``%a``, it will be surrounded by quotes.

And:

- bytes gets a "b" prefix and surrounded by quotes as well  (b'...')
- the quote ' is escaped as \' if the string contains quotes ' and "

Can you also please add examples for %a?

>>> b"%a" % 123
b'123'
>>> b"%s" % ascii(b"bytes")
b"b'bytes'"
>>> b"%s" % "text"   # hum, it's not easy to see surrounding quotes with this examples
b"'text'"

The following more complex examples are maybe not needed:

>>> b"%a" % "euro:€"
b"'euro:\\u20ac'"
>>> b"%a" % """quotes >'"<"""
b'\'quotes >\\\'"<\''

> Proposed variations
> ===================
>

It would be fair to mention also a whole different PEP, Antoine's PEP 460!

Victor


More information about the Python-Dev mailing list