[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3
Victor Stinner
victor.stinner at gmail.com
Wed Mar 26 11:10:14 CET 2014
2014-03-25 23:37 GMT+01:00 Ethan Furman <ethan at stoneleaf.us>:
> ``%a`` will call ``ascii()`` on the interpolated value.
I'm not sure that I understood correctly: is the "%a" format
supported? The result of ascii() is a Unicode string. Does it mean
that ("%a" % obj) should give the same result than
ascii(obj).encode('ascii', 'strict')?
Would it be possible to add a table or list to summarize supported
format characters? I found:
- single byte: %c
- integer: %d, %u, %i, %o, %x, %X, %f, %g, "etc." (can you please
complete "etc." ?)
- bytes and __bytes__ method: %s
- ascii(): %a
I guess that the implementation of %a can avoid a conversion from
ASCII ("PyUnicode_DecodeASCII" in the following code) and then a
conversion to ASCII again (in bytes%args):
PyObject *
PyObject_ASCII(PyObject *v)
{
PyObject *repr, *ascii, *res;
repr = PyObject_Repr(v);
if (repr == NULL)
return NULL;
if (PyUnicode_IS_ASCII(repr))
return repr;
/* repr is guaranteed to be a PyUnicode object by PyObject_Repr */
ascii = _PyUnicode_AsASCIIString(repr, "backslashreplace");
Py_DECREF(repr);
if (ascii == NULL)
return NULL;
res = PyUnicode_DecodeASCII( <==== HERE
PyBytes_AS_STRING(ascii),
PyBytes_GET_SIZE(ascii),
NULL);
Py_DECREF(ascii);
return res;
}
> This is intended
> as a debugging aid, rather than something that should be used in production.
I don't understand the purpose of this sentence. Does it mean that %a
must not be used? IMO this sentence can be removed.
> Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
> representation.
Unicode is larger than that! print(ascii(chr(0x10ffff))) => '\U0010ffff'
> Use cases include developing a new protocol and writing
> landmarks into the stream; debugging data going into an existing protocol
> to see if the problem is the protocol itself or bad data; a fall-back for a
> serialization format; or even a rudimentary serialization format when
> defining ``__bytes__`` would not be appropriate [8].
I understand the debug use case. I'm not convinced by the serialization idea :-)
> .. note::
>
> If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
And:
- bytes gets a "b" prefix and surrounded by quotes as well (b'...')
- the quote ' is escaped as \' if the string contains quotes ' and "
Can you also please add examples for %a?
>>> b"%a" % 123
b'123'
>>> b"%s" % ascii(b"bytes")
b"b'bytes'"
>>> b"%s" % "text" # hum, it's not easy to see surrounding quotes with this examples
b"'text'"
The following more complex examples are maybe not needed:
>>> b"%a" % "euro:€"
b"'euro:\\u20ac'"
>>> b"%a" % """quotes >'"<"""
b'\'quotes >\\\'"<\''
> Proposed variations
> ===================
>
It would be fair to mention also a whole different PEP, Antoine's PEP 460!
Victor
More information about the Python-Dev
mailing list